Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Split sample and interactions in OLS reg

    Dear Statalist,

    I am estimating a model to analyze wether the gap between high and low educated fathers in time spent reading with children has increased during the Great Recession. I use two time use surveys, one carried out before the recession (in 2002-2003) and one carried out during the recession (in 2009-2010). I model the data using OLS regressions. My DV is paternal time (minutes per day) spent reading with the child and my IV is a dummy for educational level (higheducated -University- yes/no) plus some controls. I have run both regressions for 2002 and 2009 and there is an increase in the gap:

    reg interactive_care highedu age age2 i.agekid child2 child3plus adhousemember weekday outsourc unemp i.edtrysp2 i.empsp2 if sampleM==1 & year==1
    reg interactive_care highedu age age2 i.agekid child2 child3plus adhousemember weekday outsourc unemp i.edtrysp2 i.empsp2 if sampleM==1 & year==2

    Code:
    
         Source |       SS           df       MS      Number of obs   =     3,859
    -------------+----------------------------------   F(15, 3843)     =     17.00
           Model |   344213.18        15  22947.5453   Prob > F        =    0.0000
        Residual |  5186151.99     3,843  1349.50611   R-squared       =    0.0622
    -------------+----------------------------------   Adj R-squared   =    0.0586
           Total |  5530365.17     3,858  1433.47983   Root MSE        =    36.736
    
    -------------------------------------------------------------------------------
    interactive~e |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    --------------+----------------------------------------------------------------
          highedu |   3.116415    1.50637     2.07   0.039     .1630547    6.069775
              age |   .9132052   1.082397     0.84   0.399    -1.208921    3.035332
             age2 |  -.0140666   .0134793    -1.04   0.297    -.0404939    .0123606
                  |
           agekid |
        age 5-12  |  -12.11388   1.357326    -8.92   0.000    -14.77503   -9.452735
           child2 |   1.592523    1.32161     1.20   0.228    -.9986006    4.183647
       child3plus |  -.0482715   2.096733    -0.02   0.982    -4.159088    4.062545
    adhousemember |  -4.721231   1.683448    -2.80   0.005    -8.021768   -1.420693
          weekday |   -7.25971   1.247098    -5.82   0.000    -9.704746   -4.814673
         outsourc |   .8800731   1.472389     0.60   0.550    -2.006666    3.766812
            unemp |   9.298752   2.876453     3.23   0.001     3.659233    14.93827
                  |
         edtrysp2 |
               2  |   3.217508   1.733236     1.86   0.063    -.1806433    6.615659
               3  |   7.223192   2.141351     3.37   0.001     3.024898    11.42149
                  |
           empsp2 |
               2  |   1.516971   2.604079     0.58   0.560    -3.588538     6.62248
               3  |  -4.548975   3.198141    -1.42   0.155    -10.81919    1.721241
               4  |  -2.104414   1.374925    -1.53   0.126    -4.800066    .5912383
                  |
            _cons |   10.77885   21.30815     0.51   0.613    -30.99751    52.55521
    -------------------------------------------------------------------------------
    Code:
        Source |       SS           df       MS      Number of obs   =     1,793
    -------------+----------------------------------   F(15, 1777)     =      8.94
           Model |  284081.073        15  18938.7382   Prob > F        =    0.0000
        Residual |  3762506.99     1,777  2117.33652   R-squared       =    0.0702
    -------------+----------------------------------   Adj R-squared   =    0.0624
           Total |  4046588.06     1,792  2258.14066   Root MSE        =    46.015
    
    -------------------------------------------------------------------------------
    interactive~e |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    --------------+----------------------------------------------------------------
          highedu |   6.284536   2.592455     2.42   0.015     1.199953    11.36912
              age |   2.409453   1.907376     1.26   0.207    -1.331485     6.15039
             age2 |  -.0280814   .0232523    -1.21   0.227    -.0736861    .0175233
                  |
           agekid |
        age 5-12  |  -16.83917   2.524653    -6.67   0.000    -21.79077   -11.88757
           child2 |   .9450514   2.392282     0.40   0.693     -3.74693    5.637033
       child3plus |  -7.204184   3.906855    -1.84   0.065     -14.8667    .4583302
    adhousemember |   -9.11387   3.291141    -2.77   0.006    -15.56878   -2.658956
          weekday |  -11.93212   2.232042    -5.35   0.000    -16.30982   -7.554419
         outsourc |   1.347878   3.626624     0.37   0.710    -5.765019    8.460774
            unemp |    4.64624   3.325663     1.40   0.163    -1.876383    11.16886
                  |
         edtrysp2 |
               2  |   2.272788   3.407544     0.67   0.505    -4.410428    8.956003
               3  |    3.28916   3.842698     0.86   0.392    -4.247523    10.82584
                  |
           empsp2 |
               2  |   7.270302   3.064111     2.37   0.018     1.260661    13.27994
               3  |   3.908624    23.1384     0.17   0.866    -41.47273    49.28997
               4  |    2.26103   2.593797     0.87   0.383    -2.826183    7.348243
                  |
            _cons |  -16.61599   38.41347    -0.43   0.665    -91.95632    58.72434
    -------------------------------------------------------------------------------
    As we see in the regressions results, high educated spent 3 minutes more in 2002-2003 and 6 minutes more in 2009-2010. The gap has been doubled.

    However, if I run a regression with both surveys pooled and using an interaction term (year09*higheducated) the result is non significant:

    reg interactive_care highedu##year09 age age2 i.agekid child2 child3plus adhousemember weekday outsourc unemp i.edtrysp2 i.empsp2 if sampleM==1

    Code:
          Source |       SS           df       MS      Number of obs   =     5,652
    -------------+----------------------------------   F(17, 5634)     =     24.17
           Model |  654926.044        17  38525.0614   Prob > F        =    0.0000
        Residual |  8978407.06     5,634  1593.61148   R-squared       =    0.0680
    -------------+----------------------------------   Adj R-squared   =    0.0652
           Total |   9633333.1     5,651  1704.71299   Root MSE        =     39.92
    
    --------------------------------------------------------------------------------
    interactive_~e |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
         1.highedu |   3.477233   1.569883     2.21   0.027     .3996584    6.554808
          1.year09 |   3.078955   1.475245     2.09   0.037     .1869058    5.971003
                   |
    highedu#year09 |
              1 1  |    2.33746   2.456134     0.95   0.341    -2.477509    7.152429
                   |
               age |   1.378432   .9524214     1.45   0.148    -.4886806    3.245545
              age2 |  -.0184018   .0117715    -1.56   0.118    -.0414785    .0046748
                   |
            agekid |
         age 5-12  |  -13.55779   1.219893   -11.11   0.000    -15.94925   -11.16633
            child2 |   1.342821   1.180038     1.14   0.255    -.9705076     3.65615
        child3plus |  -2.365831   1.887097    -1.25   0.210    -6.065268    1.333606
     adhousemember |  -6.111541   1.536433    -3.98   0.000    -9.123542   -3.099541
           weekday |  -8.814364   1.109592    -7.94   0.000    -10.98959   -6.639136
          outsourc |   .8374926   1.398754     0.60   0.549    -1.904603    3.579589
             unemp |   6.945984   2.109734     3.29   0.001     2.810094    11.08188
                   |
          edtrysp2 |
                2  |   2.906014   1.586254     1.83   0.067    -.2036535    6.015682
                3  |   5.941424   1.898214     3.13   0.002     2.220193    9.662655
                   |
            empsp2 |
                2  |   4.248734   1.910134     2.22   0.026     .5041366    7.993332
                3  |  -3.760329   3.394674    -1.11   0.268     -10.4152     2.89454
                4  |  -.8518696   1.243339    -0.69   0.493    -3.289293    1.585554
                   |
             _cons |   1.491899   18.91278     0.08   0.937    -35.58444    38.56824
    --------------------------------------------------------------------------------

    Why these differences in the results?

    This is my very first post in the forum so I am very sorry in advance if I do some mistake. Please let me know and I would be very happy to fix it.
    Many many thanks in advance

  • #2
    Your question is clearly posed and the results you show are easily readable because you used code blocks. So congratulations for good posting your very first time.

    There are two things to consider here.

    1. The combined regression is not exactly equivalent to the two separate regressions. In the two separate regressions, each variable receives a different coefficient in each year. In the combined regression, every variable except highedu is constrained to have the same coefficient in both years. So you cannot expect the single regression to produce the same results as the two separate regressions: they are different models.

    2. Even just looking at the first two outputs from the two separate regressions, although the absolute difference is pretty large, as you note, if you look at the confidence intervals for highedu in the two models, they clearly overlap considerably. So, even the first two do not make a loud and clear statement about a change from 2002 to 2009.

    Another question: is this survey data two cross sections, or is the 2002 cohort followed up in 2009. If this is longitudinal data, then you need to use a longitudinal analysis that accounts for intra-respondent correlation.

    Now, what to do? Let me assume, so we can move forward, that the data are not longitudinal.

    In situations like this, we generally rely on the combined model. Sometimes, when there is good reason to believe that the effects of the other variables in the model will have changed to an important extent between the two survey waves, we would use a combined model in which those other variables are also interacted with the year indicator variable. Several of the variables in your model seem to show large differences in coefficient between the two separate models, so you might want to consider this approach for them. We don't necessarily care about the results associated with those additional interactions, and we may not even bother to look at them carefeully, but we include them so that the model can appropriately allow for the differences. Anyway, I would proceed with the combined model, perhaps adding interaction terms for some or all of the other variables. I would also extract from the results the adjusted mean effect of highered in each of the two years by running
    Code:
    margins year09, dydx(higheredu)
    and the actual expected outcomes for both levels of highered in each year by running
    Code:
    margins higheredu#year09
    In reporting these results, I would downplay, if not altogether ignore, the p-values. The null hypothesis of zero difference between 2002 and 2009 is simply not a credible hypothesis. It's a straw man. Moreover, even if you did reject that null hypothesis, it does not follow that you can accept any other particular alternative hypothesis. So the p-value is really pretty useless here. Focus on the actual coefficients (or predicted margins) and their confidence intervals. You then have a point estimate and a range of values that are plausible bounds for the uncertainty of that estimate. If 0 happens to fall within the confidence interval, it does not follow that there is no difference. It just means that the difference is likely to be small, and, regardless, the data available are insufficient to pin it down precisely, indeed, not even sufficient to clearly identify the direction of the effect.



    Last edited by Clyde Schechter; 06 Oct 2017, 10:36.

    Comment


    • #3
      Dear Clyde,

      Your response is so good and clarifying! I really appreciate it.

      Many thanks again and best wishes

      Comment


      • #4
        Excelent answer!
        Last edited by Neyla Tad; 17 Oct 2017, 09:04.

        Comment


        • #5
          I'm confused by your models. What is the variable you are calling "did?" From the context of your question, it sounds like you are interested in a difference in differences estimator of the effect of something, and "did" is a common abbreviation for "difference in differences." But there is no single variable in a DID model that corresponds to the difference in differences estimator. That estimator is given by the coefficient of the interaction between a pre-post variable that distinguishes the era prior to the intervention from the era after, and a treatment group variable that distinguishes those who were exposed to the intervention from those who were not. So the standard approach would be:

          Code:
          tobit debt pre_post##treatment_group perhaps_other_covariates_, ...
          


          Note that I have used the double ## operator here. This operator tells Stata to expand pre_post##treatment_group into three terms: pre_post, treatment_group, and pre_post#treatment_group (note single # here). The DID estimator of the intervention causal effect is the coefficient of the pre_post#treatment_group interaction term.

          Specifically in response to your final question, once you put the model in these terms, it is absolutely essential to have both pre_post and treatment_group included in the regression. The model is mis-specified without both of those and the interaction coefficient will not be a valid DID estimator without them. There are only two exceptions to this rule that I can think of:

          1. (Common) In panel data estimators, the treatment_group variable will be constant within panels, and hence colinear with the panel fixed-effects. In this situation you can omit the plain treatment_group variable. But really, you don't need to even think about this because if you use pre_post##treatment_group in this context, Stata will recognize the colinearity and will omit the treatment_group term for you automatically.

          2. (Very, very rare) If there is a scientific reason to assert that the two groups start from an identical baseline value of the outcome in the pre-intervention era, then the treatment_group variable coefficient (which estimates the pre-intervention difference between the groups) is constrained to zero, and one way to impose that constraint is to omit it. But in the real world, there is almost never any basis for this constraint and in my career I have never seen a valid application of this. It is a theoretical possibility, but one that never, or almost never, actually arises.

          Comment


          • #6
            the results you show are easily readable because you used code blocks.
            However I can't read your Stata code-the forum font is too small. Next time, please put code as well as results between CODE delimiters.
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment


            • #7
              Dear Clyde,

              I just saw you replied again to my message! But better late than never! And it was quite nice to see your reply now because I am still working on and off on this research. I reply to your questions and I post new (although quite similar) ones. I would like to thank you A LOT for taking your time in your responses. This is so appreciated.

              Regarding your questions. Unfortunately, this is not a panel dataset. It is a repeated time-use cross-sectional. They are two different individuals being asked similar questions about how they use their time. The time points are 2002 and 2009. Also unfortunately, I think difference-in-difference is not possible to apply because the economic recession hit all the regions at the same time with the same strength. That means, I think, it is not possible to apply Diff-in-diff because there is not a "control" group to compare with. Therefore and quite unfortunately the analysis cannot claim causal effects and so on but rather trends over time and how associations between variables changed during the recession, which can also be nice.

              We saw, as you said, that the gap between higher and lower educated parents in time reading to children did not wide during the economic crisis (which is a good news by the way). Now I wonder... but did the gender gap in "physical" time (i.e., changing nappies, feeding, etc) with children narrow during the crisis? To estimate that I follow a similar strategy than before. Let's see.

              First, in order to see whether maternal time varied during that time, I estimate a pooled OLS model for the sample of mothers.

              Code:
              reg physical_care year09 notworking parttime  highedu notworking_p parttime_p  highedu_p  age age2 child2 child3plus i.agekid adhousemember outsourc weekday ordinaryday if Sample==1 & female==1
              Code:
                    Source |       SS           df       MS      Number of obs   =     5,707
              -------------+----------------------------------   F(16, 5690)     =    160.40
                     Model |  16262600.8        16  1016412.55   Prob > F        =    0.0000
                  Residual |  36057064.9     5,690  6336.91826   R-squared       =    0.3108
              -------------+----------------------------------   Adj R-squared   =    0.3089
                     Total |  52319665.7     5,706  9169.23689   Root MSE        =    79.605
              -------------------------------------------------------------------------------
              physical_care |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              --------------+----------------------------------------------------------------
                     year09 |   6.514089   2.451322     2.66   0.008     1.708563    11.31961
                 notworking |   29.39478   2.415414    12.17   0.000     24.65965    34.12991
                   parttime |  -.5359691   3.784368    -0.14   0.887    -7.954773    6.882835
                    highedu |   16.74664    2.67177     6.27   0.000     11.50895    21.98432
               notworking_p |  -11.97941   3.615724    -3.31   0.001     -19.0676   -4.891211
                 parttime_p |  -6.404455   10.07338    -0.64   0.525    -26.15212    13.34321
                  highedu_p |   1.173255   2.620091     0.45   0.654    -3.963121     6.30963
                        age |   .2298632   1.937995     0.12   0.906    -3.569345    4.029072
                       age2 |   -.028714   .0251241    -1.14   0.253    -.0779668    .0205387
                     child2 |   5.401378   2.338805     2.31   0.021     .8164294    9.986327
                 child3plus |   5.783494   3.715687     1.56   0.120    -1.500668    13.06766
                            |
                     agekid |
                  age 5-12  |  -84.30545   2.431283   -34.68   0.000    -89.07169   -79.53921
              adhousemember |  -12.02322   3.042572    -3.95   0.000    -17.98782   -6.058616
                   outsourc |   1.225287   2.786685     0.44   0.660    -4.237677    6.688251
                    weekday |   11.56794   2.228676     5.19   0.000     7.198886    15.93699
                ordinaryday |   15.36074   2.672701     5.75   0.000     10.12123    20.60025
                      _cons |   118.1332   36.75558     3.21   0.001     46.07827    190.1882
              -------------------------------------------------------------------------------
              Looking at our main variable of interest, year09, we can see that mothers do spend more time in physical care with children than in year 2002. Now let's see fathers. (Importantly, I think the name of the variables show quite clear what they are. The "_p" after some of them means "partner").

              Code:
              reg physical_care year09 notworking parttime  highedu notworking_p parttime_p  highedu_p  age age2 child2 child3plus i.agekid adhousemember outsourc weekday ordinaryday if Sample==1 & female==0
              Code:
                    Source |       SS           df       MS      Number of obs   =     5,661
              -------------+----------------------------------   F(16, 5644)     =     76.53
                     Model |  3107283.92        16  194205.245   Prob > F        =    0.0000
                  Residual |  14322087.8     5,644  2537.57757   R-squared       =    0.1783
              -------------+----------------------------------   Adj R-squared   =    0.1759
                     Total |  17429371.7     5,660   3079.3943   Root MSE        =    50.374
              
              -------------------------------------------------------------------------------
              physical_care |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              --------------+----------------------------------------------------------------
                     year09 |   7.997194   1.555608     5.14   0.000     4.947605    11.04678
                 notworking |   20.06011   2.337274     8.58   0.000     15.47815    24.64207
                   parttime |   1.701226   6.326556     0.27   0.788    -10.70126    14.10371
                    highedu |   7.295091   1.661885     4.39   0.000     4.037157    10.55303
               notworking_p |   -8.08506   1.536962    -5.26   0.000     -11.0981   -5.072025
                 parttime_p |   .5643193   2.399471     0.24   0.814    -4.139565    5.268204
                  highedu_p |   10.73067    1.69639     6.33   0.000     7.405098    14.05625
                        age |   .2348947   1.202115     0.20   0.845    -2.121713    2.591503
                       age2 |  -.0117216   .0148537    -0.79   0.430    -.0408406    .0173973
                     child2 |   3.846812    1.48769     2.59   0.010     .9303687    6.763256
                 child3plus |   .4949186   2.368423     0.21   0.834      -4.1481    5.137937
                            |
                     agekid |
                  age 5-12  |  -31.83281   1.536667   -20.72   0.000    -34.84526   -28.82035
              adhousemember |  -6.118985   1.933806    -3.16   0.002    -9.909987   -2.327982
                   outsourc |   3.348145   1.762696     1.90   0.058    -.1074166    6.803706
                    weekday |  -8.513854   1.422443    -5.99   0.000    -11.30239   -5.725318
                ordinaryday |   1.937963   1.681042     1.15   0.249    -1.357525     5.23345
                      _cons |    50.5189   23.92205     2.11   0.035     3.622479    97.41531
              -------------------------------------------------------------------------------

              Looking again at the same variable, we see that fathers also increased their time. They even increased more than mothers did. However, the increase for both gender is quite similar. As Clyde said, the two confident intervals overlap. Therefore we could say that the gender gap did not significantly narrow (quite unfortunately by the way). I also conducted a similar analysis including both genders in the regression and interacting year09*female, with very similar result ( ~ 3 minutes less gap and non-significant). So far, I think, everything is ok

              But now is when I have some issues. Following your advise Cycle, I interact some of the variables in the analysis. By doing so I also can respond to questions like, did the effect of non-working fathers vary during the crisis? that is, are non-working father spending more time in 2009? and for mothers? or, are parents spending more time in weekdays during the recession? And so on. So I do this:

              Code:
              reg physical_care year09##(notworking parttime highedu highedu_p agekid weekday) notworking_p parttime_p age age2 child2 child3plus  adhousemember outsourc ordinaryday if Sample==1 & female==0
              Code:
                    Source |       SS           df       MS      Number of obs   =     5,661
              -------------+----------------------------------   F(22, 5638)     =     57.26
                     Model |  3183048.19        22  144684.009   Prob > F        =    0.0000
                  Residual |  14246323.5     5,638  2526.83993   R-squared       =    0.1826
              -------------+----------------------------------   Adj R-squared   =    0.1794
                     Total |  17429371.7     5,660   3079.3943   Root MSE        =    50.268
              
              ------------------------------------------------------------------------------------
                   physical_care |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              -------------------+----------------------------------------------------------------
                        1.year09 |   16.79016    3.27612     5.13   0.000      10.3677    23.21261
                    1.notworking |   15.67719   3.231755     4.85   0.000     9.341709    22.01268
                      1.parttime |   11.50659   10.77043     1.07   0.285    -9.607595    32.62078
                                 |
                         highedu |
                Higher educated  |   8.394986    2.04855     4.10   0.000      4.37904    12.41093
                                 |
                       highedu_p |
                Higher educated  |   10.81093   2.089569     5.17   0.000     6.714571    14.90729
                                 |
                          agekid |
                       age 5-12  |  -27.29879   1.794926   -15.21   0.000    -30.81754   -23.78005
                       1.weekday |  -8.006533   1.717233    -4.66   0.000    -11.37297   -4.640095
                                 |
               year09#notworking |
                            1 1  |   7.803818   4.663229     1.67   0.094    -1.337905    16.94554
                                 |
                 year09#parttime |
                            1 1  |  -15.10782   13.29699    -1.14   0.256    -41.17504     10.9594
                                 |
                  year09#highedu |
              1#Higher educated  |   -3.03452   3.444636    -0.88   0.378    -9.787332    3.718292
                                 |
                year09#highedu_p |
              1#Higher educated  |  -.8161455   3.402739    -0.24   0.810    -7.486823    5.854532
                                 |
                   year09#agekid |
                     1#age 5-12  |  -14.21146   2.933732    -4.84   0.000    -19.96271   -8.460219
                                 |
                  year09#weekday |
                            1 1  |  -1.636105   2.973636    -0.55   0.582    -7.465577    4.193366
                                 |
                    notworking_p |  -8.053496   1.535017    -5.25   0.000    -11.06272   -5.044271
                      parttime_p |    .200979   2.396543     0.08   0.933    -4.497167    4.899125
                             age |  -.0341028   1.202086    -0.03   0.977    -2.390653    2.322447
                            age2 |  -.0079183   .0148568    -0.53   0.594    -.0370433    .0212067
                          child2 |   3.858897   1.484996     2.60   0.009     .9477326     6.77006
                      child3plus |   .6850271   2.364611     0.29   0.772    -3.950521    5.320575
                   adhousemember |  -6.278957   1.930123    -3.25   0.001    -10.06274   -2.495172
                        outsourc |   4.307881   1.772669     2.43   0.015     .8327675    7.782995
                     ordinaryday |   1.995561   1.680929     1.19   0.235    -1.299706    5.290829
                           _cons |   52.00098   23.91066     2.17   0.030     5.126893    98.87507
              ------------------------------------------------------------------------------------
              As we see, for the case of fathers, the only significant interaction is the age of the children. And, maybe, "not working". So we can say that fathers who are not working during the recession are spending more time with their children than before (although not quite significant). This can also be a compositional effect, I guess, because there were much more men not working in 2009 than in 2002 (and more diverse) and then the time goes up.

              But my main doubt is coming now: Why now the coefficient of "year09" is the double than without including interactions???

              Let's see what happened with mothers:

              Code:
              reg physical_care year09##(notworking parttime highedu highedu_p agekid weekday) notworking_p parttime_p age age2 child2 child3plus  adhousemember outsourc ordinaryday if Sample==1 & female==1
              
                  Source |       SS           df       MS      Number of obs   =     5,707
              -------------+----------------------------------   F(22, 5684)     =    117.12
                     Model |  16319688.5        22  741804.025   Prob > F        =    0.0000
                  Residual |  35999977.1     5,684  6333.56389   R-squared       =    0.3119
              -------------+----------------------------------   Adj R-squared   =    0.3093
                     Total |  52319665.7     5,706  9169.23689   Root MSE        =    79.584
              
              ------------------------------------------------------------------------------------
                   physical_care |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              -------------------+----------------------------------------------------------------
                        1.year09 |  -1.267485   6.107805    -0.21   0.836    -13.24111    10.70614
                    1.notworking |   26.92887   2.856012     9.43   0.000        21.33    32.52774
                      1.parttime |  -2.243114   5.600007    -0.40   0.689    -13.22126    8.735035
                                 |
                         highedu |
                Higher educated  |   14.53052   3.324429     4.37   0.000     8.013368    21.04766
                                 |
                       highedu_p |
                Higher educated  |   1.575131   3.232101     0.49   0.626    -4.761019    7.911282
                                 |
                          agekid |
                       age 5-12  |  -87.65553   2.846615   -30.79   0.000    -93.23598   -82.07508
                       1.weekday |   13.52641    2.69712     5.02   0.000     8.239021    18.81379
                                 |
               year09#notworking |
                            1 1  |   7.717046    5.17276     1.49   0.136    -2.423538    17.85763
                                 |
                 year09#parttime |
                            1 1  |   5.223155   7.682928     0.68   0.497    -9.838315    20.28463
                                 |
                  year09#highedu |
              1#Higher educated  |   6.709502   5.492459     1.22   0.222    -4.057813    17.47682
                                 |
                year09#highedu_p |
              1#Higher educated  |  -1.117851   5.414904    -0.21   0.836    -11.73313    9.497425
                                 |
                   year09#agekid |
                     1#age 5-12  |   10.55914   4.642917     2.27   0.023     1.457252    19.66103
                                 |
                  year09#weekday |
                            1 1  |  -5.877306   4.692496    -1.25   0.210    -15.07639    3.321777
                                 |
                    notworking_p |  -11.92778   3.626798    -3.29   0.001    -19.03768   -4.817868
                      parttime_p |  -5.463012   10.08012    -0.54   0.588     -25.2239    14.29788
                             age |   .4557001   1.942843     0.23   0.815    -3.353014    4.264414
                            age2 |  -.0319871   .0251972    -1.27   0.204    -.0813832    .0174091
                          child2 |    5.36734   2.338554     2.30   0.022     .7828825    9.951798
                      child3plus |   5.742402   3.716569     1.55   0.122    -1.543491    13.02829
                   adhousemember |  -11.92203   3.042754    -3.92   0.000    -17.88699   -5.957075
                        outsourc |   .1866478   2.821645     0.07   0.947    -5.344853    5.718148
                     ordinaryday |   15.50866   2.676497     5.79   0.000     10.26171    20.75562
                           _cons |   116.8047   36.82046     3.17   0.002     44.62256    188.9869
              ------------------------------------------------------------------------------------
              Why now the main variable of interest (year09) for is -1.26 and non-significant?? While for fathers it has increased a lot and it is very significant?? That is my main question. If you read until here, thank you so much. As I said I really appreciate any answer. I suspect that the conclusion I have to report in the paper is that both gender increased their time with approx. the same intensity and therefore the gender gap did not significantly varied. And that fathers with "older" children 2009 are spending significantly less time with their children while the opposite holds for mothers. But I am not very sure about the interpretation of the interaction, and, especially, why the effect of the main variable change so drastically. I am sorry for such a long message but I think it is better in order to get to know better what I am doing. And I hope I explained myself correctly!

              Thank you so so much in advance.
              All the best
              Last edited by John Fasano; 01 Jun 2018, 08:38.

              Comment


              • #8
                But my main doubt is coming now: Why now the coefficient of "year09" is the double than without including interactions???
                When you add new variables to a model, and especially so if the new variables include interaction terms with pre-existing variables, the meaning of the coefficients is changed. In the model with interactions, the coefficient of year09 does not represent "the effect of year 2009." In the original model it does. You are comparing apples to oranges and worrying because they are different colors. And when you introduce several interactions, it is impossible to interpret any one of them in isolation.

                Another aspect of this is relying on statistical significance. In your mind, I imagine you are interpreting "not statistically significant" as "no effect" or "no difference." But widespread as this practice is, it is just dead wrong and leads to endless confusion and paradoxes. One of your interaction terms' coefficients is nearly as large as one of its constituent effects, yet the latter was statistically significant and the former not. That's a meaningless comparison. The estimation of interaction effects is always substantially underpowered compared to the estimation of the constituent effects.

                I think that to really understand what is going on in these complicated interaction models you need to use the -margins- command to actually see the expected values of outcomes under various combinations of conditions, and you need to similarly estimate both condition-specific and average- marginal effects of gender. If you are not familiar with -margins-, the best introduction I know of is the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf. It contains numerous worked examples, including interaction models, and is miraculously clearly written.

                Comment


                • #9
                  Thank you very much Clyde!!

                  Comment


                  • #10
                    Hi,

                    I apologize in advance for replying to a dated post. However, the topic is very relevant to the fundamental question that I want to ask. I have read the above posts and especially thank @Clyde Schechter for making very clear comments.

                    I am looking at the effect of a variable on certain outcome with the change in law. The longitudinal panel data is split into 2 time periods, one before the law (prior to year 2000) and another after the implementation of the law (post year 2000). I want to understand if the effect of the independent variable on the outcome "persists" (if a positive relationship remains positive post law change) and "intensifies" (the effect of the independent variable is higher than what it used to be in the pre-law period (prior to year 2000).

                    I understand that the analysis with split samples by examining effect of independent variable on the outcome and using an interaction effect with only the independent variable of interest and time dummy (0 or 1 for pre and post periods) for the whole sample are different models. I learned from the readings that splitting the sample is equivalent to interacting all the control variables as well as the independent variable with time dummy. However, I am still not clear on the fundamental process that one should seek before making a decision that whether to use split or whether to use interaction for only one variable. Theoretically, the change in law matters, it may have different influence on firms with different size, expense, and other control variables. When I use the interaction effect, I am assuming that the coefficients for the control variables remain same before and after the law.

                    When I use split sample, I get the two coefficients as positive for before and after the year 2000 and I can compare directly if the effect of the independent variable persists and intensifies. However, with the interaction term (indep var x time dummy), can I only test the intensification effect but not persistence of the sign as the resultant coefficient is "relative" to prior period in nature? Please correct me if I am wrong. Also, it is reasonable to compare the coefficients in split samples when the independent variable is insignificant (Confidence interval has values from negative to high positive)?

                    I would really appreciate if someone can help.

                    Thanks,
                    Nishant

                    Comment

                    Working...
                    X