Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed Effects Regression: Problem with interpretation of interaction term vs. sample split

    Dear all,

    I want to investigate the relationship of 'Working from home (WFH)' and Job Satisfaction by running a fixed effects Regression with Stata 13.0.

    Code:
    xtset pers_id wave
    The results (1) reveal that there is a statistically significant relationship.
    Now I want to examine if employees are more satisfied when working from home if they are male or female respectively if they have children or do not have children.
    Therefore I added an interaction term 'WFH*female' (2) and 'WFH*children' (3) (all three variables are dummy variables).
    (I also Control for other individual-specific and firm-specific variables.)

    How can I Interpret the results? Are interaction terms the right way to test this in a fixed effects Regression? Is it problematic if the interaction term includes a time-invariant variable or a time-varying variable?


    Code:
            
    
    
    (1) eststo: xtreg msat_job_z WFH $individualistic_control $firm_control_2 $wave , fe vce (cluster firm_id)
    (2) eststo: xtreg msat_job_z WFH WFH_Female $individualistic_control $firm_control_2 $wave , fe vce (cluster firm_id)
    (3) eststo: xtreg msat_job_z WFH WFH_children $individualistic_control $firm_control_2 $wave , fe vce (cluster firm_id)
    
    
    
    -------------------------------------------------------------------------
                              (1)                     (2)                    (3)  
                         Job Sat._z      Job Sat._z         Job Sat._z
    
    -------------------------------------------------------------------------
    WFH                0.178***         0.220***              0.149**
                           (0.0608)        (0.0679)             (0.0598)  
    
    Female              0                      0                          0  
                            (.)                      (.)                        (.)  
    
    Children           -0.0444           -0.0446             -0.0572  
                             (0.266)            (0.266)              (0.264)  
    
    
    WFH*Female                          -0.18                  
                                                    (0.144)                  
    
    WFH*Children                                                   0.104  
                                                                             (0.106)  
    
    _cons              -11.67           -11.42                   -11.62  
                           (8.114)           (8.131)                 (8.118)  
    ---------------------------------------------------------------------
    N                   12287              12287                  12287  
    ---------------------------------------------------------------------


    If I rerun the FE regression with sample splits I get the following results:

    (1) Females
    (2) Males
    (3) with Children
    (4) withou Children



    Code:
       
    (1) eststo: xtreg msat_job_z WFH $individualistic_control $firm_control_2 $wave if female==1, fe vce (cluster firm_id)
    (2) eststo: xtreg msat_job_z WFH $individualistic_control $firm_control_2 $wave if female==0, fe vce (cluster firm_id)
    (3) eststo: xtreg msat_job_z WFH $individualistic_control $firm_control_2 $wave if children==1, fe vce (cluster firm_id)
    (4) eststo: xtreg msat_job_z WFH $individualistic_control $firm_control_2 $wave if children==0, fe vce (cluster firm_id)
    ------------------------------------------------------------------------------------ (1) (2) (3) (4) msat_job_z msat_job_z msat_job_z msat_job_z ------------------------------------------------------------------------------------ WFH 0.0675 0.215*** 0.379** 0.114* (0.129) (0.0677) (0.150) (0.0630) Female 0 0 0 0 (.) (.) (.) (.) Children -0.0634 -0.0576 0 0 (0.0969) (0.264) (.) (.) _cons -1.360 -14.38* -6.409 -12.40 (9.859) (8.656) (17.97) (7.743) ----------------------------------------------------------------------------------- N 3409 8878 3040 9247 ------------------------------------------------------------------------------------

    How can I explain the differences in the results? Is sample split or interaction term more appropraite?



    Thank you a lot in advance!
    I really appreciate your help!


    Best regards,
    Lena
    Attached Files
    Last edited by Lena Funke; 26 May 2019, 07:03.

  • #2
    Lena:
    welcome to this forum.
    Things would be easier to get if you switched to -fvvarlist- notation to create interactions abd categorical variables.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Thank you for your reply Carlo Lazzaro!

      Would this be the right code?

      Code:
      xtreg msat_job_z wfh wfh#female wfh#children children $individualistic_control $firm_control $wave, fe vce (cluster firm_id)

      Kind regards,
      Lena

      Comment


      • #4
        Lena:
        I would tweak your code a bit:
        Code:
        xtreg msat_job_z  i.wfh##i.female i.wfh##i.children  $individualistic_control $firm_control $wave, fe vce (cluster firm_id)
        The suggested variations do nothjing different from your code, but:
        - make your code more efficient;
        - remind you that using prefixes -i.- (for categorical variables) and -c.- (for continuous variable) are a good habit (with two-level categorical variables they -i-. is redundant but does not harm).
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Thank you Carlo for your help!

          If I enter your Code, I get the attached results.
          (Sorry for the wrong labeling: weiblich=female; ein oder mehrere Kinder=having children; ja=yes)

          How can I Interpret the results, as the interaction terms are not significant? What is the difference to the results of the sample splits?

          Thank you and kind regards,
          Lena
          Attached Files

          Comment


          • #6
            In terms of explaining the difference between the results with interaction terms and the split sample approach, the difference arises because you have included interaction terms of female and children only with wfh and not with any of the other variables in the model. When you do that, you end up with a model that (implicitly) constrains the effects of each of the other variables to be the same in both sexes and with and without children. By contrast, with the split sample approach, separate effects of those variables are estimated in each sample. And because it is usually the case that those other variables are correlated with the variables of primary interest (in your case wfh), the results for those will also be somewhat different from what is obtained in a limited interaction model. To get results that exactly match the split-sample approach you must extend the interaction to include all of the other variables in the model as well. Here's a simple example:
            Code:
            . sysuse auto, clear
            (1978 Automobile Data)
            
            . regress price i.foreign##c.mpg headroom
            
                  Source |       SS           df       MS      Number of obs   =        74
            -------------+----------------------------------   F(4, 69)        =      7.21
                   Model |   187174722         4  46793680.6   Prob > F        =    0.0001
                Residual |   447890674        69  6491169.19   R-squared       =    0.2947
            -------------+----------------------------------   Adj R-squared   =    0.2538
                   Total |   635065396        73  8699525.97   Root MSE        =    2547.8
            
            -------------------------------------------------------------------------------
                    price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            --------------+----------------------------------------------------------------
                  foreign |
                 Foreign  |  -803.0123   2839.986    -0.28   0.778     -6468.63    4862.606
                      mpg |  -359.3438   85.02122    -4.23   0.000    -528.9565   -189.7311
                          |
            foreign#c.mpg |
                 Foreign  |   109.8623    119.977     0.92   0.363    -129.4854    349.2099
                          |
                 headroom |  -316.4086   416.8755    -0.76   0.450    -1148.052    515.2352
                    _cons |   14195.01    2600.35     5.46   0.000     9007.452    19382.56
            -------------------------------------------------------------------------------
            
            . margins foreign, dydx(mpg)
            
            Average marginal effects                        Number of obs     =         74
            Model VCE    : OLS
            
            Expression   : Linear prediction, predict()
            dy/dx w.r.t. : mpg
            
            ------------------------------------------------------------------------------
                         |            Delta-method
                         |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
            mpg          |
                 foreign |
               Domestic  |  -359.3438   85.02122    -4.23   0.000    -528.9565   -189.7311
                Foreign  |  -249.4815   84.10352    -2.97   0.004    -417.2634   -81.69958
            ------------------------------------------------------------------------------
            
            . by foreign, sort: regress price mpg headroom
            
            --------------------------------------------------------------------------------------------------------------------------------
            -> foreign = Domestic
            
                  Source |       SS           df       MS      Number of obs   =        52
            -------------+----------------------------------   F(2, 49)        =      9.02
                   Model |   131609778         2    65804889   Prob > F        =    0.0005
                Residual |   357585023        49  7297653.52   R-squared       =    0.2690
            -------------+----------------------------------   Adj R-squared   =    0.2392
                   Total |   489194801        51  9592054.92   Root MSE        =    2701.4
            
            ------------------------------------------------------------------------------
                   price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                     mpg |  -374.1402   91.63609    -4.08   0.000    -558.2898   -189.9906
                headroom |  -472.0061   474.6421    -0.99   0.325    -1425.835    481.8229
                   _cons |   14979.11   2891.614     5.18   0.000     9168.191    20790.02
            ------------------------------------------------------------------------------
            
            --------------------------------------------------------------------------------------------------------------------------------
            -> foreign = Foreign
            
                  Source |       SS           df       MS      Number of obs   =        22
            -------------+----------------------------------   F(2, 19)        =      6.75
                   Model |  59964963.4         2  29982481.7   Prob > F        =    0.0061
                Residual |  84398249.4        19  4442013.13   R-squared       =    0.4154
            -------------+----------------------------------   Adj R-squared   =    0.3538
                   Total |   144363213        21   6874438.7   Root MSE        =    2107.6
            
            ------------------------------------------------------------------------------
                   price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                     mpg |  -252.3255     69.617    -3.62   0.002    -398.0356   -106.6155
                headroom |   700.0356   946.4659     0.74   0.469     -1280.94    2681.012
                   _cons |   10805.83   2995.142     3.61   0.002      4536.93    17074.74
            ------------------------------------------------------------------------------
            
            . regress price i.foreign##c.(mpg headroom)
            
                  Source |       SS           df       MS      Number of obs   =        74
            -------------+----------------------------------   F(5, 68)        =      5.94
                   Model |   193082124         5  38616424.8   Prob > F        =    0.0001
                Residual |   441983272        68     6499754   R-squared       =    0.3040
            -------------+----------------------------------   Adj R-squared   =    0.2529
                   Total |   635065396        73  8699525.97   Root MSE        =    2549.5
            
            ------------------------------------------------------------------------------------
                         price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------------+----------------------------------------------------------------
                       foreign |
                      Foreign  |  -4173.273   4535.836    -0.92   0.361    -13224.39    4877.848
                           mpg |  -374.1402   86.48154    -4.33   0.000    -546.7114    -201.569
                      headroom |  -472.0061   447.9434    -1.05   0.296    -1365.863    421.8509
                               |
                 foreign#c.mpg |
                      Foreign  |   121.8147   120.7092     1.01   0.316    -119.0568    362.6862
                               |
            foreign#c.headroom |
                      Foreign  |   1172.042     1229.4     0.95   0.344    -1281.188    3625.272
                               |
                         _cons |   14979.11    2728.96     5.49   0.000     9533.552    20424.66
            ------------------------------------------------------------------------------------
            
            . margins foreign, dydx(mpg)
            
            Average marginal effects                        Number of obs     =         74
            Model VCE    : OLS
            
            Expression   : Linear prediction, predict()
            dy/dx w.r.t. : mpg
            
            ------------------------------------------------------------------------------
                         |            Delta-method
                         |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
            mpg          |
                 foreign |
               Domestic  |  -374.1402   86.48154    -4.33   0.000    -546.7114    -201.569
                Foreign  |  -252.3255   84.21197    -3.00   0.004    -420.3679   -84.28314
            ------------------------------------------------------------------------------
            
            .
            There is an additional complication that arises with -xtreg, fe-, because the fixed-effects linear model implicitly contains indicator variables for each of the panels (pers_id in your case). And those, too, must be included in the interaction in order to exactly replicate the split sample results. You cannot do that using -xtreg, fe-. Instead you have to emulate the fixed effects regression by using -regress- rather than -xtreg- and include i.(female children)##i.pers_id in the model. If the number of distinct pers_id's is very large you will get an unwieldy amount of output, and you may also exceed the maximum matrix size for the regression calculations.

            I note that in your model syntax you have incorporated the additional variables by dereferencing some macros. In order to incorporate these into the interaction, you may not be able to just write i.(female children)##$controls, because you may need to specify c. or i. individually for each variable listed in $controls if they are a mix of discrete and continuous. (If they are all continuous, or all discrete, then you can do it very simply as i.(female children)##c.($controls) or i.(female children)##i.($controls).)

            As a practical matter, in many situations, the not-of-primary-interest variables are almost uncorrelated with the variables of primary interest, or there may be good reason to believe that their effects really don't differ by sex or parenthood status. In those situations, the simple interaction approach and the split sample results will be reasonably close to each other anyway. But sometimes neither of these conditions comes to the rescue and the two approaches produce discordant results.

            Finally, I want to clarify that when I refer to "producing the same results" in the above, I am referring only to the coefficient estimates. Split samples and interaction analyses are based on different sample sizes, so the standard errors, test statistics, p-values, and confidence limits will in general be different no matter what you do.

            Added: Crossed with #5.

            Comment


            • #7
              Clyde Schechter Thank you very much for your detailed reply!

              I now understand why my model with the interaction terms reveals different results than the sample split approach.

              If the number of distinct pers_id's is very large you will get an unwieldy amount of output, and you may also exceed the maximum matrix size for the regression calculations.
              Unfortunately, the number of pers_id's is very large and I cannot sufficiently exceed the matrix size with my used stata version.

              In order to incorporate these into the interaction, you may not be able to just write i.(female children)##$controls, because you may need to specify c. or i. individually for each variable listed in $controls if they are a mix of discrete and continuous.
              Thank you for the hint. Yes, my controls are a mix of discrete and continuous variables.

              I now know how I could produce the same results (with regard to the coefficient estimates), thank you! But I am still struggling, which approach would be the appropriate one in order to test my hypotheses.
              I want to examine if the there is a difference between male and female employees respectively between employees with and without children regarding the positive "effect" of working from home on job satisfaction.

              Based on the results of the sample splits: Can I say that the positive relationship between WFH and job satisfaction applies only for men and not for women? Respectively, employees with children are more satisfied when working from home than employees without children?

              Or would the interaction approach be the appropriate one, indicating that gender and having children do not moderate the relationship of WFH and job satisfaction?



              Thank you in advance!


              Kind regards,
              Lena

              Comment


              • #8
                Well, in fact I think that even looking back at your results shown in #1 and #2 of this thread, you have been misled because you are focusing on the flawed concept of statistical significance. Your situation is actually a good example of one reason that the American Statistical Association now recommends that statistical significance be abandoned. See https://www.tandfonline.com/doi/full...5.2019.1583913.

                Your interaction term may not be "statistically significant" but that does not mean that there is no moderation of the wfh effect by sex, nor is it correct to conclude from either analysis that the wfh effect on job satisfaction occurs for men but not women. The notion that "not statistically significant" means "no effect" is a widespread fallacy--one of the reasons that the concept of statistical significance is so often misleading.

                Looking at the results in #5, we see that a yes answer to WFH is associated with a 0.19 higher expected level of job satisfaction among males. The interaction coefficient is approximately -0.17, which, even though it turns out not to be statistically significant, is nearly as large! When you add them, you are left with an estimate effect of 0.02 in women. The corresponding effects in your split sample analysis are 0.215 and 0.07. The differences between the results of the two approaches are really very minor. Both of them show a much larger effect for men than for women. And the male-female difference in effect is nearly the same in both analyses.

                So whichever analyses you choose to rely on, really the qualitative conclusion is the same. The important thing to remember is that the lack of statistical significance of the interaction, properly understood, means that the data do not contain enough precision in them to provide a tight estimate of how big the male-female effect difference is: it appears to be something in the 0.15 range (with males greater), but it could actually be closer to zero, or it could also be around twice as large. So the interpretation is not that there is no difference: the conclusion is that we can't say very much about the difference. Our best guess is that it is pretty large, but we could be quite wrong, or it could be even larger--the data just can't tell us. Similarly, in the split sample analyses, the fact that the effect of WPH on satisfaction for women is not statistically significant does not mean that WPH has no effect: all we can say is that the estimated effect is small, close to zero in fact, but that the data do not give a very precise estimate, and it could be much larger or could even be negative or zero. If you abandon the fallacy that "not significant" means "no effect" you can see that all your different analyses lead to the same conclusions, of which the most troublesome is that the effect of WFH in females, and the difference between that and the effect in males are both only vaguely estimable in this data. If I had to, I would guess that this situation arises because the sample doesn't contain very many women. If it is crucial to make more precise statements about these effects in women, you probably need a sample with more of them.

                Comment


                • #9
                  I am really grateful for your comprehensive reply!

                  If you abandon the fallacy that "not significant" means "no effect" you can see that all your different analyses lead to the same conclusions
                  This helps me a lot in better understanding my data and my approach. It is a good point that statistical significance and significance is not equally the same.

                  Looking at the results in #5, we see that a yes answer to WFH is associated with a 0.19 higher expected level of job satisfaction among males. The interaction coefficient is approximately -0.17, which, even though it turns out not to be statistically significant, is nearly as large! When you add them, you are left with an estimate effect of 0.02 in women.
                  Can I transfer this calculation to the interaction term WFH*children by adding 0.099 to the WFH coefficient of 0.19 (+ adding the main effect of the coefficient associated to children?) ? Or is an interpretation with two interaction terms difficult, as the coefficient of WFH refers to men without children? Thus, for the interpretation it would be maybe easier to add the interaction terms separately like in the attached screenshot in #1. Is there a difference in interpreting interactions with time-varying dummies in a fixed effects regression?


                  If I had to, I would guess that this situation arises because the sample doesn't contain very many women. If it is crucial to make more precise statements about these effects in women, you probably need a sample with more of them.
                  You are absolutely right. Unfortunately there are less women in the sample than men, approximately 20% vs. 80%. There are much less female home-workers than male home-workers.


                  Thank you and kind regards,
                  Lena




                  Comment


                  • #10
                    Or is an interpretation with two interaction terms difficult, as the coefficient of WFH refers to men without children? Thus, for the interpretation it would be maybe easier to add the interaction terms separately like in the attached screenshot in #1.
                    With the model you have, you really have to consider four separate effects of WFH, one for men without children, one for women without children, one for men with children, and one for women with children. Rather than doing the additions manually, and perhaps getting it wrong, let -margins- do it for you:

                    Code:
                    margins female#children, dydx(wfh)
                    Is there a difference in interpreting interactions with time-varying dummies in a fixed effects regression?
                    No, the interpretation is the same whether the dummies are time-varying or not.

                    Comment


                    • #11
                      Thank you a lot! Do you know what could be the problem if the -margins- Code leads to this result?

                      Attached Files

                      Comment


                      • #12
                        Well, following fixed-effects regressions there are many parameters that cannot be estimated because they depend on the way the model gets identified in the presence of those fixed effects. Stata is sometimes a bit overly conservative in that regard. The marginal effects you are looking for are estimable. To get them, however, you need to add the -noestimcheck- option to your -margins, dydx()- command.

                        Just a word of warning: do not indiscriminately use -noestimcheck- whenever Stata tells you the things you want aren't estimable. Stata is usually right when it says that. In fact, if you were looking for expected values of the outcome under particular conditions, Stata would be right in declining to give them to you, as those would be artifacts of the identification constraints. But these marginal effects are, in fact, identified, so, for these, go ahead and use the -noestimcheck- option.

                        Comment


                        • #13
                          Thank you Clyde Schechter! Now it works and I will note the warning with -noestimcehck-.

                          Comment

                          Working...
                          X