Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data FE - right approach to remote work experience & obj. career outcomes?

    Hi everyone,

    I’m analyzing panel data (14 waves from 2008/09 until 2021/22) and want to ask for feedback on my approach.
    RQ: Did remote experience before COVID lead to better income development during and after the pandemic? (ho-experienced vs. non-ho-experienced // comparing two groups and their income development over time)
    I’m using a Fixed Effects (FE) model to control for unobserved heterogeneity:

    xtreg ln_income i.wave##ho_pre age working_hours overtime yeduc ft_empl commute_time isco_group marry, fe cluster(id)
    • ln_income = Log income (DV)
    • wave = Panel wave (time variable)
    • ho_pre = 1 if individual used homeoffice at least once, 0 otherwise (no homeoffice at all)
    • i.wave##ho_pre = Interaction term to track income differences over time based on home office experience
    • Controls: Age, working hours, overtime, education (yeduc), full-time employment (ft_empl), etc.

    Fixed-effects (within) regression Number of obs = 30,006
    Group variable: id Number of groups = 6,632

    R-squared: Obs per group:
    Within = 0.3342 min = 1
    Between = 0.4924 avg = 4.5
    Overall = 0.4745 max = 10

    F(29, 6631) = 153.11
    corr(u_i, Xb) = 0.1885 Prob > F = 0.0000

    (Std. err. adjusted for 6,632 clusters in id)
    -------------------------------------------------------------------------------
    | Robust
    ln_income | Coefficient std. err. t P>|t| [95% conf. interval]
    --------------+----------------------------------------------------------------
    wave |
    2 2009/10 | .030249 .0141047 2.14 0.032 .0025992 .0578988
    3 2010/11 | .0502897 .0145352 3.46 0.001 .0217959 .0787834
    5 2012/13 | .0643812 .0258303 2.49 0.013 .0137456 .1150169
    7 2014/15 | .0841755 .0384626 2.19 0.029 .0087764 .1595745
    8 2015/16 | .1314256 .0444306 2.96 0.003 .0443273 .2185238
    9 2016/17 | .1159861 .0505932 2.29 0.022 .0168071 .2151651
    10 2017/18 | .1417508 .0568954 2.49 0.013 .0302174 .2532841
    11 2018/19 | .1421079 .0634181 2.24 0.025 .017788 .2664277
    12 2019/20 | .1835436 .0702016 2.61 0.009 .0459259 .3211613
    13 2020/21 | .1972685 .0758904 2.60 0.009 .048499 .346038
    |
    1.ho_pre | -.1104748 .0189677 -5.82 0.000 -.1476575 -.0732921
    |
    wave#ho_pre |
    2 2009/10#1 | .0565599 .0319277 1.77 0.077 -.0060287 .1191485
    3 2010/11#1 | .0360609 .018341 1.97 0.049 .0001066 .0720152
    5 2012/13#1 | .051195 .0184046 2.78 0.005 .015116 .087274
    7 2014/15#1 | .0698457 .0191718 3.64 0.000 .0322629 .1074285
    8 2015/16#1 | .0816179 .0190157 4.29 0.000 .044341 .1188948
    9 2016/17#1 | .0866077 .019312 4.48 0.000 .0487499 .1244655
    10 2017/18#1 | .1009338 .0191027 5.28 0.000 .0634864 .1383811
    11 2018/19#1 | .1070736 .0199722 5.36 0.000 .0679217 .1462256
    12 2019/20#1 | .1236747 .0201869 6.13 0.000 .084102 .1632475
    13 2020/21#1 | .1186338 .0205628 5.77 0.000 .0783241 .1589434
    |
    age | .0133326 .0062536 2.13 0.033 .0010736 .0255916
    working_hours | .0100461 .0004773 21.05 0.000 .0091104 .0109818
    overtime | .015544 .0056095 2.77 0.006 .0045476 .0265404
    yeduc | .056992 .0069028 8.26 0.000 .0434603 .0705237
    ft_empl | .2289353 .0113705 20.13 0.000 .2066454 .2512252
    commute_time | .0001998 .0000679 2.94 0.003 .0000667 .000333
    marry | .0343692 .0070297 4.89 0.000 .0205888 .0481496
    isco_group | -.0145504 .003519 -4.13 0.000 -.0214487 -.0076521
    _cons | 5.504766 .2068468 26.61 0.000 5.09928 5.910253
    --------------+----------------------------------------------------------------
    sigma_u | .38441371
    sigma_e | .21556466
    rho | .76077203 (fraction of variance due to u_i)
    -------------------------------------------------------------------------------



    Is this the best way to model the effect of pre-COVID home office experience on income trends? Any other recommendations how to better model the formula? I am not 100% sure as individuals changed their working mode over time, meaning more and more people started to work remotely over the period of 14 years - does this lead to a bias? some start sooner, some later and some not at all (ho_pre=0). How can I control for that? Should i make categories?
    I’d really appreciate any insights on how to improve the approach.

  • #2
    Sophie:
    welcome to this forum.
    Please use CODE delimiters to post what you typed and what Stata gave you back (as per FAQ). Thanks.
    That said, I would go:
    Code:
    xtreg ln_income i.wave##i.ho_pre c.age##c.age working_hours overtime yeduc ft_empl commute_time isco_group marry, fe cluster(id)
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you Carlo! Here is the old and also updated version:

      Code:
      . xtreg ln_income i.wave##ho_pre age working_hours overtime yeduc ft_empl commute_time isco_group marry, fe cluster(id)
      
      Fixed-effects (within) regression               Number of obs     =     30,006
      Group variable: id                              Number of groups  =      6,632
      
      R-squared:                                      Obs per group:
           Within  = 0.3342                                         min =          1
           Between = 0.4924                                         avg =        4.5
           Overall = 0.4745                                         max =         10
      
                                                      F(29, 6631)       =     153.11
      corr(u_i, Xb) = 0.1885                          Prob > F          =     0.0000
      
                                        (Std. err. adjusted for 6,632 clusters in id)
      -------------------------------------------------------------------------------
                    |               Robust
          ln_income | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      --------------+----------------------------------------------------------------
               wave |
         2 2009/10  |    .030249   .0141047     2.14   0.032     .0025992    .0578988
         3 2010/11  |   .0502897   .0145352     3.46   0.001     .0217959    .0787834
         5 2012/13  |   .0643812   .0258303     2.49   0.013     .0137456    .1150169
         7 2014/15  |   .0841755   .0384626     2.19   0.029     .0087764    .1595745
         8 2015/16  |   .1314256   .0444306     2.96   0.003     .0443273    .2185238
         9 2016/17  |   .1159861   .0505932     2.29   0.022     .0168071    .2151651
        10 2017/18  |   .1417508   .0568954     2.49   0.013     .0302174    .2532841
        11 2018/19  |   .1421079   .0634181     2.24   0.025      .017788    .2664277
        12 2019/20  |   .1835436   .0702016     2.61   0.009     .0459259    .3211613
        13 2020/21  |   .1972685   .0758904     2.60   0.009      .048499     .346038
                    |
           1.ho_pre |  -.1104748   .0189677    -5.82   0.000    -.1476575   -.0732921
                    |
        wave#ho_pre |
       2 2009/10#1  |   .0565599   .0319277     1.77   0.077    -.0060287    .1191485
       3 2010/11#1  |   .0360609    .018341     1.97   0.049     .0001066    .0720152
       5 2012/13#1  |    .051195   .0184046     2.78   0.005      .015116     .087274
       7 2014/15#1  |   .0698457   .0191718     3.64   0.000     .0322629    .1074285
       8 2015/16#1  |   .0816179   .0190157     4.29   0.000      .044341    .1188948
       9 2016/17#1  |   .0866077    .019312     4.48   0.000     .0487499    .1244655
      10 2017/18#1  |   .1009338   .0191027     5.28   0.000     .0634864    .1383811
      11 2018/19#1  |   .1070736   .0199722     5.36   0.000     .0679217    .1462256
      12 2019/20#1  |   .1236747   .0201869     6.13   0.000      .084102    .1632475
      13 2020/21#1  |   .1186338   .0205628     5.77   0.000     .0783241    .1589434
                    |
                age |   .0133326   .0062536     2.13   0.033     .0010736    .0255916
      working_hours |   .0100461   .0004773    21.05   0.000     .0091104    .0109818
           overtime |    .015544   .0056095     2.77   0.006     .0045476    .0265404
              yeduc |    .056992   .0069028     8.26   0.000     .0434603    .0705237
            ft_empl |   .2289353   .0113705    20.13   0.000     .2066454    .2512252
       commute_time |   .0001998   .0000679     2.94   0.003     .0000667     .000333
         isco_group |  -.0145504    .003519    -4.13   0.000    -.0214487   -.0076521
              marry |   .0343692   .0070297     4.89   0.000     .0205888    .0481496
              _cons |   5.504766   .2068468    26.61   0.000      5.09928    5.910253
      --------------+----------------------------------------------------------------
            sigma_u |  .38441371
            sigma_e |  .21556466
                rho |  .76077203   (fraction of variance due to u_i)
      -------------------------------------------------------------------------------
      
      .

      Code:
      . xtreg ln_income i.wave##i.ho_pre c.age##c.age working_hours overtime yeduc ft_empl commute_time isco_group marry, fe cluster(id)
      
      Fixed-effects (within) regression               Number of obs     =     30,006
      Group variable: id                              Number of groups  =      6,632
      
      R-squared:                                      Obs per group:
           Within  = 0.3464                                         min =          1
           Between = 0.5007                                         avg =        4.5
           Overall = 0.4808                                         max =         10
      
                                                      F(30, 6631)       =     161.59
      corr(u_i, Xb) = 0.2459                          Prob > F          =     0.0000
      
                                        (Std. err. adjusted for 6,632 clusters in id)
      -------------------------------------------------------------------------------
                    |               Robust
          ln_income | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      --------------+----------------------------------------------------------------
               wave |
         2 2009/10  |   .0266135   .0138667     1.92   0.055    -.0005697    .0537966
         3 2010/11  |   .0463352   .0144248     3.21   0.001     .0180579    .0746124
         5 2012/13  |   .0633388   .0256791     2.47   0.014     .0129995    .1136781
         7 2014/15  |   .0865242   .0382291     2.26   0.024     .0115828    .1614656
         8 2015/16  |   .1363587   .0441606     3.09   0.002     .0497897    .2229276
         9 2016/17  |    .123267   .0502775     2.45   0.014     .0247069     .221827
        10 2017/18  |   .1513436   .0565406     2.68   0.007     .0405058    .2621815
        11 2018/19  |   .1548333   .0630418     2.46   0.014     .0312512    .2784154
        12 2019/20  |   .2009384   .0697907     2.88   0.004     .0641262    .3377506
        13 2020/21  |   .2190629   .0754593     2.90   0.004     .0711384    .3669874
                    |
           1.ho_pre |  -.1236207   .0188535    -6.56   0.000    -.1605796   -.0866618
                    |
        wave#ho_pre |
       2 2009/10#1  |   .0649743   .0316039     2.06   0.040     .0030204    .1269281
       3 2010/11#1  |   .0396058   .0182565     2.17   0.030     .0038172    .0753944
       5 2012/13#1  |   .0550159   .0183364     3.00   0.003     .0190707    .0909611
       7 2014/15#1  |   .0760793   .0191224     3.98   0.000     .0385932    .1135654
       8 2015/16#1  |   .0888263   .0189229     4.69   0.000     .0517313    .1259213
       9 2016/17#1  |   .0941021    .019202     4.90   0.000     .0564599    .1317443
      10 2017/18#1  |   .1099984    .018975     5.80   0.000     .0728013    .1471955
      11 2018/19#1  |   .1165298   .0198376     5.87   0.000     .0776418    .1554178
      12 2019/20#1  |   .1334823   .0199724     6.68   0.000     .0943299    .1726346
      13 2020/21#1  |   .1295048   .0203273     6.37   0.000     .0896568    .1693527
                    |
                age |   .0685511   .0072001     9.52   0.000     .0544367    .0826655
                    |
        c.age#c.age |  -.0007637   .0000536   -14.26   0.000    -.0008686   -.0006587
                    |
      working_hours |     .01016   .0004802    21.16   0.000     .0092187    .0111013
           overtime |   .0166727     .00557     2.99   0.003     .0057537    .0275917
              yeduc |    .042304   .0067302     6.29   0.000     .0291106    .0554973
            ft_empl |   .2348326   .0113414    20.71   0.000     .2125999    .2570653
       commute_time |   .0001964   .0000653     3.01   0.003     .0000684    .0003244
         isco_group |  -.0126011    .003486    -3.61   0.000    -.0194348   -.0057673
              marry |    .012239   .0071127     1.72   0.085    -.0017042    .0261822
              _cons |   4.743417   .2098892    22.60   0.000     4.331967    5.154868
      --------------+----------------------------------------------------------------
            sigma_u |  .38597758
            sigma_e |  .21359598
                rho |  .76555607   (fraction of variance due to u_i)
      -------------------------------------------------------------------------------
      Kinds regards, Sophie
      (StataNow 18.5)

      Comment


      • #4
        Sophie:
        1) adding -c.age##c.age- shows a turning point (maximum) at -age-:-.0685511/(2*-.0007637)=44.880909.
        If the range of -age- includes 45, the maximum is confirmed.
        2) In addition, this further predictor increase the within R_sq a bit from your previous specification.
        What I would do next:
        a) testing the joint statistical significance of -wave- and -i.wave##i.ho_pre- via -testparm-;
        b) testing the potential misspecification of the functional form of the dependent variable, replicating by hand the -linktest- procedure, as reported in the following toy-example:
        Code:
        . use "https://www.stata-press.com/data/r18/nlswork.dta"
        (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
        
        . xtreg ln_wage c.age##c.age, fe vce(cluster idcode)
        
        Fixed-effects (within) regression               Number of obs     =     28,510
        Group variable: idcode                          Number of groups  =      4,710
        
        R-squared:                                      Obs per group:
             Within  = 0.1087                                         min =          1
             Between = 0.1006                                         avg =        6.1
             Overall = 0.0865                                         max =         15
        
                                                        F(2, 4709)        =     507.42
        corr(u_i, Xb) = 0.0440                          Prob > F          =     0.0000
        
                                     (Std. err. adjusted for 4,710 clusters in idcode)
        ------------------------------------------------------------------------------
                     |               Robust
             ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
                 age |   .0539076    .004307    12.52   0.000     .0454638    .0623515
                     |
         c.age#c.age |  -.0005973    .000072    -8.30   0.000    -.0007384   -.0004562
                     |
               _cons |    .639913   .0624195    10.25   0.000     .5175415    .7622845
        -------------+----------------------------------------------------------------
             sigma_u |   .4039153
             sigma_e |  .30245467
                 rho |  .64073314   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        
        . predict fitted, xb
        
        . g sq_fitted=fitted^2
        
        
        . xtreg ln_wage fitted sq_fitted , fe vce(cluster idcode)
        
        Fixed-effects (within) regression               Number of obs     =     28,510
        Group variable: idcode                          Number of groups  =      4,710
        
        R-squared:                                      Obs per group:
             Within  = 0.1092                                         min =          1
             Between = 0.1033                                         avg =        6.1
             Overall = 0.0881                                         max =         15
        
                                                        F(2, 4709)        =     523.09
        corr(u_i, Xb) = 0.0467                          Prob > F          =     0.0000
        
                                     (Std. err. adjusted for 4,710 clusters in idcode)
        ------------------------------------------------------------------------------
                     |               Robust
             ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
              fitted |   2.569185   .7085064     3.63   0.000     1.180181    3.958189
           sq_fitted |    -.47432   .2153021    -2.20   0.028    -.8964128   -.0522272
               _cons |  -1.290258    .580562    -2.22   0.026    -2.428431   -.1520844
        -------------+----------------------------------------------------------------
             sigma_u |    .403403
             sigma_e |  .30238578
                 rho |  .64025357   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        
        . test sq_fitted
        
         ( 1)  sq_fitted = 0
        
               F(  1,  4709) =    4.85
                    Prob > F =    0.0276
        
        .
        In this case, as expected, the model is misspecified due to a single predictor.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment

        Working...
        X