Hi everyone,
I’m analyzing panel data (14 waves from 2008/09 until 2021/22) and want to ask for feedback on my approach.
RQ: Did remote experience before COVID lead to better income development during and after the pandemic? (ho-experienced vs. non-ho-experienced // comparing two groups and their income development over time)
I’m using a Fixed Effects (FE) model to control for unobserved heterogeneity:
xtreg ln_income i.wave##ho_pre age working_hours overtime yeduc ft_empl commute_time isco_group marry, fe cluster(id)
Fixed-effects (within) regression Number of obs = 30,006
Group variable: id Number of groups = 6,632
R-squared: Obs per group:
Within = 0.3342 min = 1
Between = 0.4924 avg = 4.5
Overall = 0.4745 max = 10
F(29, 6631) = 153.11
corr(u_i, Xb) = 0.1885 Prob > F = 0.0000
(Std. err. adjusted for 6,632 clusters in id)
-------------------------------------------------------------------------------
| Robust
ln_income | Coefficient std. err. t P>|t| [95% conf. interval]
--------------+----------------------------------------------------------------
wave |
2 2009/10 | .030249 .0141047 2.14 0.032 .0025992 .0578988
3 2010/11 | .0502897 .0145352 3.46 0.001 .0217959 .0787834
5 2012/13 | .0643812 .0258303 2.49 0.013 .0137456 .1150169
7 2014/15 | .0841755 .0384626 2.19 0.029 .0087764 .1595745
8 2015/16 | .1314256 .0444306 2.96 0.003 .0443273 .2185238
9 2016/17 | .1159861 .0505932 2.29 0.022 .0168071 .2151651
10 2017/18 | .1417508 .0568954 2.49 0.013 .0302174 .2532841
11 2018/19 | .1421079 .0634181 2.24 0.025 .017788 .2664277
12 2019/20 | .1835436 .0702016 2.61 0.009 .0459259 .3211613
13 2020/21 | .1972685 .0758904 2.60 0.009 .048499 .346038
|
1.ho_pre | -.1104748 .0189677 -5.82 0.000 -.1476575 -.0732921
|
wave#ho_pre |
2 2009/10#1 | .0565599 .0319277 1.77 0.077 -.0060287 .1191485
3 2010/11#1 | .0360609 .018341 1.97 0.049 .0001066 .0720152
5 2012/13#1 | .051195 .0184046 2.78 0.005 .015116 .087274
7 2014/15#1 | .0698457 .0191718 3.64 0.000 .0322629 .1074285
8 2015/16#1 | .0816179 .0190157 4.29 0.000 .044341 .1188948
9 2016/17#1 | .0866077 .019312 4.48 0.000 .0487499 .1244655
10 2017/18#1 | .1009338 .0191027 5.28 0.000 .0634864 .1383811
11 2018/19#1 | .1070736 .0199722 5.36 0.000 .0679217 .1462256
12 2019/20#1 | .1236747 .0201869 6.13 0.000 .084102 .1632475
13 2020/21#1 | .1186338 .0205628 5.77 0.000 .0783241 .1589434
|
age | .0133326 .0062536 2.13 0.033 .0010736 .0255916
working_hours | .0100461 .0004773 21.05 0.000 .0091104 .0109818
overtime | .015544 .0056095 2.77 0.006 .0045476 .0265404
yeduc | .056992 .0069028 8.26 0.000 .0434603 .0705237
ft_empl | .2289353 .0113705 20.13 0.000 .2066454 .2512252
commute_time | .0001998 .0000679 2.94 0.003 .0000667 .000333
marry | .0343692 .0070297 4.89 0.000 .0205888 .0481496
isco_group | -.0145504 .003519 -4.13 0.000 -.0214487 -.0076521
_cons | 5.504766 .2068468 26.61 0.000 5.09928 5.910253
--------------+----------------------------------------------------------------
sigma_u | .38441371
sigma_e | .21556466
rho | .76077203 (fraction of variance due to u_i)
-------------------------------------------------------------------------------
Is this the best way to model the effect of pre-COVID home office experience on income trends? Any other recommendations how to better model the formula? I am not 100% sure as individuals changed their working mode over time, meaning more and more people started to work remotely over the period of 14 years - does this lead to a bias? some start sooner, some later and some not at all (ho_pre=0). How can I control for that? Should i make categories?
I’d really appreciate any insights on how to improve the approach.
I’m analyzing panel data (14 waves from 2008/09 until 2021/22) and want to ask for feedback on my approach.
RQ: Did remote experience before COVID lead to better income development during and after the pandemic? (ho-experienced vs. non-ho-experienced // comparing two groups and their income development over time)
I’m using a Fixed Effects (FE) model to control for unobserved heterogeneity:
xtreg ln_income i.wave##ho_pre age working_hours overtime yeduc ft_empl commute_time isco_group marry, fe cluster(id)
- ln_income = Log income (DV)
- wave = Panel wave (time variable)
- ho_pre = 1 if individual used homeoffice at least once, 0 otherwise (no homeoffice at all)
- i.wave##ho_pre = Interaction term to track income differences over time based on home office experience
- Controls: Age, working hours, overtime, education (yeduc), full-time employment (ft_empl), etc.
Fixed-effects (within) regression Number of obs = 30,006
Group variable: id Number of groups = 6,632
R-squared: Obs per group:
Within = 0.3342 min = 1
Between = 0.4924 avg = 4.5
Overall = 0.4745 max = 10
F(29, 6631) = 153.11
corr(u_i, Xb) = 0.1885 Prob > F = 0.0000
(Std. err. adjusted for 6,632 clusters in id)
-------------------------------------------------------------------------------
| Robust
ln_income | Coefficient std. err. t P>|t| [95% conf. interval]
--------------+----------------------------------------------------------------
wave |
2 2009/10 | .030249 .0141047 2.14 0.032 .0025992 .0578988
3 2010/11 | .0502897 .0145352 3.46 0.001 .0217959 .0787834
5 2012/13 | .0643812 .0258303 2.49 0.013 .0137456 .1150169
7 2014/15 | .0841755 .0384626 2.19 0.029 .0087764 .1595745
8 2015/16 | .1314256 .0444306 2.96 0.003 .0443273 .2185238
9 2016/17 | .1159861 .0505932 2.29 0.022 .0168071 .2151651
10 2017/18 | .1417508 .0568954 2.49 0.013 .0302174 .2532841
11 2018/19 | .1421079 .0634181 2.24 0.025 .017788 .2664277
12 2019/20 | .1835436 .0702016 2.61 0.009 .0459259 .3211613
13 2020/21 | .1972685 .0758904 2.60 0.009 .048499 .346038
|
1.ho_pre | -.1104748 .0189677 -5.82 0.000 -.1476575 -.0732921
|
wave#ho_pre |
2 2009/10#1 | .0565599 .0319277 1.77 0.077 -.0060287 .1191485
3 2010/11#1 | .0360609 .018341 1.97 0.049 .0001066 .0720152
5 2012/13#1 | .051195 .0184046 2.78 0.005 .015116 .087274
7 2014/15#1 | .0698457 .0191718 3.64 0.000 .0322629 .1074285
8 2015/16#1 | .0816179 .0190157 4.29 0.000 .044341 .1188948
9 2016/17#1 | .0866077 .019312 4.48 0.000 .0487499 .1244655
10 2017/18#1 | .1009338 .0191027 5.28 0.000 .0634864 .1383811
11 2018/19#1 | .1070736 .0199722 5.36 0.000 .0679217 .1462256
12 2019/20#1 | .1236747 .0201869 6.13 0.000 .084102 .1632475
13 2020/21#1 | .1186338 .0205628 5.77 0.000 .0783241 .1589434
|
age | .0133326 .0062536 2.13 0.033 .0010736 .0255916
working_hours | .0100461 .0004773 21.05 0.000 .0091104 .0109818
overtime | .015544 .0056095 2.77 0.006 .0045476 .0265404
yeduc | .056992 .0069028 8.26 0.000 .0434603 .0705237
ft_empl | .2289353 .0113705 20.13 0.000 .2066454 .2512252
commute_time | .0001998 .0000679 2.94 0.003 .0000667 .000333
marry | .0343692 .0070297 4.89 0.000 .0205888 .0481496
isco_group | -.0145504 .003519 -4.13 0.000 -.0214487 -.0076521
_cons | 5.504766 .2068468 26.61 0.000 5.09928 5.910253
--------------+----------------------------------------------------------------
sigma_u | .38441371
sigma_e | .21556466
rho | .76077203 (fraction of variance due to u_i)
-------------------------------------------------------------------------------
Is this the best way to model the effect of pre-COVID home office experience on income trends? Any other recommendations how to better model the formula? I am not 100% sure as individuals changed their working mode over time, meaning more and more people started to work remotely over the period of 14 years - does this lead to a bias? some start sooner, some later and some not at all (ho_pre=0). How can I control for that? Should i make categories?
I’d really appreciate any insights on how to improve the approach.
Comment