Hi I have a question about using two panel datasets simultaneously.
My data consists of two cohorts (2005 cohort and 2015 cohort)
The first cohort starts on 2005 and end on 2007.
The second cohort starts on 2015 and end on 2017.
I appended these two panel datasets and the results are below.
where pid is the personal id, which is bigger than 10000 if the person is in 2015 cohort, peducost is the private education cost, and male is the dummy variable equal to one if the person is male.
That is I am using two panel datasets simultaneously (2005 cohort set and 2015 cohort set).
Here, I want to know whether the partial effects of gender on private education cost are different between the two cohorts.
So, I run a regression with an interaction term like below.
where ctrlvar and urbrur mean control variables and urban or rural area variable, respectively.
Here, the problem is that the dummy_2015 variable (that is one if a person is in the 2015 cohort) is omitted.
I think, the dummy_2015 and time dummies cannot be used together because of the multicollinearity.
One solution is that I just use cross-sectional data (For example, combining 2005 and 2015 data).
But, due to my personal reason, I want to use two panel datasets simultaneously.
In this case, how can I test whether the partial effects of gender is different between the two cohorts?
Thank you for your time spent to read this question.
My data consists of two cohorts (2005 cohort and 2015 cohort)
The first cohort starts on 2005 and end on 2007.
The second cohort starts on 2015 and end on 2017.
I appended these two panel datasets and the results are below.
Code:
. list pid year peducost male cohort if 6904 <= pid & pid <= 10005, sep(15) +-----------------------------------------+ | pid year peducost male cohort | |-----------------------------------------| 20710. | 6904 2005 0 1 2005 | 20711. | 6904 2006 0 1 2005 | 20712. | 6904 2007 0 1 2005 | 20713. | 6905 2005 30 1 2005 | 20714. | 6905 2006 50 1 2005 | 20715. | 6905 2007 58 1 2005 | 20716. | 6906 2005 12 1 2005 | 20717. | 6906 2006 27 1 2005 | 20718. | 6906 2007 22 1 2005 | 20719. | 6907 2005 18 1 2005 | 20720. | 6907 2006 27 1 2005 | 20721. | 6907 2007 18 1 2005 | 20722. | 6908 2005 0 1 2005 | 20723. | 6908 2006 75 1 2005 | 20724. | 6908 2007 26 1 2005 | |-----------------------------------------| 20725. | 10001 2015 0 0 2015 | 20726. | 10001 2016 0 0 2015 | 20727. | 10001 2017 0 0 2015 | 20728. | 10002 2015 9 0 2015 | 20729. | 10002 2016 0 0 2015 | 20730. | 10002 2017 0 0 2015 | 20731. | 10003 2015 0 0 2015 | 20732. | 10003 2016 0 0 2015 | 20733. | 10003 2017 34 0 2015 | 20734. | 10004 2015 0 1 2015 | 20735. | 10004 2016 0 1 2015 | 20736. | 10004 2017 0 1 2015 | 20737. | 10005 2015 0 0 2015 | 20738. | 10005 2016 0 0 2015 | 20739. | 10005 2017 0 0 2015 | +-----------------------------------------+
That is I am using two panel datasets simultaneously (2005 cohort set and 2015 cohort set).
Here, I want to know whether the partial effects of gender on private education cost are different between the two cohorts.
So, I run a regression with an interaction term like below.
Code:
. xtset pid year panel variable: pid (unbalanced) time variable: year, 2005 to 2017, but with gaps delta: 1 unit . global ctrlvar "dadage dadagesq momage momagesq i.dadedu i.momedu" . . gen dummy_2015 = (cohort == 2015) . xtreg peducost 1.male#1.dummy_2015 male $ctrlvar i.urbrur b2005.year i.dummy_2015, re vce(cl pid) note: 1.dummy_2015 omitted because of collinearity Random-effects GLS regression Number of obs = 31,735 Group variable: pid Number of groups = 6,836 R-sq: Obs per group: within = 0.1524 min = 1 between = 0.2032 avg = 4.6 overall = 0.1707 max = 6 Wald chi2(18) = 3477.54 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 (Std. Err. adjusted for 6,836 clusters in pid) --------------------------------------------------------------------------------- | Robust peducost | Coef. Std. Err. z P>|z| [95% Conf. Interval] ----------------+---------------------------------------------------------------- male#dummy_2015 | 1 1 | -.720143 .891734 -0.81 0.419 -2.46791 1.027624 | male | 1.362174 .5742116 2.37 0.018 .2367403 2.487609 dadage | 1.108523 .4977059 2.23 0.026 .1330374 2.084009 dadagesq | -.0124439 .0051745 -2.40 0.016 -.0225856 -.0023022 momage | 1.653206 .4077752 4.05 0.000 .8539813 2.452431 momagesq | -.0164883 .0043175 -3.82 0.000 -.0249505 -.0080262 | dadedu | high_school | 2.382997 .8907487 2.68 0.007 .6371611 4.128832 university | 10.57592 .9752616 10.84 0.000 8.664444 12.4874 | momedu | high_school | 3.838404 .8931592 4.30 0.000 2.087844 5.588964 university | 12.82762 1.061476 12.08 0.000 10.74717 14.90807 | urbrur | big_city | -8.737641 .8099345 -10.79 0.000 -10.32508 -7.150199 city | -9.8842 .7329264 -13.49 0.000 -11.32071 -8.447691 rural | -15.79478 .8363283 -18.89 0.000 -17.43395 -14.15561 | year | 2006 | 3.182023 .2959997 10.75 0.000 2.601874 3.762171 2007 | 11.30476 .491985 22.98 0.000 10.34049 12.26903 2015 | 11.79945 .6385644 18.48 0.000 10.54789 13.05101 2016 | 13.24131 .6740927 19.64 0.000 11.92011 14.56251 2017 | 15.41846 .7222268 21.35 0.000 14.00292 16.834 | 1.dummy_2015 | 0 (omitted) _cons | -54.29597 10.39154 -5.23 0.000 -74.66301 -33.92893 ----------------+---------------------------------------------------------------- sigma_u | 12.830406 sigma_e | 23.443425 rho | .23049036 (fraction of variance due to u_i) ---------------------------------------------------------------------------------
Here, the problem is that the dummy_2015 variable (that is one if a person is in the 2015 cohort) is omitted.
I think, the dummy_2015 and time dummies cannot be used together because of the multicollinearity.
One solution is that I just use cross-sectional data (For example, combining 2005 and 2015 data).
But, due to my personal reason, I want to use two panel datasets simultaneously.
In this case, how can I test whether the partial effects of gender is different between the two cohorts?
Thank you for your time spent to read this question.
Comment