Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using two panel datasets simultaneously: time dummies and cohort dummy

    Hi I have a question about using two panel datasets simultaneously.

    My data consists of two cohorts (2005 cohort and 2015 cohort)
    The first cohort starts on 2005 and end on 2007.
    The second cohort starts on 2015 and end on 2017.
    I appended these two panel datasets and the results are below.

    Code:
    . list pid year peducost male cohort if 6904 <= pid & pid <= 10005, sep(15)
    
           +-----------------------------------------+
           |   pid   year   peducost   male   cohort |
           |-----------------------------------------|
    20710. |  6904   2005          0      1     2005 |
    20711. |  6904   2006          0      1     2005 |
    20712. |  6904   2007          0      1     2005 |
    20713. |  6905   2005         30      1     2005 |
    20714. |  6905   2006         50      1     2005 |
    20715. |  6905   2007         58      1     2005 |
    20716. |  6906   2005         12      1     2005 |
    20717. |  6906   2006         27      1     2005 |
    20718. |  6906   2007         22      1     2005 |
    20719. |  6907   2005         18      1     2005 |
    20720. |  6907   2006         27      1     2005 |
    20721. |  6907   2007         18      1     2005 |
    20722. |  6908   2005          0      1     2005 |
    20723. |  6908   2006         75      1     2005 |
    20724. |  6908   2007         26      1     2005 |
           |-----------------------------------------|
    20725. | 10001   2015          0      0     2015 |
    20726. | 10001   2016          0      0     2015 |
    20727. | 10001   2017          0      0     2015 |
    20728. | 10002   2015          9      0     2015 |
    20729. | 10002   2016          0      0     2015 |
    20730. | 10002   2017          0      0     2015 |
    20731. | 10003   2015          0      0     2015 |
    20732. | 10003   2016          0      0     2015 |
    20733. | 10003   2017         34      0     2015 |
    20734. | 10004   2015          0      1     2015 |
    20735. | 10004   2016          0      1     2015 |
    20736. | 10004   2017          0      1     2015 |
    20737. | 10005   2015          0      0     2015 |
    20738. | 10005   2016          0      0     2015 |
    20739. | 10005   2017          0      0     2015 |
           +-----------------------------------------+
    where pid is the personal id, which is bigger than 10000 if the person is in 2015 cohort, peducost is the private education cost, and male is the dummy variable equal to one if the person is male.
    That is I am using two panel datasets simultaneously (2005 cohort set and 2015 cohort set).

    Here, I want to know whether the partial effects of gender on private education cost are different between the two cohorts.
    So, I run a regression with an interaction term like below.

    Code:
    . xtset pid year
           panel variable:  pid (unbalanced)
            time variable:  year, 2005 to 2017, but with gaps
                    delta:  1 unit
    
    . global ctrlvar "dadage dadagesq momage momagesq i.dadedu i.momedu"
    
    . 
    . gen dummy_2015 = (cohort == 2015)
    
    . xtreg peducost 1.male#1.dummy_2015 male $ctrlvar i.urbrur b2005.year i.dummy_2015, re vce(cl pid)
    note: 1.dummy_2015 omitted because of collinearity
    
    Random-effects GLS regression                   Number of obs     =     31,735
    Group variable: pid                             Number of groups  =      6,836
    
    R-sq:                                           Obs per group:
         within  = 0.1524                                         min =          1
         between = 0.2032                                         avg =        4.6
         overall = 0.1707                                         max =          6
    
                                                    Wald chi2(18)     =    3477.54
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
    
                                       (Std. Err. adjusted for 6,836 clusters in pid)
    ---------------------------------------------------------------------------------
                    |               Robust
           peducost |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ----------------+----------------------------------------------------------------
    male#dummy_2015 |
               1 1  |   -.720143    .891734    -0.81   0.419     -2.46791    1.027624
                    |
               male |   1.362174   .5742116     2.37   0.018     .2367403    2.487609
             dadage |   1.108523   .4977059     2.23   0.026     .1330374    2.084009
           dadagesq |  -.0124439   .0051745    -2.40   0.016    -.0225856   -.0023022
             momage |   1.653206   .4077752     4.05   0.000     .8539813    2.452431
           momagesq |  -.0164883   .0043175    -3.82   0.000    -.0249505   -.0080262
                    |
             dadedu |
       high_school  |   2.382997   .8907487     2.68   0.007     .6371611    4.128832
        university  |   10.57592   .9752616    10.84   0.000     8.664444     12.4874
                    |
             momedu |
       high_school  |   3.838404   .8931592     4.30   0.000     2.087844    5.588964
        university  |   12.82762   1.061476    12.08   0.000     10.74717    14.90807
                    |
             urbrur |
          big_city  |  -8.737641   .8099345   -10.79   0.000    -10.32508   -7.150199
              city  |    -9.8842   .7329264   -13.49   0.000    -11.32071   -8.447691
             rural  |  -15.79478   .8363283   -18.89   0.000    -17.43395   -14.15561
                    |
               year |
              2006  |   3.182023   .2959997    10.75   0.000     2.601874    3.762171
              2007  |   11.30476    .491985    22.98   0.000     10.34049    12.26903
              2015  |   11.79945   .6385644    18.48   0.000     10.54789    13.05101
              2016  |   13.24131   .6740927    19.64   0.000     11.92011    14.56251
              2017  |   15.41846   .7222268    21.35   0.000     14.00292      16.834
                    |
       1.dummy_2015 |          0  (omitted)
              _cons |  -54.29597   10.39154    -5.23   0.000    -74.66301   -33.92893
    ----------------+----------------------------------------------------------------
            sigma_u |  12.830406
            sigma_e |  23.443425
                rho |  .23049036   (fraction of variance due to u_i)
    ---------------------------------------------------------------------------------
    where ctrlvar and urbrur mean control variables and urban or rural area variable, respectively.

    Here, the problem is that the dummy_2015 variable (that is one if a person is in the 2015 cohort) is omitted.
    I think, the dummy_2015 and time dummies cannot be used together because of the multicollinearity.

    One solution is that I just use cross-sectional data (For example, combining 2005 and 2015 data).
    But, due to my personal reason, I want to use two panel datasets simultaneously.

    In this case, how can I test whether the partial effects of gender is different between the two cohorts?

    Thank you for your time spent to read this question.

  • #2
    In this case, how can I test whether the partial effects of gender is different between the two cohorts?
    You have already done it. The analysis show is fine. The omission of dummy_2015 due to colinearity with the year variables is not a problem. The information that would be carried by dummy_2015 is still there in the model: it is just spread out over the year indicators instead. But it's there, and that's all that matters. The estimate of the difference between the two cohorts of the partial effect of gender is given by the coefficient of male#dummy_2015, and it is exactly the same as it would have been if dummy_2015 had not been colinear with the year variables.

    Comment


    • #3
      Clyde Schechter I really appreciate for your detailed answer. The omission problem has been bullying me for a long time. But, I totally understood that is not a problem in fact. Thank you for your contribution again!

      Comment

      Working...
      X