Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • clustered standard errors or just robust? panel data

    Hi there,

    I have to use an event study methodology to estimate the impact of the event (child birth) on income for men and women separately. this is the regression. the reference category is event time -1

    Code:
    regress income ib2.eventtime i.age i.year if sex==1, robust cluster(id)
    regress income ib2.eventtime i.age i.year if sex==2, robust cluster(id)

    I have 316 people in my sample, with variable -id- indicating the person. even time goes from -2 to +5 (in years).

    The paper that I'm following just uses robust standard errors and not clustered standard errors. However, they have population based data.

    Does it matter that I have a sample for the standard errors? And if that would matter, is there a way to know / test whether I need clustered standard errors?


    Thank you I've always struggled with SE, especially considering panel data.

  • #2
    Sandra:
    some comments about your post:
    1) it's rare that splitting the regression model according to a categorical variable, such as gender, makes sense. I would stick with one regression model only using gender as a categorical predictor;
    2) if you have repeated measures on the same panels along years, you should use the -cluster()- standard errors (-robust cluster()- is just redundant;
    3) if you have panel data with a continuous regressand, you should start of from -xtreg-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Carlo Lazzaro
      Thank you for taking the time to help me

      1) The article that I follow said that they separated two regressions, 1 for men and 1 for women. That's why I'm doing the same. If I would do what you're suggesting, should it look like the first or the second?

      Code:
      regress income ib2.eventtime i.age i.year i.female, cluster(id)
      
      regress income ib2.eventtime i.age i.year i.female ib2.eventtime#i.female i.age#i.female i.year#i.female, cluster(id)
      ​​​​​​​
      2) Do you mean for example observations in every year for all individuals? My panel data is unbalanced, would this change anything? All individuals in the sample are observed at least 5 times over the years.

      3) Again, only following the do-files of the paper. They have the first regressions. I did just realise that I don't see them using -xtset- in their do-file. Could that be why they don't use xtreg? In the paper they do state that they have a balanced panel data set.

      I'm sorry for all these questions. I've been trying to figure it out for about 2 weeks now, so hopefully I will get it now

      Comment


      • #4
        Sandra:
        1) I would change your code a bit (since some interactions can be coded more efficiently and other do not make much sense: for instance, -age- should be plugged in as continuous, with both linear and squared terms to investigate possible turning points):
        [CODE]regress income ib2.eventtime##i.female c.age##c.age i.year, cluster(id)
        2) Yes, this is the main meaning of panel data. That said, do not worry about the unbalancedness of your dataset, as Stata can handle both balanced and unbalanced panel datasets with no problem.
        3) It may well be that Authors decided not to go -xtreg,fe- (this choice is however questionable, as -xtreg- is developed for panel data regression) and went -regress- with a categorical indicator for -panelid- (as you can see in the following toy-example, the results are basically the same for -c.age##c.age).
        Code:
        . use "https://www.stata-press.com/data/r16/nlswork.dta"
        (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
        
        . xtreg ln_wage c.age##c.age if idcode<=3, fe
        
        Fixed-effects (within) regression               Number of obs     =         39
        Group variable: idcode                          Number of groups  =          3
        
        R-sq:                                           Obs per group:
             within  = 0.6382                                         min =         12
             between = 0.8744                                         avg =       13.0
             overall = 0.2765                                         max =         15
        
                                                        F(2,34)           =      29.99
        corr(u_i, Xb)  = -0.2473                        Prob > F          =     0.0000
        
        ------------------------------------------------------------------------------
             ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 age |   .2512762   .0450106     5.58   0.000     .1598037    .3427487
                     |
         c.age#c.age |  -.0037603   .0007625    -4.93   0.000    -.0053098   -.0022107
                     |
               _cons |  -2.189815   .6402959    -3.42   0.002    -3.491053   -.8885773
        -------------+----------------------------------------------------------------
             sigma_u |  .31366066
             sigma_e |  .19867104
                 rho |  .71367959   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        F test that all u_i=0: F(2, 34) = 29.72                      Prob > F = 0.0000
        
        . regress ln_wage c.age##c.age i.idcode if idcode<=3
        
              Source |       SS           df       MS      Number of obs   =        39
        -------------+----------------------------------   F(4, 34)        =     24.28
               Model |  3.83375281         4  .958438203   Prob > F        =    0.0000
            Residual |  1.34198615        34  .039470181   R-squared       =    0.7407
        -------------+----------------------------------   Adj R-squared   =    0.7102
               Total |  5.17573896        38  .136203657   Root MSE        =    .19867
        
        ------------------------------------------------------------------------------
             ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 age |   .2512762   .0450106     5.58   0.000     .1598037    .3427487
                     |
         c.age#c.age |  -.0037603   .0007625    -4.93   0.000    -.0053098   -.0022107
                     |
              idcode |
                  2  |  -.4231615   .0816747    -5.18   0.000    -.5891444   -.2571786
                  3  |  -.6126416   .0809386    -7.57   0.000    -.7771285   -.4481546
                     |
               _cons |   -1.82398   .6366167    -2.87   0.007    -3.117741   -.5302195
        ------------------------------------------------------------------------------
        
        .
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Carlo Lazzaro

          I think I finally get it! 😃 You've helped me so much, thank you

          Comment


          • #6
            Sandra:
            that's what listers are for!
            I wish you and your research all the best.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment

            Working...
            X