Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Testing difference in means across groups in panel data

    Hello all,

    Can you help me how to test difference in means across groups in panel data? After reading online, I found two options: (1) running mixed effect regression and then test the parameter; (2) collapse the data (to remove the panel) and then do one way anova test. Here is the data sample and result so far:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str24 countryname int year float value byte group long ID
    "Albania"                  2018 9.621069 2  1
    "Albania"                  2019 9.562582 2  1
    "Albania"                  2020 9.668357 2  1
    "Albania"                  2021 9.583389 2  1
    "Albania"                  2022 9.226106 2  1
    "Algeria"                  2018 1.134864 3  2
    "Algeria"                  2019 1.039727 3  2
    "Algeria"                  2020 1.166164 3  2
    "Algeria"                  2021 1.096307 3  2
    "Algeria"                  2022 .8507636 3  2
    "Argentina"                2018 .0995468 2  3
    "Argentina"                2019 .1253808 2  3
    "Argentina"                2020 .1680119 2  3
    "Argentina"                2021 .1846933 2  3
    "Argentina"                2022 .2008211 2  3
    "Armenia"                  2018  11.9427 2  4
    "Armenia"                  2019 11.21892 2  4
    "Armenia"                  2020 10.49706 2  4
    "Armenia"                  2021 11.21718 2  4
    "Armenia"                  2022 10.42774 2  4
    "Azerbaijan"               2018 2.601838 2  5
    "Azerbaijan"               2019 2.646993 2  5
    "Azerbaijan"               2020 3.286457 2  5
    "Azerbaijan"               2021 2.784541 2  5
    "Azerbaijan"               2022 5.017754 2  5
    "Bahamas, The"             2018        0 1  6
    "Bahamas, The"             2019        0 1  6
    "Bahamas, The"             2020 .5208385 1  6
    "Bahamas, The"             2021 .4601183 1  6
    "Bahamas, The"             2022 .4488347 1  6
    "Barbados"                 2018 1.666766 1  7
    "Barbados"                 2019 1.594726 1  7
    "Barbados"                 2020 1.797415 1  7
    "Barbados"                 2021 1.730127 1  7
    "Barbados"                 2022 1.494327 1  7
    "Burundi"                  2018 1.811973 2  8
    "Burundi"                  2019 1.875734 2  8
    "Burundi"                  2020 1.823942 2  8
    "Burundi"                  2021 1.741071 2  8
    "Burundi"                  2022 1.447519 2  8
    "Central African Republic" 2018        0 1  9
    "Central African Republic" 2019        0 1  9
    "Central African Republic" 2020        0 1  9
    "Central African Republic" 2021        0 1  9
    "Central African Republic" 2022        0 1  9
    "China"                    2018 .1749246 3 10
    "China"                    2019 .1281117 3 10
    "China"                    2020  .128517 3 10
    "China"                    2021 .1261466 3 10
    "China"                    2022 .1453294 3 10
    "Comoros"                  2018 14.52057 1 11
    "Comoros"                  2019 14.10411 1 11
    "Comoros"                  2020 18.50192 1 11
    "Comoros"                  2021 22.21997 1 11
    "Comoros"                  2022 22.68288 1 11
    "Egypt, Arab Rep"          2018 9.716986 1 12
    "Egypt, Arab Rep"          2019 8.403885 1 12
    "Egypt, Arab Rep"          2020 7.712747 1 12
    "Egypt, Arab Rep"          2021 7.414432 1 12
    "Egypt, Arab Rep"          2022 5.942934 1 12
    "Gabon"                    2018 .1094308 3 13
    "Gabon"                    2019 .1093848 3 13
    "Gabon"                    2020  .120526 3 13
    "Gabon"                    2021 .0912953 3 13
    "Gabon"                    2022 .0875962 3 13
    "Mauritania"               2018 .8080745 3 14
    "Mauritania"               2019 .8164766 3 14
    "Mauritania"               2020 2.041765 3 14
    "Mauritania"               2021  .142859 3 14
    "Mauritania"               2022 1.117382 3 14
    "Seychelles"               2018 1.430288 3 15
    "Seychelles"               2019 1.437503 3 15
    "Seychelles"               2020 .8432241 3 15
    "Seychelles"               2021 .7392818 3 15
    "Seychelles"               2022  .632084 3 15
    end
    label values ID ID
    label def ID 1 "Albania", modify
    label def ID 2 "Algeria", modify
    label def ID 3 "Argentina", modify
    label def ID 4 "Armenia", modify
    label def ID 5 "Azerbaijan", modify
    label def ID 6 "Bahamas, The", modify
    label def ID 7 "Barbados", modify
    label def ID 8 "Burundi", modify
    label def ID 9 "Central African Republic", modify
    label def ID 10 "China", modify
    label def ID 11 "Comoros", modify
    label def ID 12 "Egypt, Arab Rep", modify
    label def ID 13 "Gabon", modify
    label def ID 14 "Mauritania", modify
    label def ID 15 "Seychelles", modify

    This is the result by using mixed effect:

    Code:
    mixed value i.group || ID: // Mixed effect model
    
    Performing EM optimization ...
    
    Performing gradient-based optimization: 
    Iteration 0:  Log likelihood = -151.58532  
    Iteration 1:  Log likelihood = -151.58532  
    
    Computing standard errors ...
    
    Mixed-effects ML regression                          Number of obs    =     75
    Group variable: ID                                   Number of groups =     15
                                                         Obs per group:
                                                                      min =      5
                                                                      avg =    5.0
                                                                      max =      5
                                                         Wald chi2(2)     =   3.33
    Log likelihood = -151.58532                          Prob > chi2      = 0.1894
    
    ------------------------------------------------------------------------------
           value | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           group |
              2  |  -.4860883   3.008294    -0.16   0.872    -6.382236    5.410059
              3  |  -4.976622   3.008294    -1.65   0.098    -10.87277     .919525
                 |
           _cons |   5.637344   2.127185     2.65   0.008     1.468138    9.806549
    ------------------------------------------------------------------------------
    
    ------------------------------------------------------------------------------
      Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
    -----------------------------+------------------------------------------------
    ID: Identity                 |
                      var(_cons) |   22.34822   8.261486      10.82876    46.12192
    -----------------------------+------------------------------------------------
                   var(Residual) |   1.381789    .252279      .9661273    1.976283
    ------------------------------------------------------------------------------
    LR test vs. linear model: chibar2(01) = 147.18        Prob >= chibar2 = 0.0000
    
    . testparm i.group
    
     ( 1)  [value]2.group = 0
     ( 2)  [value]3.group = 0
    
               chi2(  2) =    3.33
             Prob > chi2 =    0.1894

    And this is by using oneway Anova:
    Code:
    collapse (mean) value, by(countryname group)
    
    . oneway value group, tabulate
    
                |       Summary of (mean) value
          group |        Mean   Std. dev.       Freq.
    ------------+------------------------------------
              1 |   5.6373434   7.8125257           5
              2 |   5.1512553   4.8542716           5
              3 |   .66072102   .49250825           5
    ------------+------------------------------------
          Total |   3.8164399   5.4422174          15
    
                            Analysis of variance
        Source              SS         df      MS            F     Prob > F
    ------------------------------------------------------------------------
    Between groups      75.2799162      2   37.6399581      1.33     0.3006
     Within groups      339.368301     12   28.2806917
    ------------------------------------------------------------------------
        Total           414.648217     14   29.6177298
    
    Bartlett's equal-variances test: chi2(2) =  15.0187    Prob>chi2 = 0.001
    
    . pwmean value, over(group) mcompare(tukey) effects
    
    Pairwise comparisons of means with equal variances
    
    Over: group
    
    ---------------------------
                 |    Number of
                 |  comparisons
    -------------+-------------
           group |            3
    ---------------------------
    
    ------------------------------------------------------------------------------
                 |                              Tukey                Tukey
           value |   Contrast   Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           group |
         2 vs 1  |   -.486088   3.363373    -0.14   0.989    -9.459108    8.486932
         3 vs 1  |  -4.976622   3.363373    -1.48   0.334    -13.94964    3.996398
         3 vs 2  |  -4.490534   3.363373    -1.34   0.404    -13.46355    4.482486
    ------------------------------------------------------------------------------

    My goal is to compare group 1 vs 3, so it seems the Anova method answered this by testing 3 vs 1 (p-value 0.334). Or should I just simply look at the coefficient and significance of
    Code:
     [value]3.group
    (p-value 0.098) in the mixed effect regression above? Thank you for your help.


    Best,

    Abdan





  • #2
    Abdan:
    do you mean something along the following lines?
    Code:
    . xtset ID year
    . xtreg value c.year##c.year i.group , re
    note: c.year#c.year omitted because of collinearity.
    
    Random-effects GLS regression                   Number of obs     =         75
    Group variable: ID                              Number of groups  =         15
    
    R-squared:                                      Obs per group:
         Within  = 0.0172                                         min =          5
         Between = 0.1816                                         avg =        5.0
         Overall = 0.1752                                         max =          5
    
                                                    Wald chi2(3)      =       3.70
    corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.2963
    
    -------------------------------------------------------------------------------
            value | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    --------------+----------------------------------------------------------------
             year |   .0975597   .0959518     1.02   0.309    -.0905023    .2856218
                  |
    c.year#c.year |          0  (omitted)
                  |
            group |
               2  |  -.4860883   3.363373    -0.14   0.885    -7.078178    6.106001
               3  |  -4.976622   3.363373    -1.48   0.139    -11.56871    1.615467
                  |
            _cons |  -191.4333   193.8372    -0.99   0.323    -571.3472    188.4806
    --------------+----------------------------------------------------------------
          sigma_u |  5.2919269
          sigma_e |  1.1751645
              rho |  .95300364   (fraction of variance due to u_i)
    -------------------------------------------------------------------------------
    
    . predict fitted, xb
    
    . ttest fitted if group!=2, by( group) unequal
    
    Two-sample t test with unequal variances
    ------------------------------------------------------------------------------
       Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
    ---------+--------------------------------------------------------------------
           1 |      25    5.637344    .0281631    .1408153    5.579218    5.695469
           3 |      25     .660721    .0281631    .1408153    .6025953    .7188468
    ---------+--------------------------------------------------------------------
    Combined |      50    3.149032     .356019    2.517435    2.433585    3.864479
    ---------+--------------------------------------------------------------------
        diff |            4.976622    .0398286                4.896542    5.056703
    ------------------------------------------------------------------------------
        diff = mean(1) - mean(3)                                      t = 124.9510
    H0: diff = 0                     Satterthwaite's degrees of freedom =       48
    
        Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
     Pr(T < t) = 1.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 0.0000
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you very much, Carlo. Yes, I think this is what I need. Do you have any references saying that this method (i.e. estimating FE/RE model, then obtain the predicted value) is the way to test difference in means across groups in panel data? I want to cite it. Thanks!

      Comment


      • #4
        Abdan:
        not that I know.
        All in all, this is routinary postestimation exercise.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Sorry Carlo, just a quick question. Why do you use c.year##c.year instead of i.year in #2? I am testing the means difference for several variables. Some of the results are very different if I use c.year##c.year instead of i.year. Thank you.

          Best regards,

          Abdan

          Comment


          • #6
            Abdan:
            the toy-example included an interaction of -year-, considered as a continuous variable, with itself, as you highlighted. The aim of a lnear plus a square term is to investigate the potential existence of a non-linear relationship between this regressor and the dependent variable.
            That said, you can run your code without interaction, following the very same steps.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Code:
              reghdfe value i.group , absorb(year)
              margins, over(group) post
              test 1.group = 2.group
              test 1.group = 3.group
              test 2.group = 3.group

              Comment


              • #8
                Abdan:
                making #6 quantitative:
                Code:
                . xtset ID year
                
                Panel variable: ID (strongly balanced)
                 Time variable: year, 2018 to 2022
                         Delta: 1 unit
                
                . xtreg value year i.group , re
                
                Random-effects GLS regression                   Number of obs     =         75
                Group variable: ID                              Number of groups  =         15
                
                R-squared:                                      Obs per group:
                     Within  = 0.0172                                         min =          5
                     Between = 0.1816                                         avg =        5.0
                     Overall = 0.1752                                         max =          5
                
                                                                Wald chi2(3)      =       3.70
                corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.2963
                
                ------------------------------------------------------------------------------
                       value | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                -------------+----------------------------------------------------------------
                        year |   .0975597   .0959518     1.02   0.309    -.0905023    .2856218
                             |
                       group |
                          2  |  -.4860883   3.363373    -0.14   0.885    -7.078178    6.106001
                          3  |  -4.976622   3.363373    -1.48   0.139    -11.56871    1.615467
                             |
                       _cons |  -191.4333   193.8372    -0.99   0.323    -571.3472    188.4806
                -------------+----------------------------------------------------------------
                     sigma_u |  5.2919269
                     sigma_e |  1.1751645
                         rho |  .95300364   (fraction of variance due to u_i)
                ------------------------------------------------------------------------------
                
                . predict fitted, xb
                
                . ttest fitted if group!=2, by( group) unequal
                
                Two-sample t test with unequal variances
                ------------------------------------------------------------------------------
                   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
                ---------+--------------------------------------------------------------------
                       1 |      25    5.637344    .0281631    .1408153    5.579218    5.695469
                       3 |      25     .660721    .0281631    .1408153    .6025953    .7188468
                ---------+--------------------------------------------------------------------
                Combined |      50    3.149032     .356019    2.517435    2.433585    3.864479
                ---------+--------------------------------------------------------------------
                    diff |            4.976622    .0398286                4.896542    5.056703
                ------------------------------------------------------------------------------
                    diff = mean(1) - mean(3)                                      t = 124.9510
                H0: diff = 0                     Satterthwaite's degrees of freedom =       48
                
                    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
                 Pr(T < t) = 1.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 0.0000
                
                .
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Hello Carlo, George, and all,

                  May I get your advice again please? (1) What are the difference between xtreg and reghdfe command below? Is the reghdfe also estimating at random effect model here? (2) Should I cluster the standard error within panel ID? If I don't cluster my standard error at panel ID, both xtreg and reghdfe command show there is statistically significant difference between group 1 and 3. However if I cluster the SE, xtreg shows the difference is statistically significant, but reghdfe doesn't say so. The following is the output:

                  A. Without clustering at panel ID
                  A.1. Using xtreg re
                  Code:
                  xtreg value i.year i.group , re vce(robust)
                  predict fitted1, xb
                  ttest fitted1 if group!=2, by(group) unequal
                  
                  
                  Two-sample t test with unequal variances
                  ------------------------------------------------------------------------------
                     Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
                  ---------+--------------------------------------------------------------------
                         1 |      25    5.637343    .0346903    .1734515    5.565746    5.708941
                         3 |      25     .660721    .0346903    .1734515    .5891238    .7323183
                  ---------+--------------------------------------------------------------------
                  Combined |      50    3.149032    .3563011     2.51943    2.433018    3.865046
                  ---------+--------------------------------------------------------------------
                      diff |            4.976622    .0490595                4.877982    5.075263
                  ------------------------------------------------------------------------------
                      diff = mean(1) - mean(3)                                      t = 101.4406
                  H0: diff = 0                     Satterthwaite's degrees of freedom =       48
                  
                      Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
                   Pr(T < t) = 1.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 0.0000

                  A.2. Using reghdfe
                  Code:
                  reghdfe value i.group , absorb(year) vce(robust)
                  margins, over(group) post
                  test 1.group = 3.group
                  
                  
                   reghdfe value i.group , absorb(year) vce(robust)
                  (MWFE estimator converged in 1 iterations)
                  
                  HDFE Linear regression                            Number of obs   =         75
                  Absorbing 1 HDFE group                            F(   2,     68) =      16.97
                                                                    Prob > F        =     0.0000
                                                                    R-squared       =     0.1756
                                                                    Adj R-squared   =     0.1028
                                                                    Within R-sq.    =     0.1747
                                                                    Root MSE        =     5.1128
                  
                  ------------------------------------------------------------------------------
                               |               Robust
                         value | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                  -------------+----------------------------------------------------------------
                         group |
                            2  |  -.4860883   1.766792    -0.28   0.784    -4.011666    3.039489
                            3  |  -4.976622   1.514545    -3.29   0.002    -7.998849   -1.954396
                               |
                         _cons |   5.637344   1.509466     3.73   0.000     2.625252    8.649435
                  ------------------------------------------------------------------------------
                  
                  
                  
                   margins, over(group) post
                  
                  Predictive margins                                          Number of obs = 75
                  Model VCE: Robust
                  
                  Expression: Linear prediction, predict()
                  Over:       group
                  
                  ------------------------------------------------------------------------------
                               |            Delta-method
                               |     Margin   std. err.      z    P>|z|     [95% conf. interval]
                  -------------+----------------------------------------------------------------
                         group |
                            1  |   5.637344   1.509466     3.73   0.000     2.678845    8.595842
                            2  |   5.151255    .918186     5.61   0.000     3.351644    6.950867
                            3  |    .660721    .123929     5.33   0.000     .4178246    .9036175
                  ------------------------------------------------------------------------------
                  
                  . test 1.group = 3.group
                  
                   ( 1)  1bn.group - 3.group = 0
                  
                             chi2(  1) =   10.80
                           Prob > chi2 =    0.0010

                  B. With clustering at panel ID
                  B.1. Using xtreg re
                  Code:
                  xtset ID year
                  xtreg value i.year i.group , re vce(cluster ID)
                  predict fitted2, xb
                  ttest fitted2 if group!=2, by(group) unequal
                  
                  
                  Two-sample t test with unequal variances
                  ------------------------------------------------------------------------------
                     Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
                  ---------+--------------------------------------------------------------------
                         1 |      25    5.637343    .0346903    .1734515    5.565746    5.708941
                         3 |      25     .660721    .0346903    .1734515    .5891238    .7323183
                  ---------+--------------------------------------------------------------------
                  Combined |      50    3.149032    .3563011     2.51943    2.433018    3.865046
                  ---------+--------------------------------------------------------------------
                      diff |            4.976622    .0490595                4.877982    5.075263
                  ------------------------------------------------------------------------------
                      diff = mean(1) - mean(3)                                      t = 101.4406
                  H0: diff = 0                     Satterthwaite's degrees of freedom =       48
                  
                      Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
                   Pr(T < t) = 1.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 0.0000
                  B.2. Using reghdfe
                  Code:
                  reghdfe value i.group, absorb(year) vce(cluster ID)
                  margins, over(group) post
                  test 1.group = 3.group
                  
                  
                   margins, over(group) post
                  
                  Predictive margins                                          Number of obs = 75
                  Model VCE: Robust
                  
                  Expression: Linear prediction, predict()
                  Over:       group
                  
                  ------------------------------------------------------------------------------
                               |            Delta-method
                               |     Margin   std. err.      z    P>|z|     [95% conf. interval]
                  -------------+----------------------------------------------------------------
                         group |
                            1  |   5.637344   3.374384     1.67   0.095    -.9763273    12.25101
                            2  |   5.151255   2.096655     2.46   0.014     1.041886    9.260624
                            3  |    .660721    .212724     3.11   0.002     .2437896    1.077652
                  ------------------------------------------------------------------------------
                  
                  test 1.group = 3.group
                  
                   ( 1)  1bn.group - 3.group = 0
                  
                             chi2(  1) =    2.17
                           Prob > chi2 =    0.1410

                  Thank you!

                  Comment


                  • #10
                    Abdan:
                    1) -xtreg,re- used the -re- estimator, the community-contributed module -reghdfe- the -fe- one;
                    2) with 3 clusters only, stick with the default standard error (that said, statistical significance is not the scientific tool to choose between different statistics);
                    If I go -xtreg,re- and then -reghdfe-, results differ:
                    Code:
                    . use "https://www.stata-press.com/data/r18/nlswork.dta"
                    (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
                    
                    . xtreg ln_wage i.year i.nev_mar, re rob
                    
                    Random-effects GLS regression                   Number of obs     =     28,518
                    Group variable: idcode                          Number of groups  =      4,711
                    
                    R-squared:                                      Obs per group:
                         Within  = 0.1071                                         min =          1
                         Between = 0.0769                                         avg =        6.1
                         Overall = 0.0710                                         max =         15
                    
                                                                    Wald chi2(15)     =    1253.23
                    corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
                    
                                                 (Std. err. adjusted for 4,711 clusters in idcode)
                    ------------------------------------------------------------------------------
                                 |               Robust
                         ln_wage | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
                    -------------+----------------------------------------------------------------
                            year |
                             69  |   .0839846   .0102013     8.23   0.000     .0639906    .1039787
                             70  |   .0662482   .0104014     6.37   0.000     .0458617    .0866346
                             71  |   .1147831   .0110154    10.42   0.000     .0931933     .136373
                             72  |   .1269705   .0119536    10.62   0.000      .103542    .1503991
                             73  |   .1402849   .0120654    11.63   0.000     .1166371    .1639326
                             75  |   .1514261   .0122272    12.38   0.000     .1274613     .175391
                             77  |   .2110967   .0126637    16.67   0.000     .1862763    .2359172
                             78  |   .2493533   .0131938    18.90   0.000     .2234938    .2752127
                             80  |     .25596   .0135462    18.90   0.000       .22941      .28251
                             82  |   .2730926   .0136069    20.07   0.000     .2464237    .2997616
                             83  |   .3004199   .0141012    21.30   0.000      .272782    .3280578
                             85  |   .3526673    .013747    25.65   0.000     .3257237    .3796108
                             87  |   .3684121   .0141554    26.03   0.000      .340668    .3961561
                             88  |   .4240029   .0152481    27.81   0.000     .3941171    .4538887
                                 |
                       1.nev_mar |  -.0302251   .0090356    -3.35   0.001    -.0479345   -.0125158
                           _cons |   1.442927    .011602   124.37   0.000     1.420188    1.465666
                    -------------+----------------------------------------------------------------
                         sigma_u |  .36922697
                         sigma_e |  .30264631
                             rho |  .59813335   (fraction of variance due to u_i)
                    ------------------------------------------------------------------------------
                    
                    . margins, over(nev_ma) post
                    
                    Predictive margins                                      Number of obs = 28,518
                    Model VCE: Robust
                    
                    Expression: Linear prediction, predict()
                    Over:       nev_mar
                    
                    ------------------------------------------------------------------------------
                                 |            Delta-method
                                 |     Margin   std. err.      z    P>|z|     [95% conf. interval]
                    -------------+----------------------------------------------------------------
                         nev_mar |
                              0  |   1.680716   .0062094   270.67   0.000     1.668546    1.692886
                              1  |   1.580663   .0084181   187.77   0.000     1.564164    1.597162
                    ------------------------------------------------------------------------------
                    
                    . reghdfe ln_wage i.nev_ma , absorb(year) vce(robust)
                    (MWFE estimator converged in 1 iterations)
                    
                    HDFE Linear regression                            Number of obs   =     28,518
                    Absorbing 1 HDFE group                            F(   1,  28502) =       4.41
                                                                      Prob > F        =     0.0357
                                                                      R-squared       =     0.0731
                                                                      Adj R-squared   =     0.0726
                                                                      Within R-sq.    =     0.0002
                                                                      Root MSE        =     0.4605
                    
                    ------------------------------------------------------------------------------
                                 |               Robust
                         ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                    -------------+----------------------------------------------------------------
                       1.nev_mar |    .014005   .0066661     2.10   0.036     .0009392    .0270709
                           _cons |   1.671799   .0030797   542.84   0.000     1.665763    1.677835
                    ------------------------------------------------------------------------------
                    
                    Absorbed degrees of freedom:
                    -----------------------------------------------------+
                     Absorbed FE | Categories  - Redundant  = Num. Coefs |
                    -------------+---------------------------------------|
                            year |        15           0          15     |
                    -----------------------------------------------------+
                    
                    . margins, over(nev_ma) post
                    
                    Predictive margins                                      Number of obs = 28,518
                    Model VCE: Robust
                    
                    Expression: Linear prediction, predict()
                    Over:       nev_mar
                    
                    ------------------------------------------------------------------------------
                                 |            Delta-method
                                 |     Margin   std. err.      z    P>|z|     [95% conf. interval]
                    -------------+----------------------------------------------------------------
                         nev_mar |
                              0  |   1.671799   .0030797   542.84   0.000     1.665763    1.677835
                              1  |   1.685804   .0058983   285.81   0.000     1.674244    1.697364
                    ------------------------------------------------------------------------------
                    
                    . 
                    
                    However, the way -reghdfe- was performed is wrong, as it assumes that the -panelid- is -year-, as the follwing -xtreg,fe- findings prove:
                    
                    . xtset year
                    
                    Panel variable: year (unbalanced)
                    
                    . xtreg ln_wage i.year i.nev_mar, fe rob
                    note: 69.year omitted because of collinearity.
                    note: 70.year omitted because of collinearity.
                    note: 71.year omitted because of collinearity.
                    note: 72.year omitted because of collinearity.
                    note: 73.year omitted because of collinearity.
                    note: 75.year omitted because of collinearity.
                    note: 77.year omitted because of collinearity.
                    note: 78.year omitted because of collinearity.
                    note: 80.year omitted because of collinearity.
                    note: 82.year omitted because of collinearity.
                    note: 83.year omitted because of collinearity.
                    note: 85.year omitted because of collinearity.
                    note: 87.year omitted because of collinearity.
                    note: 88.year omitted because of collinearity.
                    
                    Fixed-effects (within) regression               Number of obs     =     28,518
                    Group variable: year                            Number of groups  =         15
                    
                    R-squared:                                      Obs per group:
                         Within  = 0.0002                                         min =      1,232
                         Between = 0.8963                                         avg =    1,901.2
                         Overall = 0.0032                                         max =      2,272
                    
                                                                    F(1, 14)          =       0.85
                    corr(u_i, Xb) = -0.2508                         Prob > F          =     0.3718
                    
                                                      (Std. err. adjusted for 15 clusters in year)
                    ------------------------------------------------------------------------------
                                 |               Robust
                         ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                    -------------+----------------------------------------------------------------
                            year |
                             69  |          0  (omitted)
                             70  |          0  (omitted)
                             71  |          0  (omitted)
                             72  |          0  (omitted)
                             73  |          0  (omitted)
                             75  |          0  (omitted)
                             77  |          0  (omitted)
                             78  |          0  (omitted)
                             80  |          0  (omitted)
                             82  |          0  (omitted)
                             83  |          0  (omitted)
                             85  |          0  (omitted)
                             87  |          0  (omitted)
                             88  |          0  (omitted)
                                 |
                       1.nev_mar |    .014005   .0151776     0.92   0.372    -.0185477    .0465578
                           _cons |   1.671799    .003486   479.58   0.000     1.664322    1.679276
                    -------------+----------------------------------------------------------------
                         sigma_u |  .13727056
                         sigma_e |  .46046151
                             rho |  .08161896   (fraction of variance due to u_i)
                    ------------------------------------------------------------------------------
                    
                    .
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Hello Carlo,

                      Thank you for your reply. I think I figured it out now (using my original sample code):

                      1. Doing: reghdfe value i.group, absorb(year) vce(robust) (as suggested in #7 by George) --> This is basically doing pooled OLS while controlling for i.year. I think I shouldn't use this because it doesn't take into account the panel structure.
                      2. Doing: xtreg value i.year i.group , re vce(robust) --> is taking into account the panel structure of the data, while assuming RE.
                      3. Doing: xtreg value i.year i.group , re vce(cluster ID) --> is identical to number 2.

                      In your example above, the following is identical:
                      Code:
                      xtreg ln_wage i.year i.nev_mar, re vce(robust)
                      xtreg ln_wage i.year i.nev_mar, re vce(cluster idcode)
                      Similarly, the following is identical:
                      Code:
                      reg ln_wage i.year i.nev_mar, rob
                      reghdfe ln_wage i.nev_ma , absorb(year) vce(robust)
                      Thank you!


                      Abdan

                      Comment


                      • #12
                        Abdan:
                        under -xtreg- standard errors with the options -robust- and -vce(cluster idcode)- are identical because both options call the cluster-robust standar errors. Please note that this does not hold for -regress-.
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment

                        Working...
                        X