Testing difference in means across groups in panel data

Abdan Syakura

Join Date: Nov 2018
Posts: 58

Testing difference in means across groups in panel data

22 Jun 2024, 16:49

Hello all,

Can you help me how to test difference in means across groups in panel data? After reading online, I found two options: (1) running mixed effect regression and then test the parameter; (2) collapse the data (to remove the panel) and then do one way anova test. Here is the data sample and result so far:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str24 countryname int year float value byte group long ID
"Albania"                  2018 9.621069 2  1
"Albania"                  2019 9.562582 2  1
"Albania"                  2020 9.668357 2  1
"Albania"                  2021 9.583389 2  1
"Albania"                  2022 9.226106 2  1
"Algeria"                  2018 1.134864 3  2
"Algeria"                  2019 1.039727 3  2
"Algeria"                  2020 1.166164 3  2
"Algeria"                  2021 1.096307 3  2
"Algeria"                  2022 .8507636 3  2
"Argentina"                2018 .0995468 2  3
"Argentina"                2019 .1253808 2  3
"Argentina"                2020 .1680119 2  3
"Argentina"                2021 .1846933 2  3
"Argentina"                2022 .2008211 2  3
"Armenia"                  2018  11.9427 2  4
"Armenia"                  2019 11.21892 2  4
"Armenia"                  2020 10.49706 2  4
"Armenia"                  2021 11.21718 2  4
"Armenia"                  2022 10.42774 2  4
"Azerbaijan"               2018 2.601838 2  5
"Azerbaijan"               2019 2.646993 2  5
"Azerbaijan"               2020 3.286457 2  5
"Azerbaijan"               2021 2.784541 2  5
"Azerbaijan"               2022 5.017754 2  5
"Bahamas, The"             2018        0 1  6
"Bahamas, The"             2019        0 1  6
"Bahamas, The"             2020 .5208385 1  6
"Bahamas, The"             2021 .4601183 1  6
"Bahamas, The"             2022 .4488347 1  6
"Barbados"                 2018 1.666766 1  7
"Barbados"                 2019 1.594726 1  7
"Barbados"                 2020 1.797415 1  7
"Barbados"                 2021 1.730127 1  7
"Barbados"                 2022 1.494327 1  7
"Burundi"                  2018 1.811973 2  8
"Burundi"                  2019 1.875734 2  8
"Burundi"                  2020 1.823942 2  8
"Burundi"                  2021 1.741071 2  8
"Burundi"                  2022 1.447519 2  8
"Central African Republic" 2018        0 1  9
"Central African Republic" 2019        0 1  9
"Central African Republic" 2020        0 1  9
"Central African Republic" 2021        0 1  9
"Central African Republic" 2022        0 1  9
"China"                    2018 .1749246 3 10
"China"                    2019 .1281117 3 10
"China"                    2020  .128517 3 10
"China"                    2021 .1261466 3 10
"China"                    2022 .1453294 3 10
"Comoros"                  2018 14.52057 1 11
"Comoros"                  2019 14.10411 1 11
"Comoros"                  2020 18.50192 1 11
"Comoros"                  2021 22.21997 1 11
"Comoros"                  2022 22.68288 1 11
"Egypt, Arab Rep"          2018 9.716986 1 12
"Egypt, Arab Rep"          2019 8.403885 1 12
"Egypt, Arab Rep"          2020 7.712747 1 12
"Egypt, Arab Rep"          2021 7.414432 1 12
"Egypt, Arab Rep"          2022 5.942934 1 12
"Gabon"                    2018 .1094308 3 13
"Gabon"                    2019 .1093848 3 13
"Gabon"                    2020  .120526 3 13
"Gabon"                    2021 .0912953 3 13
"Gabon"                    2022 .0875962 3 13
"Mauritania"               2018 .8080745 3 14
"Mauritania"               2019 .8164766 3 14
"Mauritania"               2020 2.041765 3 14
"Mauritania"               2021  .142859 3 14
"Mauritania"               2022 1.117382 3 14
"Seychelles"               2018 1.430288 3 15
"Seychelles"               2019 1.437503 3 15
"Seychelles"               2020 .8432241 3 15
"Seychelles"               2021 .7392818 3 15
"Seychelles"               2022  .632084 3 15
end
label values ID ID
label def ID 1 "Albania", modify
label def ID 2 "Algeria", modify
label def ID 3 "Argentina", modify
label def ID 4 "Armenia", modify
label def ID 5 "Azerbaijan", modify
label def ID 6 "Bahamas, The", modify
label def ID 7 "Barbados", modify
label def ID 8 "Burundi", modify
label def ID 9 "Central African Republic", modify
label def ID 10 "China", modify
label def ID 11 "Comoros", modify
label def ID 12 "Egypt, Arab Rep", modify
label def ID 13 "Gabon", modify
label def ID 14 "Mauritania", modify
label def ID 15 "Seychelles", modify

This is the result by using mixed effect:

Code:

mixed value i.group || ID: // Mixed effect model

Performing EM optimization ...

Performing gradient-based optimization: 
Iteration 0:  Log likelihood = -151.58532  
Iteration 1:  Log likelihood = -151.58532  

Computing standard errors ...

Mixed-effects ML regression                          Number of obs    =     75
Group variable: ID                                   Number of groups =     15
                                                     Obs per group:
                                                                  min =      5
                                                                  avg =    5.0
                                                                  max =      5
                                                     Wald chi2(2)     =   3.33
Log likelihood = -151.58532                          Prob > chi2      = 0.1894

------------------------------------------------------------------------------
       value | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       group |
          2  |  -.4860883   3.008294    -0.16   0.872    -6.382236    5.410059
          3  |  -4.976622   3.008294    -1.65   0.098    -10.87277     .919525
             |
       _cons |   5.637344   2.127185     2.65   0.008     1.468138    9.806549
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
-----------------------------+------------------------------------------------
ID: Identity                 |
                  var(_cons) |   22.34822   8.261486      10.82876    46.12192
-----------------------------+------------------------------------------------
               var(Residual) |   1.381789    .252279      .9661273    1.976283
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 147.18        Prob >= chibar2 = 0.0000

. testparm i.group

 ( 1)  [value]2.group = 0
 ( 2)  [value]3.group = 0

           chi2(  2) =    3.33
         Prob > chi2 =    0.1894

And this is by using oneway Anova:

Code:

collapse (mean) value, by(countryname group)

. oneway value group, tabulate

            |       Summary of (mean) value
      group |        Mean   Std. dev.       Freq.
------------+------------------------------------
          1 |   5.6373434   7.8125257           5
          2 |   5.1512553   4.8542716           5
          3 |   .66072102   .49250825           5
------------+------------------------------------
      Total |   3.8164399   5.4422174          15

                        Analysis of variance
    Source              SS         df      MS            F     Prob > F
------------------------------------------------------------------------
Between groups      75.2799162      2   37.6399581      1.33     0.3006
 Within groups      339.368301     12   28.2806917
------------------------------------------------------------------------
    Total           414.648217     14   29.6177298

Bartlett's equal-variances test: chi2(2) =  15.0187    Prob>chi2 = 0.001

. pwmean value, over(group) mcompare(tukey) effects

Pairwise comparisons of means with equal variances

Over: group

---------------------------
             |    Number of
             |  comparisons
-------------+-------------
       group |            3
---------------------------

------------------------------------------------------------------------------
             |                              Tukey                Tukey
       value |   Contrast   Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       group |
     2 vs 1  |   -.486088   3.363373    -0.14   0.989    -9.459108    8.486932
     3 vs 1  |  -4.976622   3.363373    -1.48   0.334    -13.94964    3.996398
     3 vs 2  |  -4.490534   3.363373    -1.34   0.404    -13.46355    4.482486
------------------------------------------------------------------------------

My goal is to compare group 1 vs 3, so it seems the Anova method answered this by testing 3 vs 1 (p-value 0.334). Or should I just simply look at the coefficient and significance of

Code:

 [value]3.group

(p-value 0.098) in the mixed effect regression above? Thank you for your help.

Best,

Abdan

Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17702

23 Jun 2024, 02:00

Abdan:
do you mean something along the following lines?

Code:

. xtset ID year
. xtreg value c.year##c.year i.group , re
note: c.year#c.year omitted because of collinearity.

Random-effects GLS regression                   Number of obs     =         75
Group variable: ID                              Number of groups  =         15

R-squared:                                      Obs per group:
     Within  = 0.0172                                         min =          5
     Between = 0.1816                                         avg =        5.0
     Overall = 0.1752                                         max =          5

                                                Wald chi2(3)      =       3.70
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.2963

-------------------------------------------------------------------------------
        value | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
--------------+----------------------------------------------------------------
         year |   .0975597   .0959518     1.02   0.309    -.0905023    .2856218
              |
c.year#c.year |          0  (omitted)
              |
        group |
           2  |  -.4860883   3.363373    -0.14   0.885    -7.078178    6.106001
           3  |  -4.976622   3.363373    -1.48   0.139    -11.56871    1.615467
              |
        _cons |  -191.4333   193.8372    -0.99   0.323    -571.3472    188.4806
--------------+----------------------------------------------------------------
      sigma_u |  5.2919269
      sigma_e |  1.1751645
          rho |  .95300364   (fraction of variance due to u_i)
-------------------------------------------------------------------------------

. predict fitted, xb

. ttest fitted if group!=2, by( group) unequal

Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
       1 |      25    5.637344    .0281631    .1408153    5.579218    5.695469
       3 |      25     .660721    .0281631    .1408153    .6025953    .7188468
---------+--------------------------------------------------------------------
Combined |      50    3.149032     .356019    2.517435    2.433585    3.864479
---------+--------------------------------------------------------------------
    diff |            4.976622    .0398286                4.896542    5.056703
------------------------------------------------------------------------------
    diff = mean(1) - mean(3)                                      t = 124.9510
H0: diff = 0                     Satterthwaite's degrees of freedom =       48

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 1.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 0.0000

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Abdan Syakura

Join Date: Nov 2018

Posts: 58
#3

23 Jun 2024, 10:21

Thank you very much, Carlo. Yes, I think this is what I need. Do you have any references saying that this method (i.e. estimating FE/RE model, then obtain the predicted value) is the way to test difference in means across groups in panel data? I want to cite it. Thanks!
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17702
#4

23 Jun 2024, 23:40

Abdan:
not that I know.
All in all, this is routinary postestimation exercise.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Abdan Syakura

Join Date: Nov 2018

Posts: 58
#5

24 Jun 2024, 12:43

Sorry Carlo, just a quick question. Why do you use c.year##c.year instead of i.year in #2? I am testing the means difference for several variables. Some of the results are very different if I use c.year##c.year instead of i.year. Thank you.

Best regards,

Abdan
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17702
#6

24 Jun 2024, 13:20

Abdan:
the toy-example included an interaction of -year-, considered as a continuous variable, with itself, as you highlighted. The aim of a lnear plus a square term is to investigate the potential existence of a non-linear relationship between this regressor and the dependent variable.
That said, you can run your code without interaction, following the very same steps.

Kind regards,
Carlo
(Stata 19.0)
Comment

George Ford

Join Date: Aug 2014
Posts: 3138

24 Jun 2024, 14:24

Code:

reghdfe value i.group , absorb(year)
margins, over(group) post
test 1.group = 2.group
test 1.group = 3.group
test 2.group = 3.group

Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17702

24 Jun 2024, 23:32

Abdan:
making #6 quantitative:

Code:

. xtset ID year

Panel variable: ID (strongly balanced)
 Time variable: year, 2018 to 2022
         Delta: 1 unit

. xtreg value year i.group , re

Random-effects GLS regression                   Number of obs     =         75
Group variable: ID                              Number of groups  =         15

R-squared:                                      Obs per group:
     Within  = 0.0172                                         min =          5
     Between = 0.1816                                         avg =        5.0
     Overall = 0.1752                                         max =          5

                                                Wald chi2(3)      =       3.70
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.2963

------------------------------------------------------------------------------
       value | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        year |   .0975597   .0959518     1.02   0.309    -.0905023    .2856218
             |
       group |
          2  |  -.4860883   3.363373    -0.14   0.885    -7.078178    6.106001
          3  |  -4.976622   3.363373    -1.48   0.139    -11.56871    1.615467
             |
       _cons |  -191.4333   193.8372    -0.99   0.323    -571.3472    188.4806
-------------+----------------------------------------------------------------
     sigma_u |  5.2919269
     sigma_e |  1.1751645
         rho |  .95300364   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. predict fitted, xb

. ttest fitted if group!=2, by( group) unequal

Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
       1 |      25    5.637344    .0281631    .1408153    5.579218    5.695469
       3 |      25     .660721    .0281631    .1408153    .6025953    .7188468
---------+--------------------------------------------------------------------
Combined |      50    3.149032     .356019    2.517435    2.433585    3.864479
---------+--------------------------------------------------------------------
    diff |            4.976622    .0398286                4.896542    5.056703
------------------------------------------------------------------------------
    diff = mean(1) - mean(3)                                      t = 124.9510
H0: diff = 0                     Satterthwaite's degrees of freedom =       48

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 1.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 0.0000

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Abdan Syakura

Join Date: Nov 2018
Posts: 58

26 Aug 2024, 06:30

Hello Carlo, George, and all,

May I get your advice again please? (1) What are the difference between xtreg and reghdfe command below? Is the reghdfe also estimating at random effect model here? (2) Should I cluster the standard error within panel ID? If I don't cluster my standard error at panel ID, both xtreg and reghdfe command show there is statistically significant difference between group 1 and 3. However if I cluster the SE, xtreg shows the difference is statistically significant, but reghdfe doesn't say so. The following is the output:

A. Without clustering at panel ID
A.1. Using xtreg re

Code:

xtreg value i.year i.group , re vce(robust)
predict fitted1, xb
ttest fitted1 if group!=2, by(group) unequal


Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
       1 |      25    5.637343    .0346903    .1734515    5.565746    5.708941
       3 |      25     .660721    .0346903    .1734515    .5891238    .7323183
---------+--------------------------------------------------------------------
Combined |      50    3.149032    .3563011     2.51943    2.433018    3.865046
---------+--------------------------------------------------------------------
    diff |            4.976622    .0490595                4.877982    5.075263
------------------------------------------------------------------------------
    diff = mean(1) - mean(3)                                      t = 101.4406
H0: diff = 0                     Satterthwaite's degrees of freedom =       48

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 1.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 0.0000

A.2. Using reghdfe

Code:

reghdfe value i.group , absorb(year) vce(robust)
margins, over(group) post
test 1.group = 3.group


 reghdfe value i.group , absorb(year) vce(robust)
(MWFE estimator converged in 1 iterations)

HDFE Linear regression                            Number of obs   =         75
Absorbing 1 HDFE group                            F(   2,     68) =      16.97
                                                  Prob > F        =     0.0000
                                                  R-squared       =     0.1756
                                                  Adj R-squared   =     0.1028
                                                  Within R-sq.    =     0.1747
                                                  Root MSE        =     5.1128

------------------------------------------------------------------------------
             |               Robust
       value | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       group |
          2  |  -.4860883   1.766792    -0.28   0.784    -4.011666    3.039489
          3  |  -4.976622   1.514545    -3.29   0.002    -7.998849   -1.954396
             |
       _cons |   5.637344   1.509466     3.73   0.000     2.625252    8.649435
------------------------------------------------------------------------------



 margins, over(group) post

Predictive margins                                          Number of obs = 75
Model VCE: Robust

Expression: Linear prediction, predict()
Over:       group

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       group |
          1  |   5.637344   1.509466     3.73   0.000     2.678845    8.595842
          2  |   5.151255    .918186     5.61   0.000     3.351644    6.950867
          3  |    .660721    .123929     5.33   0.000     .4178246    .9036175
------------------------------------------------------------------------------

. test 1.group = 3.group

 ( 1)  1bn.group - 3.group = 0

           chi2(  1) =   10.80
         Prob > chi2 =    0.0010

B. With clustering at panel ID
B.1. Using xtreg re

Code:

xtset ID year
xtreg value i.year i.group , re vce(cluster ID)
predict fitted2, xb
ttest fitted2 if group!=2, by(group) unequal


Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
       1 |      25    5.637343    .0346903    .1734515    5.565746    5.708941
       3 |      25     .660721    .0346903    .1734515    .5891238    .7323183
---------+--------------------------------------------------------------------
Combined |      50    3.149032    .3563011     2.51943    2.433018    3.865046
---------+--------------------------------------------------------------------
    diff |            4.976622    .0490595                4.877982    5.075263
------------------------------------------------------------------------------
    diff = mean(1) - mean(3)                                      t = 101.4406
H0: diff = 0                     Satterthwaite's degrees of freedom =       48

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 1.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 0.0000

B.2. Using reghdfe

Code:

reghdfe value i.group, absorb(year) vce(cluster ID)
margins, over(group) post
test 1.group = 3.group


 margins, over(group) post

Predictive margins                                          Number of obs = 75
Model VCE: Robust

Expression: Linear prediction, predict()
Over:       group

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       group |
          1  |   5.637344   3.374384     1.67   0.095    -.9763273    12.25101
          2  |   5.151255   2.096655     2.46   0.014     1.041886    9.260624
          3  |    .660721    .212724     3.11   0.002     .2437896    1.077652
------------------------------------------------------------------------------

test 1.group = 3.group

 ( 1)  1bn.group - 3.group = 0

           chi2(  1) =    2.17
         Prob > chi2 =    0.1410

Thank you!

Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17702

#10

26 Aug 2024, 08:07

Abdan:
1) -xtreg,re- used the -re- estimator, the community-contributed module -reghdfe- the -fe- one;
2) with 3 clusters only, stick with the default standard error (that said, statistical significance is not the scientific tool to choose between different statistics);
If I go -xtreg,re- and then -reghdfe-, results differ:

Code:

. use "https://www.stata-press.com/data/r18/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtreg ln_wage i.year i.nev_mar, re rob

Random-effects GLS regression                   Number of obs     =     28,518
Group variable: idcode                          Number of groups  =      4,711

R-squared:                                      Obs per group:
     Within  = 0.1071                                         min =          1
     Between = 0.0769                                         avg =        6.1
     Overall = 0.0710                                         max =         15

                                                Wald chi2(15)     =    1253.23
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

                             (Std. err. adjusted for 4,711 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        year |
         69  |   .0839846   .0102013     8.23   0.000     .0639906    .1039787
         70  |   .0662482   .0104014     6.37   0.000     .0458617    .0866346
         71  |   .1147831   .0110154    10.42   0.000     .0931933     .136373
         72  |   .1269705   .0119536    10.62   0.000      .103542    .1503991
         73  |   .1402849   .0120654    11.63   0.000     .1166371    .1639326
         75  |   .1514261   .0122272    12.38   0.000     .1274613     .175391
         77  |   .2110967   .0126637    16.67   0.000     .1862763    .2359172
         78  |   .2493533   .0131938    18.90   0.000     .2234938    .2752127
         80  |     .25596   .0135462    18.90   0.000       .22941      .28251
         82  |   .2730926   .0136069    20.07   0.000     .2464237    .2997616
         83  |   .3004199   .0141012    21.30   0.000      .272782    .3280578
         85  |   .3526673    .013747    25.65   0.000     .3257237    .3796108
         87  |   .3684121   .0141554    26.03   0.000      .340668    .3961561
         88  |   .4240029   .0152481    27.81   0.000     .3941171    .4538887
             |
   1.nev_mar |  -.0302251   .0090356    -3.35   0.001    -.0479345   -.0125158
       _cons |   1.442927    .011602   124.37   0.000     1.420188    1.465666
-------------+----------------------------------------------------------------
     sigma_u |  .36922697
     sigma_e |  .30264631
         rho |  .59813335   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. margins, over(nev_ma) post

Predictive margins                                      Number of obs = 28,518
Model VCE: Robust

Expression: Linear prediction, predict()
Over:       nev_mar

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     nev_mar |
          0  |   1.680716   .0062094   270.67   0.000     1.668546    1.692886
          1  |   1.580663   .0084181   187.77   0.000     1.564164    1.597162
------------------------------------------------------------------------------

. reghdfe ln_wage i.nev_ma , absorb(year) vce(robust)
(MWFE estimator converged in 1 iterations)

HDFE Linear regression                            Number of obs   =     28,518
Absorbing 1 HDFE group                            F(   1,  28502) =       4.41
                                                  Prob > F        =     0.0357
                                                  R-squared       =     0.0731
                                                  Adj R-squared   =     0.0726
                                                  Within R-sq.    =     0.0002
                                                  Root MSE        =     0.4605

------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
   1.nev_mar |    .014005   .0066661     2.10   0.036     .0009392    .0270709
       _cons |   1.671799   .0030797   542.84   0.000     1.665763    1.677835
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
        year |        15           0          15     |
-----------------------------------------------------+

. margins, over(nev_ma) post

Predictive margins                                      Number of obs = 28,518
Model VCE: Robust

Expression: Linear prediction, predict()
Over:       nev_mar

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     nev_mar |
          0  |   1.671799   .0030797   542.84   0.000     1.665763    1.677835
          1  |   1.685804   .0058983   285.81   0.000     1.674244    1.697364
------------------------------------------------------------------------------

. 

However, the way -reghdfe- was performed is wrong, as it assumes that the -panelid- is -year-, as the follwing -xtreg,fe- findings prove:

. xtset year

Panel variable: year (unbalanced)

. xtreg ln_wage i.year i.nev_mar, fe rob
note: 69.year omitted because of collinearity.
note: 70.year omitted because of collinearity.
note: 71.year omitted because of collinearity.
note: 72.year omitted because of collinearity.
note: 73.year omitted because of collinearity.
note: 75.year omitted because of collinearity.
note: 77.year omitted because of collinearity.
note: 78.year omitted because of collinearity.
note: 80.year omitted because of collinearity.
note: 82.year omitted because of collinearity.
note: 83.year omitted because of collinearity.
note: 85.year omitted because of collinearity.
note: 87.year omitted because of collinearity.
note: 88.year omitted because of collinearity.

Fixed-effects (within) regression               Number of obs     =     28,518
Group variable: year                            Number of groups  =         15

R-squared:                                      Obs per group:
     Within  = 0.0002                                         min =      1,232
     Between = 0.8963                                         avg =    1,901.2
     Overall = 0.0032                                         max =      2,272

                                                F(1, 14)          =       0.85
corr(u_i, Xb) = -0.2508                         Prob > F          =     0.3718

                                  (Std. err. adjusted for 15 clusters in year)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        year |
         69  |          0  (omitted)
         70  |          0  (omitted)
         71  |          0  (omitted)
         72  |          0  (omitted)
         73  |          0  (omitted)
         75  |          0  (omitted)
         77  |          0  (omitted)
         78  |          0  (omitted)
         80  |          0  (omitted)
         82  |          0  (omitted)
         83  |          0  (omitted)
         85  |          0  (omitted)
         87  |          0  (omitted)
         88  |          0  (omitted)
             |
   1.nev_mar |    .014005   .0151776     0.92   0.372    -.0185477    .0465578
       _cons |   1.671799    .003486   479.58   0.000     1.664322    1.679276
-------------+----------------------------------------------------------------
     sigma_u |  .13727056
     sigma_e |  .46046151
         rho |  .08161896   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Abdan Syakura

Join Date: Nov 2018

Posts: 58
#11

26 Aug 2024, 09:13

Hello Carlo,

Thank you for your reply. I think I figured it out now (using my original sample code):

1. Doing: reghdfe value i.group, absorb(year) vce(robust) (as suggested in #7 by George) --> This is basically doing pooled OLS while controlling for i.year. I think I shouldn't use this because it doesn't take into account the panel structure.
2. Doing: xtreg value i.year i.group , re vce(robust) --> is taking into account the panel structure of the data, while assuming RE.
3. Doing: xtreg value i.year i.group , re vce(cluster ID) --> is identical to number 2.

In your example above, the following is identical:

Code:

xtreg ln_wage i.year i.nev_mar, re vce(robust) xtreg ln_wage i.year i.nev_mar, re vce(cluster idcode)

Similarly, the following is identical:

Code:

reg ln_wage i.year i.nev_mar, rob reghdfe ln_wage i.nev_ma , absorb(year) vce(robust)

Thank you!

Abdan
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17702
#12

26 Aug 2024, 11:25

Abdan:
under -xtreg- standard errors with the options -robust- and -vce(cluster idcode)- are identical because both options call the cluster-robust standar errors. Please note that this does not hold for -regress-.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Testing difference in means across groups in panel data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment