Comparing means of supragroups – Which test or testing procedure is applicable?

Roman Vanderson

Join Date: Jan 2017
Posts: 20

Comparing means of supragroups – Which test or testing procedure is applicable?

25 Apr 2017, 06:47

Dear Statalisters,

I have a sample of panel data consisting of 4,361 firms-year observations that span over a time horizon of 6 years (2000-2005). Due to several preconditions for a large minority of firms data is not available for all years. To exemplify, the data set looks similar to this:

Code:

     +---------------------------------------------------------------------+
     | firm   fyear   D_logCOMP       D_RET       D_ROE   D_logSALES   POST|
     |---------------------------------------------------------------------|
  1. | 1004    2000    .2027831     .320521   -.0490993   -.1911497      0 |
  2. | 1004    2001   -.3795638   -.2086571   -.2444508   -.3416142      0 |
  3. | 1004    2002   -.0218005   -.4337687    .1479123   -.0679026      0 |
  4. | 1004    2003     .537168    1.734407    .0536843    .0501533      1 |
  5. | 1004    2004    .0797672   -.4545674    .0374822    .1107502      1 |
  6. | 1004    2005    .2782454   -.1730748    .0340863    .1489391      1 |
  7. | 1013    2000    .8849363   -.2804468    .2278503    .5015869      0 |
  8. | 1013    2003   -.2077956     1.27933    .8640037   -.3262057      1 |
  9. | 1034    2002    .3501801   -.1504953   -.0569549    .2170906      0 |
 10. | 1034    2003   -.0503058    1.248148    .1116977    .0302486      1 |
 11. | 1034    2004    .0227671   -.8530316   -.3684016    .0055389      1 |
 12. | 1034    2005    .4534125    .8522317    .5018871   -.9167976      1 |
 13. | 1075    2001     .325418   -.7101562   -.0019853    .1820459      0 |
 14. | 1075    2002   -.5786009   -.0569109   -.0692787   -.5615525      0 |
 15. | 1075    2003   -.1251016    .3825794    .0293953    .0438361      1 |
 16. | 1075    2004    1.026104   -.0743766   -.0025834    .0021715      1 |
 17. | 1078    2002   -.2709451   -.4388227    .0908261    .0665674      0 |
 18. | 1078    2003    .1654038    .4620089   -.0513451    .0845432      1 |
 19. | 1078    2005    -.032259   -.2257506    .0080471    .0912027      1 |
 20. | 1161    2000    .6840143   -.0432012    .3548735    .4529076      0 |
 21. | 1161    2001   -1.046721    .1935917   -.3269807   -.2044659      0 |
 22. | 1161    2004    .7973237   -.8286491    .1428577    .3250303      1 |
 23. | 1161    2005     .785953   -.0882066     .019087    .1230688      1 |
 24. | 1209    2000    .4295473    .2566491   -.1080915    .0525742      0 |
 25. | 1209    2003   -.1163316   -.0120102   -.0467958    .1311083      1 |
 26. | 1209    2004    .4704585    .1339709    .0308997    .1364298      1 |
 27. | 1209    2005    .1203718   -.1942107    .0196097    .0609665      1 |
 28. | 1230    2000     -.56847    .0531898   -.2257187    .0119858      0 |
 29. | 1230    2001   -.0148058    .1311762    .0305453     -.03898      0 |
 30. | 1230    2002    .1773906    -.234165   -.1298946    .0167122      0 |
 31. | 1230    2005   -.3194795   -.1606025    .0158854    .0550833      1 |
     +---------------------------------------------------------------------+

I am interested in the effect of a policy change – that splits the sample in PRE (years 2000-2002) and POST (years 2003-2005) data – on a proxy of subjectivity usage in CEO compensation, which I gather by predicting the residuals of two regressions that include only pre-change period data and post-change period data respectively.

Code:

reg D_logCOMP D_RET D_ROE D_logSALES i.year if POST!=1, vce(cl firm) notab
predict UCOMP_PRE, re

reg D_logCOMP D_RET D_ROE D_logSALES i.year if POST==1, vce(cl firm) notab
predict UCOMP_POST, re

As the two periods PRE and POST are not independent from one another a paired t-test would be the test of choice to compare the means of UCOMP_PRE and UCOMP_POST.

Code:

. ttest UCOMP_PRE==UCOMP_POST

Paired t test
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
UCOMP_~E |   4,361    .0156696    .0061899    .4087657    .0035343    .0278049
UCOMP_~T |   4,361     .031279    .0062513      .41282    .0190233    .0435346
---------+--------------------------------------------------------------------
    diff |   4,361   -.0156094    .0021381    .1411973   -.0198012   -.0114176
------------------------------------------------------------------------------
     mean(diff) = mean(UCOMP_PRE - UCOMP_POST)                    t =  -7.3005
 Ho: mean(diff) = 0                              degrees of freedom =     4360

 Ha: mean(diff) < 0           Ha: mean(diff) != 0           Ha: mean(diff) > 0
 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

This however can not be readily done, as UCOMP_PRE and UCOMP_POST are predicted for all years (see "Obs" in the Stata output of the paired t-test above), which is faulty, I suppose, as the mean of UCOMP_PRE would contain observations of the POST period and vice versa. If I try to correct this flaw by simply dropping the observations that are faultily predicted,

Code:

replace UCOMP_PRE =. if POST!=1
replace UCOMP_POST =. if POST==1

I logically don't get any results anymore when running the paired t-test command, as there is no paired – in the sense that the paired data is not in the line of each observation – observation anymore in the data set.

Code:

     +---------------------------------------------------------------------------------------------+
     | firm   fyear   D_logCOMP       D_RET       D_ROE   D_logSA~S   POST   UCOMP_PRE   UCOMP_POST|
     |---------------------------------------------------------------------------------------------|
  1. | 1004    2000    .2027831     .320521   -.0490993   -.1911497      0           .     .418236 |
  2. | 1004    2001   -.3795638   -.2086571   -.2444508   -.3416142      0           .    .0724417 |
  3. | 1004    2002   -.0218005   -.4337687    .1479123   -.0679026      0           .    .4560174 |
  4. | 1004    2003     .537168    1.734407    .0536843    .0501533      1    .3773581           . |
  5. | 1004    2004    .0797672   -.4545674    .0374822    .1107502      1   -.2227536           . |
  6. | 1004    2005    .2782454   -.1730748    .0340863    .1489391      1   -.0397195           . |
  7. | 1013    2000    .8849363   -.2804468    .2278503    .5015869      0           .    1.124532 |
  8. | 1013    2003   -.2077956     1.27933    .8640037   -.3262057      1   -.8766099           . |
  9. | 1034    2002    .3501801   -.1504953   -.0569549    .2170906      0           .    .6117797 |
 10. | 1034    2003   -.0503058    1.248148    .1116977    .0302486      1   -.2747711           . |
 11. | 1034    2004    .0227671   -.8530316   -.3684016    .0055389      1    .1953407           . |
 12. | 1034    2005    .4534125    .8522317    .5018871   -.9167976      1    .6290789           . |
 13. | 1075    2001     .325418   -.7101562   -.0019853    .1820459      0           .      .81615 |
 14. | 1075    2002   -.5786009   -.0569109   -.0692787   -.5615525      0           .    -.099839 |
 15. | 1075    2003   -.1251016    .3825794    .0293953    .0438361      1   -.3214543           . |
 16. | 1075    2004    1.026104   -.0743766   -.0025834    .0021715      1    .8748527           . |
 17. | 1078    2002   -.2709451   -.4388227    .0908261    .0665674      0           .    .1607376 |
 18. | 1078    2003    .1654038    .4620089   -.0513451    .0845432      1    .0186986           . |
 19. | 1078    2005    -.032259   -.2257506    .0080471    .0912027      1   -.2775374           . |
 20. | 1161    2000    .6840143   -.0432012    .3548735    .4529076      0           .    .8583364 |
 21. | 1161    2001   -1.046721    .1935917   -.3269807   -.2044659      0           .   -.7990856 |
 22. | 1161    2004    .7973237   -.8286491    .1428577    .3250303      1    .1884516           . |
 23. | 1161    2005     .785953   -.0882066     .019087    .1230688      1    .5091883           . |
 24. | 1209    2000    .4295473    .2566491   -.1080915    .0525742      0           .    .5855641 |
 25. | 1209    2003   -.1163316   -.0120102   -.0467958    .1311083      1   -.3305308           . |
 26. | 1209    2004    .4704585    .1339709    .0308997    .1364298      1    .1813791           . |
 27. | 1209    2005    .1203718   -.1942107    .0196097    .0609665      1    -.109077           . |
 28. | 1230    2000     -.56847    .0531898   -.2257187    .0119858      0           .   -.3302831 |
 29. | 1230    2001   -.0148058    .1311762    .0305453     -.03898      0           .    .2292305 |
 30. | 1230    2002    .1773906    -.234165   -.1298946    .0167122      0           .    .5308323 |
 31. | 1230    2005   -.3194795   -.1606025    .0158854    .0550833      1   -.5385462           . |
     +---------------------------------------------------------------------------------------------+



. ttest UCOMP_PRE==UCOMP_POST
no observations
r(2000);

What can I do, in terms of commands/procedures/tests, to compare the correct means of UCOMP_PRE and UCOMP_POST with each other?

For you help I thank you very much in advance!

Kind regards,
Roman

EDIT: Sorry for the, in retrospect, suboptimal title that I unfortunately cannot edit.

Last edited by Roman Vanderson; 25 Apr 2017, 06:58.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 29998
#2

25 Apr 2017, 09:36

So I think you want to do something like this after you create the variables UCOMP_PRE and UCOMP_POST:

Code:

gen UCOMP = UCOMP_PRE if POST == 0 replace UCOMP = UCOMP_POST if POST == 1 xtset firm xtreg UCOMP i.POST, fe

This will give you the closest thing to a paired t-test that makes sense for this data structure.
Comment

Roman Vanderson

Join Date: Jan 2017
Posts: 20

25 Apr 2017, 13:19

Dear Clyde,

Thank you very much for your reply and the insights that came with it. Very appreciated!

I made use of your code, but in the context of testing for differences of means I have a hard time interpreting the results.

Code:

. xtreg UCOMP i.POST, fe

Fixed-effects (within) regression               Number of obs     =      4,361
Group variable: firm                            Number of groups  =        942

R-sq:                                           Obs per group:
     within  = 0.0000                                         min =          1
     between = 0.0043                                         avg =        4.6
     overall = 0.0000                                         max =          6

                                                F(1,3418)         =       0.04
corr(u_i, Xb)  = -0.0105                        Prob > F          =     0.8474

------------------------------------------------------------------------------
    UCOMP    |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      1.POST |  -.0026244    .013636    -0.19   0.847    -.0293599    .0241112
       _cons |   .0013408   .0095173     0.14   0.888    -.0173195     .020001
-------------+----------------------------------------------------------------
     sigma_u |  .14658014
     sigma_e |  .42821505
         rho |  .10488324   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(941, 3418) = 0.39                   Prob > F = 1.0000

A general model fit seems not given since the F-test is not significant. In consequence, is there anything left to learn from the results?
Moreover, the p-value of the t-test regarding 1.POST suggests its coefficient estimate lacks statistical significance. Would this, given the model fit would have been okay, mean that POST is without significant effect on UCOMP and, consequently, that there is no statistically significant difference between the means of UCOMP_PRE and UCOMP_POST?
Finally, what does the creeping up F-test at the end of the output indicate?

Thank you very much in advance!

Kind regards,
Roman

Last edited by Roman Vanderson; 25 Apr 2017, 13:26.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29998
#4

25 Apr 2017, 13:46

A general model fit seems not given since the F-test is not significant.

What's your point? If there is no consequential difference between the pre- and post values of UCOMP (as appears to be the case here, see below) then you would expect a model that is based only on the pre-post distinction to show poor fit. By the way, the model F-test is not a measure of fit anyway.

You asked for a test of whether the pre- and post- values of UCOMP differ. You got it. The answer is that they don't differ by very much. The expected mean difference is given by the coefficient of 1.POST, namely -0.0026, with a 95% CI from -0.029 and + 0.024. If you want a significance test, the p-value if 0.847: not "statistically significant."

Finally, what does the creeping up F-test at the end of the output indicate?

"Creeping up?" Do you mean the one that says "F test that all u_i=0: F(941, 3418) = 0.39 Prob > F = 1.0000?" That is just what it says: it is a test of the null hypothesis that all of the fixed effects are zero. In this case, you do not reject that null hypothesis. People sometimes rely on this test to decide whether they could have used pooled OLS instead of the fixed-effects model. But for your present purposes, it has no relevance.
Comment

Roman Vanderson

Join Date: Jan 2017
Posts: 20

25 Apr 2017, 16:55

Dear Clyde,

Thank you very much for your reply. It actually helped me a lot understanding not only the output but the underlying fundamentals as well. Thank you for that!

You are obviously right with both: an F-test not being a measure of model fit and that poor fit is rather expected in the case described here.

After letting your input sink in, I realised that it actually makes only sense to test the absolute values of UCOMP pre- and post-change of policy. I reran the regression and this time the results indicate a stronger difference between the means that is, with a p-value of .0003, statistically significant.

Code:

. xtreg abs_UCOMP i.POST, fe

Fixed-effects (within) regression               Number of obs     =      4,361
Group variable: firm                            Number of groups  =        942

R-sq:                                           Obs per group:
     within  = 0.0026                                         min =          1
     between = 0.0102                                         avg =        4.6
     overall = 0.0028                                         max =          6

                                                F(1,3418)         =       8.80
corr(u_i, Xb)  = 0.0180                         Prob > F          =     0.0030

------------------------------------------------------------------------------
abs_UCOMP    |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      1.POST |   -.024995   .0084275    -2.97   0.003    -.0415185   -.0084715
       _cons |   .2813932   .0058821    47.84   0.000     .2698605    .2929259
-------------+----------------------------------------------------------------
     sigma_u |   .1853353
     sigma_e |  .26465225
         rho |  .32904662   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(941, 3418) = 2.11                   Prob > F = 0.0000

The coefficient estimate of 1.POST predicts a difference in means of -.025. This suggests a relatively lower mean of UCOMP after the policy change, which is in line with my expectations.

Again: Thank you very much!

Announcement

Comparing means of supragroups – Which test or testing procedure is applicable?

Comment

Comment

Comment

Comment