Cluster-robust wald test of linear hypthesis

Anders Gotfredsen

Join Date: Oct 2023

Posts: 3
#1

Cluster-robust wald test of linear hypthesis

28 Oct 2023, 10:19

Hello,

I am estimating a DiD-model of student gpa on hours worked with treatment taking place at the industry level in 2021. My dataset ranges from 2010-2021. To account for autocorrelation within industry I cluster the standard errors at the industry-level (with 48 clusters). I estimate the following equation:

Code:

reghdfe gpa treat2021, absorb(year industry) vce(cluster industry)

Where treat2021 is my treatment dummy. I want to perform a test for parallel trends by estimating a second equation with a (placebo) treatment indicator in every year (leaving one out):

Code:

reghdfe gpa treat2011-treat2021, absorb(year industry) vce(cluster industry)

I then use a Wald-test:

Code:

testparm treat2011-treat2020

However, this gives me nonsensical results. I very strongly reject the null-hypothesis (p=0.000), even though each parameter is very insignificant on its own. If I instead use normal heteroscedasticity-robust standard errors, the standard errors on the parameters become only slightly smaller, but now I get a p-value of 0.202 in my Wald test. Wouldn't I expect that smaller standard errors leads to, if anything, a lower p-value? And in any case, the change seems very dramatic. Can this be because the standard Wald-test is not cluster-robust? And if so, does there exist an equivalent cluster-robust test for linear hypotheses in Stata?
Tags: clustering, difference-in-differences, parallel trends, Wald-test
Clyde Schechter

Join Date: Apr 2014

Posts: 30194
#2

28 Oct 2023, 20:33

However, this gives me nonsensical results. I very strongly reject the null-hypothesis (p=0.000), even though each parameter is very insignificant on its own.

There is nothing nonsensical about this. It happens often. If, as so many are, you have been mistaught that "statitically significant vs not significant" means "effect exists vs there is no effect" then you would find this confusing. How can each effect be non-existent but jointly there is an effect? But that is not what statistically significant means. And, in particular, a joint test is not like "OR"-ing together the individual tests. As you probably know, an individual test of a coefficient's significance is equivalent to whether the coefficient falls out side the corresponding confidence interval. With a simultaneous test of multiple coefficients, there is a geometric analogy to this in a higher-dimensional space. But the analogy is not like a rectangular (hyper)solid based on those confidence intervals. Rather it is based on a (hyper)ellipsoid, whose axes are (usually) obliquely oriented in space, and statistical significance corresponds to the vector of the coefficients lying outside that higher ellipsoid. Because of this geometry, it is possible for the joint test to be statistically significant when none of the individual coefficients are, and also possible for one or more (or all) of the coefficients to be statistically significant yet the joint test is not.

Can this be because the standard Wald-test is not cluster-robust?

No. The Wald test is calculated using whatever variance-covariance estimates are calculated in the regression itself. If the original regression was cluster robust, then so is the Wald test. If not, then no.
1 like
Comment
Anders Gotfredsen

Join Date: Oct 2023

Posts: 3
#3

29 Oct 2023, 07:08

Thank you for the answer!

It is of course possible for the joint test to contradict what the individual t-tests may imply -- that is why I perform the test in the first place. What I find "nonsensical" (perhaps a poor choice of word) is the following:

1. Using vce(robust) and vce(cluster industry) yields roughly the same standard errors on the estimates (p-values around 0.5-0.8 on all estimates). But the joint test yields very different results (p=0.2 with robust and p=0.000 with clustered).

2. Clustering increases the standard errors, yielding less significant results on the individual t-tests. But the result of the Wald-test becomes more significant, and by a lot. This sounds contradictory to me.

Both may be possible in theory, but it just seemed weird to me. Visually, I also don't see any pre-trend. This is why I thought that perhaps the test is not valid when clustering. For example, I remember being taught that F-tests assume independent observations, which clustering violates.

I have also found this thread, discussing a similar problem. They conclude that the test for joint significance should be performed with vce(robust), even if the main model uses clustered standard errors. But I am not sure that the example from the thread applies to my example, since I am not close to testing as many restrictions as my number of clusters. They do, however, discuss that the test could be invalid, even when testing fewer restrictions.

Further, even when it is able to test all M-1 linear restrictions (which did not happen in my regressions because I do not have that many parameters), it does not behave well (see last paragraph on p.33 here http://www.stata.com/meeting/13uk/nichols_crse.pdf).

This leads me to think that I should maybe stick to vce(robust) on the test. But is this a wrong interpretation?
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30194
#4

29 Oct 2023, 12:50

Bear in mind that when you jointly test multiple coefficients, their covariances come into play in calculating the test statistic. So intuitions honed on the behavior of single coefficient tests can be quite wrong.

[quote]This leads me to think that I should maybe stick to vce(robust) on the test. But is this a wrong interpretation?
It is my understanding that because in your situation the treatment you are investigating was applied at the cluster level, the use of cluster robust vce is necessary.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2207
#5

31 Oct 2023, 20:18

As Clyde states, these outcomes of the joint and individual tests are certainly possible, but I do find it peculiar. I've done these sorts of tests a handful of times and they are always pretty consistent (unless a couple of the individual dummies are marginally significant and then the joint test is insignificant).

Out of curiosity, does the same thing happen when you include treat2010 and omit treat2020 -- as is common in creating event study plots? The value of the Wald statistic should be the same, and I'm curious to know whether it is, and what the individual t statistics look like. Now 2020 acts as the comparison year rather than 2010.

Seeing your output could also help ....
1 like
Comment
Anders Gotfredsen

Join Date: Oct 2023

Posts: 3
#6

02 Nov 2023, 15:48

I have now tried to perform the test using 2019 as my base year (2020 is excluded from my analysis for different reasons, but even if I include it in my dataset it does not make a difference). Using 2019 instead of 2010 as base year actually changes the results somewhat. More of my pre-periods become significant and, perhaps more importantly, clustering now reduces the standard error on several of the parameters. The results from the Wald-test are mostly unchanged after changing base year.

I cannot show the output directly from the Stata window due to strict micro-data regulation. Instead I have exported the results to the tables shown below. All the tables show results with 2019 as base year:

First, I run the regression with robust VCE and include controls. The code looks like this:

Code:

reghdfe gpa treat2010-treat2018 treat2021 controls, absorb(year industry other_fe) vce(robust) testparm treat2010-treat2018

And the output looks like this ("sumtimer" and "log_gns_indk" are controls):

I then run the same regression with clustered errors:

Code:

reghdfe gpa treat2010-treat2018 treat2021 controls, absorb(year industry other_fe) vce(cluster industry) testparm treat2010-treat2018

And get the following result:

I have also tried doing it without the controls and other FE's, and it does not change anything.

I am actually working on two different datasets: one for high schoolers and one for 9th graders (in the Danish school system). What I have been showing thus far is the high school dataset. But for the 9th grade dataset, interestingly, I get the same results.

Results for 9th graders, using robust VCE:

And using clustered VCE:

The difference between the Wald-tests is not as large as with the high school students, but the pattern is the same. Notice also that clustering reduces the standard errors for some years.

Should I be worried that clustering reduces some of the standard errors? Normally I would choose the largest of clustered and robust VCE.

Last edited by Anders Gotfredsen; 02 Nov 2023, 15:51.
Comment

Matteo Pinna Pintor

Join Date: Jan 2015
Posts: 96

14 Apr 2025, 08:05

I also encountered this problem, although not when testing for pre-trends in an event study setting, but when testing for the significance of dose-response relationships with a (homogeneous cohort effect) TWFE estimator and a discretized continuous treatment (various temperature indicators).

I am a bit hesitant to report, as I the data is covered by a use agreement - and in fact I should submit the manuscript quite soon. But I'm pretty sure it's not a coding mistake. In my estimates, after stacking regressions, the Wald test rejects the null in panels where pretty neat dose-response relationships emerged from the separately estimated treatment coefficients, and fails to reject in other panels.... Except in one single case, a sub-group, one of the smallest (a few thousands) and with likely not many treated observations (probably less than a hundred), in which all individual coefficients (risk ratios) are close to 1 (some below, some above) and only one is marginally significant, but the Wald is very high, and significant even at levels that are Bonferroni-adjusted for the number of panels on which I'm testing. The elevation disappears when the VCE is estimated without error clustering. The sub-group on which this aberrant value arises is likely to have, relative to other sub-groups, particularly small treatment values. I think the test may become inflated for certain patterns of values in the VCE arising from a combination of 1) a small number of treated observations relative to controls, 2) values of the treatment variable close to zero, and perhaps also something about the frequency and distribution of the outcome values.

While there is a general case to be made about the intrinsic limits of significance testing decontextualized from other information, this behavior appears to be specific and can therefore strike at least some people as worth investigating. Here, the point is made that the inversion of the variance-covariance matrix will push the test statistic up when the original matrix has small values. So, for any set of coefficient estimates, arbitrary small values will still result in a high test statistic. This makes sense to me. If the issue has indeed something to do with the mechanics of the test, elucidating it may show that corrections are possible.

In the meanwhile, a broader perspective may well be relevant to suggest what to do about it. I don't see any clear justification for abandoning error clustering, when adopted for good reasons, although such a justification might emerge. I would instead think of supplying complementary information about "joint effect size". However, I'm uncertain as how to proceed - summing the coefficients makes sense when they refer to non-overlapping, ordered categories, such as years in a pre-trend test, but not in my case, where the categories represent increasingly severe exposures which affect, in part, the same observations. Is there a principled way to assess the joint size of relative risks? After all, what is being tested is a linear restriction, so the relevant information is already "properly displayed" in the estimated coefficients — if they are close to zero, one might simply verbally emphasize this fact. Still, I wonder.

Below, the results for this aberrant sub-group. Apologies for not showing more. Basically, in a large number of other, similarly organized panels collecting estimates of similar treatment variables, either dose-response relationships of individually significant coefficient emerge, and the joint null is rejected, or estimates are close to zero and no test is significant.

Code:

. * separate regressions
. forval x=25/29 {
  2. ppmlhdfe present c.age##c.age i.education i.ghs tmean_workhrs`x'_last4w if METclass==3 & sex==0 [pw=weights], eform absorb(i.year i.region i.weekday) d vce(cluster lsoa_masked) nolo
> g verbose(-1)
  3. }

HDFE PPML regression                              No. of obs      =      6,313
Absorbing 3 HDFE groups                           Residual df     =      2,197
Statistics robust to heteroskedasticity           Wald chi2(10)   =     193.96
Deviance             =  3901.962732               Prob > chi2     =     0.0000
Log pseudolikelihood = -3111.031128               Pseudo R2       =     0.0850

Number of clusters (lsoa_masked)=     2,198
                                  (Std. err. adjusted for 2,198 clusters in lsoa_masked)
----------------------------------------------------------------------------------------
                       |               Robust
               present |     exp(b)   std. err.      z    P>|z|     [95% conf. interval]
-----------------------+----------------------------------------------------------------
                   age |   .9976453   .0222815    -0.11   0.916     .9549165    1.042286
                       |
           c.age#c.age |   1.000126   .0002528     0.50   0.618     .9996309    1.000622
                       |
             education |
   Secondary or other  |   .9280617   .1393286    -0.50   0.619      .691492    1.245565
   Tertiary education  |   .8641418   .1351589    -0.93   0.351     .6359896     1.17414
     Higher education  |   .8739808   .1497771    -0.79   0.432     .6246383    1.222856
                       |
                   ghs |
            Very good  |    1.49012   .2242518     2.65   0.008     1.109486    2.001339
                 Good  |   2.185338   .3311162     5.16   0.000     1.623854    2.940968
                 Fair  |   4.170143   .6595001     9.03   0.000     3.058687    5.685476
                 Poor  |   5.767701   1.157334     8.73   0.000     3.892265    8.546789
                       |
tmean_workhrs25_last4w |   1.039077   .0196903     2.02   0.043     1.001193    1.078395
                 _cons |   .0764019   .0364884    -5.38   0.000     .0299629    .1948161
----------------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
        year |        10           0          10     |
      region |       135           1         134     |
     weekday |         7           1           6    ?|
-----------------------------------------------------+
? = number of redundant parameters may be higher

HDFE PPML regression                              No. of obs      =      6,313
Absorbing 3 HDFE groups                           Residual df     =      2,197
Statistics robust to heteroskedasticity           Wald chi2(10)   =     189.62
Deviance             =  3906.891057               Prob > chi2     =     0.0000
Log pseudolikelihood = -3113.495291               Pseudo R2       =     0.0842

Number of clusters (lsoa_masked)=     2,198
                                  (Std. err. adjusted for 2,198 clusters in lsoa_masked)
----------------------------------------------------------------------------------------
                       |               Robust
               present |     exp(b)   std. err.      z    P>|z|     [95% conf. interval]
-----------------------+----------------------------------------------------------------
                   age |   .9951294   .0223046    -0.22   0.828     .9523595     1.03982
                       |
           c.age#c.age |   1.000156   .0002536     0.62   0.538     .9996593    1.000653
                       |
             education |
   Secondary or other  |   .9253299   .1386632    -0.52   0.605     .6898294    1.241228
   Tertiary education  |   .8585798   .1339281    -0.98   0.328     .6324169    1.165622
     Higher education  |   .8700345   .1490771    -0.81   0.416      .621851    1.217269
                       |
                   ghs |
            Very good  |   1.487547   .2252449     2.62   0.009     1.105558     2.00152
                 Good  |   2.181693    .332784     5.11   0.000     1.617916    2.941924
                 Fair  |   4.167307   .6626077     8.98   0.000       3.0515     5.69112
                 Poor  |   5.782383   1.170199     8.67   0.000     3.889077    8.597401
                       |
tmean_workhrs26_last4w |    1.02847   .0424031     0.68   0.496     .9486304    1.115028
                 _cons |   .0813889   .0390475    -5.23   0.000     .0317826    .2084208
----------------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
        year |        10           0          10     |
      region |       135           1         134     |
     weekday |         7           1           6    ?|
-----------------------------------------------------+
? = number of redundant parameters may be higher

HDFE PPML regression                              No. of obs      =      6,313
Absorbing 3 HDFE groups                           Residual df     =      2,197
Statistics robust to heteroskedasticity           Wald chi2(10)   =     191.10
Deviance             =  3907.197825               Prob > chi2     =     0.0000
Log pseudolikelihood = -3113.648675               Pseudo R2       =     0.0842

Number of clusters (lsoa_masked)=     2,198
                                  (Std. err. adjusted for 2,198 clusters in lsoa_masked)
----------------------------------------------------------------------------------------
                       |               Robust
               present |     exp(b)   std. err.      z    P>|z|     [95% conf. interval]
-----------------------+----------------------------------------------------------------
                   age |   .9945185   .0223636    -0.24   0.807     .9516386    1.039331
                       |
           c.age#c.age |   1.000164   .0002546     0.64   0.520     .9996649    1.000663
                       |
             education |
   Secondary or other  |    .924238   .1383689    -0.53   0.599     .6892063     1.23942
   Tertiary education  |   .8571226   .1336565    -0.99   0.323     .6314076    1.163526
     Higher education  |   .8691413   .1488804    -0.82   0.413     .6212737      1.2159
                       |
                   ghs |
            Very good  |   1.487517   .2253442     2.62   0.009     1.105384    2.001754
                 Good  |   2.181081   .3329555     5.11   0.000     1.617077    2.941799
                 Fair  |   4.169492   .6626755     8.98   0.000     3.053501    5.693356
                 Poor  |   5.782411   1.171673     8.66   0.000      3.88716    8.601722
                       |
tmean_workhrs27_last4w |   1.034915   .0916295     0.39   0.698     .8700431    1.231029
                 _cons |   .0825661   .0397394    -5.18   0.000     .0321451    .2120747
----------------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
        year |        10           0          10     |
      region |       135           1         134     |
     weekday |         7           1           6    ?|
-----------------------------------------------------+
? = number of redundant parameters may be higher

HDFE PPML regression                              No. of obs      =      6,313
Absorbing 3 HDFE groups                           Residual df     =      2,197
Statistics robust to heteroskedasticity           Wald chi2(10)   =     189.10
Deviance             =  3907.272381               Prob > chi2     =     0.0000
Log pseudolikelihood = -3113.685953               Pseudo R2       =     0.0842

Number of clusters (lsoa_masked)=     2,198
                                  (Std. err. adjusted for 2,198 clusters in lsoa_masked)
----------------------------------------------------------------------------------------
                       |               Robust
               present |     exp(b)   std. err.      z    P>|z|     [95% conf. interval]
-----------------------+----------------------------------------------------------------
                   age |   .9936645   .0222953    -0.28   0.777     .9509134    1.038338
                       |
           c.age#c.age |   1.000174    .000254     0.69   0.493     .9996766    1.000672
                       |
             education |
   Secondary or other  |    .924526   .1384076    -0.52   0.600     .6894277    1.239794
   Tertiary education  |   .8570319    .133643    -0.99   0.322     .6313397    1.163405
     Higher education  |   .8691721   .1488957    -0.82   0.413     .6212817     1.21597
                       |
                   ghs |
            Very good  |   1.487722   .2252791     2.62   0.009     1.105676    2.001775
                 Good  |   2.180954   .3330291     5.11   0.000     1.616848    2.941874
                 Fair  |   4.176926   .6622035     9.02   0.000     3.061319    5.699083
                 Poor  |   5.773541   1.171274     8.64   0.000     3.879356    8.592606
                       |
tmean_workhrs28_last4w |   .9621615   .1734106    -0.21   0.831     .6758274     1.36981
                 _cons |   .0839579   .0403868    -5.15   0.000     .0327041    .2155365
----------------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
        year |        10           0          10     |
      region |       135           1         134     |
     weekday |         7           1           6    ?|
-----------------------------------------------------+
? = number of redundant parameters may be higher

HDFE PPML regression                              No. of obs      =      6,313
Absorbing 3 HDFE groups                           Residual df     =      2,197
Statistics robust to heteroskedasticity           Wald chi2(10)   =     190.61
Deviance             =  3907.305885               Prob > chi2     =     0.0000
Log pseudolikelihood = -3113.702705               Pseudo R2       =     0.0842

Number of clusters (lsoa_masked)=     2,198
                                  (Std. err. adjusted for 2,198 clusters in lsoa_masked)
----------------------------------------------------------------------------------------
                       |               Robust
               present |     exp(b)   std. err.      z    P>|z|     [95% conf. interval]
-----------------------+----------------------------------------------------------------
                   age |   .9938231    .022276    -0.28   0.782     .9511082    1.038456
                       |
           c.age#c.age |   1.000172   .0002538     0.68   0.497     .9996749     1.00067
                       |
             education |
   Secondary or other  |   .9245031   .1385852    -0.52   0.601      .689146    1.240239
   Tertiary education  |   .8569858   .1336681    -0.99   0.322     .6312593    1.163428
     Higher education  |   .8691166   .1489216    -0.82   0.413     .6211924     1.21599
                       |
                   ghs |
            Very good  |   1.487574   .2253181     2.62   0.009     1.105477    2.001738
                 Good  |   2.180964   .3330416     5.11   0.000     1.616839    2.941916
                 Fair  |   4.175594   .6623515     9.01   0.000     3.059827    5.698226
                 Poor  |   5.775026   1.171286     8.65   0.000     3.880735    8.593973
                       |
tmean_workhrs29_last4w |   .9680669   .2620067    -0.12   0.905     .5695452    1.645442
                 _cons |   .0836995   .0401703    -5.17   0.000      .032674    .2144097
----------------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
        year |        10           0          10     |
      region |       135           1         134     |
     weekday |         7           1           6    ?|
-----------------------------------------------------+
? = number of redundant parameters may be higher

.
. *stacked regressions
. preserve

. qui rename (tmean_workhrs25_last4w tmean_workhrs26_last4w tmean_workhrs27_last4w tmean_workhrs28_last4w tmean_workhrs29_last4w) tempvars=

. qui gen long obs=_n

. qui reshape long tempvars, i(obs) j(stack) string

. qui encoder stack, replace

. qui lab def stack 1 "stack 25°" 2 "stack 26°" 3 "stack 27°" 4 "stack 28°" 5 "stack 29°", modify

. qui lab val stack stack

.
. ppmlhdfe present c.age##c.age#i.stack ib1.education#i.stack ib1.ghs#i.stack c.tempvars#i.stack if METclass==3 & sex==0 [pw=weights], eform absorb(i.year#i.stack i.region#i.stack i.week
> day#i.stack) d(stackfesum_3) vce(cluster lsoa_masked) noomitted nolog
(dropped 1865 observations that are either singletons or separated by a fixed effect)
note: 8 variables omitted because of collinearity: 3bn.education#2bn.stack 3bn.education#3bn.stack 3bn.education#4bn.stack 3bn.education#5bn.stack 5bn.ghs#2bn.stack 5bn.ghs#3bn.stack 5bn
> .ghs#4bn.stack 5bn.ghs#5bn.stack
Converged in 7 iterations and 22 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression                              No. of obs      =     31,565
Absorbing 3 HDFE groups                           Residual df     =      2,197
Statistics robust to heteroskedasticity           Wald chi2(50)   =     267.41
Deviance             =  19530.62988               Prob > chi2     =     0.0000
Log pseudolikelihood = -15565.56375               Pseudo R2       =     0.0843

Number of clusters (lsoa_masked)=     2,198
                                         (Std. err. adjusted for 2,198 clusters in lsoa_masked)
-----------------------------------------------------------------------------------------------
                              |               Robust
                      present |     exp(b)   std. err.      z    P>|z|     [95% conf. interval]
------------------------------+----------------------------------------------------------------
                  stack#c.age |
                   stack 25°  |   .9976453   .0222815    -0.11   0.916     .9549165    1.042286
                   stack 26°  |   .9951294   .0223046    -0.22   0.828     .9523595     1.03982
                   stack 27°  |   .9945185   .0223636    -0.24   0.807     .9516386    1.039331
                   stack 28°  |   .9936645   .0222953    -0.28   0.777     .9509134    1.038338
                   stack 29°  |   .9938231    .022276    -0.28   0.782     .9511082    1.038456
                              |
            stack#c.age#c.age |
                   stack 25°  |   1.000126   .0002528     0.50   0.618     .9996309    1.000622
                   stack 26°  |   1.000156   .0002536     0.62   0.538     .9996593    1.000653
                   stack 27°  |   1.000164   .0002546     0.64   0.520     .9996649    1.000663
                   stack 28°  |   1.000174    .000254     0.69   0.493     .9996766    1.000672
                   stack 29°  |   1.000172   .0002538     0.68   0.497     .9996749     1.00067
                              |
              education#stack |
         No degree#stack 25°  |   1.077515   .1617658     0.50   0.619     .8028482    1.446148
         No degree#stack 26°  |    1.14938   .1969419     0.81   0.416     .8215109    1.608102
         No degree#stack 27°  |   1.150561   .1970864     0.82   0.413     .8224362    1.609596
         No degree#stack 28°  |    1.15052   .1970928     0.82   0.413     .8223885    1.609576
         No degree#stack 29°  |   1.150594   .1971522     0.82   0.413     .8223753    1.609807
Secondary or other#stack 26°  |   1.063555   .1298314     0.50   0.614     .8372427    1.351042
Secondary or other#stack 27°  |   1.063392   .1298588     0.50   0.615     .8370411    1.350952
Secondary or other#stack 28°  |   1.063686   .1298163     0.51   0.613     .8373932     1.35113
Secondary or other#stack 29°  |   1.063727   .1299149     0.51   0.613     .8372816    1.351416
Tertiary education#stack 25°  |   .9311254   .0958268    -0.69   0.488     .7610388    1.139225
Tertiary education#stack 26°  |   .9868342   .1239961    -0.11   0.916      .771419    1.262403
Tertiary education#stack 27°  |   .9861718   .1239032    -0.11   0.912      .770916    1.261531
Tertiary education#stack 28°  |   .9860324    .123864    -0.11   0.911     .7708403    1.261299
Tertiary education#stack 29°  |   .9860424   .1238732    -0.11   0.911      .770836    1.261331
  Higher education#stack 25°  |    .941727   .1146242    -0.49   0.622     .7418554    1.195448
                              |
                    ghs#stack |
         Excellent#stack 26°  |   .1729391   .0349982    -8.67   0.000     .1163142    .2571304
         Excellent#stack 27°  |   .1729383    .035042    -8.66   0.000     .1162558    .2572572
         Excellent#stack 28°  |   .1732039   .0351378    -8.64   0.000     .1163791    .2577748
         Excellent#stack 29°  |   .1731594     .03512    -8.65   0.000     .1163606    .2576832
         Very good#stack 25°  |    1.49012   .2242518     2.65   0.008     1.109486    2.001339
         Very good#stack 26°  |    .257255   .0410573    -8.51   0.000     .1881542    .3517334
         Very good#stack 27°  |   .2572486   .0411031    -8.50   0.000     .1880825    .3518502
         Very good#stack 28°  |   .2576792   .0412076    -8.48   0.000     .1883462    .3525348
         Very good#stack 29°  |   .2575873   .0411756    -8.49   0.000     .1883038    .3523626
              Good#stack 25°  |   2.185338   .3311162     5.16   0.000     1.623854    2.940968
              Good#stack 26°  |      .3773    .060053    -6.12   0.000     .2761883    .5154284
              Good#stack 27°  |   .3771923   .0601151    -6.12   0.000     .2759957    .5154937
              Good#stack 28°  |   .3777499   .0602735    -6.10   0.000     .2763041    .5164418
              Good#stack 29°  |   .3776545   .0602397    -6.10   0.000     .2762609    .5162616
              Fair#stack 25°  |   4.170143   .6595001     9.03   0.000     3.058687    5.685476
              Fair#stack 26°  |   .7206903   .1113338    -2.12   0.034     .5324185     .975538
              Fair#stack 27°  |   .7210647    .111663    -2.11   0.035     .5323025    .9767647
              Fair#stack 28°  |   .7234601   .1120081    -2.09   0.037     .5341081     .979941
              Fair#stack 29°  |   .7230433   .1118852    -2.10   0.036     .5338849    .9792217
              Poor#stack 25°  |   5.767701   1.157334     8.73   0.000     3.892265    8.546789
                              |
             stack#c.tempvars |
                   stack 25°  |   1.039077   .0196903     2.02   0.043     1.001193    1.078395
                   stack 26°  |    1.02847   .0424031     0.68   0.496     .9486304    1.115028
                   stack 27°  |   1.034915   .0916295     0.39   0.698     .8700431    1.231029
                   stack 28°  |   .9621615   .1734106    -0.21   0.831     .6758274     1.36981
                   stack 29°  |   .9680669   .2620067    -0.12   0.905     .5695452    1.645442
                              |
                        _cons |   .2922611   .1447967    -2.48   0.013     .1106764    .7717684
-----------------------------------------------------------------------------------------------

Absorbed degrees of freedom:
---------------------------------------------------------+
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
-----------------+---------------------------------------|
      year#stack |        50           0          50     |
    region#stack |       675           5         670     |
   weekday#stack |        35           5          30    ?|
---------------------------------------------------------+
? = number of redundant parameters may be higher

. etable, keep(stack#c.tempvars) cstat(_r_b) /*cstat(_r_se) cstat(_r_z)*/ cstat(_r_ci, nformat(%6.3f) cidelimiter(",")) replace title("SUR estimates within `strata'-`u'hrs panel ")

SUR estimates within -hrs panel
---------------------------------------
                             present  
---------------------------------------
attack number # tempvars              
  stack 25°                       1.039
                         [1.001, 1.078]
  stack 26°                       1.028
                         [0.949, 1.115]
  stack 27°                       1.035
                         [0.870, 1.231]
  stack 28°                       0.962
                         [0.676, 1.370]
  stack 29°                       0.968
                         [0.570, 1.645]
N                                 31565
---------------------------------------

. testparm stack#c.tempvars

( 1)  1b.stack#c.tempvars = 0
 ( 2)  2.stack#c.tempvars = 0
 ( 3)  3.stack#c.tempvars = 0
 ( 4)  4.stack#c.tempvars = 0
 ( 5)  5.stack#c.tempvars = 0

           chi2(  5) =   17.27
         Prob > chi2 =    0.0040

. restore

.
. qui log close

Last edited by Matteo Pinna Pintor; 14 Apr 2025, 08:56.

I'm using StataNow/MP 18.5

Comment

Matteo Pinna Pintor

Join Date: Jan 2015

Posts: 96
#8

16 Apr 2025, 03:20

Update: this is likely to be a particular manifestation of the Hauck-Donner effect. The aberration appears to be most studied when estimates are far away from the null - in which case the test under-rejects and the power function becomes locally non-monotonic. Which is why I first deemed this unrelated to my problem However, the following example is mentioned in a short commentary by Mantel (1987):

I shall go on to considering a situation more similar to ones considered by Vaeth, but first let me say what I found when considering the Poisson distribution. If my null Poisson parameter λ is sufficiently large, then even a single
observation would make things asymptotic. But suppose that that observation is x = 0, suggesting some extreme alternative to be true. If my interest is in the logarithmic parameter θ = log λ, then Wald’s W turns out to be 0 relative
to any θ or λ, however large. (Demonstrating this requires a two-stage use of l’Hôpital’s rule in order to cope with indefinite expressions.)

Curiously, however, if my interest lies with λ, then x = 0 leads to an infinite value for W relative to any finite nonzero value for λ, however small. Thus We are led simultaneously
to accept and to reject all possible values for λ when x = 0 if we blindly use Wald’s W. The fact is that arithmetically the Poisson variance is smallest when λ approaches 0, but
it is then that the logarithmic variance is greatest; hence the anomalous behavior just described

I take this to mean that there is a symmetrical case in which the test over-rejects when the estimates are very close to the null - although this manifestation might be more specific.

There is a literature (here, here, and here) considering ways to check for power non-monotonicity within a given dataset, and also an R routine. Nothing in Stata, as far as I know - except for some posts on the old Statalist, including someone who mentions that Huber-White errors can help (consistent with evidence posted in this discussion).

I am still not sure how this relates to small values of the treatment variable (more likely with continuous treatments, which can be below 1), but my setting clearly shows that, in my case, this is where the problem comes. I explore alternative treatment indicators at constant size of sample and number of treated observations. The test is only inflated using indicators that by construction allow positive values below 1. If I then move these values to 1, which I guess can be seen as a rudimentary type of pre-fit check for power non-monotonicity, the aberration disappears.

I would be interested in knowing what Jeff Wooldridge makes of this.

Last edited by Matteo Pinna Pintor; 16 Apr 2025, 03:26.

I'm using StataNow/MP 18.5
Comment

Announcement