Missing F-statistic and underestimated standard errors using xtreg dummy-variables i.year, fe vce(cluster)

Corne Slob

Join Date: Nov 2020
Posts: 3

Missing F-statistic and underestimated standard errors using xtreg dummy-variables i.year, fe vce(cluster)

18 Nov 2020, 17:21

Dear all,

Stata does not report the F-statistic when I cluster the standard errors at the firm level. I am aware that the missing F-statistic is discussed in previous forums, however, these do not provide the answers that I am looking for. For my master thesis, I am trying to estimate the following model :
Xtreg Y $control_variables $blockholderFE i.year, fe i(firmID) vce(cluster firmID)

Where Y is an accounting variable (e.g., investment, leverage), and where $blockholderFE are 1185 dummy variables that are equal to one when a large shareholder is present in firm j in year t. I have an unbalanced panel-dataset with 1900 firms, and 21,000 firm-year observations over the years 2003-2019. I restrict large shareholders to be present in two or more firms. For those who are interested, the model is based on “Large shareholders and Corporate policies” by Cronqvist and Fahlenbrach (2009)

I am not interested in the F-statistic of the complete model, but I need to test whether the 1185 dummy variables are statistically different from zero. The command test $blockholderFE does report the F-statistics, however, the F-statistic is inflated (F-statistic > 400). In contrast, the F-statistic without vce(cluster) has a value of less than 3. I have analyzed the differences between the standard errors and robust standard errors and find that including vce(cluster) reduces the standard errors by 80% on average. I feel like the standard errors are unreliable, is that correct? Can I use the following as a reason for not clustering the standard errors at the firm level?
According to Austin Nichols and Mark Schaffer “In a fixed-effect model, where there are a large number of parameters, this often means that test of overall model significance is feasible. However, testing fewer than M linear constraints is perfectly feasible in these models, though when fixed effects and clustering are specified at the same level, tests that involve the fixed effects themselves are inadvisable (the standard errors on fixed effects are likely to be substantially underestimated, though this will not affect the other variance estimates in general” https://www.stata.com/meeting/13uk/nichols_crse.pdf

According to help j_robustsingular, the F-statistic turns missing when 1) more predictors than clusters, and 2) the presence of singleton dummies.
I don’t think that these are apparent in my case because:
1) I have more clusters (1900), than parameters (<1200)
2) I restrict large shareholders to be present in more than two firms. Hence, each dummy variable is non-zero in more than 1 observation, and the dummy variable is non-zero in multiple clusters.

I found that the F-statistic goes missing when different dummy variables are present in the same firms. (using the xtreg command)
Example:

FirmID	Year	Dummy1	Dummy2	Dummy3	Dummy4
1	t	1	1	0	1
1	t	1	1	1	1
1	t	1	0	1	0
1	t	0	0	0	0
1	t	0	0	0	0
2	t	1	1	0	0
2	t	1	1	0	0
2	t	0	0	1	0
2	t	0	0	1	0
2	t	0	0	0	0
3	t	0	1	0	0
3	t	0	1	1	0
3	t	0	0	1	0
3	t	0	0	0	0
3	t	0	0	0	0
4	t	0	0	1	0
4	t	0	0	1	1
4	t	0	0	1	1
4	t	0	0	0	1
4	t	0	0	0	0

When I ran xtreg and then the dependent variable, a set of control variables, dummy1-dummy3, firm and year fixed effects, and clustered standard errors at the firm level, Stata does report the F-statistic. However, when I add the fourth dummy, the F-statistic goes missing. For simplicity, I have shown 4 variables, but the same holds with more than 50 dummy variables. Can someone explain why this is happening?

I want to include vce(cluster) because xttest3 shows that there is heteroscedasticity. Also, xtserial suggests that there is serial correlation in a model without firm, and year fixed effects. For my analyses, I work with a model that does not cluster the standard errors, because clustering seems to underestimate the standard errors. In a model with a set of control variables, 1185 dummy variables, firm and year fixed effects, I find that the blockholder fixed effects (1185 dummy variables) are statistically different from zero (p-value <0.001). I would like to rule out that not accounting for serial correlation and heteroscedasticity results in a type 1 error. (concluding that the blockholder fixed effects are jointly significant, while actually, they are not)

Basically, I have the following questions:
(1) Clustering the standard error seems to result in underestimated standard errors. Is it therefore better not to cluster the standard errors? What would be the reasoning?
(2) Why does Stata not report the F-statistic when dummy4 is included in the example above?
(3) Are my concerns for having a type 1 error, by not accounting for heteroscedasticity and serial correlation, well-founded? If so, is there a way to mitigate this issue?

Hopefully, someone can help me.

Thanks in advance!
Corné

Last edited by Corne Slob; 18 Nov 2020, 17:59.

Tags: cluster, f-test, fixed effects, panel data, standard error

Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#2

18 Nov 2020, 18:02

Well, for the example you show with four indicator ("dummy") variables, it is clearly an instance of the number of degrees of freedom exceeding the number of clusters. (The number of observations is not relevant here--it is the number of clusters that limits the number of variables.)

In your real data it is less clear what is going on. My best guess is that in the estimation sample you do not have as many clusters as you think you do. Remember that the estimation sample will only include observations that have non-missing values on every variable mentioned in the regression command. Since you have some "control" variables in the model, it is possible that missing data in those variables has led to more clusters being omitted from the estimation than you are aware of. It's simple enough to know. Just look at the output from -xtreg-, above the coefficient table. Just above the start of the regression table there will be a note (Std. Err. adjusted for # clusters in firmid). The number in that message is the actual count of clusters that survived the creation of the estimation sample.

If it's not that, then it may be that due to missing data on the "control" variables, some of your clusters may have been reduced to singletons.

In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
1 like
Comment

Corne Slob

Join Date: Nov 2020
Posts: 3

19 Nov 2020, 06:18

Dear Clyde,

Thanks for your fast response.

You are right, from my example, it is not clear that the number of clusters is higher than the number of variables.
Hereafter, I show the example using -dataex- and a small sample of the real data (with 693 indicator variables and where firmID is up to 1815), as suggested.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float firmID double year float(investment lag_lnassets lag_cashflow dummy1 dummy2 dummy3 dummy4) byte(blockholderdummy1 blockholderdummy2 blockholderdummy3 blockholderdummy4 blockholderdummy5 blockholderdummy6 blockholderdummy7 blockholderdummy8)
1 2003 .06168109  6.531783  .14389177 1 1 0 0 0 0 0 0 0 0 0 0
1 2004 .07844731  6.564267  .18100154 1 1 0 0 0 0 0 0 0 0 0 0
1 2005 .11712198  6.596095   .2800219 0 0 0 0 0 0 0 0 0 0 0 0
1 2006 .14008342  6.886347   .4627454 0 0 1 0 0 0 0 0 0 0 0 0
1 2007 .11659434  6.973199   .4294967 0 0 1 0 0 0 0 0 0 0 0 0
1 2008 .08871011  7.216717   .4447028 0 0 1 0 0 0 0 0 0 0 0 0
1 2009 .11749449  7.228034  .39031485 0 0 0 0 0 0 0 0 0 0 0 0
1 2010  .3734085  7.313915   .3402393 0 0 0 0 0 0 0 0 0 0 0 0
1 2011 .21834816  7.440574   .3960022 0 0 0 0 0 0 0 0 0 0 0 0
1 2012 .08245342  7.694235  .35440105 0 0 0 0 0 0 0 0 0 0 0 0
1 2013 .06214822  7.667111   .3587601 0 0 0 0 0 0 0 0 0 0 0 0
1 2014 .11202516  7.695985   .4369137 0 0 0 0 0 0 0 0 0 0 0 0
1 2015   .299661  7.323171    .091459 0 0 0 0 0 0 0 0 0 0 0 0
1 2016 .10704046  7.273856   .3772881 0 0 0 0 0 0 0 0 0 0 0 0
1 2017 .06085754   7.31595  .38611025 0 0 0 0 0 0 0 0 0 0 0 0
1 2018 .05495894  7.329553    .315906 0 0 0 0 0 0 0 0 0 0 0 0
2 2003 .21391813 10.096547  11.384886 1 1 0 0 0 0 0 0 0 0 0 0
2 2004  .2056149 10.192993   .6909986 1 1 0 0 0 0 0 0 0 0 0 0
2 2005 .20098507    10.267    .710709 1 1 1 0 0 0 0 0 0 0 0 0
2 2006   .222853 10.279908   .7874672 0 1 1 0 0 0 0 0 0 0 0 0
2 2007 .23842546  10.49621   .5456318 0 0 1 0 0 0 0 0 0 0 0 0
2 2008 .17128205 10.589458   .7861875 0 0 0 1 0 0 0 0 0 0 0 0
2 2009  .1508551 10.655356   .8742903 0 0 0 1 0 0 0 0 0 0 0 0
2 2010 .13322088  10.86698  1.0853536 0 0 0 1 0 0 0 0 0 0 0 0
2 2011 .18711683 10.993097     .95157 0 0 0 0 0 0 0 0 0 0 0 0
2 2012 .22800346 11.006704   .9750829 0 0 0 0 0 0 0 0 0 0 0 0
2 2013 .14200588  11.11595  1.1107666 0 0 0 0 0 0 0 0 0 0 0 0
2 2014  .1823878 10.667862  .50874066 0 0 0 0 0 0 0 0 0 0 0 0
2 2015  .1870261 10.628013  .55359864 0 0 0 0 0 0 0 0 0 0 0 0
2 2016   .195637 10.627334   .6871104 0 0 0 0 0 0 0 0 0 0 0 0
2 2017  .1989483 10.871725   .4216405 0 0 0 0 0 0 0 0 0 0 0 0
2 2018 .18325227 11.241773  .59141105 0 0 0 0 0 0 0 0 0 0 0 0
2 2019  .2165807 11.115026   .7377415 0 0 0 0 0 0 0 0 0 0 0 0
3 2003  .2425987 4.7510266 .000732724 0 0 0 0 0 0 0 0 0 0 0 0
3 2004 .31032965  4.816395   4.319079 0 0 0 0 0 0 0 0 0 0 0 0
3 2005    1.3474  5.008613   6.235604 0 1 1 0 0 0 0 0 0 0 0 0
3 2006 .09940465  5.004134  4.4898267 0 1 1 0 0 0 0 0 0 0 0 0
3 2007 .14642262  5.115548  1.9141258 0 1 1 0 0 0 0 0 0 0 0 0
3 2008   .271675  5.238981   2.496464 0 0 1 0 0 0 0 0 0 0 0 0
3 2009 .12932435  5.403771   3.597594 0 0 0 0 0 0 0 0 0 0 0 0
3 2010   .931984  5.325271   2.436731 0 0 0 0 0 0 0 0 0 0 0 0
3 2011  .7847533  5.446095  2.2068722 0 0 0 0 0 0 0 0 0 0 0 0
3 2012 .09078132  5.741929  2.0931578 0 0 0 0 0 0 0 0 0 0 0 0
3 2013 .08731312   5.70138  1.9779247 0 0 0 0 0 0 0 0 0 0 0 0
3 2014 .10035057  5.778983  2.5008116 0 0 0 0 0 0 0 0 0 0 0 0
3 2015 .05331375  6.148434  3.2508326 0 0 0 0 0 0 0 0 0 0 0 0
3 2016 .10788064  6.193944   3.917048 0 0 0 0 0 0 0 0 0 0 0 0
3 2017 .18847074  6.293009   4.539403 0 0 0 0 0 0 0 0 0 0 0 0
3 2018 .55341387  6.945229  3.4976106 0 0 0 0 0 0 0 0 0 0 0 0
4 2003  .1979708  8.633942 -17.070185 0 0 0 0 0 0 0 0 0 0 0 0
4 2004  .3742081  8.867053  .25033697 0 0 1 0 0 0 0 0 0 0 0 0
4 2005  .3574024  8.967531   .3417983 0 0 1 0 0 0 0 0 1 0 0 0
4 2006  .6875231  8.893954   .3270879 0 0 1 1 0 0 0 0 1 0 0 0
4 2007  .4226235  9.483949   .2484265 0 0 0 1 0 0 0 0 1 0 0 0
4 2008 .13220339  9.354441  -.5201906 0 0 0 1 0 0 0 0 0 0 0 0
end

I dropped all the missing values and included indicator variables for large shareholders that are present in more than two or more clusters and are non-zero in more than two observations in each cluster. Hence, I have no singletons.
693 indicator variables and 1830 firms remain in the dataset. I still have the following issues:
1) F-statistic goes missing when estimating:

Code:

xtreg investment lag_lnassets lag_cashflow lag_tobinsQ  blockholderdummy1-blockholderdummy693 i.year, fe i(firmID) vce(cluster firmID)


Fixed-effects (within) regression               Number of obs     =     20,982
Group variable: firmID                          Number of groups  =      1,815

R-sq:                                           Obs per group:
     within  = 0.1887                                         min =          2
     between = 0.2699                                         avg =       11.6
     overall = 0.2195                                         max =         17

                                                F(711,1814)       =          .
corr(u_i, Xb)  = -0.1002                        Prob > F          =          .

                                    (Std. Err. adjusted for 1,815 clusters in firmID)

To show when the F-statistic goes missing, I use the four dummy variables from the example before. Using these dummy variables, it turns out that the F-statistic goes missing when dummy 4 is added.
I don't know why, but by adding an indicator variable that is present in the same clusters as other indicator variables, the F-statistic goes missing.

Code:

Regression without dummy4

xtreg investment lag_lnassets lag_cashflow lag_tobinsQ dummy1-dummy3 blockholderdummy1-blockholderdummy20 i.year, fe i(firmID) vce(cluster firmID)


Fixed-effects (within) regression               Number of obs     =     20,982
Group variable: firmID                          Number of groups  =      1,815

R-sq:                                           Obs per group:
     within  = 0.1216                                         min =          2
     between = 0.2919                                         avg =       11.6
     overall = 0.1915                                         max =         17

                                                F(42,1814)        =      31.44
corr(u_i, Xb)  = -0.0315                        Prob > F          =     0.0000

                                   (Std. Err. adjusted for 1,815 clusters in firmID)
------------------------------------------------------------------------------------
                   |               Robust
        investment |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
      lag_lnassets |  -.0380031   .0058367    -6.51   0.000    -.0494504   -.0265558
      lag_cashflow |   .0047232   .0007943     5.95   0.000     .0031653    .0062811
       lag_tobinsQ |    .055387   .0035791    15.48   0.000     .0483674    .0624066
            dummy1 |  -.1594272   .0801823    -1.99   0.047    -.3166864   -.0021679
            dummy2 |   .0929349   .0819996     1.13   0.257    -.0678887    .2537584
            dummy3 |   .0037441   .0608523     0.06   0.951    -.1156039    .1230921
 blockholderdummy1 |   .7003422   .2766797     2.53   0.011      .157698    1.242986
 blockholderdummy2 |  -.0928466   .0227137    -4.09   0.000    -.1373943   -.0482989
 blockholderdummy3 |    .014436   .0404178     0.36   0.721    -.0648344    .0937063
 blockholderdummy4 |   .2216758   .0632535     3.50   0.000     .0976185    .3457331
 blockholderdummy5 |  -.0176342   .0098311    -1.79   0.073    -.0369157    .0016473
 blockholderdummy6 |   -.002985   .0210026    -0.14   0.887    -.0441768    .0382067
 blockholderdummy7 |  -.0055695   .0128751    -0.43   0.665     -.030821     .019682
 blockholderdummy8 |  -.0384459    .013929    -2.76   0.006    -.0657644   -.0111273
 blockholderdummy9 |   .0112613   .0413898     0.27   0.786    -.0699154     .092438
blockholderdummy10 |  -.0879261   .0087488   -10.05   0.000    -.1050849   -.0707672
blockholderdummy11 |   .0241237   .0199463     1.21   0.227    -.0149964    .0632437
blockholderdummy12 |  -.0227759   .0209407    -1.09   0.277    -.0638463    .0182946
blockholderdummy13 |  -.0120346   .0099668    -1.21   0.227    -.0315822     .007513
blockholderdummy14 |  -.0094283   .0191017    -0.49   0.622    -.0468919    .0280352
blockholderdummy15 |  -.1965801   .0250201    -7.86   0.000    -.2456514   -.1475088
blockholderdummy16 |   .0247173   .0474506     0.52   0.602    -.0683463    .1177808
blockholderdummy17 |  -.0558941   .0264637    -2.11   0.035    -.1077966   -.0039917
blockholderdummy18 |   .0268469   .0253643     1.06   0.290    -.0228993    .0765931
blockholderdummy19 |   .0147014   .0204066     0.72   0.471    -.0253215    .0547243
blockholderdummy20 |    .001004    .022434     0.04   0.964    -.0429951    .0450031
                   |
              year |
             2004  |   .0403535   .0076667     5.26   0.000      .025317      .05539
             2005  |   .0582069   .0079728     7.30   0.000     .0425701    .0738438
             2006  |   .0813596   .0089158     9.13   0.000     .0638733    .0988459
             2007  |   .0695244   .0090678     7.67   0.000       .05174    .0873087
             2008  |   .0546497   .0093084     5.87   0.000     .0363934    .0729061
             2009  |  -.0123334   .0076495    -1.61   0.107    -.0273361    .0026694
             2010  |   .0194917    .008319     2.34   0.019     .0031759    .0358075
             2011  |   .0529051   .0086588     6.11   0.000     .0359227    .0698874
             2012  |   .0859319   .0105299     8.16   0.000     .0652799    .1065838
             2013  |    .051021   .0087367     5.84   0.000      .033886    .0681561
             2014  |   .0322704   .0090312     3.57   0.000     .0145577    .0499831
             2015  |    .020803   .0088304     2.36   0.019     .0034842    .0381217
             2016  |   .0172798   .0090514     1.91   0.056    -.0004725    .0350321
             2017  |   .0213456   .0096694     2.21   0.027     .0023813    .0403099
             2018  |   .0053292   .0098276     0.54   0.588    -.0139455    .0246038
             2019  |   .0421334   .0110883     3.80   0.000     .0203862    .0638807
                   |
             _cons |   .4057398   .0413197     9.82   0.000     .3247005     .486779
-------------------+----------------------------------------------------------------
           sigma_u |  .15785327
           sigma_e |  .19158825
               rho |  .40435173   (fraction of variance due to u_i)
------------------------------------------------------------------------------------

.

Code:

Including dummy4

xtreg investment lag_lnassets lag_cashflow lag_tobinsQ dummy1-dummy4 blockholderdummy1-blockholderdummy20 i.year, fe i(firmID) vce(cluster firmID)


Fixed-effects (within) regression               Number of obs     =     20,982
Group variable: firmID                          Number of groups  =      1,815

R-sq:                                           Obs per group:
     within  = 0.1217                                         min =          2
     between = 0.2919                                         avg =       11.6
     overall = 0.1915                                         max =         17

                                                F(42,1814)        =          .
corr(u_i, Xb)  = -0.0316                        Prob > F          =          .

                                   (Std. Err. adjusted for 1,815 clusters in firmID)
------------------------------------------------------------------------------------
                   |               Robust
        investment |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
      lag_lnassets |  -.0380248   .0058397    -6.51   0.000    -.0494781   -.0265715
      lag_cashflow |    .004723   .0007943     5.95   0.000      .003165    .0062809
       lag_tobinsQ |   .0553851   .0035792    15.47   0.000     .0483654    .0624049
            dummy1 |  -.1578213   .0798591    -1.98   0.048    -.3144468   -.0011958
            dummy2 |   .0950916   .0802503     1.18   0.236     -.062301    .2524843
            dummy3 |   .0032694   .0588286     0.06   0.956    -.1121095    .1186482
            dummy4 |   .0292371    .049099     0.60   0.552    -.0670595    .1255337
 blockholderdummy1 |   .7003167   .2766718     2.53   0.011      .157688    1.242945
 blockholderdummy2 |  -.0928709   .0227222    -4.09   0.000    -.1374354   -.0483064
 blockholderdummy3 |   .0144192   .0404202     0.36   0.721    -.0648558    .0936941
 blockholderdummy4 |   .2216712   .0632568     3.50   0.000     .0976075    .3457349
 blockholderdummy5 |  -.0177592   .0098165    -1.81   0.071     -.037012    .0014937
 blockholderdummy6 |  -.0029711   .0210047    -0.14   0.888     -.044167    .0382248
 blockholderdummy7 |  -.0055734    .012875    -0.43   0.665    -.0308248    .0196781
 blockholderdummy8 |  -.0384254   .0139248    -2.76   0.006    -.0657358    -.011115
 blockholderdummy9 |   .0112781   .0413858     0.27   0.785    -.0698908     .092447
blockholderdummy10 |  -.0879229   .0087564   -10.04   0.000    -.1050966   -.0707492
blockholderdummy11 |   .0241189   .0199433     1.21   0.227    -.0149954    .0632332
blockholderdummy12 |  -.0227719   .0209411    -1.09   0.277    -.0638431    .0182994
blockholderdummy13 |  -.0120213   .0099673    -1.21   0.228      -.03157    .0075274
blockholderdummy14 |  -.0094408   .0191006    -0.49   0.621    -.0469023    .0280206
blockholderdummy15 |  -.1965847   .0250254    -7.86   0.000    -.2456664   -.1475031
blockholderdummy16 |   .0247063   .0474507     0.52   0.603    -.0683574      .11777
blockholderdummy17 |  -.0559065   .0264645    -2.11   0.035    -.1078107   -.0040023
blockholderdummy18 |   .0268458   .0253633     1.06   0.290    -.0228986    .0765903
blockholderdummy19 |   .0147091   .0204085     0.72   0.471    -.0253176    .0547358
blockholderdummy20 |   .0010197   .0224344     0.05   0.964    -.0429803    .0450196
                   |
              year |
             2004  |   .0403576   .0076668     5.26   0.000     .0253209    .0553943
             2005  |   .0582163   .0079727     7.30   0.000     .0425797     .073853
             2006  |   .0813508   .0089161     9.12   0.000     .0638639    .0988377
             2007  |   .0695163   .0090685     7.67   0.000     .0517306     .087302
             2008  |   .0546149   .0093118     5.87   0.000     .0363519    .0728779
             2009  |  -.0123472   .0076501    -1.61   0.107     -.027351    .0026567
             2010  |   .0194791   .0083196     2.34   0.019     .0031621    .0357961
             2011  |   .0529188   .0086594     6.11   0.000     .0359354    .0699022
             2012  |   .0859478   .0105306     8.16   0.000     .0652944    .1066012
             2013  |   .0510391   .0087375     5.84   0.000     .0339026    .0681756
             2014  |   .0322912   .0090324     3.58   0.000     .0145761    .0500062
             2015  |   .0208262   .0088318     2.36   0.018     .0035046    .0381479
             2016  |   .0173043   .0090536     1.91   0.056    -.0004522    .0350609
             2017  |   .0213723   .0096724     2.21   0.027     .0024021    .0403425
             2018  |   .0053589    .009831     0.55   0.586    -.0139223    .0246402
             2019  |   .0421659   .0110911     3.80   0.000     .0204133    .0639185
                   |
             _cons |   .4058875   .0413389     9.82   0.000     .3248107    .4869643
-------------------+----------------------------------------------------------------
           sigma_u |   .1578506
           sigma_e |   .1915927
               rho |   .4043324   (fraction of variance due to u_i)
------------------------------------------------------------------------------------

.

Due to adding dummy 4 (which is no singleton) the F-statistic goes missing. I have more clusters than variables, so that should not be the issue:

Code:

(Std. Err. adjusted for 1,815 clusters in firmID)

I can perform the F-test for the variables of interest, however, adding dummy 4 inflates the F-statistic:

Code:

qui xtreg investment lag_lnassets lag_cashflow lag_tobinsQ dummy1-dummy4 blockholderdummy1-blockholderdummy50 i.year, fe i(firmID) vce(cluster firmID)
testparm dummy1-dummy3 blockholderdummy1-blockholderdummy50


       F( 53,  1814) =   16.15
            Prob > F =    0.0000

testparm dummy1-dummy4 blockholderdummy1-blockholderdummy50


       F( 54,  1814) =   29.73
            Prob > F =    0.0000

2) Underestimation of the standard errors when clustering.
This clearer when I use my real data with 693 indicator variables.

Code:

 Clustering the standard errors:

qui xtreg investment lag_lnassets lag_cashflow lag_tobinsQ blockholderdummy1-blockholderdummy693 i.year, fe i(firmID) vce(cluster firmID)
testparm blockholderdummy1-blockholderdummy693

       F(693,  1814) =  349.45
            Prob > F =    0.0000

Code:

Without clustering the standard errors: 

qui xtreg investment lag_lnassets lag_cashflow lag_tobinsQ blockholderdummy1-blockholderdummy693 i.year, fe i(firmID)

testparm blockholderdummy1-blockholderdummy693

       F(693, 18455) =    2.38
            Prob > F =    0.0000

In contrast to the xtreg command, the f-statistic does not go missing when using reghdfe.
However, F-statistics for testing blockholderdummy1-blockholderdummy693 is exactly the same.

In my dataset, clustering the standard error seems to do more harm than good.
My concern is having a type 1 error by not accounting for heteroscedasticity and serial correlation.
Is there anything I can do to mitigate this concern?

Hopefully, my example is now more helpful.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#4

19 Nov 2020, 10:37

The difference in the F statistic doesn't really bother or surprise me. It's a different F statistic, especially in the denominator, so I have no expectation that the results are going to be similar.

Your example is not convincing. In the example data you have 4 distinct firmIDs (clusters). So when you regress on dummy1-dummy4, your number of predictor variables equals the number of clusters, so no F statistic. When you use only 3 predictors, you have more clusters than predictors, so you get your F-statistic.

But the regression outputs you show are a different matter. Clearly you do not have more variables than clusters in those outputs. And the fact that it doesn't happen with -reghdfe- makes me suspicious that there is some bug in -xtreg- that you have somehow stumbled on. Of course, I can't verify that you don't actually have a singleton cluster somehow created when you add in dummy4 due to a missing value in dummy4--but that's simple enough for you to check that I trust you have done that properly. So I'm stumped.

If nobody else comes up with an explanation in a day or two, I recommend you take this up with Stata Tech Support.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#5

19 Nov 2020, 18:20

While I said in #4 that I trust you have properly verified that you really don't have any singleton clusters, let me suggest that you check that one more time using the simplest, and most fail-safe method:
[/code]
* Run the regression with the variables that gives you the missing F-statistic
keep if e(sample)
by firmID, sort: assert _N > 1
[/code]
If you truly have no singleton clusters in the estimation sample, Stata will produce no output in response to the -assert- command and will await your next instruction. If you do have one or more singleton cluster, Stata will tell you just how many there are, and then you can go hunt them down.
Comment

Corne Slob

Join Date: Nov 2020
Posts: 3

20 Nov 2020, 03:39

Here are the results:

Code:

xtreg investment lag_lnassets lag_cashflow lag_tobinsQ blockholderdummy1-blockholderdummy693 i.year, fe i(firmID) vce(cluster firmID)

Fixed-effects (within) regression               Number of obs     =     20,982
Group variable: firmID                          Number of groups  =      1,815

R-sq:                                           Obs per group:
     within  = 0.1887                                         min =          2
     between = 0.2699                                         avg =       11.6
     overall = 0.2195                                         max =         17

                                                F(711,1814)       =          .
corr(u_i, Xb)  = -0.1002                        Prob > F          =          .

Code:

keep if e(sample)
(0 observations deleted)

Code:

. by firmID, sort: assert _N > 1

No output produced in response to the assert command

When I add a Singleton dummy on purpose, I get the same results

Code:

gen singleton=0
replace singleton=1 in 1
xtreg investment singleton lag_lnassets lag_cashflow lag_tobinsQ blockholderdummy1-blockholderdummy693 i.year, fe i(firmID) vce(cluster firmID)

 keep if e(sample)
(0 observations deleted)

by firmID, sort: assert _N > 1

I used the follow to check for singletons:

Code:

foreach var of varlist blockholderdummy*{
bysort firmID:egen total`var'= total(`var')
}

foreach var of varlist totalblockholderdummy*{
drop if `var'==1
}

None of the variables were dropped

Last edited by Corne Slob; 20 Nov 2020, 04:10.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#7

20 Nov 2020, 17:23

OK. I'm stumped. I think you should pass this along to Stata Tech Support. And when you hear back from them, please post back with the update.
Comment

Announcement

Missing F-statistic and underestimated standard errors using xtreg dummy-variables i.year, fe vce(cluster)

Comment

Comment

Comment

Comment

Comment

Comment