Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing F-statistic and underestimated standard errors using xtreg dummy-variables i.year, fe vce(cluster)

    Dear all,

    Stata does not report the F-statistic when I cluster the standard errors at the firm level. I am aware that the missing F-statistic is discussed in previous forums, however, these do not provide the answers that I am looking for. For my master thesis, I am trying to estimate the following model :
    Xtreg Y $control_variables $blockholderFE i.year, fe i(firmID) vce(cluster firmID)

    Where Y is an accounting variable (e.g., investment, leverage), and where $blockholderFE are 1185 dummy variables that are equal to one when a large shareholder is present in firm j in year t. I have an unbalanced panel-dataset with 1900 firms, and 21,000 firm-year observations over the years 2003-2019. I restrict large shareholders to be present in two or more firms. For those who are interested, the model is based on “Large shareholders and Corporate policies” by Cronqvist and Fahlenbrach (2009)

    I am not interested in the F-statistic of the complete model, but I need to test whether the 1185 dummy variables are statistically different from zero. The command test $blockholderFE does report the F-statistics, however, the F-statistic is inflated (F-statistic > 400). In contrast, the F-statistic without vce(cluster) has a value of less than 3. I have analyzed the differences between the standard errors and robust standard errors and find that including vce(cluster) reduces the standard errors by 80% on average. I feel like the standard errors are unreliable, is that correct? Can I use the following as a reason for not clustering the standard errors at the firm level?
    According to Austin Nichols and Mark Schaffer “In a fixed-effect model, where there are a large number of parameters, this often means that test of overall model significance is feasible. However, testing fewer than M linear constraints is perfectly feasible in these models, though when fixed effects and clustering are specified at the same level, tests that involve the fixed effects themselves are inadvisable (the standard errors on fixed effects are likely to be substantially underestimated, though this will not affect the other variance estimates in general” https://www.stata.com/meeting/13uk/nichols_crse.pdf

    According to help j_robustsingular, the F-statistic turns missing when 1) more predictors than clusters, and 2) the presence of singleton dummies.
    I don’t think that these are apparent in my case because:
    1) I have more clusters (1900), than parameters (<1200)
    2) I restrict large shareholders to be present in more than two firms. Hence, each dummy variable is non-zero in more than 1 observation, and the dummy variable is non-zero in multiple clusters.

    I found that the F-statistic goes missing when different dummy variables are present in the same firms. (using the xtreg command)
    Example:
    FirmID Year Dummy1 Dummy2 Dummy3 Dummy4
    1 t 1 1 0 1
    1 t 1 1 1 1
    1 t 1 0 1 0
    1 t 0 0 0 0
    1 t 0 0 0 0
    2 t 1 1 0 0
    2 t 1 1 0 0
    2 t 0 0 1 0
    2 t 0 0 1 0
    2 t 0 0 0 0
    3 t 0 1 0 0
    3 t 0 1 1 0
    3 t 0 0 1 0
    3 t 0 0 0 0
    3 t 0 0 0 0
    4 t 0 0 1 0
    4 t 0 0 1 1
    4 t 0 0 1 1
    4 t 0 0 0 1
    4 t 0 0 0 0
    When I ran xtreg and then the dependent variable, a set of control variables, dummy1-dummy3, firm and year fixed effects, and clustered standard errors at the firm level, Stata does report the F-statistic. However, when I add the fourth dummy, the F-statistic goes missing. For simplicity, I have shown 4 variables, but the same holds with more than 50 dummy variables. Can someone explain why this is happening?

    I want to include vce(cluster) because xttest3 shows that there is heteroscedasticity. Also, xtserial suggests that there is serial correlation in a model without firm, and year fixed effects. For my analyses, I work with a model that does not cluster the standard errors, because clustering seems to underestimate the standard errors. In a model with a set of control variables, 1185 dummy variables, firm and year fixed effects, I find that the blockholder fixed effects (1185 dummy variables) are statistically different from zero (p-value <0.001). I would like to rule out that not accounting for serial correlation and heteroscedasticity results in a type 1 error. (concluding that the blockholder fixed effects are jointly significant, while actually, they are not)

    Basically, I have the following questions:
    (1) Clustering the standard error seems to result in underestimated standard errors. Is it therefore better not to cluster the standard errors? What would be the reasoning?
    (2) Why does Stata not report the F-statistic when dummy4 is included in the example above?
    (3) Are my concerns for having a type 1 error, by not accounting for heteroscedasticity and serial correlation, well-founded? If so, is there a way to mitigate this issue?

    Hopefully, someone can help me.

    Thanks in advance!
    Corné
    Last edited by Corne Slob; 18 Nov 2020, 17:59.

  • #2
    Well, for the example you show with four indicator ("dummy") variables, it is clearly an instance of the number of degrees of freedom exceeding the number of clusters. (The number of observations is not relevant here--it is the number of clusters that limits the number of variables.)

    In your real data it is less clear what is going on. My best guess is that in the estimation sample you do not have as many clusters as you think you do. Remember that the estimation sample will only include observations that have non-missing values on every variable mentioned in the regression command. Since you have some "control" variables in the model, it is possible that missing data in those variables has led to more clusters being omitted from the estimation than you are aware of. It's simple enough to know. Just look at the output from -xtreg-, above the coefficient table. Just above the start of the regression table there will be a note (Std. Err. adjusted for # clusters in firmid). The number in that message is the actual count of clusters that survived the creation of the estimation sample.

    If it's not that, then it may be that due to missing data on the "control" variables, some of your clusters may have been reduced to singletons.

    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.



    Comment


    • #3
      Dear Clyde,

      Thanks for your fast response.

      You are right, from my example, it is not clear that the number of clusters is higher than the number of variables.
      Hereafter, I show the example using -dataex- and a small sample of the real data (with 693 indicator variables and where firmID is up to 1815), as suggested.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float firmID double year float(investment lag_lnassets lag_cashflow dummy1 dummy2 dummy3 dummy4) byte(blockholderdummy1 blockholderdummy2 blockholderdummy3 blockholderdummy4 blockholderdummy5 blockholderdummy6 blockholderdummy7 blockholderdummy8)
      1 2003 .06168109  6.531783  .14389177 1 1 0 0 0 0 0 0 0 0 0 0
      1 2004 .07844731  6.564267  .18100154 1 1 0 0 0 0 0 0 0 0 0 0
      1 2005 .11712198  6.596095   .2800219 0 0 0 0 0 0 0 0 0 0 0 0
      1 2006 .14008342  6.886347   .4627454 0 0 1 0 0 0 0 0 0 0 0 0
      1 2007 .11659434  6.973199   .4294967 0 0 1 0 0 0 0 0 0 0 0 0
      1 2008 .08871011  7.216717   .4447028 0 0 1 0 0 0 0 0 0 0 0 0
      1 2009 .11749449  7.228034  .39031485 0 0 0 0 0 0 0 0 0 0 0 0
      1 2010  .3734085  7.313915   .3402393 0 0 0 0 0 0 0 0 0 0 0 0
      1 2011 .21834816  7.440574   .3960022 0 0 0 0 0 0 0 0 0 0 0 0
      1 2012 .08245342  7.694235  .35440105 0 0 0 0 0 0 0 0 0 0 0 0
      1 2013 .06214822  7.667111   .3587601 0 0 0 0 0 0 0 0 0 0 0 0
      1 2014 .11202516  7.695985   .4369137 0 0 0 0 0 0 0 0 0 0 0 0
      1 2015   .299661  7.323171    .091459 0 0 0 0 0 0 0 0 0 0 0 0
      1 2016 .10704046  7.273856   .3772881 0 0 0 0 0 0 0 0 0 0 0 0
      1 2017 .06085754   7.31595  .38611025 0 0 0 0 0 0 0 0 0 0 0 0
      1 2018 .05495894  7.329553    .315906 0 0 0 0 0 0 0 0 0 0 0 0
      2 2003 .21391813 10.096547  11.384886 1 1 0 0 0 0 0 0 0 0 0 0
      2 2004  .2056149 10.192993   .6909986 1 1 0 0 0 0 0 0 0 0 0 0
      2 2005 .20098507    10.267    .710709 1 1 1 0 0 0 0 0 0 0 0 0
      2 2006   .222853 10.279908   .7874672 0 1 1 0 0 0 0 0 0 0 0 0
      2 2007 .23842546  10.49621   .5456318 0 0 1 0 0 0 0 0 0 0 0 0
      2 2008 .17128205 10.589458   .7861875 0 0 0 1 0 0 0 0 0 0 0 0
      2 2009  .1508551 10.655356   .8742903 0 0 0 1 0 0 0 0 0 0 0 0
      2 2010 .13322088  10.86698  1.0853536 0 0 0 1 0 0 0 0 0 0 0 0
      2 2011 .18711683 10.993097     .95157 0 0 0 0 0 0 0 0 0 0 0 0
      2 2012 .22800346 11.006704   .9750829 0 0 0 0 0 0 0 0 0 0 0 0
      2 2013 .14200588  11.11595  1.1107666 0 0 0 0 0 0 0 0 0 0 0 0
      2 2014  .1823878 10.667862  .50874066 0 0 0 0 0 0 0 0 0 0 0 0
      2 2015  .1870261 10.628013  .55359864 0 0 0 0 0 0 0 0 0 0 0 0
      2 2016   .195637 10.627334   .6871104 0 0 0 0 0 0 0 0 0 0 0 0
      2 2017  .1989483 10.871725   .4216405 0 0 0 0 0 0 0 0 0 0 0 0
      2 2018 .18325227 11.241773  .59141105 0 0 0 0 0 0 0 0 0 0 0 0
      2 2019  .2165807 11.115026   .7377415 0 0 0 0 0 0 0 0 0 0 0 0
      3 2003  .2425987 4.7510266 .000732724 0 0 0 0 0 0 0 0 0 0 0 0
      3 2004 .31032965  4.816395   4.319079 0 0 0 0 0 0 0 0 0 0 0 0
      3 2005    1.3474  5.008613   6.235604 0 1 1 0 0 0 0 0 0 0 0 0
      3 2006 .09940465  5.004134  4.4898267 0 1 1 0 0 0 0 0 0 0 0 0
      3 2007 .14642262  5.115548  1.9141258 0 1 1 0 0 0 0 0 0 0 0 0
      3 2008   .271675  5.238981   2.496464 0 0 1 0 0 0 0 0 0 0 0 0
      3 2009 .12932435  5.403771   3.597594 0 0 0 0 0 0 0 0 0 0 0 0
      3 2010   .931984  5.325271   2.436731 0 0 0 0 0 0 0 0 0 0 0 0
      3 2011  .7847533  5.446095  2.2068722 0 0 0 0 0 0 0 0 0 0 0 0
      3 2012 .09078132  5.741929  2.0931578 0 0 0 0 0 0 0 0 0 0 0 0
      3 2013 .08731312   5.70138  1.9779247 0 0 0 0 0 0 0 0 0 0 0 0
      3 2014 .10035057  5.778983  2.5008116 0 0 0 0 0 0 0 0 0 0 0 0
      3 2015 .05331375  6.148434  3.2508326 0 0 0 0 0 0 0 0 0 0 0 0
      3 2016 .10788064  6.193944   3.917048 0 0 0 0 0 0 0 0 0 0 0 0
      3 2017 .18847074  6.293009   4.539403 0 0 0 0 0 0 0 0 0 0 0 0
      3 2018 .55341387  6.945229  3.4976106 0 0 0 0 0 0 0 0 0 0 0 0
      4 2003  .1979708  8.633942 -17.070185 0 0 0 0 0 0 0 0 0 0 0 0
      4 2004  .3742081  8.867053  .25033697 0 0 1 0 0 0 0 0 0 0 0 0
      4 2005  .3574024  8.967531   .3417983 0 0 1 0 0 0 0 0 1 0 0 0
      4 2006  .6875231  8.893954   .3270879 0 0 1 1 0 0 0 0 1 0 0 0
      4 2007  .4226235  9.483949   .2484265 0 0 0 1 0 0 0 0 1 0 0 0
      4 2008 .13220339  9.354441  -.5201906 0 0 0 1 0 0 0 0 0 0 0 0
      end


      I dropped all the missing values and included indicator variables for large shareholders that are present in more than two or more clusters and are non-zero in more than two observations in each cluster. Hence, I have no singletons.
      693 indicator variables and 1830 firms remain in the dataset. I still have the following issues:
      1) F-statistic goes missing when estimating:
      Code:
      xtreg investment lag_lnassets lag_cashflow lag_tobinsQ  blockholderdummy1-blockholderdummy693 i.year, fe i(firmID) vce(cluster firmID)
      
      
      Fixed-effects (within) regression               Number of obs     =     20,982
      Group variable: firmID                          Number of groups  =      1,815
      
      R-sq:                                           Obs per group:
           within  = 0.1887                                         min =          2
           between = 0.2699                                         avg =       11.6
           overall = 0.2195                                         max =         17
      
                                                      F(711,1814)       =          .
      corr(u_i, Xb)  = -0.1002                        Prob > F          =          .
      
                                          (Std. Err. adjusted for 1,815 clusters in firmID)

      To show when the F-statistic goes missing, I use the four dummy variables from the example before. Using these dummy variables, it turns out that the F-statistic goes missing when dummy 4 is added.
      I don't know why, but by adding an indicator variable that is present in the same clusters as other indicator variables, the F-statistic goes missing.

      Code:
      Regression without dummy4
      
      xtreg investment lag_lnassets lag_cashflow lag_tobinsQ dummy1-dummy3 blockholderdummy1-blockholderdummy20 i.year, fe i(firmID) vce(cluster firmID)
      
      
      Fixed-effects (within) regression               Number of obs     =     20,982
      Group variable: firmID                          Number of groups  =      1,815
      
      R-sq:                                           Obs per group:
           within  = 0.1216                                         min =          2
           between = 0.2919                                         avg =       11.6
           overall = 0.1915                                         max =         17
      
                                                      F(42,1814)        =      31.44
      corr(u_i, Xb)  = -0.0315                        Prob > F          =     0.0000
      
                                         (Std. Err. adjusted for 1,815 clusters in firmID)
      ------------------------------------------------------------------------------------
                         |               Robust
              investment |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------------+----------------------------------------------------------------
            lag_lnassets |  -.0380031   .0058367    -6.51   0.000    -.0494504   -.0265558
            lag_cashflow |   .0047232   .0007943     5.95   0.000     .0031653    .0062811
             lag_tobinsQ |    .055387   .0035791    15.48   0.000     .0483674    .0624066
                  dummy1 |  -.1594272   .0801823    -1.99   0.047    -.3166864   -.0021679
                  dummy2 |   .0929349   .0819996     1.13   0.257    -.0678887    .2537584
                  dummy3 |   .0037441   .0608523     0.06   0.951    -.1156039    .1230921
       blockholderdummy1 |   .7003422   .2766797     2.53   0.011      .157698    1.242986
       blockholderdummy2 |  -.0928466   .0227137    -4.09   0.000    -.1373943   -.0482989
       blockholderdummy3 |    .014436   .0404178     0.36   0.721    -.0648344    .0937063
       blockholderdummy4 |   .2216758   .0632535     3.50   0.000     .0976185    .3457331
       blockholderdummy5 |  -.0176342   .0098311    -1.79   0.073    -.0369157    .0016473
       blockholderdummy6 |   -.002985   .0210026    -0.14   0.887    -.0441768    .0382067
       blockholderdummy7 |  -.0055695   .0128751    -0.43   0.665     -.030821     .019682
       blockholderdummy8 |  -.0384459    .013929    -2.76   0.006    -.0657644   -.0111273
       blockholderdummy9 |   .0112613   .0413898     0.27   0.786    -.0699154     .092438
      blockholderdummy10 |  -.0879261   .0087488   -10.05   0.000    -.1050849   -.0707672
      blockholderdummy11 |   .0241237   .0199463     1.21   0.227    -.0149964    .0632437
      blockholderdummy12 |  -.0227759   .0209407    -1.09   0.277    -.0638463    .0182946
      blockholderdummy13 |  -.0120346   .0099668    -1.21   0.227    -.0315822     .007513
      blockholderdummy14 |  -.0094283   .0191017    -0.49   0.622    -.0468919    .0280352
      blockholderdummy15 |  -.1965801   .0250201    -7.86   0.000    -.2456514   -.1475088
      blockholderdummy16 |   .0247173   .0474506     0.52   0.602    -.0683463    .1177808
      blockholderdummy17 |  -.0558941   .0264637    -2.11   0.035    -.1077966   -.0039917
      blockholderdummy18 |   .0268469   .0253643     1.06   0.290    -.0228993    .0765931
      blockholderdummy19 |   .0147014   .0204066     0.72   0.471    -.0253215    .0547243
      blockholderdummy20 |    .001004    .022434     0.04   0.964    -.0429951    .0450031
                         |
                    year |
                   2004  |   .0403535   .0076667     5.26   0.000      .025317      .05539
                   2005  |   .0582069   .0079728     7.30   0.000     .0425701    .0738438
                   2006  |   .0813596   .0089158     9.13   0.000     .0638733    .0988459
                   2007  |   .0695244   .0090678     7.67   0.000       .05174    .0873087
                   2008  |   .0546497   .0093084     5.87   0.000     .0363934    .0729061
                   2009  |  -.0123334   .0076495    -1.61   0.107    -.0273361    .0026694
                   2010  |   .0194917    .008319     2.34   0.019     .0031759    .0358075
                   2011  |   .0529051   .0086588     6.11   0.000     .0359227    .0698874
                   2012  |   .0859319   .0105299     8.16   0.000     .0652799    .1065838
                   2013  |    .051021   .0087367     5.84   0.000      .033886    .0681561
                   2014  |   .0322704   .0090312     3.57   0.000     .0145577    .0499831
                   2015  |    .020803   .0088304     2.36   0.019     .0034842    .0381217
                   2016  |   .0172798   .0090514     1.91   0.056    -.0004725    .0350321
                   2017  |   .0213456   .0096694     2.21   0.027     .0023813    .0403099
                   2018  |   .0053292   .0098276     0.54   0.588    -.0139455    .0246038
                   2019  |   .0421334   .0110883     3.80   0.000     .0203862    .0638807
                         |
                   _cons |   .4057398   .0413197     9.82   0.000     .3247005     .486779
      -------------------+----------------------------------------------------------------
                 sigma_u |  .15785327
                 sigma_e |  .19158825
                     rho |  .40435173   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------------
      
      .

      Code:
      Including dummy4
      
      xtreg investment lag_lnassets lag_cashflow lag_tobinsQ dummy1-dummy4 blockholderdummy1-blockholderdummy20 i.year, fe i(firmID) vce(cluster firmID)
      
      
      Fixed-effects (within) regression               Number of obs     =     20,982
      Group variable: firmID                          Number of groups  =      1,815
      
      R-sq:                                           Obs per group:
           within  = 0.1217                                         min =          2
           between = 0.2919                                         avg =       11.6
           overall = 0.1915                                         max =         17
      
                                                      F(42,1814)        =          .
      corr(u_i, Xb)  = -0.0316                        Prob > F          =          .
      
                                         (Std. Err. adjusted for 1,815 clusters in firmID)
      ------------------------------------------------------------------------------------
                         |               Robust
              investment |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------------+----------------------------------------------------------------
            lag_lnassets |  -.0380248   .0058397    -6.51   0.000    -.0494781   -.0265715
            lag_cashflow |    .004723   .0007943     5.95   0.000      .003165    .0062809
             lag_tobinsQ |   .0553851   .0035792    15.47   0.000     .0483654    .0624049
                  dummy1 |  -.1578213   .0798591    -1.98   0.048    -.3144468   -.0011958
                  dummy2 |   .0950916   .0802503     1.18   0.236     -.062301    .2524843
                  dummy3 |   .0032694   .0588286     0.06   0.956    -.1121095    .1186482
                  dummy4 |   .0292371    .049099     0.60   0.552    -.0670595    .1255337
       blockholderdummy1 |   .7003167   .2766718     2.53   0.011      .157688    1.242945
       blockholderdummy2 |  -.0928709   .0227222    -4.09   0.000    -.1374354   -.0483064
       blockholderdummy3 |   .0144192   .0404202     0.36   0.721    -.0648558    .0936941
       blockholderdummy4 |   .2216712   .0632568     3.50   0.000     .0976075    .3457349
       blockholderdummy5 |  -.0177592   .0098165    -1.81   0.071     -.037012    .0014937
       blockholderdummy6 |  -.0029711   .0210047    -0.14   0.888     -.044167    .0382248
       blockholderdummy7 |  -.0055734    .012875    -0.43   0.665    -.0308248    .0196781
       blockholderdummy8 |  -.0384254   .0139248    -2.76   0.006    -.0657358    -.011115
       blockholderdummy9 |   .0112781   .0413858     0.27   0.785    -.0698908     .092447
      blockholderdummy10 |  -.0879229   .0087564   -10.04   0.000    -.1050966   -.0707492
      blockholderdummy11 |   .0241189   .0199433     1.21   0.227    -.0149954    .0632332
      blockholderdummy12 |  -.0227719   .0209411    -1.09   0.277    -.0638431    .0182994
      blockholderdummy13 |  -.0120213   .0099673    -1.21   0.228      -.03157    .0075274
      blockholderdummy14 |  -.0094408   .0191006    -0.49   0.621    -.0469023    .0280206
      blockholderdummy15 |  -.1965847   .0250254    -7.86   0.000    -.2456664   -.1475031
      blockholderdummy16 |   .0247063   .0474507     0.52   0.603    -.0683574      .11777
      blockholderdummy17 |  -.0559065   .0264645    -2.11   0.035    -.1078107   -.0040023
      blockholderdummy18 |   .0268458   .0253633     1.06   0.290    -.0228986    .0765903
      blockholderdummy19 |   .0147091   .0204085     0.72   0.471    -.0253176    .0547358
      blockholderdummy20 |   .0010197   .0224344     0.05   0.964    -.0429803    .0450196
                         |
                    year |
                   2004  |   .0403576   .0076668     5.26   0.000     .0253209    .0553943
                   2005  |   .0582163   .0079727     7.30   0.000     .0425797     .073853
                   2006  |   .0813508   .0089161     9.12   0.000     .0638639    .0988377
                   2007  |   .0695163   .0090685     7.67   0.000     .0517306     .087302
                   2008  |   .0546149   .0093118     5.87   0.000     .0363519    .0728779
                   2009  |  -.0123472   .0076501    -1.61   0.107     -.027351    .0026567
                   2010  |   .0194791   .0083196     2.34   0.019     .0031621    .0357961
                   2011  |   .0529188   .0086594     6.11   0.000     .0359354    .0699022
                   2012  |   .0859478   .0105306     8.16   0.000     .0652944    .1066012
                   2013  |   .0510391   .0087375     5.84   0.000     .0339026    .0681756
                   2014  |   .0322912   .0090324     3.58   0.000     .0145761    .0500062
                   2015  |   .0208262   .0088318     2.36   0.018     .0035046    .0381479
                   2016  |   .0173043   .0090536     1.91   0.056    -.0004522    .0350609
                   2017  |   .0213723   .0096724     2.21   0.027     .0024021    .0403425
                   2018  |   .0053589    .009831     0.55   0.586    -.0139223    .0246402
                   2019  |   .0421659   .0110911     3.80   0.000     .0204133    .0639185
                         |
                   _cons |   .4058875   .0413389     9.82   0.000     .3248107    .4869643
      -------------------+----------------------------------------------------------------
                 sigma_u |   .1578506
                 sigma_e |   .1915927
                     rho |   .4043324   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------------
      
      .
      Due to adding dummy 4 (which is no singleton) the F-statistic goes missing. I have more clusters than variables, so that should not be the issue:
      Code:
      (Std. Err. adjusted for 1,815 clusters in firmID)

      I can perform the F-test for the variables of interest, however, adding dummy 4 inflates the F-statistic:

      Code:
      qui xtreg investment lag_lnassets lag_cashflow lag_tobinsQ dummy1-dummy4 blockholderdummy1-blockholderdummy50 i.year, fe i(firmID) vce(cluster firmID)
      testparm dummy1-dummy3 blockholderdummy1-blockholderdummy50
      
      
             F( 53,  1814) =   16.15
                  Prob > F =    0.0000
      
      testparm dummy1-dummy4 blockholderdummy1-blockholderdummy50
      
      
             F( 54,  1814) =   29.73
                  Prob > F =    0.0000


      2) Underestimation of the standard errors when clustering.
      This clearer when I use my real data with 693 indicator variables.

      Code:
       Clustering the standard errors:
      
      qui xtreg investment lag_lnassets lag_cashflow lag_tobinsQ blockholderdummy1-blockholderdummy693 i.year, fe i(firmID) vce(cluster firmID)
      testparm blockholderdummy1-blockholderdummy693
      
             F(693,  1814) =  349.45
                  Prob > F =    0.0000


      Code:
      Without clustering the standard errors: 
      
      qui xtreg investment lag_lnassets lag_cashflow lag_tobinsQ blockholderdummy1-blockholderdummy693 i.year, fe i(firmID)
      
      testparm blockholderdummy1-blockholderdummy693
      
             F(693, 18455) =    2.38
                  Prob > F =    0.0000

      In contrast to the xtreg command, the f-statistic does not go missing when using reghdfe.
      However, F-statistics for testing blockholderdummy1-blockholderdummy693 is exactly the same.


      In my dataset, clustering the standard error seems to do more harm than good.
      My concern is having a type 1 error by not accounting for heteroscedasticity and serial correlation.
      Is there anything I can do to mitigate this concern?

      Hopefully, my example is now more helpful.

      Comment


      • #4
        The difference in the F statistic doesn't really bother or surprise me. It's a different F statistic, especially in the denominator, so I have no expectation that the results are going to be similar.

        Your example is not convincing. In the example data you have 4 distinct firmIDs (clusters). So when you regress on dummy1-dummy4, your number of predictor variables equals the number of clusters, so no F statistic. When you use only 3 predictors, you have more clusters than predictors, so you get your F-statistic.

        But the regression outputs you show are a different matter. Clearly you do not have more variables than clusters in those outputs. And the fact that it doesn't happen with -reghdfe- makes me suspicious that there is some bug in -xtreg- that you have somehow stumbled on. Of course, I can't verify that you don't actually have a singleton cluster somehow created when you add in dummy4 due to a missing value in dummy4--but that's simple enough for you to check that I trust you have done that properly. So I'm stumped.

        If nobody else comes up with an explanation in a day or two, I recommend you take this up with Stata Tech Support.

        Comment


        • #5
          While I said in #4 that I trust you have properly verified that you really don't have any singleton clusters, let me suggest that you check that one more time using the simplest, and most fail-safe method:
          [/code]
          * Run the regression with the variables that gives you the missing F-statistic
          keep if e(sample)
          by firmID, sort: assert _N > 1
          [/code]
          If you truly have no singleton clusters in the estimation sample, Stata will produce no output in response to the -assert- command and will await your next instruction. If you do have one or more singleton cluster, Stata will tell you just how many there are, and then you can go hunt them down.

          Comment


          • #6
            Here are the results:

            Code:
            xtreg investment lag_lnassets lag_cashflow lag_tobinsQ blockholderdummy1-blockholderdummy693 i.year, fe i(firmID) vce(cluster firmID)
            
            Fixed-effects (within) regression               Number of obs     =     20,982
            Group variable: firmID                          Number of groups  =      1,815
            
            R-sq:                                           Obs per group:
                 within  = 0.1887                                         min =          2
                 between = 0.2699                                         avg =       11.6
                 overall = 0.2195                                         max =         17
            
                                                            F(711,1814)       =          .
            corr(u_i, Xb)  = -0.1002                        Prob > F          =          .

            Code:
            keep if e(sample)
            (0 observations deleted)
            Code:
            . by firmID, sort: assert _N > 1

            No output produced in response to the assert command


            When I add a Singleton dummy on purpose, I get the same results

            Code:
            gen singleton=0
            replace singleton=1 in 1
            xtreg investment singleton lag_lnassets lag_cashflow lag_tobinsQ blockholderdummy1-blockholderdummy693 i.year, fe i(firmID) vce(cluster firmID)
            
             keep if e(sample)
            (0 observations deleted)
            
            by firmID, sort: assert _N > 1

            I used the follow to check for singletons:
            Code:
            foreach var of varlist blockholderdummy*{
            bysort firmID:egen total`var'= total(`var')
            }
            
            foreach var of varlist totalblockholderdummy*{
            drop if `var'==1
            }
            None of the variables were dropped
            Last edited by Corne Slob; 20 Nov 2020, 04:10.

            Comment


            • #7
              OK. I'm stumped. I think you should pass this along to Stata Tech Support. And when you hear back from them, please post back with the update.

              Comment

              Working...
              X