Why F-test is missing? Could you please help!

Celine Tran

Join Date: Oct 2018
Posts: 46

Why F-test is missing? Could you please help!

13 Jan 2019, 19:07

Hi,

I run some regressions, but the F-test is missing. I read some previous posts related to this issue, but I haven't found the answer for my case. Could you please help?

The first regression :

Code:

regress w2_uhat  PerFD After AfterFD $control i.year i.ind, robust

Code:

Linear regression                                      Number of obs =     487
                                                       F( 33,   452) =       .
                                                       Prob > F      =       .
                                                       R-squared     =  0.2130
                                                       Root MSE      =   .1115

------------------------------------------------------------------------------
             |               Robust
     w2_uhat |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       PerFD |  -.0135181    .071665    -0.19   0.850    -.1543561    .1273198
       After |   .0024053   .0176921     0.14   0.892    -.0323637    .0371743
     AfterFD |  -.0413574      .1092    -0.38   0.705    -.2559602    .1732453
      PerInD |    .060242   .0280825     2.15   0.032     .0050536    .1154304
        Dual |  -.0165479   .0160462    -1.03   0.303    -.0480824    .0149866
       bsize |  -.0107742   .0222168    -0.48   0.628    -.0544352    .0328868
        Loss |  -.0273159   .0179308    -1.52   0.128     -.062554    .0079221
    CashOper |  -3.67e-10   8.98e-11    -4.09   0.000    -5.43e-10   -1.91e-10
    Firmsize |    .004892   .0054648     0.90   0.371    -.0058476    .0156316
        blev |   .0426323    .048696     0.88   0.382    -.0530664    .1383311
         ROA |   .2461967   .0673786     3.65   0.000     .1137825    .3786109
          mb |  -.0078885   .0062817    -1.26   0.210    -.0202334    .0044564
             |
        year |
       2001  |  -.0096172   .0599574    -0.16   0.873     -.127447    .1082126
       2002  |  -.0229657   .0646225    -0.36   0.722    -.1499635    .1040321
       2003  |   .0240859    .062535     0.39   0.700    -.0988095    .1469813
       2004  |  -.0127065   .0636559    -0.20   0.842    -.1378047    .1123917
       2005  |  -.0268376   .0602837    -0.45   0.656    -.1453086    .0916335
       2006  |  -.0327154   .0597816    -0.55   0.584    -.1501998    .0847691
       2007  |   -.070418   .0617424    -1.14   0.255    -.1917558    .0509197
       2008  |    -.02034   .0606025    -0.34   0.737    -.1394376    .0987576
       2009  |  -.0192564   .0608335    -0.32   0.752     -.138808    .1002952
       2010  |   -.013746   .0620105    -0.22   0.825    -.1356106    .1081187
       2011  |  -.0190377   .0627831    -0.30   0.762    -.1424208    .1043453
       2012  |  -.0066407   .0626652    -0.11   0.916     -.129792    .1165105
       2013  |   .0156804   .0630179     0.25   0.804    -.1081641    .1395249
       2014  |  -.0057087   .0647463    -0.09   0.930    -.1329498    .1215324
       2015  |  -.0025069    .065502    -0.04   0.969    -.1312332    .1262194
       2016  |  -.0017253    .067764    -0.03   0.980    -.1348968    .1314463
       2017  |  -.0380075   .0643386    -0.59   0.555    -.1644475    .0884325
             |
         ind |
          2  |   .0212005    .020601     1.03   0.304     -.019285    .0616861
          3  |   .0743825   .0270466     2.75   0.006     .0212299    .1275352
          5  |   .0338658   .0205932     1.64   0.101    -.0066045    .0743362
          6  |  -.0557151   .0360526    -1.55   0.123    -.1265665    .0151364
          7  |   .0340026   .0215114     1.58   0.115    -.0082721    .0762773
             |
       _cons |  -.1087547   .1122393    -0.97   0.333    -.3293302    .1118209
------------------------------------------------------------------------------

.
end of do-file

.

The second regression

Code:

xtreg w2_uhat PerFD After AfterFD $control i.year, cluster(firmid) fe

Code:

Fixed-effects (within) regression               Number of obs      =       487
Group variable: firmid                          Number of groups   =        59

R-sq:  within  = 0.2613                         Obs per group: min =         4
       between = 0.0312                                        avg =       8.3
       overall = 0.1466                                        max =        10

                                                F(28,58)           =         .
corr(u_i, Xb)  = -0.4773                        Prob > F           =         .

                                (Std. Err. adjusted for 59 clusters in firmid)
------------------------------------------------------------------------------
             |               Robust
     w2_uhat |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       PerFD |   .0952295   .1127901     0.84   0.402    -.1305445    .3210035
       After |   .0263845   .0243078     1.09   0.282    -.0222729    .0750418
     AfterFD |  -.1804964    .105972    -1.70   0.094    -.3926225    .0316297
      PerInD |   .0051841   .0516282     0.10   0.920    -.0981609    .1085291
        Dual |  -.0806385   .0333862    -2.42   0.019    -.1474683   -.0138087
       bsize |  -.0183543   .0344244    -0.53   0.596    -.0872623    .0505536
        Loss |  -.0224149     .01964    -1.14   0.258    -.0617286    .0168989
    CashOper |  -6.97e-10   3.27e-10    -2.13   0.037    -1.35e-09   -4.19e-11
    Firmsize |   .0204043   .0234019     0.87   0.387    -.0264397    .0672483
        blev |   .0718019   .0857411     0.84   0.406    -.0998276    .2434314
         ROA |   .3226941   .0745929     4.33   0.000     .1733802     .472008
          mb |  -.0041064   .0073946    -0.56   0.581    -.0189083    .0106955
             |
        year |
       2001  |   .0054016   .0674643     0.08   0.936    -.1296429    .1404461
       2002  |  -.0195736   .0554911    -0.35   0.726    -.1306511     .091504
       2003  |   .0321885   .0661514     0.49   0.628    -.1002279    .1646049
       2004  |  -.0044521   .0637201    -0.07   0.945    -.1320017    .1230976
       2005  |  -.0223795   .0583464    -0.38   0.703    -.1391726    .0944135
       2006  |  -.0362498   .0598186    -0.61   0.547    -.1559897      .08349
       2007  |  -.0724873   .0693725    -1.04   0.300    -.2113513    .0663768
       2008  |  -.0343343   .0671898    -0.51   0.611    -.1688293    .1001607
       2009  |   -.021291   .0662511    -0.32   0.749     -.153907    .1113249
       2010  |  -.0052354   .0680241    -0.08   0.939    -.1414004    .1309296
       2011  |  -.0305103    .071615    -0.43   0.672    -.1738632    .1128427
       2012  |  -.0247793   .0744705    -0.33   0.741    -.1738482    .1242895
       2013  |  -.0030818   .0797583    -0.04   0.969    -.1627355    .1565718
       2014  |  -.0238453   .0772591    -0.31   0.759    -.1784961    .1308056
       2015  |  -.0109596   .0744212    -0.15   0.883    -.1599298    .1380106
       2016  |  -.0030587   .0823633    -0.04   0.971    -.1679267    .1618094
       2017  |  -.0261159   .0806758    -0.32   0.747    -.1876062    .1353743
             |
       _cons |   -.332236    .409316    -0.81   0.420    -1.151571    .4870991
-------------+----------------------------------------------------------------
     sigma_u |  .06864986
     sigma_e |  .10383842
         rho |  .30414563   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.
end of do-file

Tags: None

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

13 Jan 2019, 21:08

The mechanical answer is that your variance matrix of parameter estimates is not of full rank.

Why this is so, it is not obvious to me. But it happens sometimes when you have plenty of dummies as you have in your regressions.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#3

13 Jan 2019, 21:55

Usually it is a matter of one (or more) of the indicators (dummies) being a singleton. That is, it takes on the same value in all but one observation from the estimation sample. So for each of those variables, tabulate it -if e(sample)- and look for a singleton.

Unsolicited advice: regression coefficients like -3.67e-10 (and the coefficient of the same variable in the other regression) are difficult for people to grasp. From the name of the variable, CashOper, I'm guessing that the variable represents some money. So why not change the units from dollars (or euros or yen or yuan or whatever it is) to billions of dollars (or euros....). That will scale the coefficient up by a factor of 10⁹, which will make it much easier to understand, and also in line with the magnitude of the other model coefficients. Nothing else in the model will be affected by the change.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#4

14 Jan 2019, 00:20

Celine:
see also -help j_robustsingular-.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

Celine Tran

Join Date: Oct 2018
Posts: 46

14 Jan 2019, 19:24

Clyde Schechter Thank you very much for your suggestions. I try to detect the singleton, but the result comes like this:

Code:

tabstat uhat uhat w2_uhat PerFD After AfterFD $control, s(
> sum) col (stat)

    variable |       sum
-------------+----------
        uhat | -3.877048
        uhat | -3.877048
     w2_uhat | -4.810651
       PerFD |  49.92943
       After |       247
     AfterFD |  24.25791
      PerInD |  282.0946
        Dual |        57
       bsize |  901.9576
        Loss |       123
    CashOper |  9.93e+09
    Firmsize |  8996.005
        blev |  80.33136
         ROA |  10.86872
          mb |  619.4801
------------------------

Also, if I do not choose cluster firmid, the F-test is not missing. So, what should I do?

Code:

xtreg w2_uhat PerFD After AfterFD $control i.year, fe // significant but F-test is missing

Fixed-effects (within) regression               Number of obs      =       487
Group variable: firmid                          Number of groups   =        59

R-sq:  within  = 0.2613                         Obs per group: min =         4
       between = 0.0312                                        avg =       8.3
       overall = 0.1466                                        max =        10

                                                F(29,399)          =      4.87
corr(u_i, Xb)  = -0.4773                        Prob > F           =    0.0000

------------------------------------------------------------------------------
     w2_uhat |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       PerFD |   .0952295    .106757     0.89   0.373     -.114647     .305106
       After |   .0263845   .0256723     1.03   0.305    -.0240855    .0768544
     AfterFD |  -.1804964    .102104    -1.77   0.078    -.3812255    .0202326
      PerInD |   .0051841   .0568521     0.09   0.927    -.1065831    .1169512
        Dual |  -.0806385   .0395031    -2.04   0.042    -.1582987   -.0029783
       bsize |  -.0183543   .0379917    -0.48   0.629    -.0930432    .0563345
        Loss |  -.0224149   .0174635    -1.28   0.200    -.0567468    .0119171
    CashOper |  -6.97e-10   1.25e-10    -5.58   0.000    -9.43e-10   -4.51e-10
    Firmsize |   .0204043   .0164717     1.24   0.216    -.0119778    .0527865
        blev |   .0718019   .0696703     1.03   0.303    -.0651649    .2087686
         ROA |   .3226941   .0478992     6.74   0.000     .2285278    .4168604
          mb |  -.0041064   .0078638    -0.52   0.602    -.0195661    .0113533
             |
        year |
       2001  |   .0054016   .0753653     0.07   0.943    -.1427611    .1535643
       2002  |  -.0195736   .0693735    -0.28   0.778    -.1559568    .1168096
       2003  |   .0321885    .069036     0.47   0.641    -.1035313    .1679084
       2004  |  -.0044521   .0682707    -0.07   0.948    -.1386672    .1297631
       2005  |  -.0223795   .0698613    -0.32   0.749    -.1597218    .1149628
       2006  |  -.0362498   .0692424    -0.52   0.601    -.1723753    .0998757
       2007  |  -.0724873   .0708081    -1.02   0.307    -.2116908    .0667163
       2008  |  -.0343343    .071965    -0.48   0.634    -.1758123    .1071437
       2009  |   -.021291   .0721139    -0.30   0.768    -.1630617    .1204796
       2010  |  -.0052354   .0734579    -0.07   0.943    -.1496483    .1391775
       2011  |  -.0305103   .0771881    -0.40   0.693    -.1822564    .1212359
       2012  |  -.0247793   .0799958    -0.31   0.757    -.1820452    .1324865
       2013  |  -.0030818   .0821192    -0.04   0.970    -.1645222    .1583585
       2014  |  -.0238453   .0828736    -0.29   0.774    -.1867687    .1390782
       2015  |  -.0109596   .0828348    -0.13   0.895    -.1738067    .1518875
       2016  |  -.0030587   .0859331    -0.04   0.972     -.171997    .1658797
       2017  |  -.0261159   .0894776    -0.29   0.771    -.2020225    .1497906
             |
       _cons |   -.332236   .2964296    -1.12   0.263     -.914995     .250523
-------------+----------------------------------------------------------------
     sigma_u |  .06864986
     sigma_e |  .10383842
         rho |  .30414563   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0:     F(58, 399) =     2.47             Prob > F = 0.0000

. 
end of do-file

.

Thank you very much in advance !

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#6

14 Jan 2019, 19:50

The -tabstat- command you ran does not check for singletons. I said to -tab- (not -tabstat-) the "dummy" variables (not the continuous ones); and be sure use -if e(sample)- in the -tab- command. I think you will find that one of them is a singleton.

And yes, the problem goes away if you abandon the cluster robust standard errors--singleton dummies are not a problem for ordinary variance estimators.
Comment

Celine Tran

Join Date: Oct 2018
Posts: 46

14 Jan 2019, 22:00

Clyde Schechter Thank you very much. In my regression, the dummy variables are After, Dual and Loss. And here is the result

Code:

tab After if e(sample)

      After |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        240       49.28       49.28
          1 |        247       50.72      100.00
------------+-----------------------------------
      Total |        487      100.00

. tab Dual if e(sample)

       Dual |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        430       88.30       88.30
          1 |         57       11.70      100.00
------------+-----------------------------------
      Total |        487      100.00

. tab Loss if e(sample)

       Loss |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        364       74.74       74.74
          1 |        123       25.26      100.00
------------+-----------------------------------
      Total |        487      100.00

As I understand, there is not any singleton variables in my regression. However, F-Test is still missing

( I am really confused. Whether other reasons results in the missing F-test?

Also, in the case of missing F-test ( because I must be use cluster firmid) , are estimated coefficients biased? I mean could I use these results?

Regards,
Anh

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#8

14 Jan 2019, 22:15

What about the year indicators? Are any of them singletons? Another possibility here is that within some of the firm clusters one of the dummies is a singleton. Since some of your clusters only have 4 observations, that could easily happen just by chance.

In any case, the missing F-test is not anything to be concerned about. Unless your research goals specifically require a test of the omnibus null hypothesis that all of the coefficients in your model are zero, you don't need that F-test. And it is a very unusual research goal that requires that omnibus test.

There is no issue of bias in your coefficients. (In fact, if you look at the results you got when you used the ordinary VCE, the coefficients are exactly the same.) And the standard errors and tests of all the individual coefficients are fine as well. Everything you see there is perfectly usable. The only issue is that your VCE matrix is not of full rank and so the number of coefficients that can be tested simultaneously is smaller than the full number of coefficients. But any tests of groups of coefficients that are small enough to produce a non-missing result are perfectly OK.
2 likes
Comment
Celine Tran

Join Date: Oct 2018

Posts: 46
#9

15 Jan 2019, 00:04

Clyde Schechter Thank you. However, for year indicators, I use i.year, so that I can not use - tab i.year if e(sample). Could I generate year dummy and test whether one of them is singleton?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#10

15 Jan 2019, 00:50

Celine:
quoting -help j_robustsingular-:

Are you using a svy estimator or did you specify the vce(cluster clustvar) option?

The VCE you have just estimated is not of sufficient rank to perform the model test. As discussed in [R] test, the model test with clustered or survey data is distributed as
F(k,d-k+1) or chi2(k), where k is the number of constraints and d=number of clusters or d=number of PSUs minus the number of strata. Because the rank of the VCE is at most d and
the model test reserves 1 degree of freedom for the constant, at most d-1 constraints can be tested, so k must be less than d. The model that you just fit does not meet this
requirement.

To simplify the remaining discussion, let's consider the case of clustered data. This discussion applies to survey estimation in general by substituting, "PSUs - strata" for
"clusters".

There is no mechanical problem with your model, but you need to consider carefully whether any of the reported standard errors mean anything. The theory that justifies the
standard error calculation is asymptotic in the number of clusters, and we have just established that you are estimating at least as many parameters as you have clusters.

That concern aside, the model test statistic issue is that you cannot simultaneously test that all coefficients are zero because there is not enough information. You could test a
subset, but not all, and so Stata refuses to report the overall model test statistic.

Here note the degrees of freedom reported for the chi2 or F. You might see chi2(6) or F(6, 5). If you were to count the number of coefficients that would be constrained to 0 in a
model test in this case, you would find that number to be greater than 6. You could find out what that number is by reestimating the model parameters without the vce(robust) and
vce(cluster clustvar) options (or, for the survey commands, using the corresponding non-svy estimator). In any case, the 6 reported is the maximum number of coefficients that
could be simultaneously tested.

Kind regards,
Carlo
(Stata 19.0)
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#11

15 Jan 2019, 06:14

To second what Clyde Schechter said in #9, Celine Tran there is no problem with the situation that you have encountered. You are spending your time on digging into the details of a particular data configuration, and these details are not interesting at all.

This "overall test of regression significance" is an anachronism of the past. This test used to make lots of sense when econometricians were running regressions with 10 observations and 2 non-constant regressors. Then you would like to know whether your overall regression explains anything...

But you are running a regression with 500 observations and 30 regressors, of course your regression explains a lot. Look at your R-squares, they are on the order of magnitude of 20%, and this by the way is not any big deal of news either with 500 observations and 30 regressors.

You should focus on testing interesting hypotheses motivated by economic theory, and these hypotheses almost never have anything to do with the "overall significance" of a kitchen sink regression with a full set of time and industry dummies. For such interesting hypotheses the rank deficiency of the estimates variance is typically not a problem. E.g., as you see the t-statistics for individual significance of your regressors are all fine.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#12

15 Jan 2019, 21:39

I fully endorse what Joro Kolev says in #11.

While I think it is a waste of your time to pursue this issue any farther, for future reference note that while you cannot -tab i.year-, you can do

Code:

forvalues y = 2000/2017 { tab `y'.year if e(sample) }
Comment
Celine Tran

Join Date: Oct 2018

Posts: 46
#13

16 Jan 2019, 23:16

Clyde Schechter Joro Kolev thank you very much for your help. I believe your explanation. However, I am afraid that reviewers maybe rebut my result when the missing F-test is reported.

Clyde Schechter when I use the code as your suggestion, the result is :

Code:

forvalues y=2000/2017{ 2. tab `y'.year if e(sample) 3. } factor variables and time-series operators not allowed r(101);

Could you please help me/

Thank you very much in advance.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#14

17 Jan 2019, 11:50

Sorry, I thought -tab- accepted factor variable notation. But I'm wrong. You can still get what you need by doing

Code:

tab year if e(sample)

and look for a year where the frequency comes up as 1.

Regarding your concern about reviewers, you can never predict what reviewers will do. Some are very sharp; there are others who are both ignorant and unaware of their ignorance. Suffice it to say that unless the omnibus hypothesis test of all coefficients equaling 0 (a very bizarre hypothesis in your context, I think) is part of your research goal, there is no legitimate reason for a reviewer to challenge it. If you encounter that problem, I would recommend appealing to the editor to either override the reviewer on the matter or get another opinion.

Even assuming you chase down the source of this "problem," how will you fix it? You can omit the offending variable from your model. If it's one of the year indicators then you could combine that year with one of the adjacent years. But clearly both of these involve doing substantive mutilation of your model in order to "solve" what is, in reality, a non-issue. In fact, if I were reviewing a paper that did one of those things, I would criticize it for that!
Comment

Celine Tran

Join Date: Oct 2018
Posts: 46

#15

17 Jan 2019, 22:26

Clyde Schechter Thank you very much. I am agree with your explanation. However, the code you gave me still do not work through. Could you please have a look? I would like to learn for further research.

Code:

 tab year if e(sample)
no observations

. tab year

       year |      Freq.     Percent        Cum.
------------+-----------------------------------
       2000 |          3        0.56        0.56
       2001 |          6        1.12        1.67
       2002 |         15        2.79        4.46
       2003 |         18        3.35        7.81
       2004 |         23        4.28       12.08
       2005 |         33        6.13       18.22
       2006 |         40        7.43       25.65
       2007 |         51        9.48       35.13
       2008 |         46        8.55       43.68
       2009 |         52        9.67       53.35
       2010 |         54       10.04       63.38
       2011 |         42        7.81       71.19
       2012 |         41        7.62       78.81
       2013 |         34        6.32       85.13
       2014 |         30        5.58       90.71
       2015 |         27        5.02       95.72
       2016 |         14        2.60       98.33
       2017 |          9        1.67      100.00
------------+-----------------------------------
      Total |        538      100.00

Thank you.

Announcement

Why F-test is missing? Could you please help!

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment