Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Implications of a massive change in a regression's F-test statistic?

    Dear all,

    When running the following regression Stata omits the F-test results.

    HTML Code:
    reg D_ROE_lead1_win c.UCOMP##i.POST D_RET_win D_ROE_win D_logSALES_by2002_win i.sic_Comp_2d i.fyear, vce(cl gvkey) notab
    note: 2005.fyear omitted because of collinearity
    
    Linear regression                               Number of obs     =      4,387
                                                    F(65, 945)        =          .
                                                    Prob > F          =          .
                                                    R-squared         =     0.1559
                                                    Root MSE          =     .18604
    
    
    .
    I suspect this is caused by the singleton dummy problem and check for indicator variables that are nonzero for only one observation or cluster and, amonst other things, indentified that the POST variable (1 if fyears 2003-2005; 0 if 2000-2002) in combination with the sic_COMP_2d variable (which indicates the industry group; 2-digit SIC codes) might cause the problems; in particular the industry groups 1, 22, 78, 83, and 99:

    HTML Code:
    . tab sic_Comp_2d fyear
    
    sic_Comp_2 |                            Fiscal Year
             d |      2000       2001       2002       2003       2004       2005 |     Total
    -----------+------------------------------------------------------------------+----------
             1 |         1          0          0          0          0          1 |         2
            10 |         3          2          2          3          2          2 |        14
            13 |        23         23         20         25         26         25 |       142
            14 |         2          2          2          2          2          2 |        12
            15 |         7          8          8          8          7          8 |        46
            16 |         3          3          3          3          1          1 |        14
            20 |        22         21         21         29         29         25 |       147
            21 |         2          2          1          1          2          2 |        10
            22 |         3          0          0          3          3          3 |        12
            23 |         7          7          6          6          7          8 |        41
            24 |         5          6          6          6          6          5 |        34
            25 |         7          6          7          7          6          8 |        41
            26 |        13         11         15         16         14         13 |        82
            27 |        11         11         11         13         12         10 |        68
            28 |        57         51         49         57         59         55 |       328
            29 |         4          3          5          6          5          5 |        28
            30 |         7          8          8          7          8          7 |        45
            31 |         2          3          5          5          5          5 |        25
            32 |         5          5          5          5          4          4 |        28
            33 |        13         12         13         14         13         14 |        79
            34 |        12          9         10         13         12         12 |        68
            35 |        45         43         51         54         50         46 |       289
            36 |        59         57         55         58         55         51 |       335
            37 |        21         26         26         21         22         21 |       137
            38 |        23         28         34         31         30         32 |       178
            39 |         5          4          5          4          4          5 |        27
            40 |         5          3          3          4          4          5 |        24
            42 |         7          7          8          8          6          6 |        42
            44 |         4          5          4          4          3          4 |        24
            45 |         4          4          4          4          4          4 |        24
            47 |         3          3          2          2          3          2 |        15
            48 |        14         14         14         11         15         15 |        83
            49 |        64         67         65         63         59         57 |       375
            50 |        14         14         17         18         18         16 |        97
            51 |         5          5          7          6          6          6 |        35
            52 |         4          3          3          3          3          3 |        19
            53 |         8          9         11         10          9          9 |        56
            54 |         7          6          6          6          6          6 |        37
            55 |         3          2          2          2          2          2 |        13
            56 |        14         14         13         10         11         13 |        75
            57 |         3          4          4          3          4          5 |        23
            58 |        15         15         17         17         16         11 |        91
            59 |        10         10         10         10         13         12 |        65
            60 |        44         39         41         44         44         45 |       257
            61 |         5          4          5          7          6          5 |        32
            62 |        15         14         13         11         11         13 |        77
            63 |        23         28         34         33         31         31 |       180
            64 |         4          4          5          4          4          3 |        24
            67 |         1          2          1          1          2          2 |         9
            70 |         2          3          3          3          3          2 |        16
            72 |         4          3          3          5          3          2 |        20
            73 |        48         51         50         54         61         57 |       321
            75 |         3          2          2          2          3          3 |        15
            78 |         1          1          1          1          1          1 |         6
            79 |         2          2          1          1          2          3 |        11
            80 |        11          8          8         10          9         10 |        56
            82 |         2          2          3          3          2          2 |        14
            83 |         1          1          1          1          1          1 |         6
            87 |         7          7          8          9          8          5 |        44
            99 |         1          1          0          0          1          1 |         4
    -----------+------------------------------------------------------------------+----------
         Total |       720        708        737        767        758        732 |     4,422
    
    
    .
    If I exclude these industry groups from the regression the F-statistic appears and the problem seems resolved – especially since I would just loose a couple of observations. There are minor changes in magnitude but statistical significance of all the regressors remains unchanged with the constant being the only exeption (it looses its statistical significance).

    HTML Code:
    . reg D_ROE_lead1_win c.UCOMP##i.POST D_RET_win D_ROE_win D_logSALES_by2002_win i.sic_Comp_2d i.fyear if sic_Comp_2d!=1 & sic_Comp_2d!=22 & sic_Comp_2d!=99 & sic_Comp_2d!=78 & sic_Comp_2d!=83, vce(cl gvkey) notab
    note: 2005.fyear omitted because of collinearity
    
    Linear regression                               Number of obs     =      4,357
                                                    F(64, 938)        =       6.35
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.1555
                                                    Root MSE          =     .18645
    
    
    .
    What made me curious though is that if I include sic_Comp_2d!=1 the F-statistic increases dramatically.

    HTML Code:
    . reg D_ROE_lead1_win c.UCOMP##i.POST D_RET_win D_ROE_win D_logSALES_by2002_win i.sic_Comp_2d i.fyear if sic_Comp_2d!=22 & sic_Comp_2d!=99 & sic_Comp_2d!=78 & sic_Comp_2d!=83, vce(cl gvkey) notab
    note: 2005.fyear omitted because of collinearity
    
    Linear regression                               Number of obs     =      4,359
                                                    F(65, 939)        =    2516.96
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.1561
                                                    Root MSE          =     .18643
    
    
    .
    I have been taught that the important aspect is the significance of the F-test and thus I always just check if -Prob > F- is acceptable, more or less ignoring the F-test statistic. I found this odd, however, and wonder whether this translates into any implications regarding the model's regressions results?

    Thank you very much in advance for soothing my inquisitiveness!


    Kind regards,
    Roman



  • #2
    The F-test tests the null hypothesis that all variables in the model are simultaneously not significant. I would pay more attention to the adjusted R-square, which indicates the proportion of the dependent variable's variance the model explains.

    Edit: To answer your question more directly, the increased F-statistic implies you can be more confident that you have some coefficients that are statistically significant. It's an indication of improved model fit. If you had only one variable in your model, the F-statistic would be the square of the T-statistic on that variable's coefficient.
    Last edited by Kris Bitney; 23 Mar 2017, 17:00.

    Comment


    • #3
      There is something very bizarre about these results. If you didn't have the cluster robust VCE, you could calculate the F statistics by hand from R2 and the degrees of freedom: F =( R2/(df1))/((1-R2)/df2). If you apply that formula, for both regressions you come out with numbers in the 2.5 ballpark. The cluster robust VCE changes that, of course, but I wouldn't expect the change to be all that dramatic. The 6.35 value is certainly plausible under the circumstances. But the 2516.96 boggles my mind. My inference is that by removing that restriction and adding 2 more observations to the estimation sample, something drastic has happened to the clustering structure. Perhaps a doubleton cluster was added, or two singleton clusters? (Actually, I wouldn't really expect a doubleton cluster to make that radical a difference, and with singleton clusters I would expect a missing F-statistic.) So I'm really puzzled by what's going on here?

      Can you shed light on what the clustering of those two added observations is (and whether their cluster also includes other members of the original estimation sample)?

      Comment


      • #4
        Dear Kris, dear Clyde,

        First of all: Thank you very much for your reply! This forum is amazing!

        Kris Bitney: Thanks for the clarification. While this actually makes total sense, I never realised that the F-statistic itself indicates an improved model fit. Regarding the adj. R2: there's just a slight difference between the two regressions (as the output below suggests adj. R2 to be 14.32 and 14.28 respectively), which contributes to the confusion, I assume.



        Clyde Schechter: The effect indeed seems related to the cluster robust VCE. If left out, the vast difference disappears.

        HTML Code:
        . reg D_ROE_lead1_win c.UCOMP##i.POST D_RET_win D_ROE_win D_logSALES_by2002_win i.sic_Comp_2d i.fyear if sic_Comp_2d!=22 & sic_Comp_2d!=99 & sic_Comp_2d!=7
        > 8 & sic_Comp_2d!=83 & fyear>=2000 & fyear<=2005, notab
        note: 2005.fyear omitted because of collinearity
        
              Source |       SS           df       MS      Number of obs   =     4,359
        -------------+----------------------------------   F(65, 4293)     =     12.21
               Model |  27.5810237        65  .424323442   Prob > F        =    0.0000
            Residual |  149.226927     4,293  .034760523   R-squared       =    0.1560
        -------------+----------------------------------   Adj R-squared   =    0.1432
               Total |  176.807951     4,358  .040570893   Root MSE        =    .18644
        
        
        . reg D_ROE_lead1_win c.UCOMP##i.POST D_RET_win D_ROE_win D_logSALES_by2002_win i.sic_Comp_2d i.fyear if sic_Comp_2d!=1 & sic_Comp_2d!=22 & sic_Comp_2d!=99
        >  & sic_Comp_2d!=78 & sic_Comp_2d!=83 & fyear>=2000 & fyear<=2005, notab
        note: 2005.fyear omitted because of collinearity
        
              Source |       SS           df       MS      Number of obs   =     4,357
        -------------+----------------------------------   F(64, 4292)     =     12.34
               Model |   27.458285        64  .429035704   Prob > F        =    0.0000
            Residual |  149.226323     4,292  .034768482   R-squared       =    0.1554
        -------------+----------------------------------   Adj R-squared   =    0.1428
               Total |  176.684608     4,356  .040561205   Root MSE        =    .18646
        
        
        . 
        However, something else is puzzling: If I, as Clyde suggested, use the original sample (i.e. when I do not keep fyears 2000-2005 only, the sample then consists of fyears 1997-2006; note that this is still a subsample as fyears are only included when CEO tenure is >=3fyears) but specify that only fyears 2000-2005 are used, I cannot recreate the output. While I should get the same as when I keep 2000-2005 observations only, I get the following instead:

        HTML Code:
        . reg D_ROE_lead1_win c.UCOMP##i.POST D_RET_win D_ROE_win D_logSALES_by2002_win i.sic_Comp_2d i.fyear if sic_Comp_2d!=1 & sic_Comp_2d!=22 & sic_Comp_2d!=99 & sic_Comp
        > _2d!=78 & sic_Comp_2d!=83 & fyear>=2000 & fyear<=2005, vce(cl gvkey) notab
        note: 2005.fyear omitted because of collinearity
        
        Linear regression                               Number of obs     =      4,357
                                                        F(64, 938)        =       6.39
                                                        Prob > F          =     0.0000
                                                        R-squared         =     0.1554
                                                        Root MSE          =     .18646
        
        
        
        . reg D_ROE_lead1_win c.UCOMP##i.POST D_RET_win D_ROE_win D_logSALES_by2002_win i.sic_Comp_2d i.fyear if sic_Comp_2d!=22 & sic_Comp_2d!=99 & sic_Comp_2d!=78 & sic_Com
        > p_2d!=83 & fyear>=2000 & fyear<=2005, vce(cl gvkey) notab
        note: 2005.fyear omitted because of collinearity
        
        Linear regression                               Number of obs     =      4,359
                                                        F(64, 939)        =          .
                                                        Prob > F          =          .
                                                        R-squared         =     0.1560
                                                        Root MSE          =     .18644
        When I run the regressions with the sample covering all fyears (1997-2006) I get similar results:

        HTML Code:
        . reg D_ROE_lead1_win c.UCOMP##i.POST D_RET_win D_ROE_win D_logSALES_by2002_win i.sic_Comp_2d i.fyear if sic_Comp_2d!=1 & sic_Comp_2d!=22 & sic_Comp_2d!=99 & sic_Comp
        > _2d!=78 & sic_Comp_2d!=83, vce(cl gvkey) notab
        note: 2005.fyear omitted because of collinearity
        
        Linear regression                               Number of obs     =      5,045
                                                        F(65, 939)        =       7.30
                                                        Prob > F          =     0.0000
                                                        R-squared         =     0.1492
                                                        Root MSE          =     .18444
        
        
        . 
        
        
        . reg D_ROE_lead1_win c.UCOMP##i.POST D_RET_win D_ROE_win D_logSALES_by2002_win i.sic_Comp_2d i.fyear if sic_Comp_2d!=22 & sic_Comp_2d!=99 & sic_Comp_2d!=78 & sic_Com
        > p_2d!=83, vce(cl gvkey) notab
        note: 2005.fyear omitted because of collinearity
        
        Linear regression                               Number of obs     =      5,048
                                                        F(65, 940)        =          .
                                                        Prob > F          =          .
                                                        R-squared         =     0.1495
                                                        Root MSE          =      .1845
        
        
        .

        Regarding the cluster -1.sic_COMP_2d- with the two observations: There are actually four observations if the time frame isn't limited to the fiscal years I am interested in; i.e. years 2000-2005.

        HTML Code:
        . list gvkey fyear D_ROE_lead1_win UCOMP POST D_RET_win D_ROE_win D_logSALES_by2002_win if sic_Comp_2d==1
        
              +---------------------------------------------------------------------------------+
              | gvkey   fyear   D_ROE_le~       UCOMP   POST   D_RET_win   D_ROE_win   D_logS.. |
              |---------------------------------------------------------------------------------|
        5144. | 28524    1999    .4124693    .6136166      0   -.9010378    .0614858   .2821875 |
        5145. | 28524    2000   -.3257052   -.7923633      0    .2058951    .4124693   .1125178 |
        5146. | 28524    2005   -.1437912   -.1056713      1   -.0116722    .2342666   .1174688 |
        5147. | 28524    2006           .     .284418      1    .5846176   -.1437912    .099791 |
              +---------------------------------------------------------------------------------+
        
        . 



        I take a more detailed look at the data and realise that the industry IDs (-sic_COMP_2d-) that I identified as having an effect on whether or not the F-test results are omitted are indeed those that hold just one company each (-gvkey- being the firm ID), except for -22.sic_COMP_2d-. Maybe this contributes to the strong reaction?

        HTML Code:
        . unique gvkey, by(sic_Comp_2d) gen(firms_per_industry)
        Number of unique values of gvkey is  950
        Number of records is  5856
        variable firms_per_industry contains number of unique values of gvkey by sic_Comp_2d
        
          +---------------------+
          | sic_C~2d   firms_~y |
          |---------------------|
          |        1          1 |
          |       10          3 |
          |       13         30 |
          |       14          2 |
          |       15          9 |
          |---------------------|
          |       16          3 |
          |       20         32 |
          |       21          2 |
          |       22          3 |
          |       23          8 |
          |---------------------|
          |       24          7 |
          |       25          9 |
          |       26         18 |
          |       27         14 |
          |       28         70 |
          |---------------------|
          |       29          6 |
          |       30          9 |
          |       31          5 |
          |       32          5 |
          |       33         17 |
          |---------------------|
          |       34         15 |
          |       35         62 |
          |       36         72 |
          |       37         30 |
          |       38         44 |
          |---------------------|
          |       39          7 |
          |       40          5 |
          |       42          8 |
          |       44          5 |
          |       45          5 |
          |---------------------|
          |       47          3 |
          |       48         18 |
          |       49         83 |
          |       50         20 |
          |       51          8 |
          |---------------------|
          |       52          4 |
          |       53         12 |
          |       54          7 |
          |       55          3 |
          |       56         16 |
          |---------------------|
          |       57          6 |
          |       58         19 |
          |       59         15 |
          |       60         51 |
          |       61          7 |
          |---------------------|
          |       62         16 |
          |       63         38 |
          |       64          5 |
          |       67          2 |
          |       70          3 |
          |---------------------|
          |       72          5 |
          |       73         70 |
          |       75          3 |
          |       78          1 |
          |       79          4 |
          |---------------------|
          |       80         11 |
          |       82          3 |
          |       83          1 |
          |       87          9 |
          |       99          1 |
          +---------------------+
        
        . 

        Here's the table that shows the observations of fiscal years by each industry group, which helps identifying singleton affects in respect of the indicator variable -POST- (0 if fyear<=2002; 1 if fyear>2002).

        HTML Code:
        . tab sic_Comp_2d fyear
        
        sic_Comp_2 |                                       Fiscal Year
                 d |      1999       2000       2001       2002       2003       2004       2005       2006 |     Total
        -----------+----------------------------------------------------------------------------------------+----------
                 1 |         1          1          0          0          0          0          1          1 |         4 
                10 |         2          3          2          2          3          2          2          3 |        19 
                13 |        24         23         23         20         25         26         25         26 |       192 
                14 |         2          2          2          2          2          2          2          2 |        16 
                15 |         8          7          8          8          8          7          8          9 |        63 
                16 |         3          3          3          3          3          1          1          2 |        19 
                20 |        18         22         21         21         29         29         25         24 |       189 
                21 |         2          2          2          1          1          2          2          2 |        14 
                22 |         3          3          0          0          3          3          3          2 |        17 
                23 |         8          7          7          6          6          7          8          6 |        55 
                24 |         4          5          6          6          6          6          5          3 |        41 
                25 |         8          7          6          7          7          6          8          7 |        56 
                26 |        13         13         11         15         16         14         13         13 |       108 
                27 |        11         11         11         11         13         12         10          9 |        88 
                28 |        54         57         51         49         57         59         55         58 |       440 
                29 |         6          4          3          5          6          5          5          5 |        39 
                30 |         7          7          8          8          7          8          7          7 |        59 
                31 |         2          2          3          5          5          5          5          5 |        32 
                32 |         4          5          5          5          5          4          4          5 |        37 
                33 |        14         13         12         13         14         13         14         15 |       108 
                34 |        12         12          9         10         13         12         12         12 |        92 
                35 |        45         45         43         51         54         50         46         46 |       380 
                36 |        54         59         57         55         58         55         51         49 |       438 
                37 |        19         21         26         26         21         22         21         23 |       179 
                38 |        30         23         28         34         31         30         32         37 |       245 
                39 |         5          5          4          5          4          4          5          5 |        37 
                40 |         5          5          3          3          4          4          5          3 |        32 
                42 |         7          7          7          8          8          6          6          5 |        54 
                44 |         3          4          5          4          4          3          4          5 |        32 
                45 |         4          4          4          4          4          4          4          4 |        32 
                47 |         3          3          3          2          2          3          2          2 |        20 
                48 |        14         14         14         14         11         15         15         13 |       110 
                49 |        57         64         67         65         63         59         57         65 |       497 
                50 |        14         14         14         17         18         18         16         15 |       126 
                51 |         6          5          5          7          6          6          6          6 |        47 
                52 |         4          4          3          3          3          3          3          3 |        26 
                53 |         9          8          9         11         10          9          9          9 |        74 
                54 |         5          7          6          6          6          6          6          5 |        47 
                55 |         3          3          2          2          2          2          2          0 |        16 
                56 |        14         14         14         13         10         11         13         13 |       102 
                57 |         3          3          4          4          3          4          5          4 |        30 
                58 |        16         15         15         17         17         16         11         11 |       118 
                59 |        12         10         10         10         10         13         12         10 |        87 
                60 |        40         44         39         41         44         44         45         42 |       339 
                61 |         4          5          4          5          7          6          5          5 |        41 
                62 |        14         15         14         13         11         11         13         12 |       103 
                63 |        27         23         28         34         33         31         31         32 |       239 
                64 |         5          4          4          5          4          4          3          3 |        32 
                67 |         1          1          2          1          1          2          2          2 |        12 
                70 |         2          2          3          3          3          3          2          2 |        20 
                72 |         3          4          3          3          5          3          2          3 |        26 
                73 |        46         48         51         50         54         61         57         50 |       417 
                75 |         2          3          2          2          2          3          3          2 |        19 
                78 |         1          1          1          1          1          1          1          1 |         8 
                79 |         3          2          2          1          1          2          3          3 |        17 
                80 |        10         11          8          8         10          9         10         10 |        76 
                82 |         3          2          2          3          3          2          2          3 |        20 
                83 |         0          1          1          1          1          1          1          1 |         7 
                87 |         8          7          7          8          9          8          5          5 |        57 
                99 |         1          1          1          0          0          1          1          1 |         6 
        -----------+----------------------------------------------------------------------------------------+----------
             Total |       708        720        708        737        767        758        732        726 |     5,856 

        Is there any other/better/more efficient way to identify singleton or doubleton clusters?

        What does this mean for my model? I guess leaving out the industry groups that hold only one firm is fair, so I would circumvent the odd problem regarding the F-statistic report. However, the question remains if there are some underlying effects that I should sort out somehow...


        Comment


        • #5
          Singleton clusters are very problematic for models estimated with cluster robust VCE. Since in your case they only amount to 4 observations in a data set of several thousand, I would not hesitate to exclude them from the analysis. You can, when you write up your results, mention as a limitation that those particular industries could not be included because of inadequate data.

          As for an efficient way of identifying singleton clusters:

          Code:
          by sic_Comp_2d, sort: gen byte singleton = (_N == 1)
          list sic_Comp_2d if singleton

          Comment


          • #6
            Thank you very much, Clyde!

            I'll do as you suggested and indicate the exclusion of the observations that cause problems. Note however that 4 firms each solely make up an industry group on their own, which seems to cause problems in interaction with the cluster robust VCE. This amounts to a total of 18 firm years being excluded – which still remains rather marginal: 0.41 per cent of the sample.

            HTML Code:
            . list gvkey fyear POST sic_Comp_2d if sic_Comp_2d==1 | sic_Comp_2d==78 | sic_Comp_2d==83 | sic_Comp_2d==99, sepby(gvkey)
            
                  +---------------------------------+
                  | gvkey   fyear   POST   sic_C~2d |
                  |---------------------------------|
               1. | 28524    2000      0          1 |
               2. | 28524    2005      1          1 |
                  |---------------------------------|
            4282. |  7022    2000      0         78 |
            4283. |  7022    2001      0         78 |
            4284. |  7022    2002      0         78 |
            4285. |  7022    2003      1         78 |
            4286. |  7022    2004      1         78 |
            4287. |  7022    2005      1         78 |
                  |---------------------------------|
            4369. | 62967    2000      0         83 |
            4370. | 62967    2001      0         83 |
            4371. | 62967    2002      0         83 |
            4372. | 62967    2003      1         83 |
            4373. | 62967    2004      1         83 |
            4374. | 62967    2005      1         83 |
                  |---------------------------------|
            4419. |  5047    2000      0         99 |
            4420. |  5047    2001      0         99 |
            4421. |  5047    2004      1         99 |
            4422. |  5047    2005      1         99 |
                  +---------------------------------+
            
            .

            When running the code you provided, there are no singleton clusters identified. When I modify the code to the inclusion of -POST- the two observations of -1.sic_COMP_2d- are identified.

            Code:
            by sic_Comp_2d POST, sort: gen byte singleton2 = (_N == 1)
            list sic_Comp_2d if singleton2
            HTML Code:
            . list sic_Comp_2d if singleton2
            
                  +----------+
                  | sic_C~2d |
                  |----------|
               1. |        1 |
               2. |        1 |
                  +----------+
            
            .
            The other problematic -sic_COMP_2d- (i.e. 78, 83, 99) are logically not identified this way, as these are no singleton clusters. I suspect, the problems caused by these are related to the firm level (-gvkey-) cluster robust VCE in interaction with the industry level factor variables (-sic_COMP_2d-), which both consist of the very same observations for -78.sic_COMP_2d-, -83.sic_COMP_2d-, and -99.sic_COMP_2d-, as these industry-IDs are consist of just one firm each. Does this make sense on any level? Anyhow, excluding these firms – and consequently the industry-IDs – results in Stata reporting F-statistics again.


            I thank you for your kind help!


            Kind regards,
            Roman
            Last edited by Roman Vanderson; 27 Mar 2017, 01:52.

            Comment

            Working...
            X