Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Clustered standard errors not matching firms

    Hi-

    I have a rather perplexing issue. I am estimating the following equation:

    reg routputpcusd l.routputpcusd fmage worker edurwgt agewgt tenwgt ernbtwgtusd hours, vce(cluster firm)

    This is a panel with 12 years of data for each firm (there are some NA's for the dependent variable).

    The output is telling me that there are 237 groups but according to the data points there are 2976 rows (248 firms) which means my groups are short 11 firms.

    I've gone through by hand several times and made sure that there are no firms with only one data point for the dependent variable.

    Any idea on how I can see which firms are being included in the clustering?

    Below is the output:

    Linear regression Number of obs = 1,198
    F(8, 236) = 54.21
    Prob > F = 0.0000
    R-squared = 0.5196
    Root MSE = 3331.2

    (Std. Err. adjusted for 237 clusters in firm)
    ------------------------------------------------------------------------------
    | Robust
    routputpcusd | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    routputpcusd |
    L1. | .5365922 .0639668 8.39 0.000 .4105734 .6626111
    |
    fmage | 1.054583 6.975469 0.15 0.880 -12.68756 14.79672
    worker | 2.520342 2.371748 1.06 0.289 -2.152161 7.192845
    edurwgt | 50.44194 33.81116 1.49 0.137 -16.1683 117.0522
    agewgt | -37.67292 14.36249 -2.62 0.009 -65.96798 -9.377858
    tenwgt | 61.35397 30.43172 2.02 0.045 1.401443 121.3065
    ernbtwgtusd | 34.78622 9.28194 3.75 0.000 16.50018 53.07226
    hours | 7.347066 6.968288 1.05 0.293 -6.380926 21.07506
    _cons | -112.213 541.8919 -0.21 0.836 -1179.776 955.3502
    ------------------------------------------------------------------------------


    Thank you for your help.

    Jason

  • #2
    Well, without seeing your entire data set, it's impossible to be certain what is happening. But the most common cause of this problem would be missing values in the regression variables. Remember that if any of the regression variables has a missing value, that observation is excluded from the estimation sample. It may be that after that process reduces the estimation sample, there may now be clusters that either have been eliminated altogether or have been reduced to a single observation. So verifying that in your entire data set there are no singleton clusters does not tell you what you need to know. You need to examine just those observations that are in the estimation sample. I would probably start with -levelsof firm if !e(sample)- to see which are the firms that didn't make it into the estimation sample. Then I would explore the data in those firms to see why.

    By the way, in the future, please show Stata output by posting it between code delimiters (see FAQ #12 for instructions how) to maximize its readability. In this case, the ragged alignment of the output didn't really matter, but most of the time one really needs to read the output carefully, and that would be very difficult to do in the current format.

    Comment


    • #3
      Thank you very much for your help and my apologies about not putting the output in the correct format.

      I ran through all of the firm numbers in levels firm if !e(sample) and they match perfectly to what is in the dataset. If I understood your meaning above any firm with a missing variable would be dropped (thus excluded as a group) but that should also be excluded from e(sample), no? If that's true then something different is going on here because I still have the same N but the groups are 11 firms short.

      My apologies again if I didn't understand your explanation correctly.

      Comment


      • #4
        If I understood your meaning above any firm with a missing variable would be dropped (thus excluded as a group) but that should also be excluded from e(sample), no?
        No, that's not what I meant. Each individual observation in the data set that contains a missing value on any variable in the regression is omitted from the estimation sample. It doesn't mean that the firm as a whole is omitted under that circumstance. But what can happen is that in some firms, every single observation for that firm ends up, on its own, eliminated by this rule because each one contains at least one missing value on some model variable. That's what I'm suggesting you should look for.

        Single-observation firms don't contribute to the coefficient estimates in fixed-effects regression (but you're not even doing fixed-effects regression here anyway), because they have no within-firm variation. But they are still in the estimation sample, and they should appear in the count of clusters reported at the start of the regression output. They can interfere with getting cluster robust VCE, but the number of clusters reported should still include the singletons. So singletons are not the issue here.

        something different is going on here because I still have the same N but the groups are 11 firms short.
        But are you saying that the number of observations reported in the regression, 1,198 is the entire data set, but that there are more than 237 firms among them and Stata is only reporting 237? How do you know you really have 248 firms in the data set?

        Run this:

        Code:
        by firm, sort: gen firm_flag = 1 if _n == 1
        tab firm_flag
        The count of observations with firm_flag = 1 will tell you how many different firms there are in your data set.
        Last edited by Clyde Schechter; 28 Apr 2017, 21:24.

        Comment


        • #5
          I did that and it comes back and says 248. I previously verified that using the simple tabstat firm, stat(n) and dividing by 12 for the number of years. If this helps, here below is a sample of my data from that regression (I hope it comes out right). Effectively though i have the 248 firms and when I run

          Code:
          reg newprof2 l.newprof2 image worker edurwge agewgt tenwgt ernbtwgtusd hours, vce(robust firm)
          only clusters 237 of the firms.

          It is possible that there are 11 firms that have at least one missing variable for across each variable and the firm is being eliminated. Hopefully this helps

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(firm year newprof2 fmage worker edurwgt agewgt tenwgt ernbtwgtusd hours)
           1 1991  29547.967 16  31 12.546032  37.59127    8.2209 71.778275 40.72727
           1 1992 -1611.5938 17  28 12.359098  33.39721  7.511809   55.4053       40
           1 1993  119.15405 18  29 11.533333  38.68095   6.44246  49.64608     44.4
           1 1994          . 19   .         .         .         .         .        .
           1 1995          . 20   .         .         .         .         .        .
           1 1996          .  .   .         .         .         .         .        .
           1 1997          .  .   .         .         .         .         .        .
           1 1998          .  .   .         .         .         .         .        .
           1 1999          .  .   .         .         .         .         .        .
           1 2000          .  .   .         .         .         .         .        .
           1 2001          .  .   .         .         .         .         .        .
           1 2002          .  .   .         .         .         .         .        .
           3 1991   12617.12 21  25  9.888889  44.27778  13.38889 111.91695       41
           3 1992   27475.92 22  32 11.064545 37.226696  9.019192  65.71113       40
           3 1993  38332.508 23  40 10.615385  29.95513  6.455662 65.383385       42
           3 1994  26311.254 24  36 10.910076 34.991447 11.937007  29.04348       40
           3 1995  19631.766 25  36 10.910913 35.980404  12.91828  18.31117       40
           3 1996   22649.29 26  49 10.916512  39.85423  12.04869 16.765247       40
           3 1997   12711.49 27  49 10.920675   40.7877 13.004317  12.46151       40
           3 1998   17493.82 28  51     11.82     40.12 12.703333   11.9707       40
           3 1999    11481.1 29  57  11.79643  41.06429 13.677976  9.240684       40
           3 2000          . 30   .         .         .         .         .       40
           3 2001          . 31   .         .         .         .         .       40
           3 2002          . 32   .         .         .         .         .       40
           4 1991          . 43  29  9.552721  48.22789  9.977395   54.4519 41.14286
           4 1992 -445.51715 44  17 10.360745   45.5435 12.703853  57.17431       40
           4 1993          . 45  24  11.08179  37.79576  8.236043  53.40769       40
           4 1994          . 46   .         .         .         .         .        .
           4 1995          . 47   .         .         .         .         .        .
           4 1996          .  .   .         .         .         .         .        .
           4 1997          .  .   .         .         .         .         .        .
           4 1998          .  .   .         .         .         .         .        .
           4 1999          .  .   .         .         .         .         .        .
           4 2000          .  .   .         .         .         .         .        .
           4 2001          .  .   .         .         .         .         .        .
           4 2002          .  .   .         .         .         .         .        .
           5 1991  1446.3203 16  24 11.738095  39.89193 11.991745  79.77148  38.8125
           5 1992   96978.46 17  37  14.15744 36.202717 14.564607   59.6791       40
           5 1993 -18473.496 18  13  11.81212  44.81212  15.11717  59.87683       40
           5 1994          . 19   .         .         .         .         .        .
           5 1995          . 20   .         .         .         .         .        .
           5 1996          .  .   .         .         .         .         .        .
           5 1997          .  .   .         .         .         .         .        .
           5 1998          .  .   .         .         .         .         .        .
           5 1999          .  .   .         .         .         .         .        .
           5 2000          .  .   .         .         .         .         .        .
           5 2001          .  .   .         .         .         .         .        .
           5 2002          .  .   .         .         .         .         .        .
           6 1991 -38061.063 11  34 11.791125   40.1627  4.813372  99.88208     43.8
           6 1992  -41615.83 12  48 10.541832   35.3973  9.313103  61.13635       40
           6 1993  -23089.84 13  31 10.880552  38.68747  7.627473  64.67123 53.71429
           6 1994          . 14   .         .         .         .         .        .
           6 1995          . 15   .         .         .         .         .        .
           6 1996          .  .   .         .         .         .         .        .
           6 1997          .  .   .         .         .         .         .        .
           6 1998          .  .   .         .         .         .         .        .
           6 1999          .  .   .         .         .         .         .        .
           6 2000          .  .   .         .         .         .         .        .
           6 2001          .  .   .         .         .         .         .        .
           6 2002          .  .   .         .         .         .         .        .
           7 1991   9589.297 17 120 12.954382 35.912365  8.947179  108.7541     40.8
           7 1992  33806.844 18  91 11.207092   47.6937 17.281984  50.34266       40
           7 1993          . 19  76      9.88  42.58667 17.668001  39.95833       40
           7 1994   -12674.7 20  81 10.504258  39.15688 17.739515 21.079933       40
           7 1995   11004.76 21  86 10.705798   41.7855 15.157344 16.695282       40
           7 1996          .  .   .         .         .         .         .        .
           7 1997          .  .   .         .         .         .         .        .
           7 1998          .  .   .         .         .         .         .        .
           7 1999          .  .   .         .         .         .         .        .
           7 2000          .  .   .         .         .         .         .        .
           7 2001          .  .   .         .         .         .         .        .
           7 2002          .  .   .         .         .         .         .        .
           8 1991  12233.913 19  96 13.982707   38.4802  5.046356 198.17105 44.33333
           8 1992   91592.96 20  72 12.580282 27.739906  5.489906  83.85126 49.88889
           8 1993   1363.718 21 101 13.424242   31.9798  2.486532  47.14246     47.7
           8 1994  36860.363 22 101        15        31       .75 100.41112 45.44444
           8 1995   49416.77 23 111 11.186363  39.14545  7.030682  29.03164 41.55556
           8 1996   66610.02 24 103 10.620837  40.79272  7.967431  21.01969     50.6
           8 1997    50871.2 25 104  10.88835  41.57281  8.455165  21.59709 50.28571
           8 1998  30795.076 26 117 10.478448  40.80676  10.37021 12.345554 47.85714
           8 1999  19971.895 27 114 10.452803  41.88422 11.412487 13.338804 47.85714
           8 2000  11822.936 28 108         .  42.57632 11.824507  5.980361     41.6
           8 2001   4215.049 29 108         .  39.55841 10.750584  6.143687 41.33333
           8 2002  2192.8323 30 124         .  44.90515 13.984643  7.355155 41.33333
          10 1991  -76.12212 28  11       9.3        31         9  29.40019     55.2
          10 1992   253.5513 29  16 10.033334      34.8 14.633333  94.32981     55.5
          10 1993          . 30  19       8.9      31.4     10.95  41.73991     61.2
          10 1994  113.99525 31  13      9.25 24.333334      6.25  23.28783       66
          10 1995   103.1704 32  16  9.639999  25.41778  7.373333 15.604102       66
          10 1996   4.979248 33   6       8.4        31       8.8 14.747324 58.33333
          10 1997      4.373 34   7       8.4        32       9.8   9.75822 58.33333
          10 1998  34.580616 35  13       7.5     43.75   19.8125  8.183615       48
          10 1999  20.861155 36  10       7.5     44.75   20.8125  9.991505       48
          10 2000  129.64243 37   9         .  51.71429  28.02381  6.984901       58
          10 2001  126.12746 38   9         .  52.71429  29.02381  6.267991       58
          10 2002  112.76437 39   9         .  53.71429  30.02381  5.028958       58
          11 1991  271.86472  4  17    11.125  20.09375       1.5  20.95572     51.6
          11 1992  102.20673  5  27 10.096154 21.173077   1.59375 16.307129       48
          11 1993  24.015545  6  27  9.117806 22.851183 2.9114995  11.87346 47.27273
          11 1994   352.2015  7  31  9.766666 22.333334 2.0916667 10.893758       50
          end
          format %ty year

          Comment


          • #6
            OK. This helps a lot.

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input float(firm year newprof2 fmage worker edurwgt agewgt tenwgt ernbtwgtusd hours)
             1 1991  29547.967 16  31 12.546032  37.59127    8.2209 71.778275 40.72727
             1 1992 -1611.5938 17  28 12.359098  33.39721  7.511809   55.4053       40
             1 1993  119.15405 18  29 11.533333  38.68095   6.44246  49.64608     44.4
             1 1994          . 19   .         .         .         .         .        .
             1 1995          . 20   .         .         .         .         .        .
             1 1996          .  .   .         .         .         .         .        .
             1 1997          .  .   .         .         .         .         .        .
             1 1998          .  .   .         .         .         .         .        .
             1 1999          .  .   .         .         .         .         .        .
             1 2000          .  .   .         .         .         .         .        .
             1 2001          .  .   .         .         .         .         .        .
             1 2002          .  .   .         .         .         .         .        .
             3 1991   12617.12 21  25  9.888889  44.27778  13.38889 111.91695       41
             3 1992   27475.92 22  32 11.064545 37.226696  9.019192  65.71113       40
             3 1993  38332.508 23  40 10.615385  29.95513  6.455662 65.383385       42
             3 1994  26311.254 24  36 10.910076 34.991447 11.937007  29.04348       40
             3 1995  19631.766 25  36 10.910913 35.980404  12.91828  18.31117       40
             3 1996   22649.29 26  49 10.916512  39.85423  12.04869 16.765247       40
             3 1997   12711.49 27  49 10.920675   40.7877 13.004317  12.46151       40
             3 1998   17493.82 28  51     11.82     40.12 12.703333   11.9707       40
             3 1999    11481.1 29  57  11.79643  41.06429 13.677976  9.240684       40
             3 2000          . 30   .         .         .         .         .       40
             3 2001          . 31   .         .         .         .         .       40
             3 2002          . 32   .         .         .         .         .       40
             4 1991          . 43  29  9.552721  48.22789  9.977395   54.4519 41.14286
             4 1992 -445.51715 44  17 10.360745   45.5435 12.703853  57.17431       40
             4 1993          . 45  24  11.08179  37.79576  8.236043  53.40769       40
             4 1994          . 46   .         .         .         .         .        .
             4 1995          . 47   .         .         .         .         .        .
             4 1996          .  .   .         .         .         .         .        .
             4 1997          .  .   .         .         .         .         .        .
             4 1998          .  .   .         .         .         .         .        .
             4 1999          .  .   .         .         .         .         .        .
             4 2000          .  .   .         .         .         .         .        .
             4 2001          .  .   .         .         .         .         .        .
             4 2002          .  .   .         .         .         .         .        .
             5 1991  1446.3203 16  24 11.738095  39.89193 11.991745  79.77148  38.8125
             5 1992   96978.46 17  37  14.15744 36.202717 14.564607   59.6791       40
             5 1993 -18473.496 18  13  11.81212  44.81212  15.11717  59.87683       40
             5 1994          . 19   .         .         .         .         .        .
             5 1995          . 20   .         .         .         .         .        .
             5 1996          .  .   .         .         .         .         .        .
             5 1997          .  .   .         .         .         .         .        .
             5 1998          .  .   .         .         .         .         .        .
             5 1999          .  .   .         .         .         .         .        .
             5 2000          .  .   .         .         .         .         .        .
             5 2001          .  .   .         .         .         .         .        .
             5 2002          .  .   .         .         .         .         .        .
             6 1991 -38061.063 11  34 11.791125   40.1627  4.813372  99.88208     43.8
             6 1992  -41615.83 12  48 10.541832   35.3973  9.313103  61.13635       40
             6 1993  -23089.84 13  31 10.880552  38.68747  7.627473  64.67123 53.71429
             6 1994          . 14   .         .         .         .         .        .
             6 1995          . 15   .         .         .         .         .        .
             6 1996          .  .   .         .         .         .         .        .
             6 1997          .  .   .         .         .         .         .        .
             6 1998          .  .   .         .         .         .         .        .
             6 1999          .  .   .         .         .         .         .        .
             6 2000          .  .   .         .         .         .         .        .
             6 2001          .  .   .         .         .         .         .        .
             6 2002          .  .   .         .         .         .         .        .
             7 1991   9589.297 17 120 12.954382 35.912365  8.947179  108.7541     40.8
             7 1992  33806.844 18  91 11.207092   47.6937 17.281984  50.34266       40
             7 1993          . 19  76      9.88  42.58667 17.668001  39.95833       40
             7 1994   -12674.7 20  81 10.504258  39.15688 17.739515 21.079933       40
             7 1995   11004.76 21  86 10.705798   41.7855 15.157344 16.695282       40
             7 1996          .  .   .         .         .         .         .        .
             7 1997          .  .   .         .         .         .         .        .
             7 1998          .  .   .         .         .         .         .        .
             7 1999          .  .   .         .         .         .         .        .
             7 2000          .  .   .         .         .         .         .        .
             7 2001          .  .   .         .         .         .         .        .
             7 2002          .  .   .         .         .         .         .        .
             8 1991  12233.913 19  96 13.982707   38.4802  5.046356 198.17105 44.33333
             8 1992   91592.96 20  72 12.580282 27.739906  5.489906  83.85126 49.88889
             8 1993   1363.718 21 101 13.424242   31.9798  2.486532  47.14246     47.7
             8 1994  36860.363 22 101        15        31       .75 100.41112 45.44444
             8 1995   49416.77 23 111 11.186363  39.14545  7.030682  29.03164 41.55556
             8 1996   66610.02 24 103 10.620837  40.79272  7.967431  21.01969     50.6
             8 1997    50871.2 25 104  10.88835  41.57281  8.455165  21.59709 50.28571
             8 1998  30795.076 26 117 10.478448  40.80676  10.37021 12.345554 47.85714
             8 1999  19971.895 27 114 10.452803  41.88422 11.412487 13.338804 47.85714
             8 2000  11822.936 28 108         .  42.57632 11.824507  5.980361     41.6
             8 2001   4215.049 29 108         .  39.55841 10.750584  6.143687 41.33333
             8 2002  2192.8323 30 124         .  44.90515 13.984643  7.355155 41.33333
            10 1991  -76.12212 28  11       9.3        31         9  29.40019     55.2
            10 1992   253.5513 29  16 10.033334      34.8 14.633333  94.32981     55.5
            10 1993          . 30  19       8.9      31.4     10.95  41.73991     61.2
            10 1994  113.99525 31  13      9.25 24.333334      6.25  23.28783       66
            10 1995   103.1704 32  16  9.639999  25.41778  7.373333 15.604102       66
            10 1996   4.979248 33   6       8.4        31       8.8 14.747324 58.33333
            10 1997      4.373 34   7       8.4        32       9.8   9.75822 58.33333
            10 1998  34.580616 35  13       7.5     43.75   19.8125  8.183615       48
            10 1999  20.861155 36  10       7.5     44.75   20.8125  9.991505       48
            10 2000  129.64243 37   9         .  51.71429  28.02381  6.984901       58
            10 2001  126.12746 38   9         .  52.71429  29.02381  6.267991       58
            10 2002  112.76437 39   9         .  53.71429  30.02381  5.028958       58
            11 1991  271.86472  4  17    11.125  20.09375       1.5  20.95572     51.6
            11 1992  102.20673  5  27 10.096154 21.173077   1.59375 16.307129       48
            11 1993  24.015545  6  27  9.117806 22.851183 2.9114995  11.87346 47.27273
            11 1994   352.2015  7  31  9.766666 22.333334 2.0916667 10.893758       50
            end
            format %ty year
            
            //    DECLARE PANEL DATA
            xtset firm year
            
            //    CREATE A LIST OF ALL FIRMS
            levelsof firm, local(all_firms)
            
            //    RUN THE REGRESSION
            reg newprof2 l.newprof2  worker  agewgt tenwgt ernbtwgtusd hours, vce(cluster firm)
            
            //    IDENTIFY THE FIRMS THAT ARE INCLUDED IN THE REGRESSION
            levelsof firm if e(sample), local(estimation_sample)
            
            //    IDENTIFY THE FIRMS THAT WERE EXCLUDED
            local missing_firms: list all_firms - estimation_sample
            display `"`missing_firms'"'
            
            //    LIST OBSERVATIONS FROM EXCLUDED FIRMS
            sort firm year
            foreach m of local missing_firms {
                list firm year newprof2 worker agewgt tenwgt ernbtwgtusd hours ///
                    if firm == `m', noobs clean
            }
            Note: I dropped some variables from your regression equation because they don't appear in your example data, and I needed to run the regression. But the logic is unaffected. (In fact, any firm that would be omitted from this regression, that has fewer variables, would automatically also be eliminated from the full one.)

            If you run this, you will see that firm 4 does not make it into the estimation sample. Looking at the observations in firm 4, you can see that all but one is missing at least one of the regression variables. Now, at first glance, the observation for firm 4 in year 1992 looks like it should be included, but, your regression includes a lag of newprof2 as one of the predictor variables. l.newprof2 is the value of newprof2 in 1991, which is missing. So every one of the observations in firm 4 is excluded. I am confident that if you run this code on your full data, with the full regression command, you will find similar results that account for all 11 missing firms.

            Comment


            • #7
              Clyde-

              Sorry, it was getting late. I had a few typo's in my variables. But none-the-less, this worked.

              I am very grateful for your help.

              Thank you so much!

              Jason

              Comment

              Working...
              X