Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    With probit, there are convergence issues as Carlo Lazzaro points out because estimation is by ML, as well as the possibility of perfect prediction. When you include state dummies, you increase the model degrees of freedom. All these point to an issue with sample size (i.e., you do not have a large enough sample).

    Comment


    • #17
      Thank you, Andrew. It gives clarity.

      Comment


      • #18
        Sorry to bother you again, Andrew. I understood the small sample size issue with -probit-. But why does the -bootstrap- not work in the case of -reg- but work in the case of -reghdfe-? Whether it is i.state in -reg- or a(state) in -reghdfe-, we lose degrees of freedom in both cases.

        Comment


        • #19
          Varsha:
          does the following toy-example mirror the issue you're experiencing?
          Code:
          . use "https://www.stata-press.com/data/r18/nlswork.dta"
          (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
          
          
          . reg ln_wage c.age##c.age, vce(cl idcode)
          
          Linear regression                               Number of obs     =     28,510
                                                          F(2, 4709)        =     701.39
                                                          Prob > F          =     0.0000
                                                          R-squared         =     0.0882
                                                          Root MSE          =     .45654
          
                                       (Std. err. adjusted for 4,710 clusters in idcode)
          ------------------------------------------------------------------------------
                       |               Robust
               ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                   age |   .0855891   .0046943    18.23   0.000     .0763861    .0947922
                       |
           c.age#c.age |  -.0010982   .0000798   -13.76   0.000    -.0012547   -.0009417
                       |
                 _cons |   .1647917   .0655679     2.51   0.012      .036248    .2933354
          ------------------------------------------------------------------------------
          
          . bootstrap, reps(50): reg ln_wage c.age##c.age, vce(cl idcode )
          (running regress on estimation sample)
          
          Bootstrap replications (50): repeated time values within panel
          the most likely cause for this error is misspecifying the cluster(), idcluster(), or group() option
          r(451);
          
          . xtset idcode year
          
          Panel variable: idcode (unbalanced)
           Time variable: year, 68 to 88, but with gaps
                   Delta: 1 unit
          
          . reghdfe ln_wage c.age##c.age , a(idcode) vce(cl idcode)
          (dropped 551 singleton observations)
          (MWFE estimator converged in 1 iterations)
          
          HDFE Linear regression                            Number of obs   =     27,959
          Absorbing 1 HDFE group                            F(   2,   4158) =     507.41
          Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                            R-squared       =     0.6564
                                                            Adj R-squared   =     0.5963
                                                            Within R-sq.    =     0.1087
          Number of clusters (idcode)  =      4,159         Root MSE        =     0.3025
          
                                       (Std. err. adjusted for 4,159 clusters in idcode)
          ------------------------------------------------------------------------------
                       |               Robust
               ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                   age |   .0539076   .0043071    12.52   0.000     .0454634    .0623519
                       |
           c.age#c.age |  -.0005973    .000072    -8.30   0.000    -.0007384   -.0004562
                       |
                 _cons |    .641693   .0624556    10.27   0.000     .5192465    .7641394
          ------------------------------------------------------------------------------
          
          Absorbed degrees of freedom:
          -----------------------------------------------------+
           Absorbed FE | Categories  - Redundant  = Num. Coefs |
          -------------+---------------------------------------|
                idcode |      4159        4159           0    *|
          -----------------------------------------------------+
          * = FE nested within cluster; treated as redundant for DoF computation
          
          . bootstrap, reps(50): reghdfe ln_wage c.age##c.age , a(idcode)
          (running reghdfe on estimation sample)
          
          Bootstrap replications (50): .........10.........20.........30.........40.........50 done
          
          HDFE Linear regression                            Number of obs   =     27,959
          Absorbing 1 HDFE group                            Wald chi2(2)    =    1952.41
                                                            Prob > chi2     =     0.0000
                                                            R-squared       =     0.6564
                                                            Adj R-squared   =     0.5963
                                                            Within R-sq.    =     0.1087
                                                            Root MSE        =     0.3025
          
          ------------------------------------------------------------------------------
                       |   Observed   Bootstrap                         Normal-based
               ln_wage | coefficient  std. err.      z    P>|z|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                   age |   .0539076   .0032862    16.40   0.000     .0474668    .0603485
                       |
           c.age#c.age |  -.0005973   .0000545   -10.97   0.000    -.0007041   -.0004905
                       |
                 _cons |    .641693   .0480176    13.36   0.000     .5475801    .7358058
          ------------------------------------------------------------------------------
          
          Absorbed degrees of freedom:
          -----------------------------------------------------+
           Absorbed FE | Categories  - Redundant  = Num. Coefs |
          -------------+---------------------------------------|
                idcode |      4159           0        4159     |
          -----------------------------------------------------+
          
          .
          In addition (and probably a bit off-topic), over and above the -bootstrap- issues, your -regress- and -reghdfe- codes actually point to two different regressions. To obtain the same results, the panel fixed effect should be included as a predictor in the right-hand side of the regress- equation. This way, the sample estimate (but not the standard error) of the shared coefficients will be the same:
          Code:
          . reg ln_wage c.age##c.age i.idcode i.year if idcode<=3, vce(cl idcode )
          
          Linear regression                               Number of obs     =         39
                                                          F(3, 2)           =          .
                                                          Prob > F          =          .
                                                          R-squared         =     0.8139
                                                          Root MSE          =     .21943
          
                                           (Std. err. adjusted for 3 clusters in idcode)
          ------------------------------------------------------------------------------
                       |               Robust
               ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                   age |   .0773019   .0106911     7.23   0.019     .0313017    .1233021
                       |
           c.age#c.age |  -.0045583    .002264    -2.01   0.182    -.0142995    .0051828
                       |
                idcode |
                    2  |  -.4183815   .0165036   -25.35   0.002    -.4893909   -.3473721
                    3  |   .6579353   .7215294     0.91   0.458    -2.446555    3.762426
                       |
                  year |
                   69  |   .3367906   .0914392     3.68   0.066    -.0566406    .7302218
                   70  |   .2089384   .2867011     0.73   0.542    -1.024637    1.442514
                   71  |   .3144116   .1619035     1.94   0.192     -.382203    1.011026
                   72  |   .5888124   .4958888     1.19   0.357    -1.544825     2.72245
                   73  |   .8912873   .5219448     1.71   0.230     -1.35446    3.137034
                   75  |   1.246958   .6073839     2.05   0.176    -1.366404     3.86032
                   77  |   1.560689   .8626802     1.81   0.212    -2.151125    5.272502
                   78  |   1.941522   1.278416     1.52   0.268    -3.559059    7.442103
                   80  |    2.34498   1.525965     1.54   0.264    -4.220718    8.910678
                   82  |   2.698954   1.663018     1.62   0.246    -4.456435    9.854344
                   83  |   2.994437    1.81452     1.65   0.241    -4.812813    10.80169
                   85  |   3.538578   2.210833     1.60   0.251    -5.973868    13.05102
                   87  |   3.965153   2.460506     1.61   0.248    -6.621548    14.55185
                   88  |    4.40786   2.688929     1.64   0.243    -7.161667    15.97739
                       |
                 _cons |   1.341224   .1489003     9.01   0.012     .7005575     1.98189
          ------------------------------------------------------------------------------
          
          . reghdfe ln_wage c.age##c.age if idcode<=3, a(idcode year) vce(cl idcode )
          (dropped 2 singleton observations)
          (MWFE estimator converged in 4 iterations)
          
          HDFE Linear regression                            Number of obs   =         37
          Absorbing 2 HDFE groups                           F(   2,      2) =     387.46
          Statistics robust to heteroskedasticity           Prob > F        =     0.0026
                                                            R-squared       =     0.8112
                                                            Adj R-squared   =     0.6423
                                                            Within R-sq.    =     0.4300
          Number of clusters (idcode)  =          3         Root MSE        =     0.2251
          
                                           (Std. err. adjusted for 3 clusters in idcode)
          ------------------------------------------------------------------------------
                       |               Robust
               ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                   age |   .0773019   .0099217     7.79   0.016     .0346122    .1199915
                       |
           c.age#c.age |  -.0045583    .002101    -2.17   0.162    -.0135984    .0044817
                       |
                 _cons |   3.523181   1.558736     2.26   0.152     -3.18352    10.22988
          ------------------------------------------------------------------------------
          
          Absorbed degrees of freedom:
          -----------------------------------------------------+
           Absorbed FE | Categories  - Redundant  = Num. Coefs |
          -------------+---------------------------------------|
                idcode |         3           3           0    *|
                  year |        13           0          13     |
          -----------------------------------------------------+
          * = FE nested within cluster; treated as redundant for DoF computation
          Last edited by Carlo Lazzaro; 02 Sep 2024, 01:58.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #20
            Carlo, I have cross-sectional data, so I'm not facing the issue you have tried to show with the example. Also, I have state fixed effects in -reg- as well. I have included them in the macro along with other predictors.

            Comment


            • #21
              Originally posted by Varsha Vaishnav View Post
              Sorry to bother you again, Andrew. I understood the small sample size issue with -probit-. But why does the -bootstrap- not work in the case of -reg- but work in the case of -reghdfe-? Whether it is i.state in -reg- or a(state) in -reghdfe-, we lose degrees of freedom in both cases.
              I already explained this in #16. The degrees of freedom decrease if you include the fixed effects explicitly as dummies. This decrease causes a problem with your sample size. The formula is \(DF= N-k-1\), where \(k\) is the number of slope parameters. So \(k\) is larger with dummies included, although technically the estimator adjusts the standard errors in case one absorbs indicators.



              The relevant comparisons from my suggested commands in #14 are

              Originally posted by Andrew Musau View Post
              bootstrap, reps(50) seed(09012024): regress y2 y1_hat $controls2 i.state if phase, vce(cl cluster)
              bootstrap, reps(50) seed(09012024): reghdfe y2 y1_hat $controls2 i.state if phase, noabsorb vce(cl cluster)
              Here, you face the same issue with reghdfe as with regress. So the estimation command is not the issue. areg allows you to run linear regression absorbing indicators, and is the analog of reghdfe absorbing indicators.
              Last edited by Andrew Musau; 02 Sep 2024, 08:21.

              Comment


              • #22
                Thank you, Andrew. I didn't get the point before. It's clear now.

                Comment

                Working...
                X