Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • fixed effect model in xtivreg2 and singletons

    I am running a fixed-effect model using xtivreg2 in stata, where my panel variable is firms and my time variable is years. It is survey data that tracks firms over year through different iterations of the survey. Let me begin by saying that in my original dataset, the panel variable idpanel is always repeated at least once. That is, there are no firms which are observed for only one year. There are also no missing idpanel. The command is:

    xtivreg2 depvar var1 var2 (var3= instrument), fe ro

    What happens is that xtivreg2 gives me a warning:

    Warning - singleton groups detected. 4552 observation(s) not used.

    Such that the number of observations for the first-stage regression is 17489, and the same for the second stage. Therefore this command is dropping observations during the first-stage regression part. So I proceeded to replicate the first stage regression using:

    xtreg var3 var1 var2 instrument, fe ro

    And the number of observations, 26,370, is significantly higher than its first-stage regression counterpart in xtivreg2. The coefficients are also different.

    After some exploring, I've noticed that the reason is because in the first-stage regression of xtivreg2 the command drops observations whenever there is a missing in any of the covariates or the dependent variables. Afterwards it recalculates the panel, and of course it finds now some firms for which the observation has been dropped in a given year because of the aforementioned missings, but not in other years, hence it has become a singleton.

    My question therefore is: why. From a theoretical point of view, a replication of the first-stage regression using xtreg should present the same results (coefficients) as the first-stage calculated within xtivreg2, no?. In my case it doesn't. Thank you all in advance.

    See also my post on Stack Overflow:
    https://stackoverflow.com/questions/...34317_58990690
    Last edited by Pietro Bomprezzi; 22 Nov 2019, 03:18. Reason: posting Stack Overflow cross-post URL

  • #2
    Please give a URL for your cross-posting on Stack Overflow (see our policy on cross-posting in the FAQ Advice).

    Comment


    • #3
      Pietro:
      welcome to this forum.
      Stata applies listwise deletion to observations with missing values in any variable.
      Hence, it sounds strange that you do not experience a reduction in the number of observations when you go -xtreg-.
      To have a more precise idea of what's going on with your estimates, as per FAQ please post (within CODE delimiters) what you typed and what Stata gave you back. Thanks.
      Kind regards,
      Carlo
      (Stata 18.0 SE)

      Comment


      • #4
        Please give a URL for your cross-posting on Stack Overflow (see our policy on cross-posting in the FAQ Advice).

        Sorry; duplicated post.

        Comment


        • #5
          It appears that Pietro has edited the original post and provided the link to his cross-post at Stack Overflow. As Carlo states, the issue here has to do with listwise deletion of missing values, so

          Code:
          xtivreg2 depvar var1 var2 (var3= instrument), fe ro
          implies the following first-stage fixed effects regression

          Code:
          xtreg var3 instrument var1 var2 if !missing(depvar), fe ro

          Comment


          • #6
            Hello Carlo. I agree with you in principle. In fact, it is exactly when I do a manual listwise deletion (drop if missing(myvars)) of the all the variables included in my regressions, that I get an xtreg that is identical to the first-stage of my ivreg2.

            Oddly enough, now that I was replicating in my dataset what I was explaining to you so I could show the code and output, it seems to work as it should and the problem no longer arises. I cannot explain why. I think I have to confront myself again with my coauthor and see if it is a communication issue.

            As a new member of this forum, would it be more correct for me to delete this thread and post again if I can pinpoint the issue or should I leave as is? Thanks everyone.

            Comment


            • #7
              Pietro:
              thanks for your feedback.
              The rule is to do our best to close the thread we started with all the details we consider useful for others who may come across the same problem in the future.
              Kind regards,
              Carlo
              (Stata 18.0 SE)

              Comment


              • #8
                Threads can’t be deleted unilaterally. This is explained in the FAQ Advice.

                Comment


                • #9
                  Dear Carlo,

                  I have resolved the issue. It was due to the fact that we were using two different sets of industry*year dummies in the two regressions. In xtivreg2, we used industry-year dummies generated beforehand. In xtreg instead, we were using the:
                  Code:
                  i.industry#i.year
                  option. These two sets of dummies were different (I don't know why but that is another question) and the result was that when the listwise deletion was carried, different observations were deleted.

                  Thanks again for the suggestion and enlightment with regards to this concept of listwise deletion.

                  Regards
                  Pietro
                  Last edited by Pietro Bomprezzi; 22 Nov 2019, 07:19.

                  Comment


                  • #10
                    Pietro:
                    thanks for sharing further details.
                    In all likelihood, the issue hinges on the (different) years that in the two codes were used as reference category (ie, the omitted one to avoid dummy trap).
                    Kind regards,
                    Carlo
                    (Stata 18.0 SE)

                    Comment


                    • #11
                      Hello all,

                      I have to use lagged explanatory variables in xtivreg2 that leads to singletons which I am not able to detect. I must mention that when I use the present value of the explanatory variables, there are no singletons detected (observations and number of groups in FE estimation = observations and number of groups in xtivreg2 estimation). But, this changes while I use the lagged form of explanatory variables (xtivreg2 drops singletons while in FE estimation the number of observations and groups is higher). Although, I could do the manual checking and detect these firms which might not be having a lagged value of a certain variable (i.e. singleton) and delete them manually. But, given the plethora of variables, I am curious if there can be a way in Stata to detect this. Even if I dont detect, and wanted to use the same sample that enters into xtivreg2 to go for FE estimation, I am not able to fetch the same sample. Any help would be greatly appreciated.

                      Here are the codes and results [attaching with only a subset of variables used]:

                      Fixed Effects Estimation:
                      Code:
                      . eststo: xtreg TobinsQ c.L1.BLEV##c.L1.BLEV L1.INV, fe vce (robust)
                      
                      Fixed-effects (within) regression               Number of obs     =      4,682
                      Group variable: Unique_Ide~r                    Number of groups  =        503
                      
                      R-sq:                                           Obs per group:
                           within  = 0.0222                                         min =          1
                           between = 0.0973                                         avg =        9.3
                           overall = 0.0746                                         max =         16
                      
                                                                      F(3,502)          =       6.40
                      corr(u_i, Xb)  = 0.1395                         Prob > F          =     0.0003
                      
                                             (Std. Err. adjusted for 503 clusters in Unique_Identifier)
                      ---------------------------------------------------------------------------------
                                      |               Robust
                              TobinsQ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      ----------------+----------------------------------------------------------------
                                 BLEV |
                                  L1. |  -2.627809   .7501255    -3.50   0.001    -4.101581   -1.154037
                                      |
                      cL.BLEV#cL.BLEV |   2.578923   .8093329     3.19   0.002     .9888254     4.16902
                                      |
                                  INV |
                                  L1. |   .0262735   .0101926     2.58   0.010      .006248     .046299
                                      |
                                _cons |   1.600565   .1375829    11.63   0.000     1.330256    1.870874
                      ----------------+----------------------------------------------------------------
                              sigma_u |  .91934071
                              sigma_e |  .75080102
                                  rho |  .59989611   (fraction of variance due to u_i)
                      ---------------------------------------------------------------------------------
                      (est14 stored)
                      IV Estimation:
                      Code:
                      . eststo: xtivreg2 TobinsQ (L1.BLEV = L1.CVA_G L1.CVA_Square L1.NDTS L1.NDTS_Square) L1.INV , 
                      > fe robust small endog(L1.BLEV)
                      Warning - singleton groups detected.  5 observation(s) not used.
                      
                      FIXED EFFECTS ESTIMATION
                      ------------------------
                      Number of groups =       498                    Obs per group: min =         2
                                                                                     avg =       9.4
                                                                                     max =        16
                      
                      IV (2SLS) estimation
                      --------------------
                      
                      Estimates efficient for homoskedasticity only
                      Statistics robust to heteroskedasticity
                      
                                                                            Number of obs =     4677
                                                                            F(  2,  4177) =    11.43
                                                                            Prob > F      =   0.0000
                      Total (centered) SS     =  2407.372783                Centered R2   =  -0.0949
                      Total (uncentered) SS   =  2407.372783                Uncentered R2 =  -0.0949
                      Residual SS             =  2635.773598                Root MSE      =    .7944
                      
                      ------------------------------------------------------------------------------
                                   |               Robust
                           TobinsQ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                              BLEV |
                               L1. |  -3.346971   .7349992    -4.55   0.000    -4.787961   -1.905982
                                   |
                               INV |
                               L1. |    .037986   .0190524     1.99   0.046     .0006331    .0753388
                      ------------------------------------------------------------------------------
                      Underidentification test (Kleibergen-Paap rk LM statistic):            115.184
                                                                         Chi-sq(4) P-val =    0.0000
                      ------------------------------------------------------------------------------
                      Weak identification test (Cragg-Donald Wald F statistic):               47.622
                                               (Kleibergen-Paap rk Wald F statistic):         33.857
                      Stock-Yogo weak ID test critical values:  5% maximal IV relative bias    16.85
                                                               10% maximal IV relative bias    10.27
                                                               20% maximal IV relative bias     6.71
                                                               30% maximal IV relative bias     5.34
                                                               10% maximal IV size             24.58
                                                               15% maximal IV size             13.96
                                                               20% maximal IV size             10.26
                                                               25% maximal IV size              8.31
                      Source: Stock-Yogo (2005).  Reproduced by permission.
                      NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
                      ------------------------------------------------------------------------------
                      Hansen J statistic (overidentification test of all instruments):        22.444
                                                                         Chi-sq(3) P-val =    0.0001
                      -endog- option:
                      Endogeneity test of endogenous regressors:                               5.306
                                                                         Chi-sq(1) P-val =    0.0213
                      Regressors tested:    L.BLEV
                      ------------------------------------------------------------------------------
                      Instrumented:         L.BLEV
                      Included instruments: L.INV
                      Excluded instruments: L.CVA_G L.CVA_Square L.NDTS L.NDTS_Square
                      ------------------------------------------------------------------------------
                      (est15 stored)
                      Generating sample used for IV
                      Code:
                      gen IV_Sample = e(sample)
                      Again using that same sample for FE estimation but number of observations still include singletons and is same as previous FE

                      Code:
                      eststo: xtreg TobinsQ c.L1.BLEV##c.L1.BLEV L1.INV if IV_Sample, fe vce (robust)
                      
                      Fixed-effects (within) regression               Number of obs     =      4,682
                      Group variable: Unique_Ide~r                    Number of groups  =        503
                      
                      R-sq:                                           Obs per group:
                           within  = 0.0222                                         min =          1
                           between = 0.0973                                         avg =        9.3
                           overall = 0.0746                                         max =         16
                      
                                                                      F(3,502)          =       6.40
                      corr(u_i, Xb)  = 0.1395                         Prob > F          =     0.0003
                      
                                             (Std. Err. adjusted for 503 clusters in Unique_Identifier)
                      ---------------------------------------------------------------------------------
                                      |               Robust
                              TobinsQ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      ----------------+----------------------------------------------------------------
                                 BLEV |
                                  L1. |  -2.627809   .7501255    -3.50   0.001    -4.101581   -1.154037
                                      |
                      cL.BLEV#cL.BLEV |   2.578923   .8093329     3.19   0.002     .9888254     4.16902
                                      |
                                  INV |
                                  L1. |   .0262735   .0101926     2.58   0.010      .006248     .046299
                                      |
                                _cons |   1.600565   .1375829    11.63   0.000     1.330256    1.870874
                      ----------------+----------------------------------------------------------------
                              sigma_u |  .91934071
                              sigma_e |  .75080102
                                  rho |  .59989611   (fraction of variance due to u_i)
                      ---------------------------------------------------------------------------------
                      (est16 stored)
                      Please help in knowing how could this be resolved? Waiting for reply.

                      many thanks and regards,
                      Mohina

                      Comment


                      • #12

                        Here is the dataset:

                        I can see that the variable INV doesn't have the first period value for each Unique firm since it measures the change in cap.expenditure. The problem is identifying exactly those firms that are singleton (given lagged consideration) so that I can drop them in my FE estimation.

                        Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input long Unique_Identifier int Year double BLEV float INV double(CVA_G NDTS)
                        1 2001  .055007051676511765           .  .5007052421569824 .014104371890425682
                        1 2002   .10846561193466187   .08465608  .4761904776096344  .02777777798473835
                        1 2003   .08745874464511871  -.01650165  .6303630471229553   .0445544570684433
                        1 2010   .09751037508249283  -.02282158  .5871369242668152 .033195022493600845
                        1 2011   .24752475321292877   .13861386  .8613861203193665   .0445544570684433
                        1 2012     .429175466299057  .019027485  .7906976938247681  .04862579330801964
                        1 2013    .4252873659133911  .011494253  .7394636273384094  .04789271950721741
                        1 2015  .014975041151046753   .03327787  .8186355829238892  .08985024690628052
                        1 2016  .012066365219652653  -.05429864  .7043740749359131  .07390648871660233
                        1 2017   .16565480828285217   .01908066 .45186468958854675  .03035559318959713
                        2 2001   .41645878553390503           .  .6230601668357849 .026731086894869804
                        2 2002    .3669382333755493 -.014112033  .6036447882652283 .027792584151029587
                        2 2003   .30021658539772034 .0034330215  .5661951303482056  .02333993837237358
                        2 2004    .2761489748954773   .06977824  .5933471322059631 .022444024682044983
                        2 2005    .3158489763736725    .1605357  .6698573231697083  .02379419468343258
                        2 2006    .2615233063697815  .065879814  .7014051675796509  .02791675552725792
                        2 2007   .24146167933940887  .031055775  .6141208410263062 .025134121999144554
                        2 2008    .2755413055419922   .03044634  .5260298848152161  .02524818293750286
                        2 2009   .30744194984436035    .1022892  .5899816751480103 .026857752352952957
                        2 2010    .2788902819156647   .04516661  .6195094585418701 .024625148624181747
                        2 2011   .27633315324783325  .031976607  .6047827005386353 .025127828121185303
                        2 2012   .25731614232063293   .09208064  .6753150820732117 .024048035964369774
                        2 2013   .22729633748531342   .05986407   .666262686252594 .026902178302407265
                        2 2014    .1844494342803955   .07157767  .6651003956794739 .027743402868509293
                        2 2015   .24228519201278687   .11656328  .6964685320854187  .04307246208190918
                        2 2016   .22154110670089722   .05725962  .7191405892372131  .03731784597039223
                        2 2017    .1777629256248474 -.007602268  .7002505660057068 .038490671664476395
                        3 2002   .24770642817020416           .   .608562707901001  .03363914415240288
                        3 2003   .11974109709262848  -.03236246  .6504854559898377  .03559870645403862
                        3 2004   .01923076994717121  -.04807692  .5608974099159241  .07051282376050949
                        3 2005  .012944984249770641  -.04854369    .53721684217453  .07119741290807724
                        3 2006   .00872093066573143 -.069767445  .5261628031730652  .06395348906517029
                        3 2008   .02278481051325798 -.005063291 .42784810066223145  .06329113990068436
                        3 2009 .0062500000931322575     -.00625 .34166666865348816   .0520833320915699
                        3 2010   .01149425283074379  -.01313629  .4761904776096344 .031198685988783836
                        3 2011   .04881450533866882   -.0027894  .3235704302787781  .00976290088146925
                        3 2012  .009208102710545063 -.007366483  .3001841604709625 .014732965268194675
                        3 2013   .02380952425301075   .02756892  .3095238208770752 .011278195306658745
                        3 2014  .027272727340459824  .025757575 .35151514410972595   .0181818176060915
                        3 2015  .012912482023239136 -.012912482 .37015780806541443  .02152080275118351
                        3 2016   .13763703405857086 -.015834348 .23142509162425995 .018270401284098625
                        3 2017 .0013908206019550562 -.016689846 .21974965929985046 .019471488893032074
                        4 2001   .05920117720961571   .24214654 .39171770215034485 .024601813405752182
                        4 2002   .07022888213396072   .08706458  .4240590035915375 .023690223693847656
                        4 2003   .05551784858107567   .04031492  .4096986651420593  .02487443946301937
                        4 2004    .1197541207075119   .10734842 .47362393140792847  .02640402317047119
                        4 2005   .15231451392173767  .015018288  .4503306448459625  .02540997415781021
                        4 2006   .18435005843639374 .0003547847  .3896521031856537 .021149108186364174
                        4 2007   .24266663193702698   .09281386 .39094796776771545 .016039978712797165
                        4 2008    .2584107220172882   .04178892   .397095263004303  .02280370332300663
                        end
                        format %ty Year
                        label values Unique_Identifier UI1
                        label def UI1 1 "21_102524", modify
                        label def UI1 2 "21_102576", modify
                        label def UI1 3 "21_102816", modify
                        label def UI1 4 "21_103261", modify
                        [/CODE]

                        Thanks so much!

                        Comment


                        • #13
                          xtivreg2 is essentially ivreghdfe (SSC) and similar to reghdfe (SSC), it automatically drops singletons following findings from Sergio Correia's research. Singletons, as the name implies, are single observations and while they do not affect coefficient estimates in fixed effects models, they have an effect on the (cluster-robust) standard errors. Your data example is not helpful as it does not include some variables in your estimation command, but it is not difficult to illustrate how to choose a sample that excludes singletons. Below, I use reghdfe.

                          Code:
                          webuse grunfeld, clear
                          drop if company>8 & time<19
                          reghdfe invest mvalue L.kstock, a(company) 
                          xtset company year
                          xtreg invest mvalue L.kstock, fe
                          bys company: egen count= total(e(sample))
                          bys company: egen sample= max(count>1)
                          xtreg  invest mvalue L.kstock if sample, fe
                          Res.:

                          Code:
                          . reghdfe invest mvalue L.kstock, a(company) 
                          (dropped 2 singleton observations)
                          (MWFE estimator converged in 1 iterations)
                          
                          HDFE Linear regression                            Number of obs   =        152
                          Absorbing 1 HDFE group                            F(   2,    142) =     178.03
                                                                            Prob > F        =     0.0000
                                                                            R-squared       =     0.9290
                                                                            Adj R-squared   =     0.9245
                                                                            Within R-sq.    =     0.7149
                                                                            Root MSE        =    64.9083
                          
                          ------------------------------------------------------------------------------
                                invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                                mvalue |   .1256025   .0152628     8.23   0.000     .0954307    .1557742
                                       |
                                kstock |
                                   L1. |   .3449293   .0251861    13.70   0.000     .2951412    .3947174
                                       |
                                 _cons |   -82.9314    20.0202    -4.14   0.000    -122.5075   -43.35526
                          ------------------------------------------------------------------------------
                          
                          Absorbed degrees of freedom:
                          -----------------------------------------------------+
                           Absorbed FE | Categories  - Redundant  = Num. Coefs |
                          -------------+---------------------------------------|
                               company |         8           0           8     |
                          -----------------------------------------------------+
                          
                          . 
                          
                          . 
                          . xtreg invest mvalue L.kstock, fe
                          
                          Fixed-effects (within) regression               Number of obs     =        154
                          Group variable: company                         Number of groups  =         10
                          
                          R-sq:                                           Obs per group:
                               within  = 0.7149                                         min =          1
                               between = 0.8032                                         avg =       15.4
                               overall = 0.7824                                         max =         19
                          
                                                                          F(2,142)          =     178.03
                          corr(u_i, Xb)  = -0.2566                        Prob > F          =     0.0000
                          
                          ------------------------------------------------------------------------------
                                invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                                mvalue |   .1256025   .0152628     8.23   0.000     .0954307    .1557742
                                       |
                                kstock |
                                   L1. |   .3449293   .0251861    13.70   0.000     .2951412    .3947174
                                       |
                                 _cons |  -82.95353   19.82072    -4.19   0.000    -122.1353   -43.77172
                          -------------+----------------------------------------------------------------
                               sigma_u |  95.327391
                               sigma_e |  64.908289
                                   rho |  .68323609   (fraction of variance due to u_i)
                          ------------------------------------------------------------------------------
                          F test that all u_i=0: F(9, 142) = 32.00                     Prob > F = 0.0000
                          
                          
                          . 
                          . xtreg  invest mvalue L.kstock if sample, fe
                          
                          Fixed-effects (within) regression               Number of obs     =        152
                          Group variable: company                         Number of groups  =          8
                          
                          R-sq:                                           Obs per group:
                               within  = 0.7149                                         min =         19
                               between = 0.8077                                         avg =       19.0
                               overall = 0.7823                                         max =         19
                          
                                                                          F(2,142)          =     178.03
                          corr(u_i, Xb)  = -0.2542                        Prob > F          =     0.0000
                          
                          ------------------------------------------------------------------------------
                                invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                                mvalue |   .1256025   .0152628     8.23   0.000     .0954307    .1557742
                                       |
                                kstock |
                                   L1. |   .3449293   .0251861    13.70   0.000     .2951412    .3947174
                                       |
                                 _cons |   -82.9314    20.0202    -4.14   0.000    -122.5075   -43.35526
                          -------------+----------------------------------------------------------------
                               sigma_u |  99.627653
                               sigma_e |  64.908289
                                   rho |  .70201861   (fraction of variance due to u_i)
                          ------------------------------------------------------------------------------
                          F test that all u_i=0: F(7, 142) = 40.90                     Prob > F = 0.0000
                          
                          .

                          Comment


                          • #14
                            Many many thanks Andrew for the much needed clarity. Indeed, it was very helpful.

                            regards,
                            Mohina

                            Comment

                            Working...
                            X