fixed effect model in xtivreg2 and singletons

Pietro Bomprezzi

Join Date: Nov 2019

Posts: 6
#1

fixed effect model in xtivreg2 and singletons

22 Nov 2019, 02:46

I am running a fixed-effect model using xtivreg2 in stata, where my panel variable is firms and my time variable is years. It is survey data that tracks firms over year through different iterations of the survey. Let me begin by saying that in my original dataset, the panel variable idpanel is always repeated at least once. That is, there are no firms which are observed for only one year. There are also no missing idpanel. The command is:

xtivreg2 depvar var1 var2 (var3= instrument), fe ro

What happens is that xtivreg2 gives me a warning:

Warning - singleton groups detected. 4552 observation(s) not used.

Such that the number of observations for the first-stage regression is 17489, and the same for the second stage. Therefore this command is dropping observations during the first-stage regression part. So I proceeded to replicate the first stage regression using:

xtreg var3 var1 var2 instrument, fe ro

And the number of observations, 26,370, is significantly higher than its first-stage regression counterpart in xtivreg2. The coefficients are also different.

After some exploring, I've noticed that the reason is because in the first-stage regression of xtivreg2 the command drops observations whenever there is a missing in any of the covariates or the dependent variables. Afterwards it recalculates the panel, and of course it finds now some firms for which the observation has been dropped in a given year because of the aforementioned missings, but not in other years, hence it has become a singleton.

My question therefore is: why. From a theoretical point of view, a replication of the first-stage regression using xtreg should present the same results (coefficients) as the first-stage calculated within xtivreg2, no?. In my case it doesn't. Thank you all in advance.

See also my post on Stack Overflow:
https://stackoverflow.com/questions/...34317_58990690

Last edited by Pietro Bomprezzi; 22 Nov 2019, 03:18. Reason: posting Stack Overflow cross-post URL
Tags: fixed effects, panel data, xtivreg2
Nick Cox

Join Date: Mar 2014

Posts: 35730
#2

22 Nov 2019, 03:15

Please give a URL for your cross-posting on Stack Overflow (see our policy on cross-posting in the FAQ Advice).
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#3

22 Nov 2019, 03:28

Pietro:
welcome to this forum.
Stata applies listwise deletion to observations with missing values in any variable.
Hence, it sounds strange that you do not experience a reduction in the number of observations when you go -xtreg-.
To have a more precise idea of what's going on with your estimates, as per FAQ please post (within CODE delimiters) what you typed and what Stata gave you back. Thanks.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35730
#4

22 Nov 2019, 03:43

Please give a URL for your cross-posting on Stack Overflow (see our policy on cross-posting in the FAQ Advice).

Sorry; duplicated post.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10223
#5

22 Nov 2019, 03:52

It appears that Pietro has edited the original post and provided the link to his cross-post at Stack Overflow. As Carlo states, the issue here has to do with listwise deletion of missing values, so

Code:

xtivreg2 depvar var1 var2 (var3= instrument), fe ro

implies the following first-stage fixed effects regression

Code:

xtreg var3 instrument var1 var2 if !missing(depvar), fe ro
Comment
Pietro Bomprezzi

Join Date: Nov 2019

Posts: 6
#6

22 Nov 2019, 04:14

Hello Carlo. I agree with you in principle. In fact, it is exactly when I do a manual listwise deletion (drop if missing(myvars)) of the all the variables included in my regressions, that I get an xtreg that is identical to the first-stage of my ivreg2.

Oddly enough, now that I was replicating in my dataset what I was explaining to you so I could show the code and output, it seems to work as it should and the problem no longer arises. I cannot explain why. I think I have to confront myself again with my coauthor and see if it is a communication issue.

As a new member of this forum, would it be more correct for me to delete this thread and post again if I can pinpoint the issue or should I leave as is? Thanks everyone.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#7

22 Nov 2019, 04:20

Pietro:
thanks for your feedback.
The rule is to do our best to close the thread we started with all the details we consider useful for others who may come across the same problem in the future.

Kind regards,
Carlo
(Stata 19.0)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35730
#8

22 Nov 2019, 05:27

Threads can’t be deleted unilaterally. This is explained in the FAQ Advice.
Comment
Pietro Bomprezzi

Join Date: Nov 2019

Posts: 6
#9

22 Nov 2019, 07:13

Dear Carlo,

I have resolved the issue. It was due to the fact that we were using two different sets of industry*year dummies in the two regressions. In xtivreg2, we used industry-year dummies generated beforehand. In xtreg instead, we were using the:

Code:

i.industry#i.year

option. These two sets of dummies were different (I don't know why but that is another question) and the result was that when the listwise deletion was carried, different observations were deleted.

Thanks again for the suggestion and enlightment with regards to this concept of listwise deletion.

Regards
Pietro

Last edited by Pietro Bomprezzi; 22 Nov 2019, 07:19.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#10

22 Nov 2019, 07:41

Pietro:
thanks for sharing further details.
In all likelihood, the issue hinges on the (different) years that in the two codes were used as reference category (ie, the omitted one to avoid dummy trap).

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

mohina saxena

Join Date: Mar 2016
Posts: 61

#11

20 Jan 2022, 23:09

Hello all,

I have to use lagged explanatory variables in xtivreg2 that leads to singletons which I am not able to detect. I must mention that when I use the present value of the explanatory variables, there are no singletons detected (observations and number of groups in FE estimation = observations and number of groups in xtivreg2 estimation). But, this changes while I use the lagged form of explanatory variables (xtivreg2 drops singletons while in FE estimation the number of observations and groups is higher). Although, I could do the manual checking and detect these firms which might not be having a lagged value of a certain variable (i.e. singleton) and delete them manually. But, given the plethora of variables, I am curious if there can be a way in Stata to detect this. Even if I dont detect, and wanted to use the same sample that enters into xtivreg2 to go for FE estimation, I am not able to fetch the same sample. Any help would be greatly appreciated.

Here are the codes and results [attaching with only a subset of variables used]:

Fixed Effects Estimation:

Code:

. eststo: xtreg TobinsQ c.L1.BLEV##c.L1.BLEV L1.INV, fe vce (robust)

Fixed-effects (within) regression               Number of obs     =      4,682
Group variable: Unique_Ide~r                    Number of groups  =        503

R-sq:                                           Obs per group:
     within  = 0.0222                                         min =          1
     between = 0.0973                                         avg =        9.3
     overall = 0.0746                                         max =         16

                                                F(3,502)          =       6.40
corr(u_i, Xb)  = 0.1395                         Prob > F          =     0.0003

                       (Std. Err. adjusted for 503 clusters in Unique_Identifier)
---------------------------------------------------------------------------------
                |               Robust
        TobinsQ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
           BLEV |
            L1. |  -2.627809   .7501255    -3.50   0.001    -4.101581   -1.154037
                |
cL.BLEV#cL.BLEV |   2.578923   .8093329     3.19   0.002     .9888254     4.16902
                |
            INV |
            L1. |   .0262735   .0101926     2.58   0.010      .006248     .046299
                |
          _cons |   1.600565   .1375829    11.63   0.000     1.330256    1.870874
----------------+----------------------------------------------------------------
        sigma_u |  .91934071
        sigma_e |  .75080102
            rho |  .59989611   (fraction of variance due to u_i)
---------------------------------------------------------------------------------
(est14 stored)

IV Estimation:

Code:

. eststo: xtivreg2 TobinsQ (L1.BLEV = L1.CVA_G L1.CVA_Square L1.NDTS L1.NDTS_Square) L1.INV , 
> fe robust small endog(L1.BLEV)
Warning - singleton groups detected.  5 observation(s) not used.

FIXED EFFECTS ESTIMATION
------------------------
Number of groups =       498                    Obs per group: min =         2
                                                               avg =       9.4
                                                               max =        16

IV (2SLS) estimation
--------------------

Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity

                                                      Number of obs =     4677
                                                      F(  2,  4177) =    11.43
                                                      Prob > F      =   0.0000
Total (centered) SS     =  2407.372783                Centered R2   =  -0.0949
Total (uncentered) SS   =  2407.372783                Uncentered R2 =  -0.0949
Residual SS             =  2635.773598                Root MSE      =    .7944

------------------------------------------------------------------------------
             |               Robust
     TobinsQ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        BLEV |
         L1. |  -3.346971   .7349992    -4.55   0.000    -4.787961   -1.905982
             |
         INV |
         L1. |    .037986   .0190524     1.99   0.046     .0006331    .0753388
------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic):            115.184
                                                   Chi-sq(4) P-val =    0.0000
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic):               47.622
                         (Kleibergen-Paap rk Wald F statistic):         33.857
Stock-Yogo weak ID test critical values:  5% maximal IV relative bias    16.85
                                         10% maximal IV relative bias    10.27
                                         20% maximal IV relative bias     6.71
                                         30% maximal IV relative bias     5.34
                                         10% maximal IV size             24.58
                                         15% maximal IV size             13.96
                                         20% maximal IV size             10.26
                                         25% maximal IV size              8.31
Source: Stock-Yogo (2005).  Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments):        22.444
                                                   Chi-sq(3) P-val =    0.0001
-endog- option:
Endogeneity test of endogenous regressors:                               5.306
                                                   Chi-sq(1) P-val =    0.0213
Regressors tested:    L.BLEV
------------------------------------------------------------------------------
Instrumented:         L.BLEV
Included instruments: L.INV
Excluded instruments: L.CVA_G L.CVA_Square L.NDTS L.NDTS_Square
------------------------------------------------------------------------------
(est15 stored)

Generating sample used for IV

Code:

gen IV_Sample = e(sample)

Again using that same sample for FE estimation but number of observations still include singletons and is same as previous FE

Code:

eststo: xtreg TobinsQ c.L1.BLEV##c.L1.BLEV L1.INV if IV_Sample, fe vce (robust)

Fixed-effects (within) regression               Number of obs     =      4,682
Group variable: Unique_Ide~r                    Number of groups  =        503

R-sq:                                           Obs per group:
     within  = 0.0222                                         min =          1
     between = 0.0973                                         avg =        9.3
     overall = 0.0746                                         max =         16

                                                F(3,502)          =       6.40
corr(u_i, Xb)  = 0.1395                         Prob > F          =     0.0003

                       (Std. Err. adjusted for 503 clusters in Unique_Identifier)
---------------------------------------------------------------------------------
                |               Robust
        TobinsQ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
           BLEV |
            L1. |  -2.627809   .7501255    -3.50   0.001    -4.101581   -1.154037
                |
cL.BLEV#cL.BLEV |   2.578923   .8093329     3.19   0.002     .9888254     4.16902
                |
            INV |
            L1. |   .0262735   .0101926     2.58   0.010      .006248     .046299
                |
          _cons |   1.600565   .1375829    11.63   0.000     1.330256    1.870874
----------------+----------------------------------------------------------------
        sigma_u |  .91934071
        sigma_e |  .75080102
            rho |  .59989611   (fraction of variance due to u_i)
---------------------------------------------------------------------------------
(est16 stored)

Please help in knowing how could this be resolved? Waiting for reply.

many thanks and regards,
Mohina

Comment

mohina saxena

Join Date: Mar 2016
Posts: 61

#12

20 Jan 2022, 23:27

Here is the dataset:

I can see that the variable INV doesn't have the first period value for each Unique firm since it measures the change in cap.expenditure. The problem is identifying exactly those firms that are singleton (given lagged consideration) so that I can drop them in my FE estimation.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long Unique_Identifier int Year double BLEV float INV double(CVA_G NDTS)
1 2001  .055007051676511765           .  .5007052421569824 .014104371890425682
1 2002   .10846561193466187   .08465608  .4761904776096344  .02777777798473835
1 2003   .08745874464511871  -.01650165  .6303630471229553   .0445544570684433
1 2010   .09751037508249283  -.02282158  .5871369242668152 .033195022493600845
1 2011   .24752475321292877   .13861386  .8613861203193665   .0445544570684433
1 2012     .429175466299057  .019027485  .7906976938247681  .04862579330801964
1 2013    .4252873659133911  .011494253  .7394636273384094  .04789271950721741
1 2015  .014975041151046753   .03327787  .8186355829238892  .08985024690628052
1 2016  .012066365219652653  -.05429864  .7043740749359131  .07390648871660233
1 2017   .16565480828285217   .01908066 .45186468958854675  .03035559318959713
2 2001   .41645878553390503           .  .6230601668357849 .026731086894869804
2 2002    .3669382333755493 -.014112033  .6036447882652283 .027792584151029587
2 2003   .30021658539772034 .0034330215  .5661951303482056  .02333993837237358
2 2004    .2761489748954773   .06977824  .5933471322059631 .022444024682044983
2 2005    .3158489763736725    .1605357  .6698573231697083  .02379419468343258
2 2006    .2615233063697815  .065879814  .7014051675796509  .02791675552725792
2 2007   .24146167933940887  .031055775  .6141208410263062 .025134121999144554
2 2008    .2755413055419922   .03044634  .5260298848152161  .02524818293750286
2 2009   .30744194984436035    .1022892  .5899816751480103 .026857752352952957
2 2010    .2788902819156647   .04516661  .6195094585418701 .024625148624181747
2 2011   .27633315324783325  .031976607  .6047827005386353 .025127828121185303
2 2012   .25731614232063293   .09208064  .6753150820732117 .024048035964369774
2 2013   .22729633748531342   .05986407   .666262686252594 .026902178302407265
2 2014    .1844494342803955   .07157767  .6651003956794739 .027743402868509293
2 2015   .24228519201278687   .11656328  .6964685320854187  .04307246208190918
2 2016   .22154110670089722   .05725962  .7191405892372131  .03731784597039223
2 2017    .1777629256248474 -.007602268  .7002505660057068 .038490671664476395
3 2002   .24770642817020416           .   .608562707901001  .03363914415240288
3 2003   .11974109709262848  -.03236246  .6504854559898377  .03559870645403862
3 2004   .01923076994717121  -.04807692  .5608974099159241  .07051282376050949
3 2005  .012944984249770641  -.04854369    .53721684217453  .07119741290807724
3 2006   .00872093066573143 -.069767445  .5261628031730652  .06395348906517029
3 2008   .02278481051325798 -.005063291 .42784810066223145  .06329113990068436
3 2009 .0062500000931322575     -.00625 .34166666865348816   .0520833320915699
3 2010   .01149425283074379  -.01313629  .4761904776096344 .031198685988783836
3 2011   .04881450533866882   -.0027894  .3235704302787781  .00976290088146925
3 2012  .009208102710545063 -.007366483  .3001841604709625 .014732965268194675
3 2013   .02380952425301075   .02756892  .3095238208770752 .011278195306658745
3 2014  .027272727340459824  .025757575 .35151514410972595   .0181818176060915
3 2015  .012912482023239136 -.012912482 .37015780806541443  .02152080275118351
3 2016   .13763703405857086 -.015834348 .23142509162425995 .018270401284098625
3 2017 .0013908206019550562 -.016689846 .21974965929985046 .019471488893032074
4 2001   .05920117720961571   .24214654 .39171770215034485 .024601813405752182
4 2002   .07022888213396072   .08706458  .4240590035915375 .023690223693847656
4 2003   .05551784858107567   .04031492  .4096986651420593  .02487443946301937
4 2004    .1197541207075119   .10734842 .47362393140792847  .02640402317047119
4 2005   .15231451392173767  .015018288  .4503306448459625  .02540997415781021
4 2006   .18435005843639374 .0003547847  .3896521031856537 .021149108186364174
4 2007   .24266663193702698   .09281386 .39094796776771545 .016039978712797165
4 2008    .2584107220172882   .04178892   .397095263004303  .02280370332300663
end
format %ty Year
label values Unique_Identifier UI1
label def UI1 1 "21_102524", modify
label def UI1 2 "21_102576", modify
label def UI1 3 "21_102816", modify
label def UI1 4 "21_103261", modify

[/CODE]

Thanks so much!

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10223

#13

21 Jan 2022, 12:36

xtivreg2 is essentially ivreghdfe (SSC) and similar to reghdfe (SSC), it automatically drops singletons following findings from Sergio Correia's research. Singletons, as the name implies, are single observations and while they do not affect coefficient estimates in fixed effects models, they have an effect on the (cluster-robust) standard errors. Your data example is not helpful as it does not include some variables in your estimation command, but it is not difficult to illustrate how to choose a sample that excludes singletons. Below, I use reghdfe.

Code:

webuse grunfeld, clear
drop if company>8 & time<19
reghdfe invest mvalue L.kstock, a(company) 
xtset company year
xtreg invest mvalue L.kstock, fe
bys company: egen count= total(e(sample))
bys company: egen sample= max(count>1)
xtreg  invest mvalue L.kstock if sample, fe

Res.:

Code:

. reghdfe invest mvalue L.kstock, a(company) 
(dropped 2 singleton observations)
(MWFE estimator converged in 1 iterations)

HDFE Linear regression                            Number of obs   =        152
Absorbing 1 HDFE group                            F(   2,    142) =     178.03
                                                  Prob > F        =     0.0000
                                                  R-squared       =     0.9290
                                                  Adj R-squared   =     0.9245
                                                  Within R-sq.    =     0.7149
                                                  Root MSE        =    64.9083

------------------------------------------------------------------------------
      invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      mvalue |   .1256025   .0152628     8.23   0.000     .0954307    .1557742
             |
      kstock |
         L1. |   .3449293   .0251861    13.70   0.000     .2951412    .3947174
             |
       _cons |   -82.9314    20.0202    -4.14   0.000    -122.5075   -43.35526
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
     company |         8           0           8     |
-----------------------------------------------------+

. 

. 
. xtreg invest mvalue L.kstock, fe

Fixed-effects (within) regression               Number of obs     =        154
Group variable: company                         Number of groups  =         10

R-sq:                                           Obs per group:
     within  = 0.7149                                         min =          1
     between = 0.8032                                         avg =       15.4
     overall = 0.7824                                         max =         19

                                                F(2,142)          =     178.03
corr(u_i, Xb)  = -0.2566                        Prob > F          =     0.0000

------------------------------------------------------------------------------
      invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      mvalue |   .1256025   .0152628     8.23   0.000     .0954307    .1557742
             |
      kstock |
         L1. |   .3449293   .0251861    13.70   0.000     .2951412    .3947174
             |
       _cons |  -82.95353   19.82072    -4.19   0.000    -122.1353   -43.77172
-------------+----------------------------------------------------------------
     sigma_u |  95.327391
     sigma_e |  64.908289
         rho |  .68323609   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(9, 142) = 32.00                     Prob > F = 0.0000


. 
. xtreg  invest mvalue L.kstock if sample, fe

Fixed-effects (within) regression               Number of obs     =        152
Group variable: company                         Number of groups  =          8

R-sq:                                           Obs per group:
     within  = 0.7149                                         min =         19
     between = 0.8077                                         avg =       19.0
     overall = 0.7823                                         max =         19

                                                F(2,142)          =     178.03
corr(u_i, Xb)  = -0.2542                        Prob > F          =     0.0000

------------------------------------------------------------------------------
      invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      mvalue |   .1256025   .0152628     8.23   0.000     .0954307    .1557742
             |
      kstock |
         L1. |   .3449293   .0251861    13.70   0.000     .2951412    .3947174
             |
       _cons |   -82.9314    20.0202    -4.14   0.000    -122.5075   -43.35526
-------------+----------------------------------------------------------------
     sigma_u |  99.627653
     sigma_e |  64.908289
         rho |  .70201861   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(7, 142) = 40.90                     Prob > F = 0.0000

.

Comment

mohina saxena

Join Date: Mar 2016

Posts: 61
#14

25 Jan 2022, 02:16

Many many thanks Andrew for the much needed clarity. Indeed, it was very helpful.

regards,
Mohina
Comment

Announcement