Potential bug in IVREGRESS: instruments dropped due to collinearity when they should not be dropped

Joro Kolev

Join Date: Aug 2018
Posts: 3050

#16

26 Aug 2020, 01:23

If one does the first stage jointly, some peculiarities of the situation become apparent. The residuals in the two equations are equivalent:

Code:

. webuse abdata, clear

. mvreg L.n L2.n = DL.n DL2.n

Equation             Obs   Parms        RMSE    "R-sq"          F        P
--------------------------------------------------------------------------
L_n                  611       3    1.329206    0.0178   5.497023   0.0043
L2_n                 611       3    1.329206    0.0075   2.293505   0.1018

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
L_n          |
           n |
         LD. |   .7793204   .4007435     1.94   0.052    -.0076892     1.56633
        L2D. |   .9261032   .4324795     2.14   0.033     .0767682    1.775438
             |
       _cons |   1.101659   .0575023    19.16   0.000     .9887318    1.214586
-------------+----------------------------------------------------------------
L2_n         |
           n |
         LD. |  -.2206796   .4007435    -0.55   0.582    -1.007689    .5663299
        L2D. |   .9261032   .4324795     2.14   0.033     .0767682    1.775438
             |
       _cons |   1.101659   .0575023    19.16   0.000     .9887318    1.214586
------------------------------------------------------------------------------

. matlist e(Sigma)

             |         L.        L2.
             |         n          n 
-------------+----------------------
         L.n |  1.766787            
        L2.n |  1.766787   1.766787 

. predict double Lnres, eq(L_n) resid
(420 missing values generated)

. predict double L2nres, eq(L2_n) resid
(420 missing values generated)

. summ Lnres L2nres

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       Lnres |        611    2.77e-11    1.327025   -3.18583   3.577057
      L2nres |        611    1.66e-11    1.327025   -3.18583   3.577057

Now if I want to replicate the IV results by the Control Function method, I get the same results, but one set of residuals gets dropped due to collinearity:

Code:

. _regress n L.n L2.n  (DL.n DL2.n)

Instrumental variables (2SLS) regression

      Source |       SS           df       MS      Number of obs   =       611
-------------+----------------------------------   F(2, 608)       =    599.49
       Model |   1086.3574         2  543.178702   Prob > F        =    0.0000
    Residual |  12.8860345       608  .021194136   R-squared       =    0.9883
-------------+----------------------------------   Adj R-squared   =    0.9882
       Total |  1099.24344       610  1.80203842   Root MSE        =    .14558

------------------------------------------------------------------------------
           n |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           n |
         L1. |   1.219895   .0426201    28.62   0.000     1.136194    1.303595
         L2. |  -.1969064   .0659825    -2.98   0.003    -.3264876   -.0673251
             |
       _cons |  -.0821593   .0563611    -1.46   0.145    -.1928454    .0285268
------------------------------------------------------------------------------

. reg n L.n L2.n Lnres L2nres
note: L2nres omitted because of collinearity

      Source |       SS           df       MS      Number of obs   =       611
-------------+----------------------------------   F(3, 607)       =  18336.33
       Model |  1087.24616         3  362.415386   Prob > F        =    0.0000
    Residual |  11.9972815       607  .019764879   R-squared       =    0.9891
-------------+----------------------------------   Adj R-squared   =    0.9890
       Total |  1099.24344       610  1.80203842   Root MSE        =    .14059

------------------------------------------------------------------------------
           n |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           n |
         L1. |   1.219895    .041158    29.64   0.000     1.139066    1.300724
         L2. |  -.1969064   .0637188    -3.09   0.002    -.3220425   -.0717702
             |
       Lnres |  -.0287638   .0495784    -0.58   0.562    -.1261299    .0686022
      L2nres |          0  (omitted)
       _cons |  -.0821593   .0544276    -1.51   0.132    -.1890485    .0247298
------------------------------------------------------------------------------

Comment

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#17

26 Aug 2020, 01:32

What to do in this situation, i.e., estimates can be calculated, but there is something funny going on behind scenes, is a matter of personal taste.

My personal taste is for people programming routines not to patronise me, and whatever is calculable to be calculated. Here I appreciate the behaviour of _regress, and I do not appreciate that whoever programmed -ivregress- is trying to do the thinking instead of me.
1 like
Comment
JanDitzen

Join Date: Jan 2015

Posts: 350
#18

26 Aug 2020, 05:02

Interesting topic and finding!

I am not familiar with all the details of ivregress, ivreg2 and xtdpdgmm but I encountered a similar problem when working on xtdcce2. Here the problem is that collinearities can occur between the variables added to approximate cross-sectional dependence and the explanatory variables on a unit specific and global level. I implemented several checks, which especially for large datasets, can take a while. If the checks fail, the user gets notified. I think it is important to be transparent about problems because otherwise (estimation-)commands can produce wrong or questionable results. I agree with Jeff that the user is responsible in the end for the correct use of the command, but as developer you should be open and clear about what the command does.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2601
#19

28 Aug 2020, 04:47

I received a response from Stata Tech Support:
The recommendation is to use the perfect option to avoid the problem discussed in my initial post. The perfect option is not the default to protect users who do not account for all the potential problems when using ivregress. They might reconsider this, if we deem these checks unnecessary for all cases and can provide a textbook or journal article reference on that matter.

I think we agreed that some of these collinearity checks (at least those who involve a single endogenous variable at a time) are certainly useful, and the question revolves mainly around the situation where the model could be re-parameterized in a way such that it has fewer endogenous variables. I am not aware of any reference that says we should not carry out the collinearity check between both endogenous variables jointly and the instruments. But I am also not aware of any reference that says we should carry out this collinearity check.

So, I guess from the perspective of StataCorp there is not enough evidence to overturn the status quo. It remains the responsibility of the user to check whether ivregress indeed estimates the desired model.

https://www.kripfganz.de/stata/
Comment
haiyan lin

Join Date: Aug 2020

Posts: 34
#20

04 Sep 2020, 07:31

Great thanks for this discussion. Learnt a lot.
Comment
Mudassira Sarfraz

Join Date: Aug 2019

Posts: 21
#21

04 Jan 2021, 11:32

Dear Statalister

I have got a similar problem of dropping the instrument due to collinearity. I am using 'xtivreg2' with fixed effects. My dependent variable is an index, however, endogenous and IV are binary.
Moreover, my instrument is the treatment status of an individual participating in a program.

Code:

xtivreg2 index $x (endog=instrument), fe i(panid) cluster (IDCODE) equation not identified; must have at least as many instruments not in the regression as there are instrumented variables

but when I manually do the first stage regression using the following code, it does not give any error of dropping of 'instrument' due to collinearity.

Code:

xtreg endog instrument $x, fe i(panid) vce (cluster IDCODE)

Looking forward to your guidance on how to solve this problem.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2601
#22

05 Jan 2021, 05:42

You could use the nocollin option of xtivreg2 to suppress the collinearity checks if you are sure that these checks are not needed in your situation. You could also replicate the fixed-effects IV regression with my xtdpdgmm command:

Code:

xtdpdgmm index $x endog, iv($x instrument) model(mdev) norescale vce(cluster IDCODE)

https://www.kripfganz.de/stata/
Comment
Huaxin Wanglu

Join Date: Mar 2021

Posts: 33
#23

14 Nov 2022, 17:34

I remove my report because I suddenly find there is an error in my code.

Last edited by Huaxin Wanglu; 14 Nov 2022, 18:23.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment