Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    If one does the first stage jointly, some peculiarities of the situation become apparent. The residuals in the two equations are equivalent:

    Code:
    . webuse abdata, clear
    
    . mvreg L.n L2.n = DL.n DL2.n
    
    Equation             Obs   Parms        RMSE    "R-sq"          F        P
    --------------------------------------------------------------------------
    L_n                  611       3    1.329206    0.0178   5.497023   0.0043
    L2_n                 611       3    1.329206    0.0075   2.293505   0.1018
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    L_n          |
               n |
             LD. |   .7793204   .4007435     1.94   0.052    -.0076892     1.56633
            L2D. |   .9261032   .4324795     2.14   0.033     .0767682    1.775438
                 |
           _cons |   1.101659   .0575023    19.16   0.000     .9887318    1.214586
    -------------+----------------------------------------------------------------
    L2_n         |
               n |
             LD. |  -.2206796   .4007435    -0.55   0.582    -1.007689    .5663299
            L2D. |   .9261032   .4324795     2.14   0.033     .0767682    1.775438
                 |
           _cons |   1.101659   .0575023    19.16   0.000     .9887318    1.214586
    ------------------------------------------------------------------------------
    
    . matlist e(Sigma)
    
                 |         L.        L2.
                 |         n          n 
    -------------+----------------------
             L.n |  1.766787            
            L2.n |  1.766787   1.766787 
    
    . predict double Lnres, eq(L_n) resid
    (420 missing values generated)
    
    . predict double L2nres, eq(L2_n) resid
    (420 missing values generated)
    
    . summ Lnres L2nres
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
           Lnres |        611    2.77e-11    1.327025   -3.18583   3.577057
          L2nres |        611    1.66e-11    1.327025   -3.18583   3.577057
    Now if I want to replicate the IV results by the Control Function method, I get the same results, but one set of residuals gets dropped due to collinearity:

    Code:
    . _regress n L.n L2.n  (DL.n DL2.n)
    
    Instrumental variables (2SLS) regression
    
          Source |       SS           df       MS      Number of obs   =       611
    -------------+----------------------------------   F(2, 608)       =    599.49
           Model |   1086.3574         2  543.178702   Prob > F        =    0.0000
        Residual |  12.8860345       608  .021194136   R-squared       =    0.9883
    -------------+----------------------------------   Adj R-squared   =    0.9882
           Total |  1099.24344       610  1.80203842   Root MSE        =    .14558
    
    ------------------------------------------------------------------------------
               n |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               n |
             L1. |   1.219895   .0426201    28.62   0.000     1.136194    1.303595
             L2. |  -.1969064   .0659825    -2.98   0.003    -.3264876   -.0673251
                 |
           _cons |  -.0821593   .0563611    -1.46   0.145    -.1928454    .0285268
    ------------------------------------------------------------------------------
    
    . reg n L.n L2.n Lnres L2nres
    note: L2nres omitted because of collinearity
    
          Source |       SS           df       MS      Number of obs   =       611
    -------------+----------------------------------   F(3, 607)       =  18336.33
           Model |  1087.24616         3  362.415386   Prob > F        =    0.0000
        Residual |  11.9972815       607  .019764879   R-squared       =    0.9891
    -------------+----------------------------------   Adj R-squared   =    0.9890
           Total |  1099.24344       610  1.80203842   Root MSE        =    .14059
    
    ------------------------------------------------------------------------------
               n |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               n |
             L1. |   1.219895    .041158    29.64   0.000     1.139066    1.300724
             L2. |  -.1969064   .0637188    -3.09   0.002    -.3220425   -.0717702
                 |
           Lnres |  -.0287638   .0495784    -0.58   0.562    -.1261299    .0686022
          L2nres |          0  (omitted)
           _cons |  -.0821593   .0544276    -1.51   0.132    -.1890485    .0247298
    ------------------------------------------------------------------------------

    Comment


    • #17
      What to do in this situation, i.e., estimates can be calculated, but there is something funny going on behind scenes, is a matter of personal taste.

      My personal taste is for people programming routines not to patronise me, and whatever is calculable to be calculated. Here I appreciate the behaviour of _regress, and I do not appreciate that whoever programmed -ivregress- is trying to do the thinking instead of me.

      Comment


      • #18
        Interesting topic and finding!

        I am not familiar with all the details of ivregress, ivreg2 and xtdpdgmm but I encountered a similar problem when working on xtdcce2. Here the problem is that collinearities can occur between the variables added to approximate cross-sectional dependence and the explanatory variables on a unit specific and global level. I implemented several checks, which especially for large datasets, can take a while. If the checks fail, the user gets notified. I think it is important to be transparent about problems because otherwise (estimation-)commands can produce wrong or questionable results. I agree with Jeff that the user is responsible in the end for the correct use of the command, but as developer you should be open and clear about what the command does.

        Comment


        • #19
          I received a response from Stata Tech Support:
          The recommendation is to use the perfect option to avoid the problem discussed in my initial post. The perfect option is not the default to protect users who do not account for all the potential problems when using ivregress. They might reconsider this, if we deem these checks unnecessary for all cases and can provide a textbook or journal article reference on that matter.

          I think we agreed that some of these collinearity checks (at least those who involve a single endogenous variable at a time) are certainly useful, and the question revolves mainly around the situation where the model could be re-parameterized in a way such that it has fewer endogenous variables. I am not aware of any reference that says we should not carry out the collinearity check between both endogenous variables jointly and the instruments. But I am also not aware of any reference that says we should carry out this collinearity check.

          So, I guess from the perspective of StataCorp there is not enough evidence to overturn the status quo. It remains the responsibility of the user to check whether ivregress indeed estimates the desired model.
          https://www.kripfganz.de/stata/

          Comment


          • #20
            Great thanks for this discussion. Learnt a lot.

            Comment


            • #21
              Dear Statalister

              I have got a similar problem of dropping the instrument due to collinearity. I am using 'xtivreg2' with fixed effects. My dependent variable is an index, however, endogenous and IV are binary.
              Moreover, my instrument is the treatment status of an individual participating in a program.

              Code:
               xtivreg2 index $x (endog=instrument), fe i(panid) cluster (IDCODE)
              
              equation not identified; must have at least as many instruments
              not in the regression as there are instrumented variables
              but when I manually do the first stage regression using the following code, it does not give any error of dropping of 'instrument' due to collinearity.

              Code:
                 xtreg endog instrument $x, fe i(panid) vce (cluster IDCODE)
              Looking forward to your guidance on how to solve this problem.

              Comment


              • #22
                You could use the nocollin option of xtivreg2 to suppress the collinearity checks if you are sure that these checks are not needed in your situation. You could also replicate the fixed-effects IV regression with my xtdpdgmm command:
                Code:
                xtdpdgmm index $x endog, iv($x instrument) model(mdev) norescale vce(cluster IDCODE)
                https://www.kripfganz.de/stata/

                Comment


                • #23
                  I remove my report because I suddenly find there is an error in my code.
                  Last edited by Huaxin Wanglu; 14 Nov 2022, 18:23.

                  Comment

                  Working...
                  X