Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Instrumental Variable and Quadratic Term

    Dear All,
    I hope this post finds you well. I would like to test for endogeneity by using the command ivprobit. My independent variables are Competition and Competition_square. The dependent variable is Innovation and the instrumental variable Politic. My code is as follow:
    Code:
    ivprobit Innovation (Competition Competition_square = Politic Politic), twostep asis
    Although this code has been successfully executed (with a warning: Politic omitted because of collinearity), I am not sure whether this is the right syntax for the quadratic instrumented variable, and could not find it in the Stata help file.
    Could you please help me?
    Many thanks for your support!

  • #2
    I don’t see how that could’ve executed because you’re short an instrument. Is Politic a binary variable?

    In any case, you should use a two-step control function method. Then one IV is enough. The method you’re trying to use isn’t consistent even if you have a second IV.

    Comment


    • #3
      Many thanks for the reply; I highly appreciate it. Politic is a continuous variable. I will use a two-step control function method as you suggested. Thank you very much!

      Comment


      • #4
        Professor Jeff Wooldridge , there might be a misunderstanding coming from the inappropriate name that Stata has assigned to -ivprobit-.

        -ivprobit- is not implementing IV Probit, -ivprobit, twostep- is implementing exactly the control function method that you are proposing.

        Sadrac Cenophat , although you do not need a second instrument for the control function approach you are using, you still for aesthetic symmetry might choose to use Politic and Politic_squared as instruments, when you are instrumenting Competition and Competition_square.

        The command dropped Politic because you repeated the variable twice in the instrument list. Look here, the first time nothing is dropped, the second time when I repeat the instrument twice it is dropped:

        Code:
        .  webuse laborsup
        
        . ivprobit fem_work fem_educ kids (other_inc = male_educ), twostep
        Checking reduced-form model...
        
        Two-step probit with endogenous regressors        Number of obs   =        500
                                                          Wald chi2(3)    =      93.97
                                                          Prob > chi2     =     0.0000
        
        ------------------------------------------------------------------------------
                     |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
           other_inc |   -.058473   .0093364    -6.26   0.000    -.0767719    -.040174
            fem_educ |    .227437   .0281628     8.08   0.000     .1722389     .282635
                kids |  -.1961748   .0496323    -3.95   0.000    -.2934522   -.0988973
               _cons |   .3956061   .4982649     0.79   0.427    -.5809752    1.372187
        ------------------------------------------------------------------------------
        Instrumented:  other_inc
        Instruments:   fem_educ kids male_educ
        ------------------------------------------------------------------------------
        Wald test of exogeneity: chi2(1) = 6.50                   Prob > chi2 = 0.0108
        
        . ivprobit fem_work fem_educ kids (other_inc = male_educ male_educ), twostep
        Checking reduced-form model...
        note: male_educ omitted because of collinearity
        
        Two-step probit with endogenous regressors        Number of obs   =        500
                                                          Wald chi2(3)    =      93.97
                                                          Prob > chi2     =     0.0000
        
        ------------------------------------------------------------------------------
                     |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
           other_inc |   -.058473   .0093364    -6.26   0.000    -.0767719    -.040174
            fem_educ |    .227437   .0281628     8.08   0.000     .1722389     .282635
                kids |  -.1961748   .0496323    -3.95   0.000    -.2934522   -.0988973
               _cons |   .3956061   .4982649     0.79   0.427    -.5809752    1.372187
        ------------------------------------------------------------------------------
        Instrumented:  other_inc
        Instruments:   fem_educ kids male_educ
        ------------------------------------------------------------------------------
        Wald test of exogeneity: chi2(1) = 6.50                   Prob > chi2 = 0.0108

        Comment


        • #5
          Joro: Even so, the two-step method will result in perfect collinearity if there is only one IV in the way Sadrac proposed. There is a way to do the CF method with only one IV, but I think that has to be done "by hand."

          The general point is that, regardless of what w1 and w2 are -- even if one is a function of the other -- we would need to specify two IVs. I'm definitely not recommending it in Sadrac's case. I'll show below what I think should be done. Here is the generic command that leads to lack of identification:

          Code:
          ivprobit y x1 ... xK (w1 w2 = z), twostep
          In the first stage, w1 and w2 are each regressed on x1 ... xK and z, and then the residuals are obtained -- say, v1h and v2h. In the second stage probit is used. But with only one z, (x1 ... xK w1 w2 v1h v2h) are perfectly collinear -- and so the command should not work.

          Here is verification using a data set similar to the one you used:

          Code:
          gen nwifeincsq = nwifeinc^2
          ivprobit inlf educ age exper kidslt6 (nwifeinc nwifeincsq = huseduc), twostep
          Checking reduced-form model...
          equation not identified; must have at least as many instruments 
          not in the regression as there are instrumented variables
          r(481);
          Even if we had an extra IV for nwifeincsq, the above would make no sense. As I said, if nwifeinc follows a linear first stage with an additive error independent of the exogenous variables then nwifeincsq definitely does not. And forget about normality, as the error would include z*v and v^2.

          From my 2015 Journal of Human Resources paper on control function methods, here is what will work. But one needs to adjust the standard errors for the two-step estimation. I suggested the bootstrap.

          Code:
          . reg nwifeinc age educ kidlt6 huseduc
          
                Source |       SS           df       MS      Number of obs   =     5,634
          -------------+----------------------------------   F(4, 5629)      =    205.98
                 Model |  532573.708         4  133143.427   Prob > F        =    0.0000
              Residual |  3638495.37     5,629  646.383971   R-squared       =    0.1277
          -------------+----------------------------------   Adj R-squared   =    0.1271
                 Total |  4171069.08     5,633  740.470279   Root MSE        =    25.424
          
          ------------------------------------------------------------------------------
              nwifeinc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                   age |   .4026188   .0391036    10.30   0.000     .3259608    .4792769
                  educ |    .321613   .1620976     1.98   0.047     .0038392    .6393868
                kidlt6 |   .5574093   .8699482     0.64   0.522    -1.148025    2.262843
               huseduc |    2.83534   .1423423    19.92   0.000     2.556295    3.114386
                 _cons |  -27.21475   2.496775   -10.90   0.000    -32.10939   -22.32011
          ------------------------------------------------------------------------------
          
          . predict vh, resid
          
          . probit inlf educ age kidlt6 nwifeinc c.nwifeinc#c.nwifeinc vh
          
          Iteration 0:   log likelihood =  -3826.743  
          Iteration 1:   log likelihood = -3541.1662  
          Iteration 2:   log likelihood = -3540.9946  
          Iteration 3:   log likelihood = -3540.9946  
          
          Probit regression                               Number of obs     =      5,634
                                                          LR chi2(6)        =     571.50
                                                          Prob > chi2       =     0.0000
          Log likelihood = -3540.9946                     Pseudo R2         =     0.0747
          
          ---------------------------------------------------------------------------------------
                           inlf |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          ----------------------+----------------------------------------------------------------
                           educ |   .1086459   .0091411    11.89   0.000     .0907295    .1265622
                            age |  -.0213767   .0023138    -9.24   0.000    -.0259117   -.0168418
                         kidlt6 |  -.5283854   .0452607   -11.67   0.000    -.6170947   -.4396761
                       nwifeinc |   .0008796   .0031125     0.28   0.777    -.0052207    .0069799
                                |
          c.nwifeinc#c.nwifeinc |   -.000045   .0000179    -2.52   0.012      -.00008     -.00001
                                |
                             vh |   -.005753   .0026575    -2.16   0.030    -.0109616   -.0005444
                          _cons |  -.1474851   .1356416    -1.09   0.277    -.4133378    .1183676
          ---------------------------------------------------------------------------------------
          
          . margins, dydx(nwifeinc)
          
          Average marginal effects                        Number of obs     =      5,634
          Model VCE    : OIM
          
          Expression   : Pr(inlf), predict()
          dy/dx w.r.t. : nwifeinc
          
          ------------------------------------------------------------------------------
                       |            Delta-method
                       |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
              nwifeinc |  -.0006727   .0009544    -0.70   0.481    -.0025434    .0011979
          ------------------------------------------------------------------------------
          The t statistic on vh shows that the null that nwifeinc is exogenous (and therefore nwifeinc^2) is rejected at the 5% level. The other standard errors are not valid if vh is to be left in the estimation.

          Comment


          • #6
            Professor @Jeff Wooldridge Professor @Joro Kolev
            Thank you so much for the insights. It is much clearer for me now. Have a very nice week!
            Sadrac

            Comment


            • #7
              Originally posted by Jeff Wooldridge View Post
              Joro: Even so, the two-step method will result in perfect collinearity if there is only one IV in the way Sadrac proposed. There is a way to do the CF method with only one IV, but I think that has to be done "by hand."

              The general point is that, regardless of what w1 and w2 are -- even if one is a function of the other -- we would need to specify two IVs. I'm definitely not recommending it in Sadrac's case. I'll show below what I think should be done. Here is the generic command that leads to lack of identification:

              Code:
              ivprobit y x1 ... xK (w1 w2 = z), twostep
              In the first stage, w1 and w2 are each regressed on x1 ... xK and z, and then the residuals are obtained -- say, v1h and v2h. In the second stage probit is used. But with only one z, (x1 ... xK w1 w2 v1h v2h) are perfectly collinear -- and so the command should not work.

              Here is verification using a data set similar to the one you used:

              Code:
              gen nwifeincsq = nwifeinc^2
              ivprobit inlf educ age exper kidslt6 (nwifeinc nwifeincsq = huseduc), twostep
              Checking reduced-form model...
              equation not identified; must have at least as many instruments
              not in the regression as there are instrumented variables
              r(481);
              Even if we had an extra IV for nwifeincsq, the above would make no sense. As I said, if nwifeinc follows a linear first stage with an additive error independent of the exogenous variables then nwifeincsq definitely does not. And forget about normality, as the error would include z*v and v^2.

              From my 2015 Journal of Human Resources paper on control function methods, here is what will work. But one needs to adjust the standard errors for the two-step estimation. I suggested the bootstrap.

              Code:
              . reg nwifeinc age educ kidlt6 huseduc
              
              Source | SS df MS Number of obs = 5,634
              -------------+---------------------------------- F(4, 5629) = 205.98
              Model | 532573.708 4 133143.427 Prob > F = 0.0000
              Residual | 3638495.37 5,629 646.383971 R-squared = 0.1277
              -------------+---------------------------------- Adj R-squared = 0.1271
              Total | 4171069.08 5,633 740.470279 Root MSE = 25.424
              
              ------------------------------------------------------------------------------
              nwifeinc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
              -------------+----------------------------------------------------------------
              age | .4026188 .0391036 10.30 0.000 .3259608 .4792769
              educ | .321613 .1620976 1.98 0.047 .0038392 .6393868
              kidlt6 | .5574093 .8699482 0.64 0.522 -1.148025 2.262843
              huseduc | 2.83534 .1423423 19.92 0.000 2.556295 3.114386
              _cons | -27.21475 2.496775 -10.90 0.000 -32.10939 -22.32011
              ------------------------------------------------------------------------------
              
              . predict vh, resid
              
              . probit inlf educ age kidlt6 nwifeinc c.nwifeinc#c.nwifeinc vh
              
              Iteration 0: log likelihood = -3826.743
              Iteration 1: log likelihood = -3541.1662
              Iteration 2: log likelihood = -3540.9946
              Iteration 3: log likelihood = -3540.9946
              
              Probit regression Number of obs = 5,634
              LR chi2(6) = 571.50
              Prob > chi2 = 0.0000
              Log likelihood = -3540.9946 Pseudo R2 = 0.0747
              
              ---------------------------------------------------------------------------------------
              inlf | Coef. Std. Err. z P>|z| [95% Conf. Interval]
              ----------------------+----------------------------------------------------------------
              educ | .1086459 .0091411 11.89 0.000 .0907295 .1265622
              age | -.0213767 .0023138 -9.24 0.000 -.0259117 -.0168418
              kidlt6 | -.5283854 .0452607 -11.67 0.000 -.6170947 -.4396761
              nwifeinc | .0008796 .0031125 0.28 0.777 -.0052207 .0069799
              |
              c.nwifeinc#c.nwifeinc | -.000045 .0000179 -2.52 0.012 -.00008 -.00001
              |
              vh | -.005753 .0026575 -2.16 0.030 -.0109616 -.0005444
              _cons | -.1474851 .1356416 -1.09 0.277 -.4133378 .1183676
              ---------------------------------------------------------------------------------------
              
              . margins, dydx(nwifeinc)
              
              Average marginal effects Number of obs = 5,634
              Model VCE : OIM
              
              Expression : Pr(inlf), predict()
              dy/dx w.r.t. : nwifeinc
              
              ------------------------------------------------------------------------------
              | Delta-method
              | dy/dx Std. Err. z P>|z| [95% Conf. Interval]
              -------------+----------------------------------------------------------------
              nwifeinc | -.0006727 .0009544 -0.70 0.481 -.0025434 .0011979
              ------------------------------------------------------------------------------
              The t statistic on vh shows that the null that nwifeinc is exogenous (and therefore nwifeinc^2) is rejected at the 5% level. The other standard errors are not valid if vh is to be left in the estimation.
              Dear Prof. Wooldridge,

              My question is similar to Sadrac's question.

              My panel model is of the following form where x1 is the endogenous regressor and z is the moderator.


              y = f(x1, x1^2, x1z)
              The instrument for x1 is M (based on averages of similar observations except the one under consideration). Based on the paper by Haans et al.(2015) https://onlinelibrary.wiley.com/doi/....1002/smj.2399, I am instrumenting both the linear and quadratic terms with M and M^2.
              I am confused how should I instrument the interaction term. Should the instrument be M.z or should it be an average of x1z for similar firms except the one under consideration?

              If it is Mz, then do I add Mz to the instruments for x1, x1^2?

              Kindly advice. Thanks.

              Comment

              Working...
              X