Instrumental Variable and Quadratic Term

Sadrac Cenophat

Join Date: Feb 2019

Posts: 7
#1

Instrumental Variable and Quadratic Term

08 Sep 2020, 09:18

Dear All,
I hope this post finds you well. I would like to test for endogeneity by using the command ivprobit. My independent variables are Competition and Competition_square. The dependent variable is Innovation and the instrumental variable Politic. My code is as follow:

Code:

ivprobit Innovation (Competition Competition_square = Politic Politic), twostep asis

Although this code has been successfully executed (with a warning: Politic omitted because of collinearity), I am not sure whether this is the right syntax for the quadratic instrumented variable, and could not find it in the Stata help file.
Could you please help me?
Many thanks for your support!
Tags: None
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2175
#2

08 Sep 2020, 20:44

I don’t see how that could’ve executed because you’re short an instrument. Is Politic a binary variable?

In any case, you should use a two-step control function method. Then one IV is enough. The method you’re trying to use isn’t consistent even if you have a second IV.
Comment
Sadrac Cenophat

Join Date: Feb 2019

Posts: 7
#3

11 Sep 2020, 05:25

Many thanks for the reply; I highly appreciate it. Politic is a continuous variable. I will use a two-step control function method as you suggested. Thank you very much!
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

12 Sep 2020, 02:49

Professor Jeff Wooldridge , there might be a misunderstanding coming from the inappropriate name that Stata has assigned to -ivprobit-.

-ivprobit- is not implementing IV Probit, -ivprobit, twostep- is implementing exactly the control function method that you are proposing.

Sadrac Cenophat , although you do not need a second instrument for the control function approach you are using, you still for aesthetic symmetry might choose to use Politic and Politic_squared as instruments, when you are instrumenting Competition and Competition_square.

The command dropped Politic because you repeated the variable twice in the instrument list. Look here, the first time nothing is dropped, the second time when I repeat the instrument twice it is dropped:

Code:

.  webuse laborsup

. ivprobit fem_work fem_educ kids (other_inc = male_educ), twostep
Checking reduced-form model...

Two-step probit with endogenous regressors        Number of obs   =        500
                                                  Wald chi2(3)    =      93.97
                                                  Prob > chi2     =     0.0000

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   other_inc |   -.058473   .0093364    -6.26   0.000    -.0767719    -.040174
    fem_educ |    .227437   .0281628     8.08   0.000     .1722389     .282635
        kids |  -.1961748   .0496323    -3.95   0.000    -.2934522   -.0988973
       _cons |   .3956061   .4982649     0.79   0.427    -.5809752    1.372187
------------------------------------------------------------------------------
Instrumented:  other_inc
Instruments:   fem_educ kids male_educ
------------------------------------------------------------------------------
Wald test of exogeneity: chi2(1) = 6.50                   Prob > chi2 = 0.0108

. ivprobit fem_work fem_educ kids (other_inc = male_educ male_educ), twostep
Checking reduced-form model...
note: male_educ omitted because of collinearity

Two-step probit with endogenous regressors        Number of obs   =        500
                                                  Wald chi2(3)    =      93.97
                                                  Prob > chi2     =     0.0000

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   other_inc |   -.058473   .0093364    -6.26   0.000    -.0767719    -.040174
    fem_educ |    .227437   .0281628     8.08   0.000     .1722389     .282635
        kids |  -.1961748   .0496323    -3.95   0.000    -.2934522   -.0988973
       _cons |   .3956061   .4982649     0.79   0.427    -.5809752    1.372187
------------------------------------------------------------------------------
Instrumented:  other_inc
Instruments:   fem_educ kids male_educ
------------------------------------------------------------------------------
Wald test of exogeneity: chi2(1) = 6.50                   Prob > chi2 = 0.0108

Comment

Jeff Wooldridge

Join Date: Apr 2014
Posts: 2175

13 Sep 2020, 08:45

Joro: Even so, the two-step method will result in perfect collinearity if there is only one IV in the way Sadrac proposed. There is a way to do the CF method with only one IV, but I think that has to be done "by hand."

The general point is that, regardless of what w1 and w2 are -- even if one is a function of the other -- we would need to specify two IVs. I'm definitely not recommending it in Sadrac's case. I'll show below what I think should be done. Here is the generic command that leads to lack of identification:

Code:

ivprobit y x1 ... xK (w1 w2 = z), twostep

In the first stage, w1 and w2 are each regressed on x1 ... xK and z, and then the residuals are obtained -- say, v1h and v2h. In the second stage probit is used. But with only one z, (x1 ... xK w1 w2 v1h v2h) are perfectly collinear -- and so the command should not work.

Here is verification using a data set similar to the one you used:

Code:

gen nwifeincsq = nwifeinc^2
ivprobit inlf educ age exper kidslt6 (nwifeinc nwifeincsq = huseduc), twostep
Checking reduced-form model...
equation not identified; must have at least as many instruments 
not in the regression as there are instrumented variables
r(481);

Even if we had an extra IV for nwifeincsq, the above would make no sense. As I said, if nwifeinc follows a linear first stage with an additive error independent of the exogenous variables then nwifeincsq definitely does not. And forget about normality, as the error would include z*v and v^2.

From my 2015 Journal of Human Resources paper on control function methods, here is what will work. But one needs to adjust the standard errors for the two-step estimation. I suggested the bootstrap.

Code:

. reg nwifeinc age educ kidlt6 huseduc

      Source |       SS           df       MS      Number of obs   =     5,634
-------------+----------------------------------   F(4, 5629)      =    205.98
       Model |  532573.708         4  133143.427   Prob > F        =    0.0000
    Residual |  3638495.37     5,629  646.383971   R-squared       =    0.1277
-------------+----------------------------------   Adj R-squared   =    0.1271
       Total |  4171069.08     5,633  740.470279   Root MSE        =    25.424

------------------------------------------------------------------------------
    nwifeinc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .4026188   .0391036    10.30   0.000     .3259608    .4792769
        educ |    .321613   .1620976     1.98   0.047     .0038392    .6393868
      kidlt6 |   .5574093   .8699482     0.64   0.522    -1.148025    2.262843
     huseduc |    2.83534   .1423423    19.92   0.000     2.556295    3.114386
       _cons |  -27.21475   2.496775   -10.90   0.000    -32.10939   -22.32011
------------------------------------------------------------------------------

. predict vh, resid

. probit inlf educ age kidlt6 nwifeinc c.nwifeinc#c.nwifeinc vh

Iteration 0:   log likelihood =  -3826.743  
Iteration 1:   log likelihood = -3541.1662  
Iteration 2:   log likelihood = -3540.9946  
Iteration 3:   log likelihood = -3540.9946  

Probit regression                               Number of obs     =      5,634
                                                LR chi2(6)        =     571.50
                                                Prob > chi2       =     0.0000
Log likelihood = -3540.9946                     Pseudo R2         =     0.0747

---------------------------------------------------------------------------------------
                 inlf |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------------+----------------------------------------------------------------
                 educ |   .1086459   .0091411    11.89   0.000     .0907295    .1265622
                  age |  -.0213767   .0023138    -9.24   0.000    -.0259117   -.0168418
               kidlt6 |  -.5283854   .0452607   -11.67   0.000    -.6170947   -.4396761
             nwifeinc |   .0008796   .0031125     0.28   0.777    -.0052207    .0069799
                      |
c.nwifeinc#c.nwifeinc |   -.000045   .0000179    -2.52   0.012      -.00008     -.00001
                      |
                   vh |   -.005753   .0026575    -2.16   0.030    -.0109616   -.0005444
                _cons |  -.1474851   .1356416    -1.09   0.277    -.4133378    .1183676
---------------------------------------------------------------------------------------

. margins, dydx(nwifeinc)

Average marginal effects                        Number of obs     =      5,634
Model VCE    : OIM

Expression   : Pr(inlf), predict()
dy/dx w.r.t. : nwifeinc

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    nwifeinc |  -.0006727   .0009544    -0.70   0.481    -.0025434    .0011979
------------------------------------------------------------------------------

The t statistic on vh shows that the null that nwifeinc is exogenous (and therefore nwifeinc^2) is rejected at the 5% level. The other standard errors are not valid if vh is to be left in the estimation.

Comment

Sadrac Cenophat

Join Date: Feb 2019

Posts: 7
#6

14 Sep 2020, 08:20

Professor @Jeff Wooldridge Professor @Joro Kolev
Thank you so much for the insights. It is much clearer for me now. Have a very nice week!
Sadrac
Comment

Nitin Jain

Join Date: Apr 2022
Posts: 65

10 Aug 2022, 08:28

Originally posted by Jeff Wooldridge View Post

Code:

ivprobit y x1 ... xK (w1 w2 = z), twostep

Code:

gen nwifeincsq = nwifeinc^2
ivprobit inlf educ age exper kidslt6 (nwifeinc nwifeincsq = huseduc), twostep
Checking reduced-form model...
equation not identified; must have at least as many instruments
not in the regression as there are instrumented variables
r(481);

Code:

. reg nwifeinc age educ kidlt6 huseduc

Source | SS df MS Number of obs = 5,634
-------------+---------------------------------- F(4, 5629) = 205.98
Model | 532573.708 4 133143.427 Prob > F = 0.0000
Residual | 3638495.37 5,629 646.383971 R-squared = 0.1277
-------------+---------------------------------- Adj R-squared = 0.1271
Total | 4171069.08 5,633 740.470279 Root MSE = 25.424

------------------------------------------------------------------------------
nwifeinc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .4026188 .0391036 10.30 0.000 .3259608 .4792769
educ | .321613 .1620976 1.98 0.047 .0038392 .6393868
kidlt6 | .5574093 .8699482 0.64 0.522 -1.148025 2.262843
huseduc | 2.83534 .1423423 19.92 0.000 2.556295 3.114386
_cons | -27.21475 2.496775 -10.90 0.000 -32.10939 -22.32011
------------------------------------------------------------------------------

. predict vh, resid

. probit inlf educ age kidlt6 nwifeinc c.nwifeinc#c.nwifeinc vh

Iteration 0: log likelihood = -3826.743
Iteration 1: log likelihood = -3541.1662
Iteration 2: log likelihood = -3540.9946
Iteration 3: log likelihood = -3540.9946

Probit regression Number of obs = 5,634
LR chi2(6) = 571.50
Prob > chi2 = 0.0000
Log likelihood = -3540.9946 Pseudo R2 = 0.0747

---------------------------------------------------------------------------------------
inlf | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------------+----------------------------------------------------------------
educ | .1086459 .0091411 11.89 0.000 .0907295 .1265622
age | -.0213767 .0023138 -9.24 0.000 -.0259117 -.0168418
kidlt6 | -.5283854 .0452607 -11.67 0.000 -.6170947 -.4396761
nwifeinc | .0008796 .0031125 0.28 0.777 -.0052207 .0069799
|
c.nwifeinc#c.nwifeinc | -.000045 .0000179 -2.52 0.012 -.00008 -.00001
|
vh | -.005753 .0026575 -2.16 0.030 -.0109616 -.0005444
_cons | -.1474851 .1356416 -1.09 0.277 -.4133378 .1183676
---------------------------------------------------------------------------------------

. margins, dydx(nwifeinc)

Average marginal effects Number of obs = 5,634
Model VCE : OIM

Expression : Pr(inlf), predict()
dy/dx w.r.t. : nwifeinc

------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nwifeinc | -.0006727 .0009544 -0.70 0.481 -.0025434 .0011979
------------------------------------------------------------------------------

Dear Prof. Wooldridge,

My question is similar to Sadrac's question.

My panel model is of the following form where x1 is the endogenous regressor and z is the moderator.

y = f(x1, x1^2, x1z)

The instrument for x1 is M (based on averages of similar observations except the one under consideration). Based on the paper by Haans et al.(2015) https://onlinelibrary.wiley.com/doi/....1002/smj.2399, I am instrumenting both the linear and quadratic terms with M and M^2. I am confused how should I instrument the interaction term. Should the instrument be M.z or should it be an average of x1z for similar firms except the one under consideration?

If it is Mz, then do I add Mz to the instruments for x1, x1^2?

Kindly advice. Thanks.

Announcement