Instrumental variables regression with proportional (rate) dependent variable

Austin L Wright

Join Date: Jun 2015

Posts: 2
#1

Instrumental variables regression with proportional (rate) dependent variable

25 Jun 2015, 12:01

I am looking for theoretical and practical advice on how to model a particular type of IV regression. The classical case of 2SLS is estimation where y is continuous. 2SLS is inefficient when y is binary or a count variable. Recent applied work has highlighted the value of two stage residual inclusion when y (or x) is binary. In the design of interest, the dependent variable is continuous but bounded from 0 to 1 (it is a proportion). My understanding is that the most appropriate model for rate variables is a GLM with a link function like logit (I've used this approach in a previous, unrelated project). I have not seen documentation of a 2SGLM, however.

Does anyone have advice on how to implement an IV design with a rate outcome variable and continuous X and Z variables?
Tags: None
Richard Williams

Join Date: Apr 2014

Posts: 4983
#2

25 Jun 2015, 12:33

If your dependent variable was a 0/1 dichotomy, would ivprobit do what you want? If so, Wooldridge has noted that ivprobit could be easily modified to work with fractional response variables (see especially slide 17):

http://www.stata.com/meeting/chicago...wooldridge.pdf

Some of the things that Wooldridge was calling for in that presentation were implemented in Stata 14 with the -fracreg- command. But, as far as I know fractional ivprobit does not have an official Stata command. If you are interested, I do have a fracivp command where I hacked ivprobit to work with fractional variables -- but it is use at your own risk.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Austin L Wright

Join Date: Jun 2015

Posts: 2
#3

25 Jun 2015, 14:11

Thanks Richard. This sounds very promising. The description in the slides you posted seems to indicate something along the lines of a residual inclusion model for the second stage. Would you mind sharing your code? I'd really appreciate it.

To complicate the model slightly, my design includes two endogenous regressors as well as unit and time fixed effects.

Last edited by Austin L Wright; 25 Jun 2015, 14:14.
Comment
Donovan Pollack

Join Date: Jun 2017

Posts: 44
#4

20 Sep 2021, 08:27

Dear Richard,

Other than using the residuals from a first stage as an instrument, is there any other adjustment that IVProbit command would need? Do you know of any papers that discuss the econometrics/justification of using a probit model for a fractional dependent variable with these residuals?

thank you!
D
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2153
#5

20 Sep 2021, 10:16

I discuss this method in Section 18.6 of my 2010 MIT Press book "Econometric Analysis of Cross Section and Panel Data." It is also a special case of Papke and Wooldridge (2008, Journal of Econometrics). A two-step method is very easy provided your endogenous explanatory variable is (roughly) continuous. You can use glm or fracprob.

Code:

reg w x1 ... xk z1 ... zm predict vhat, resid glm y w vhat x1 ... xk, fam(bin) link(probit) vce(robust)

The standard errors from the glm are not correct if you decide to leave vhat in the equation. You should bootstrap the two estimation steps to get proper standard errors. As Richard pointed out several years ago, you can use ivprobit in a single step if you override the check that y is binary. It's the same problem then. I show that this is consistent in my 2014 Journal of Econometrics paper on quasi-MLE with endogenous explanatory variables.
Comment
Donovan Pollack

Join Date: Jun 2017

Posts: 44
#6

20 Sep 2021, 11:03

Thank you very much. One thing that I am not super clear about is how to include a second endogenous variable in the control function. If x and x^2 are endogenous, I want to have two instruments for my two endogenous varialbes: vhat and vhat^2. But including these in the glm equation together would be problematic because they are correlated?

thank you very much!

D
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#7

23 Sep 2021, 05:11

I think you did not understand the procedure at all.

Residual inclusion/ control function is conceptually different from Instrumental Variables, and it does not work like the IV.

Professor Wooldridge literally told you how you should do it in #5 above.

Here is one more thread that you might find useful: https://www.statalist.org/forums/for...ch-with-probit
and one more thread that you might find useful: https://www.statalist.org/forums/for...quadratic-term

The second thread above explicitly speaks of dealing with a quadratic term.

Originally posted by Donovan Pollack View Post

Thank you very much. One thing that I am not super clear about is how to include a second endogenous variable in the control function. If x and x^2 are endogenous, I want to have two instruments for my two endogenous varialbes: vhat and vhat^2. But including these in the glm equation together would be problematic because they are correlated?

thank you very much!

D
Comment
Donovan Pollack

Join Date: Jun 2017

Posts: 44
#8

23 Sep 2021, 07:09

Dear Joro,

I understand they are different but Prof Wooldridge explained that the ivprobit command, which takes instruments, can do this in one step. If I have x and x^2, it is not clear how to handle this with the ivprobit command. Does this make sense?

thank you,
D
Comment
Donovan Pollack

Join Date: Jun 2017

Posts: 44
#9

23 Sep 2021, 07:29

Just to clarify/contrast these procedures:

(1) glm
glm y x x^2 vhat $controls, fam(bin) link(logit) vce(robust)
then bootstrap

vs
(2) ivprobit
ivprobit y $controls, (x x^2 = iv) but this is not identified
ivprobit y $controls, (x x^2 = iv iv2)

is the glm option correct with two endogenous regressors? Or is this better with ivprobit?

thanks
best,
D
Comment
Donovan Pollack

Join Date: Jun 2017

Posts: 44
#10

23 Sep 2021, 07:44

I have seen some people on here suggest, incorrectly I believe, that the ivprobit command should be : ivprobit y $controls, (x x^2 = vhat vhat2). I thought this was not how the twostep function works, we should use the iv, not the residuals themselves in this command, right?
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2153
#11

23 Sep 2021, 19:40

Donovan: You can trick Stata into doing this by specifying x as endogenous and x^2 as an exogenous variable. Once the control function has been included for x all functions of x are exogenous.

While you don't need to include vh^2 in the two-step control function procedure, you can use it to make the functional form more flexible. I recommend this sort of flexibility in my 2015 Journal of Human Resources paper.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2153
#12

23 Sep 2021, 19:44

The following all make sense (and you have to bootstrap the two-step procedures). So vce(robust) is only used for testing the coefficient on vhat or on both vhat and vhat^2. For ivprobit, you have to override the data check or it will make y binary.

Code:

glm y x c.x#c.x vhat $controls, fam(bin) link(logit) vce(robust) glm y x c.x#c.x vhat c.vhat#c.vhat $controls, fam(bin) link(logit) vce(robust) ivprobit y $controls c.x#c.x (x = iv)
Comment
Lukas Lang

Join Date: Dec 2016

Posts: 42
#13

08 Jun 2022, 17:46

Dear Jeff Wooldridge, I was just try running your suggestion in #12 and I obtain very different results.

The key difference between the three models that you suggest is that with the ivprobit command

Code:

ivprobit y $controls c.x#c.x (x = iv)

the term c.x#c.x is included in the first stage.

Would this actually be the right thing to do?

Last edited by Lukas Lang; 08 Jun 2022, 17:56. Reason: Please delete this post. Something went wrong and the same post was added twice. Apologies.

------
I use Stata 17
Comment
Lukas Lang

Join Date: Dec 2016

Posts: 42
#14

08 Jun 2022, 17:48

Dear Jeff Wooldridge, I was just trying your suggestion in #12 and I obtain very different results.

The key difference between the three models that you suggest is that with the ivprobit command

Code:

ivprobit y $controls c.x#c.x (x = iv)

the term c.x#c.x is included in the first stage.

Would this actually be the right thing to do?

Last edited by Lukas Lang; 08 Jun 2022, 17:51. Reason: correcting typos

------
I use Stata 17
Comment

Lyle DA

Join Date: May 2023
Posts: 2

#15

08 Jun 2023, 08:08

Hello Jeff Wooldridge ,

I need your help with the following.

I am running an ivprobit command for a continuous endogenous variable (D) that has a curvilinear effect. I instrumented both the linear and quadratic terms of D on two exogenous variables BSize (continuous) and FC (binary).

gen D2= D^2

ivprobit SOF ControlVars (D D2= BSize FC) i.Year i.industry, vce(cluster Tickernum)

A summary of the long output is presented below:

	1^st Stage		2^nd Stage
	D	D2	SOF
BSize	0.278*	3.262
	(0.161)	(8.110)
FC	6.750***	352.924***
	(1.463)	(88.784)
D			0.186**
			(0.075)
D2			-0.004**
			(0.002)
Constant	6.109**	53.321	-6.076***
	(2.542)	(129.891)	(0.933)
Year controls	Yes	Yes	Yes
Industry controls	Yes	Yes	Yes
Control Vars	Yes	Yes	Yes
/athrho2_1 : corr(e.D2,e.D)			-0.023
			(0.370)
/athrho3_1: corr(e.D,e.SOF)			0.185
			(0.394)
/athrho3_2: corr(e.D2,e.SOF)			1.740***
			(0.034)
/lnsigma2 : SD (D)			2.159***
			(0.026)
/lnsigma3 : SD (D2)			6.129***
			(0.041)
# Observations	2124	2124	2124
Wald χ2			269.48***
Wald test of exogeneity			χ2(1): 6.88; Prob> χ2 = 0.0320

I estimated the marginal effects using this command:

margins, dydx(D D2) atmeans

The reported marginal effects for D and D2 are 0.186** and -0.004**, respectively.

But when I tried to verify the inverted U-shape using utest, stata reports that D not found. r(111).

My question is: can I use utest after the ivprobit command? if not, how can I verify the inverted U-shape here?

Kind Regards

Lyle

Announcement

Instrumental variables regression with proportional (rate) dependent variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment