Instrumenting for quadratic terms with categorical variables?

Marco D'Alterio

Join Date: Jul 2014

Posts: 11
#1

Instrumenting for quadratic terms with categorical variables?

05 Aug 2014, 13:29

Dear STATALIST users, I am trying to estimate two demand systems, namely the Almost Ideal Demand System (AID system) and its quadratic version, the QUAID system, on household budget date. More specifically I am trying to derive demands for food categories. In order to face the problem of total expenditure endogeneity, and without any data on income available, I had to make do with data on employment status of the head of family and the spouse, from which I have derived dummies and subsituted them for income as instrumental variables. Normally one uses income in order to address the endogeneity of total expenditure, and this allow to use logincome and logincome^2 instead of logexpenditure and logexpenditure^2. Now, the point is that in the quadratic version I cannot clearly take any square of dummies, since there is no point in squaring zeroes and one-s, but using these dummies to instrument both for logexp and logexp^2 seems a bit weak to me. Indeed, the results from this second kind of analysis appear to be less convincing. I was wondering whether there is, in general, a way to address such a problem. I would really appreciate any hint or suggestions.

Best regards,

Marco
Tags: None
Matthew J. Baker

Join Date: Mar 2014

Posts: 126
#2

06 Aug 2014, 06:06

Marco --

I'm not certain, but I think your problem is a case of the more general problem that occurs when one (nonlinear) model is nested in another. In your case, you are instrumenting for expenditures, which then enter in nonlinear fashion into your QUAID model. I think the paper that you might consult on this issue is Topel and Murphy, 1985. Journal of Business and Economic Statistics, "Estimation and Inference in Two-Step Econometric Models." I might also mention that there is a Stata journal piece by James Hardin: Hardin, 2002. Stata Journal. "The Robust Variance Estimator for Two-Stage Models." That shows how the basic ideas work in Stata. Arne Rise Hole, 2006. Stata journal "Calculating Murphy–Topel variance estimates in Stata: A simplified procedure," is also a nice discussion of the general idea.

I might have this wrong, but I think in your setting the idea would be to estimate some instrumental model for your endogenous variable(s), transform this endogenous variable, use the transformation(s) in the second-stage model, and then adjust the covariance matrix of the second-stage model appropriately.

Hope that helps, and if I'm wrong about all this, I hope someone else chimes in!

Best,

Matt Baker
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2199
#3

06 Aug 2014, 07:26

I wouldn't square fitted values and insert them in place of ]log(expenditure)]^2; that's the so-called "forbidden regression." However, it is legitimate to use the squared fitted values as instruments (which is not the same think). So

1. Estimate a reduced form for lexpenditure using all exogenous variables, including the two omitted employment dummies. Obtain the fitted values, lexpenditureh. You might even include interactions of the dummies with other exogenous variables, if warranted.

2. gen lexpendiiturehsq = lexpenditureh^2

3. Use lexpenditureh and lexpenditurehsq in your list of IVs, along with the other exogenous variables. If I were estimating by 2SLS it would be

ivregress 2sls y x1 ... xK (lexpenditure lexpendituresq = lexpenditureh lexpenditurehsq), robust

If the model is correctly specified then you don't have to adjust for the first-stage estimation of the IVs.

I discuss these strategies in Chapters 8 and 9 of my 2010 MIT Press book.

You can also employ a control function approach, which would be to get the residuals, say vh, from the reduced form for lexpenditure. Then add vh and possibly functions, such as the square, to the share equations and estimate by usual methods for exogenous variables. I believe Richard Blundell and coauthors have a paper on this. Sorry, don't have time to look it up. JW
Comment
Marco D'Alterio

Join Date: Jul 2014

Posts: 11
#4

09 Aug 2014, 18:53

I must admit that on this forum I'm starting to feel like a dwarf amid the giants. The expertise of the respondents to my posts is becoming increasingly high, though I suspect that, unless some psychotic identity usurper (how well versed into Econometrics, though) is hiding behind the profile of Prof. Wooldridge (!), I have now hit the superior bound. Apologies for breaking the etiquette, but I cannot conceal a certain sense of astonishment.
Back to the proper object of this post, first of all, many thanks to Prof. Baker for showing an interest into my question and bringing to my attention the idea of two-step models and the Murphy-Topel variance estimator. However, due to time constraints I'm afraid I will not be able to explore its implications for the kind of analysis I am conducting.

On the other hand, I found in some way directly implementable into my model Prof. Wooldridge's suggested strategy. Therefore, my deep gratitude and, needless to say, my (humble) admiration to him. I am actually trying to estimate a food demand system, by fitting AID and QUAID systems to data from Household Budget Surveys for the period 1997-2010. I am doing this through 3SLS. Before, I was instrumenting for total expenditure with dummies on employment status (in absence of data on income), but this implied that the estimates of AID and QUAID system were hardly distinguishable, since I was actually using the same set of instruments for both. Instead, by using the squared fitted values from the reduced form equation for logexp as an additional instrument, QUAID's estimates change dramatically and the budget elasticities exhibits a much more plausible pattern! However, there are severe problems with a couple of equations in the system, which predict thousands of negative budget shares and shows no significance for the linear term in logexpenditure (while the quadratic term is always significant). I have to pin down the source of these distortions (may be some presence of heteroskedasticity, or endogeneity of the implicit prices that I have constructed, or something wrong in the use of the set of demographics dummies...), unless the procedure I followed is not "safe" for 3SLS. So, a further question arises:

Is there any prescription against the use of Prof's Wooldridge strategy for 3SLS? Considering the difference between 2SLS (for which, the procedure would perfectly work) and 3SLS, I can't see any theoretical obstacle (if my interpretation is correct, 3SLS is a 2SLS that takes finally into account simultaneous correlations between the errors in the various equations of the system), but I could probably be wrong. I would appreciate further suggestions, since this is the bulk of my analysis!

Many thanks again to my respondents.

Best regards,

Marco
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2199
#5

12 Aug 2014, 08:35

Marco: Thanks for your kind words (even if they are a bit over the top :-)). As you suspect, there is no problem implementing this strategy with 3SLS. The key is to list fitted values, and their squares, as IVs, rather than inserting as regressors. You may then use them as IVs in whatever estimation method you like.
Comment
Marco D'Alterio

Join Date: Jul 2014

Posts: 11
#6

15 Aug 2014, 12:55

Needless to say, the strategy worked. Thanks again, Prof. Wooldridge, for your help and your modesty.
Comment
Idit Raz-Kalisher

Join Date: Dec 2016

Posts: 8
#7

01 Jan 2018, 10:14

Hello,
I'm struggling with a similar problem, however my second stage has a multinomial dependent variable.
Could the procedure presented apply in my case also?
Best,
Idit.
Comment

Announcement

Instrumenting for quadratic terms with categorical variables?

Comment

Comment

Comment

Comment

Comment

Comment