Instrumental variables with many missing values in the endogenous regressor

Simon Heß

Join Date: Jul 2015

Posts: 75
#1

Instrumental variables with many missing values in the endogenous regressor

08 Apr 2019, 08:49

Hi,

I have a data set with 10k observations for Y and an endogenous regressor X with many missing observations (90%) I have an instrument Z with no missings. I know that the values for X are missing at random.

I think, the naive approach would by to run

Code:

ivregress 2sls Y (X=Z)

which seems not optimal to me, because it ignores the information on Y and Z in all the observations with missing X... And since X is missing at random, I can assume that the relationship between X and Z is the same among those observations.

Optimally I would run the first stage with the subsample with non-missing Xs and the the second stage on the full sample. Which should give me more power

How is this possible in Stata?
Are there issues I neglect?
Are there papers about this?

PS: I am cross-posting a similar question here: https://stats.stackexchange.com/ques...ous-regressors

Last edited by Simon Heß; 08 Apr 2019, 08:53.
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

09 Apr 2019, 11:02

You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

While you can do the instrumental equation estimate outside the ivreg, and then use the predicted values in a regress, this creates problems with the standard errors (which has been discussed on this listserve). But I'm not sure whether this is legitimate or not.
Comment
Simon Heß

Join Date: Jul 2015

Posts: 75
#3

10 Apr 2019, 03:54

Sorry, I fail to see how I did not follow the FAQs on asking questions. Could you help me by elaborating on what I did (not) do?

I am aware that I could do the two stages separately, and also that I would need to correct my standard errors. Assuming that there's nothing else wrong with my approach, my question would be how to do this in Stata (i.e., estimating it with the correct standard errors), in particular I was hoping that there would be a smart way to cast this into a (G)MM framework and let Stata figure out the right standard errors without much manual work. And of course my other sub-question (is there previous work/examples/papers on this?) remains, I couldn't find any.

In the meantime I got a fairly useful answer on Stackexchange, (https://stats.stackexchange.com/ques.../401934#401934) that makes me confident that at least conceptually I am on the right track. Of course I am also aware that I could simply bootstrap the SEs, but I'd rather use standard asymptotics.
Comment

Announcement

Instrumental variables with many missing values in the endogenous regressor

Comment

Comment