Comparing IV estimates across independent samples

Adam Markovitz

Join Date: Oct 2015

Posts: 4
#1

Comparing IV estimates across independent samples

13 Sep 2017, 08:37

Hi everyone,

My goal: estimate instrumental variable (IV) models across two strata and compare whether those estimates significantly differ from one another.

My problem #1: the -suest- command (which allows for comparison between "seemingly unrelated estimates") is not compatible with ivregress or other IV commands. (If it were, I would simply save the estimates from two IV models and then use -suest- to compare estimates.)

My problem #2: it does not appear possible (or at least straightforward) to return bootstrapped estimates from two independent samples in the same routine. (If it were, I would simply bootstrap IV models on two independent samples and then, because both estimates were held in memory, use -lincom- to compare the parameters).

My problem #3: it is not clear whether performing IV models "by hand" on independent samples and using -suest- plus -lincom- within the bootstrap procedure will yield correct SEs. (This is technically feasible but I'm concerned about its validity.)

It seems unlikely that this problem has not been solved yet but I have not yet found it online. Thank you in advance for any assistance and apologies if an answer to this problem has been described previously on statalist. I illustrate my attempts to date below:

Using a stata data set, suppose we want to estimate the effect of housing value ("hsngval", the endogenous treatment) on rent ("rent") and compare whether the effect varies across urbanicity ("urban", which we create by splitting "pcturban" into above- and below-median values). We use family income ("faminc") and region ("region") as the instrumental variables.

Code:

// use data set use http://www.stata-press.com/data/r13/hsng, clear // split into above- and below- median values xtile urban = pcturban, nq(2) // estimate instrumental variable if non-urban ivregress 2sls rent (hsngval = faminc i.region) if urban == 1 est sto a // estimate instrumental variable if urban ivregress 2sls rent (hsngval = faminc i.region) if urban == 2 est sto b

However, as noted in problem #1, using suest to compare parameter estimates from the two models gives the following error message: "ivregress is not supported by suest."

Code:

// attempt suest suest a b

Next, I had the idea that I could perform a bootstrap routine to compare the two estimates, as doing so could allow Stata to hold both estimates in memory, after which I could use -lincom- or -test- to compare the estimates.

Code:

// bootstrap attempt 1 capture program drop boot1 program define boot1, rclass * estimate 2sls for region i ivregress 2sls rent pcturban (hsngval = faminc i.region) if urban == 1 return scalar b0 = _b[hsngval] * estimate 2sls for region j ivregress 2sls rent pcturban (hsngval = faminc i.region) if urban == 2 return scalar b1 = _b[hsngval] end bootstrap b0=r(b0) b1=r(b1), reps(50): boot1 lincom b1 - b0

However, as noted in problem #2, this gives the following error message: "insufficient observations to compute bootstrap standard errors. no results will be saved." I believe that this is because Stata does not allow one to returned bootstrapped estimates derived from two independent samples simultaneously. Charlotte Rogers raised this possibility in a previous thread but others brushed off her concerns, suggesting that this error merely reflects small sample size. I disagree and attempt to provide evidence to the contrary effect below. Additionally, I have encountered this issue on my own data with a much larger sample size (n~10,000). This does not mean, however, that there is not some workaround for this issue -- just that I don't know what it is. A few tricks I've (unsuccessfully) attempted include defining two outcomes or two treatments (one for "urban" = 1, one for "urban" = 2) such that the other outcome or treatment is missing in the other independent group, to try to trick Stata into performing bootstrapping on two independent samples at the same time. This yielded the same error message as before ("insufficient observations..."). For instance:

Code:

// bootstrap attempt 2 gen rent1 = . replace rent1 = rent if urban == 1 gen rent2 = . replace rent2 = rent if urban == 2 capture program drop boot2 program define boot2, rclass * estimate 2sls for region i ivregress 2sls rent1 pcturban (hsngval = faminc i.region) if urban == 1 return scalar b0 = _b[hsngval] * estimate 2sls for region j ivregress 2sls rent2 pcturban (hsngval = faminc i.region) if urban == 2 return scalar b1 = _b[hsngval] end bootstrap b0=r(b0) b1=r(b1), reps(50): boot2

A third, technically feasible bootstrap approach is one in which I: (1) estimate the models on two independent samples; (2) use -suest- to compare estimates; and (3) use - lincom - to test whether those estimates differ from one another,..., and then bootstrap that entire estimation process. This is based on an approach suggested by Maarten Buis for estimating IV by hand (here) but not necessarily to compare estimates between samples. However, as I note in problem #3, I worry that bootstrapping the lincom results rather than the model estimates will underestimate the combined uncertainty of estimating the first-stage, second-stage, and differencing across strata:

Code:

** bootstrap of IV "by hand" capture program drop boot3 program define boot3, rclass * estimate 2sls for region i reg hsngval faminc i.region if urban == 1 predict D_hat, xb reg rent D_hat if urban == 1 est store a * estimate 2sls for baseline j reg hsngval faminc i.region if urban == 2 predict D_hat2, xb reg rent D_hat2 if urban == 2 est store b * compare estimates from unrelated models suest a b * test whether linear combination /= 0 to determine if estimates differ from one another lincom [b_mean]_b[D_hat]-[a_mean]_b[D_hat] drop D_hat D_hat2 * return result from linear combination test for bootstrapping (see below) return scalar diff = r(estimate) end bootstrap diff=r(diff), reps(50): boot3

In comparison to the "bootstrap3" program, I would prefer to bootstrap only the IV coefficients (not the estimate of their differences), and then use -suest- and -lincom- to compare those bootstrapped estimates. However, when I attempt this, we once again encounter the issue of being unable to save scalar estimates from two independent samples in one bootstrap routine.

Code:

// attempt to bootstrap IV estimates from two separate samples capture program drop boot4 program define boot4, rclass * estimate 2sls if nonurban reg hsngval faminc i.region if urban == 1 predict D_hat, xb reg rent D_hat if urban == 1 drop D_hat return scalar b0 = _b[D_hat] * estimate 2sls if urban reg hsngval faminc i.region if urban == 2 predict D_hat2, xb reg rent D_hat2 if urban == 1 drop D_hat2 return scalar b1 = _b[D_hat2] end bootstrap b0=r(b0) b1=r(b1), reps(50): boot4

Finally, I'd like to note that while the "program4" gives the same error message as before("insufficient observations..."), if you hacked off the second part of the bootstrap routine, and only bootstrap the regression on "uban ==1", there is no error message. Thus, the problem does NOT appear to be one of sample size (as posters stated on the earlier thread I referenced) but rather an inherent obstacle to returning bootstrapped estimates from two independent samples in the same bootstrap routine.

I'd be grateful for any feedback on the above problem, either clarifying: (1) a technical solution to the above challenge; or (2) an econometric clarification for why returning bootstrapped estimates on two independent samples is a bad idea.

Thanks,
Adam
Tags: bootstrap, bootstrap SE, IV, ivregress, stratified analysis
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

14 Sep 2017, 10:34

You didn't get a quick answer. You'll increase your chances of a helpful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. You will also increase your chances of a helpful answer with a much (very much) shorter question.

I don't have a full answer, but one approach you might consider is to run the two estimates at the same time. You can use factor notation to allow separate parameters when urban=1 and urban =2. You can allow for different error variances using cluster or robust.

While more efficient with factor variables, conceptually what you're doing is creating two variables for each iv - one equals faminc if urban=1 and 0 otherwise, and the other equals faminc if urban =2 and 0 otherwise. You can do the same for region. Then you include both sets of variables in the estimate.
Comment

Announcement

Comparing IV estimates across independent samples

Comment