Hi everyone,
My goal: estimate instrumental variable (IV) models across two strata and compare whether those estimates significantly differ from one another.
My problem #1: the -suest- command (which allows for comparison between "seemingly unrelated estimates") is not compatible with ivregress or other IV commands. (If it were, I would simply save the estimates from two IV models and then use -suest- to compare estimates.)
My problem #2: it does not appear possible (or at least straightforward) to return bootstrapped estimates from two independent samples in the same routine. (If it were, I would simply bootstrap IV models on two independent samples and then, because both estimates were held in memory, use -lincom- to compare the parameters).
My problem #3: it is not clear whether performing IV models "by hand" on independent samples and using -suest- plus -lincom- within the bootstrap procedure will yield correct SEs. (This is technically feasible but I'm concerned about its validity.)
It seems unlikely that this problem has not been solved yet but I have not yet found it online. Thank you in advance for any assistance and apologies if an answer to this problem has been described previously on statalist. I illustrate my attempts to date below:
Using a stata data set, suppose we want to estimate the effect of housing value ("hsngval", the endogenous treatment) on rent ("rent") and compare whether the effect varies across urbanicity ("urban", which we create by splitting "pcturban" into above- and below-median values). We use family income ("faminc") and region ("region") as the instrumental variables.
However, as noted in problem #1, using suest to compare parameter estimates from the two models gives the following error message: "ivregress is not supported by suest."
Next, I had the idea that I could perform a bootstrap routine to compare the two estimates, as doing so could allow Stata to hold both estimates in memory, after which I could use -lincom- or -test- to compare the estimates.
However, as noted in problem #2, this gives the following error message: "insufficient observations to compute bootstrap standard errors. no results will be saved." I believe that this is because Stata does not allow one to returned bootstrapped estimates derived from two independent samples simultaneously. Charlotte Rogers raised this possibility in a previous thread but others brushed off her concerns, suggesting that this error merely reflects small sample size. I disagree and attempt to provide evidence to the contrary effect below. Additionally, I have encountered this issue on my own data with a much larger sample size (n~10,000). This does not mean, however, that there is not some workaround for this issue -- just that I don't know what it is. A few tricks I've (unsuccessfully) attempted include defining two outcomes or two treatments (one for "urban" = 1, one for "urban" = 2) such that the other outcome or treatment is missing in the other independent group, to try to trick Stata into performing bootstrapping on two independent samples at the same time. This yielded the same error message as before ("insufficient observations..."). For instance:
A third, technically feasible bootstrap approach is one in which I: (1) estimate the models on two independent samples; (2) use -suest- to compare estimates; and (3) use - lincom - to test whether those estimates differ from one another,..., and then bootstrap that entire estimation process. This is based on an approach suggested by Maarten Buis for estimating IV by hand (here) but not necessarily to compare estimates between samples. However, as I note in problem #3, I worry that bootstrapping the lincom results rather than the model estimates will underestimate the combined uncertainty of estimating the first-stage, second-stage, and differencing across strata:
In comparison to the "bootstrap3" program, I would prefer to bootstrap only the IV coefficients (not the estimate of their differences), and then use -suest- and -lincom- to compare those bootstrapped estimates. However, when I attempt this, we once again encounter the issue of being unable to save scalar estimates from two independent samples in one bootstrap routine.
Finally, I'd like to note that while the "program4" gives the same error message as before("insufficient observations..."), if you hacked off the second part of the bootstrap routine, and only bootstrap the regression on "uban ==1", there is no error message. Thus, the problem does NOT appear to be one of sample size (as posters stated on the earlier thread I referenced) but rather an inherent obstacle to returning bootstrapped estimates from two independent samples in the same bootstrap routine.
I'd be grateful for any feedback on the above problem, either clarifying: (1) a technical solution to the above challenge; or (2) an econometric clarification for why returning bootstrapped estimates on two independent samples is a bad idea.
Thanks,
Adam
My goal: estimate instrumental variable (IV) models across two strata and compare whether those estimates significantly differ from one another.
My problem #1: the -suest- command (which allows for comparison between "seemingly unrelated estimates") is not compatible with ivregress or other IV commands. (If it were, I would simply save the estimates from two IV models and then use -suest- to compare estimates.)
My problem #2: it does not appear possible (or at least straightforward) to return bootstrapped estimates from two independent samples in the same routine. (If it were, I would simply bootstrap IV models on two independent samples and then, because both estimates were held in memory, use -lincom- to compare the parameters).
My problem #3: it is not clear whether performing IV models "by hand" on independent samples and using -suest- plus -lincom- within the bootstrap procedure will yield correct SEs. (This is technically feasible but I'm concerned about its validity.)
It seems unlikely that this problem has not been solved yet but I have not yet found it online. Thank you in advance for any assistance and apologies if an answer to this problem has been described previously on statalist. I illustrate my attempts to date below:
Using a stata data set, suppose we want to estimate the effect of housing value ("hsngval", the endogenous treatment) on rent ("rent") and compare whether the effect varies across urbanicity ("urban", which we create by splitting "pcturban" into above- and below-median values). We use family income ("faminc") and region ("region") as the instrumental variables.
Code:
// use data set use http://www.stata-press.com/data/r13/hsng, clear // split into above- and below- median values xtile urban = pcturban, nq(2) // estimate instrumental variable if non-urban ivregress 2sls rent (hsngval = faminc i.region) if urban == 1 est sto a // estimate instrumental variable if urban ivregress 2sls rent (hsngval = faminc i.region) if urban == 2 est sto b
Code:
// attempt suest suest a b
Code:
// bootstrap attempt 1 capture program drop boot1 program define boot1, rclass * estimate 2sls for region i ivregress 2sls rent pcturban (hsngval = faminc i.region) if urban == 1 return scalar b0 = _b[hsngval] * estimate 2sls for region j ivregress 2sls rent pcturban (hsngval = faminc i.region) if urban == 2 return scalar b1 = _b[hsngval] end bootstrap b0=r(b0) b1=r(b1), reps(50): boot1 lincom b1 - b0
Code:
// bootstrap attempt 2 gen rent1 = . replace rent1 = rent if urban == 1 gen rent2 = . replace rent2 = rent if urban == 2 capture program drop boot2 program define boot2, rclass * estimate 2sls for region i ivregress 2sls rent1 pcturban (hsngval = faminc i.region) if urban == 1 return scalar b0 = _b[hsngval] * estimate 2sls for region j ivregress 2sls rent2 pcturban (hsngval = faminc i.region) if urban == 2 return scalar b1 = _b[hsngval] end bootstrap b0=r(b0) b1=r(b1), reps(50): boot2
Code:
** bootstrap of IV "by hand" capture program drop boot3 program define boot3, rclass * estimate 2sls for region i reg hsngval faminc i.region if urban == 1 predict D_hat, xb reg rent D_hat if urban == 1 est store a * estimate 2sls for baseline j reg hsngval faminc i.region if urban == 2 predict D_hat2, xb reg rent D_hat2 if urban == 2 est store b * compare estimates from unrelated models suest a b * test whether linear combination /= 0 to determine if estimates differ from one another lincom [b_mean]_b[D_hat]-[a_mean]_b[D_hat] drop D_hat D_hat2 * return result from linear combination test for bootstrapping (see below) return scalar diff = r(estimate) end bootstrap diff=r(diff), reps(50): boot3
Code:
// attempt to bootstrap IV estimates from two separate samples capture program drop boot4 program define boot4, rclass * estimate 2sls if nonurban reg hsngval faminc i.region if urban == 1 predict D_hat, xb reg rent D_hat if urban == 1 drop D_hat return scalar b0 = _b[D_hat] * estimate 2sls if urban reg hsngval faminc i.region if urban == 2 predict D_hat2, xb reg rent D_hat2 if urban == 1 drop D_hat2 return scalar b1 = _b[D_hat2] end bootstrap b0=r(b0) b1=r(b1), reps(50): boot4
I'd be grateful for any feedback on the above problem, either clarifying: (1) a technical solution to the above challenge; or (2) an econometric clarification for why returning bootstrapped estimates on two independent samples is a bad idea.
Thanks,
Adam
Comment