Hello, I want to provide a client with a Stata command that takes some varlist of independent / predictor variables, runs a regression (logit, as it happens) with vce(bootstrap, saving(bootfilename)). Then, I want to loop over the individual bootstrap replicates in bootfilename.dta and predict the dependent variable repeatedly, using those coefficient values. The goal is to propagate uncertainty from the regression forward into other code that uses the replicated predicted values.
Now, I am mindful that they could in future use this with some unexpected varlist, maybe including interactions. So, I want the plugging of the bootstrap coefficients into the prediction formula to be automatic, without any hard coding. I need to plug a variable in bootfilename.dta like _b_cons, which is labelled "_b[_cons]" into the predict command for the right kind of regression, and have it treated as the _cons parameter.
I know that bootfilename.dta will contain some metadata telling Stata that the regression was a logit, so I really suspect that there is a simple way to do this, and I just don't know it (yet). I could parse the names but it will be messy if interactions get in there.
Any suggestions would be most gratefully received!
Anonymised code example:
clear all
set seed 9876
set obs 1000
gen sex = rbinomial(1,0.55)
gen ageband = 1 + rbinomial(2, 0.5)
gen true_logit = -1 + 0.1*1.sex + 0.5*2.ageband + 0.7*3.ageband
gen true_risk = 1/(1+exp(-1*true_logit))
gen outcome = rbinomial(1, true_risk)
// run model and store bootstrap replicate coefficients
logit outcome i.sex i.ageband, ///
vce(bootstrap, dots seed(1234) rep(100) ///
saving(bootfilename, replace))
// now how can I loop -predict- over the bootstrap file rows?
Now, I am mindful that they could in future use this with some unexpected varlist, maybe including interactions. So, I want the plugging of the bootstrap coefficients into the prediction formula to be automatic, without any hard coding. I need to plug a variable in bootfilename.dta like _b_cons, which is labelled "_b[_cons]" into the predict command for the right kind of regression, and have it treated as the _cons parameter.
I know that bootfilename.dta will contain some metadata telling Stata that the regression was a logit, so I really suspect that there is a simple way to do this, and I just don't know it (yet). I could parse the names but it will be messy if interactions get in there.
Any suggestions would be most gratefully received!
Anonymised code example:
clear all
set seed 9876
set obs 1000
gen sex = rbinomial(1,0.55)
gen ageband = 1 + rbinomial(2, 0.5)
gen true_logit = -1 + 0.1*1.sex + 0.5*2.ageband + 0.7*3.ageband
gen true_risk = 1/(1+exp(-1*true_logit))
gen outcome = rbinomial(1, true_risk)
// run model and store bootstrap replicate coefficients
logit outcome i.sex i.ageband, ///
vce(bootstrap, dots seed(1234) rep(100) ///
saving(bootfilename, replace))
// now how can I loop -predict- over the bootstrap file rows?
Comment