Dear All,
I am estimating a simple predictive model for the number of opioid related deaths. I want to check how predictions improve as the sample size increases. For that, I am trying to bootstrap the prediction, within a loop, that estimates the predictions as sample size increases from n=500 to n=2500.
Here is a dataex of count 100 of the sample:
And the loop I have written is as follows:
g Male = 1 if Sex == 1
replace Male = 0 if Sex == 0
g MalexBlack = Male*Black
forvalues i=500/2500 {
capture program drop myboot
program define myboot, rclass
preserve
bsample
probit ICD_Opioids Sex Black Home age030 if (T509~=1)
predict death_hat if T509==1, pr
gen deathpredicted=1 if death_hat>=.5
recode deathpredicted .=0
replace deathpredicted = . if T509 !=1
egen numdeaths_ruhm_tmp = sum(deathpredicted) // Calculate estimated number of opioid deaths with Ruhm approach
sum numdeaths_ruhm_tmp // Summarize this--will extract mean in next line
return scalar numdeaths_ruhm = r(mean)
egen numdeaths_tox_tmp = sum(T509_TOXOPIOID) // Calculate estimated number of opioid deaths from toxicology data
sum numdeaths_tox_tmp // Summarize this--will extract mean in next line
return scalar numdeaths_tox = r(mean)
return scalar numdeaths_diff = numdeaths_ruhm - numdeaths_tox // Main statistic of interest. Is it different from 0?
restore
end
bootstrap numdeaths_diff = r(numdeaths_diff) numdeaths_ruhm = r(numdeaths_ruhm) numdeaths_tox = r(numdeaths_tox), size(`i') /*saving(bootstrap10, replace)*/ reps(500) seed(1234): myboot
save "bootstrap.dta", append
}
It doesn't run at all. I am a somewhat experienced user of Stata but with no experience with loops. So I suspect this has multiple problems but I don't even know where to begin since it does not run at all. I will be grateful for all/ any help please.
Sincerely,
Sumedha.
I am estimating a simple predictive model for the number of opioid related deaths. I want to check how predictions improve as the sample size increases. For that, I am trying to bootstrap the prediction, within a loop, that estimates the predictions as sample size increases from n=500 to n=2500.
Here is a dataex of count 100 of the sample:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double(ICD_Opioids Sex T509) byte(Black age030 Home) 0 1 1 0 0 1 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 0 0 1 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 1 0 1 1 0 0 0 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 0 0 1 0 1 1 1 0 1 0 1 1 0 0 1 0 0 1 0 1 0 0 1 1 0 0 1 1 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 1 0 1 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 1 1 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 1 1 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 0 1 1 0 0 0 1 1 0 1 1 1 0 1 0 1 1 0 0 0 1 0 0 0 1 0 0 1 1 0 1 0 1 1 0 0 1 1 0 1 1 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 0 1 1 0 1 1 0 1 1 0 1 0 1 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 1 1 0 0 0 0 0 1 1 0 1 0 1 1 0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 1 1 0 0 1 0 0 0 1 0 1 1 0 1 1 0 1 0 0 1 1 0 0 1 0 1 1 0 0 1 1 1 0 0 1 1 0 0 1 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 0 1 0 0 0 1 0 1 1 0 1 1 0 0 1 0 0 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 1 1 0 1 1 1 1 0 0 0 1 0 1 1 0 1 1 1 1 0 0 0 1 1 0 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 0 1 1 0 1 0 1 0 1 1 0 0 0 0 0 1 0 0 1 1 1 0 1 0 1 0 1 1 0 0 0 1 1 0 0 1 1 0 0 1 0 0 1 0 1 1 0 1 0 end label values ICD_Opioids ICD_Opioids label def ICD_Opioids 0 "Absent", modify label def ICD_Opioids 1 "Present", modify label values Sex Sex label def Sex 0 "Female", modify label def Sex 1 "Male", modify label values T509 T509 label def T509 0 "Absent", modify label def T509 1 "Present", modify
g Male = 1 if Sex == 1
replace Male = 0 if Sex == 0
g MalexBlack = Male*Black
forvalues i=500/2500 {
capture program drop myboot
program define myboot, rclass
preserve
bsample
probit ICD_Opioids Sex Black Home age030 if (T509~=1)
predict death_hat if T509==1, pr
gen deathpredicted=1 if death_hat>=.5
recode deathpredicted .=0
replace deathpredicted = . if T509 !=1
egen numdeaths_ruhm_tmp = sum(deathpredicted) // Calculate estimated number of opioid deaths with Ruhm approach
sum numdeaths_ruhm_tmp // Summarize this--will extract mean in next line
return scalar numdeaths_ruhm = r(mean)
egen numdeaths_tox_tmp = sum(T509_TOXOPIOID) // Calculate estimated number of opioid deaths from toxicology data
sum numdeaths_tox_tmp // Summarize this--will extract mean in next line
return scalar numdeaths_tox = r(mean)
return scalar numdeaths_diff = numdeaths_ruhm - numdeaths_tox // Main statistic of interest. Is it different from 0?
restore
end
bootstrap numdeaths_diff = r(numdeaths_diff) numdeaths_ruhm = r(numdeaths_ruhm) numdeaths_tox = r(numdeaths_tox), size(`i') /*saving(bootstrap10, replace)*/ reps(500) seed(1234): myboot
save "bootstrap.dta", append
}
It doesn't run at all. I am a somewhat experienced user of Stata but with no experience with loops. So I suspect this has multiple problems but I don't even know where to begin since it does not run at all. I will be grateful for all/ any help please.
Sincerely,
Sumedha.
Comment