Hello all,
I am a first-time poster, so please bear with me.
Context:
I am conducting an analysis where I have 13 sites, and am conducting a risk factor analysis of tuberculosis (TB) on the outcome of COPD, adjusting for just 7 variables. Unfortunately, although the dataset that I've been given is quite large (12k participants), among the 13 sites, the amount of reported TB cases can be very low, in some cases <10. Therefore, to get the site-specific 95% CI, my advisor (who is an R man, hence this posting), suggested I bootstrap the mixed-effects regression I'm using, with the reps equal to the site N. That all sounded great, but when I test out my code even for 11 reps, let alone the nearly 2,000 I'll have to do for some sites, it takes at least 20 minutes. Part of what I think is slowing down the bootstrapping is that for some of the sites, there are repetitions that fail (a small red x displays instead of the dot charting rep progress), I think because there were not enough tb cases in the random sample.
I would ideally like to get the OR and 95% CI from each of the covariates, but if push came to shove, I would settle simply for the TB variable.
I am using Stata 14.2 I/C and the melogit package.
Primary issue: What can I do to speed up the code?
Secondary Issue: What is the best way to pull out the 5th and 95th values for the ORs that I need? I'm currently using estat bootstrap, percentile, but that also generates the following error, breaking up my dofile.
N.B. I shortened names of variables above to have it all on one line, site=site_city
I think this is because the random intercept for the site variable is not being stored through estat, but I also haven't been able to find where it might be in the Stata file on melogit, if it is stored at all.
Thank you for the time, I hope to hear from you soon!
Best,
Reuben
I am a first-time poster, so please bear with me.
Context:
I am conducting an analysis where I have 13 sites, and am conducting a risk factor analysis of tuberculosis (TB) on the outcome of COPD, adjusting for just 7 variables. Unfortunately, although the dataset that I've been given is quite large (12k participants), among the 13 sites, the amount of reported TB cases can be very low, in some cases <10. Therefore, to get the site-specific 95% CI, my advisor (who is an R man, hence this posting), suggested I bootstrap the mixed-effects regression I'm using, with the reps equal to the site N. That all sounded great, but when I test out my code even for 11 reps, let alone the nearly 2,000 I'll have to do for some sites, it takes at least 20 minutes. Part of what I think is slowing down the bootstrapping is that for some of the sites, there are repetitions that fail (a small red x displays instead of the dot charting rep progress), I think because there were not enough tb cases in the random sample.
I would ideally like to get the OR and 95% CI from each of the covariates, but if push came to shove, I would settle simply for the TB variable.
I am using Stata 14.2 I/C and the melogit package.
Code:
bootstrap if site==6,reps(1748) seed(1989): melogit copd tb age sex bmi pack educ fuel || site: ,or estat bootstrap, percentile
Secondary Issue: What is the best way to pull out the 5th and 95th values for the ORs that I need? I'm currently using estat bootstrap, percentile, but that also generates the following error, breaking up my dofile.
equation [var(_cons[site_city] not found
r(303);
r(303);
I think this is because the random intercept for the site variable is not being stored through estat, but I also haven't been able to find where it might be in the Stata file on melogit, if it is stored at all.
Thank you for the time, I hope to hear from you soon!
Best,
Reuben
Comment