Hello everyone,
I have a question about the best way to find 95% confidence intervals around the mean of a set of logistic regression coefficients.
I am using Stata v13 (though I do have access to v15 if it will solve this problem).
I ran logit on 341 binary variables total across 48 datasets. These binary variables are divided into theoretically-derived domains, let's call them A, B, and C (each with, say, about 100 variables in it).
My dependent variables are predicted by dependent variables of gender, race (black, asian, hispanic) and education (3 categories), income (continuous) and age (continuous). I am interested in the gender, race, and income coefficients (5 coefficients for each of 341 dependent variables). My overall goal is to get a 95% confidence interval for the global mean of each of several "domains" (A, B and C, separately) for each of these 5 coefficients of interest. Of course, some of these coefficients are themselves significant, and others are not. In a sense, this is a bit like meta-analysis, but as if each of the studies had been performed identically.
Resampling at the level of my original surveys (going back to the original 48 datasets) using svy bootstrap seems unnecessary, though I could be convinced otherwise.
So, I believe the best way to approach this (statistically) is to bootstrap each SE as if it were drawn from its own normal distribution around the B coefficient for each regression. In other words, for each of the (341*5=) 1705 coefficients of interest, bootstrap resample the SEs around that mean value (within a given "domain" of A, B, or C) and use those to derive the sampling space for the overall confidence intervals within that domain/coefficient pairing.
Here is what I have so far, using the bootstrap command on data compiling all regression coefficients after controlling for multiple comparisons.
When I run this code, I receive the error " command not found r(111);"
Here is some example data, based on the compiled dataset I have of all regression coefficients:
Any insight on the most appropriate way to code resampling for SEs / 95% confidence interval for a mean value of a set of logistic regression coefficients? Thank you.
I have a question about the best way to find 95% confidence intervals around the mean of a set of logistic regression coefficients.
I am using Stata v13 (though I do have access to v15 if it will solve this problem).
I ran logit on 341 binary variables total across 48 datasets. These binary variables are divided into theoretically-derived domains, let's call them A, B, and C (each with, say, about 100 variables in it).
My dependent variables are predicted by dependent variables of gender, race (black, asian, hispanic) and education (3 categories), income (continuous) and age (continuous). I am interested in the gender, race, and income coefficients (5 coefficients for each of 341 dependent variables). My overall goal is to get a 95% confidence interval for the global mean of each of several "domains" (A, B and C, separately) for each of these 5 coefficients of interest. Of course, some of these coefficients are themselves significant, and others are not. In a sense, this is a bit like meta-analysis, but as if each of the studies had been performed identically.
Resampling at the level of my original surveys (going back to the original 48 datasets) using svy bootstrap seems unnecessary, though I could be convinced otherwise.
So, I believe the best way to approach this (statistically) is to bootstrap each SE as if it were drawn from its own normal distribution around the B coefficient for each regression. In other words, for each of the (341*5=) 1705 coefficients of interest, bootstrap resample the SEs around that mean value (within a given "domain" of A, B, or C) and use those to derive the sampling space for the overall confidence intervals within that domain/coefficient pairing.
Here is what I have so far, using the bootstrap command on data compiling all regression coefficients after controlling for multiple comparisons.
Code:
foreach q of varlist question { // loop through each variable bootstrap se_`q' = r(se), /// seed(1) /// rep(200) /// 200 repetitions trace /// saving($temp/ki08_vis01_masterTableLogOdds`q'_boot.txt): /// summarize LogOdds } // end of loop through all domains in data set
Here is some example data, based on the compiled dataset I have of all regression coefficients:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str17 dem_var str13 question byte dataset float LogOdds byte(dem_var_lab domain) float p_value byte pick "dB_rblack" "c_worde" 1 .0162122 7 1 .9320659 1 "dB_fem" "c_inf14r" 3 -.7050021 10 3 5.86e-06 1 "dB_fem" "c_mortg15u" 5 -.6026705 10 3 .0107125 1 "dB_rblack" "c_ssclaim" 6 -.5507009 7 2 .0327299 1 "dB_rblack" "c_bond12f" 8 -.4074745 7 2 7.56e-10 1 "dB_rhisp" "c_inf15f" 9 -.7526921 8 2 2.65e-35 1 "dV_finc_ln15dl_1k" "c_wincdisc" 12 -.0692395 11 1 .4303051 1 "dB_rhisp" "c_bkmrmn" 17 -.7201893 8 2 .0031061 1 "dB_rhisp" "c_salvat" 17 -.0469491 8 3 .862355 1 "dB_fem" "c_billright" 40 -.8137815 10 1 .0000453 1 "dB_rasian" "c_cheney" 44 -.8320684 6 3 .3538574 1 "dB_rasian" "c_roberts" 44 -.8772344 6 3 .1400249 1 "dV_finc_ln15dl_1k" "c_roberts" 44 .2653895 11 3 .1446916 1 "dB_rhisp" "c_killed11" 68 -.4540319 8 1 .0228193 1 "dV_finc_ln15dl_1k" "c_poverty" 73 .068696 11 2 .5121317 1 end label values dem_var_lab dem_var_label label def dem_var_label 6 "Asian / Asian American", modify label def dem_var_label 7 "Black / African American", modify label def dem_var_label 8 "Hispanic", modify label def dem_var_label 10 "Woman", modify label def dem_var_label 11 "% Change Fam Income", modify label values domain domains label def domains 1 "A", modify label def domains 2 "B", modify label def domains 3 "C", modify
Comment