Confidence interval around mean of a set of regression coefficients

Molly King

Join Date: Feb 2015

Posts: 4
#1

Confidence interval around mean of a set of regression coefficients

29 Jan 2018, 15:27

Hello everyone,

I have a question about the best way to find 95% confidence intervals around the mean of a set of logistic regression coefficients.

I am using Stata v13 (though I do have access to v15 if it will solve this problem).

I ran logit on 341 binary variables total across 48 datasets. These binary variables are divided into theoretically-derived domains, let's call them A, B, and C (each with, say, about 100 variables in it).

My dependent variables are predicted by dependent variables of gender, race (black, asian, hispanic) and education (3 categories), income (continuous) and age (continuous). I am interested in the gender, race, and income coefficients (5 coefficients for each of 341 dependent variables). My overall goal is to get a 95% confidence interval for the global mean of each of several "domains" (A, B and C, separately) for each of these 5 coefficients of interest. Of course, some of these coefficients are themselves significant, and others are not. In a sense, this is a bit like meta-analysis, but as if each of the studies had been performed identically.

Resampling at the level of my original surveys (going back to the original 48 datasets) using svy bootstrap seems unnecessary, though I could be convinced otherwise.

So, I believe the best way to approach this (statistically) is to bootstrap each SE as if it were drawn from its own normal distribution around the B coefficient for each regression. In other words, for each of the (341*5=) 1705 coefficients of interest, bootstrap resample the SEs around that mean value (within a given "domain" of A, B, or C) and use those to derive the sampling space for the overall confidence intervals within that domain/coefficient pairing.

Here is what I have so far, using the bootstrap command on data compiling all regression coefficients after controlling for multiple comparisons.

Code:

foreach q of varlist question { // loop through each variable bootstrap se_`q' = r(se), /// seed(1) /// rep(200) /// 200 repetitions trace /// saving($temp/ki08_vis01_masterTableLogOdds`q'_boot.txt): /// summarize LogOdds } // end of loop through all domains in data set

When I run this code, I receive the error " command not found r(111);"

Here is some example data, based on the compiled dataset I have of all regression coefficients:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str17 dem_var str13 question byte dataset float LogOdds byte(dem_var_lab domain) float p_value byte pick "dB_rblack" "c_worde" 1 .0162122 7 1 .9320659 1 "dB_fem" "c_inf14r" 3 -.7050021 10 3 5.86e-06 1 "dB_fem" "c_mortg15u" 5 -.6026705 10 3 .0107125 1 "dB_rblack" "c_ssclaim" 6 -.5507009 7 2 .0327299 1 "dB_rblack" "c_bond12f" 8 -.4074745 7 2 7.56e-10 1 "dB_rhisp" "c_inf15f" 9 -.7526921 8 2 2.65e-35 1 "dV_finc_ln15dl_1k" "c_wincdisc" 12 -.0692395 11 1 .4303051 1 "dB_rhisp" "c_bkmrmn" 17 -.7201893 8 2 .0031061 1 "dB_rhisp" "c_salvat" 17 -.0469491 8 3 .862355 1 "dB_fem" "c_billright" 40 -.8137815 10 1 .0000453 1 "dB_rasian" "c_cheney" 44 -.8320684 6 3 .3538574 1 "dB_rasian" "c_roberts" 44 -.8772344 6 3 .1400249 1 "dV_finc_ln15dl_1k" "c_roberts" 44 .2653895 11 3 .1446916 1 "dB_rhisp" "c_killed11" 68 -.4540319 8 1 .0228193 1 "dV_finc_ln15dl_1k" "c_poverty" 73 .068696 11 2 .5121317 1 end label values dem_var_lab dem_var_label label def dem_var_label 6 "Asian / Asian American", modify label def dem_var_label 7 "Black / African American", modify label def dem_var_label 8 "Hispanic", modify label def dem_var_label 10 "Woman", modify label def dem_var_label 11 "% Change Fam Income", modify label values domain domains label def domains 1 "A", modify label def domains 2 "B", modify label def domains 3 "C", modify

Any insight on the most appropriate way to code resampling for SEs / 95% confidence interval for a mean value of a set of logistic regression coefficients? Thank you.
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

30 Jan 2018, 11:00

You didn't get a quick response. You'll increase your chances of a helpful response by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. Also simplify what you post to the minimum needed to specify your problem. There is almost no possible reason we would care about your labels for for example - they just add clutter.

I don't know about the bootstrap approach, but it you have a bunch of parameters with standard errors, you should be able to simply calculate the mean of the - look at an intro statistics text. So, you could just save the parameters and standard errors and work with that.

Regarding your actual error (which is a totally different issue), when I run it I get

(running summarize on estimation sample)
'r(se)' evaluated to missing in full sample
r(322); t=0.02 9:53:17

which is not the error you have. Obviously, you've generated an r(se) somewhere else. But without it, I cannot generate the error you're asking about. That makes it hard for me to help you.

The first thing I'd do would be to make sure the statements run properly without the loop. Just put the variable name in as you think it should be.

Next use di " `local macro'" (insert your local macro names for local macro) to see that the macro contains what you intend - put this right before every time you invoke a macro. So I'd put it before the bootstrap and before the saving statements.

If that doesn't show you the problem, I'd make sure the statements run properly without the loop.

Next, I'd put "set trace on" before starting the loop and see what it tells you.
Comment
Molly King

Join Date: Feb 2015

Posts: 4
#3

21 Feb 2018, 10:52

Thank you very much, I appreciate your help, Professor Bromiley.
Comment

Announcement

Confidence interval around mean of a set of regression coefficients

Comment

Comment