Bootstrap the difference between means for two groups

Hanna Gyllensten

Join Date: Sep 2016

Posts: 19
#1

Bootstrap the difference between means for two groups

07 Jun 2017, 08:59

Hi,

I'm sorry if this is an already responded to question that I have just been unable to find in the list, but I have searched the Stata manual, this list and many other sources for several days and not been able to find a response that I could adjust to my situation.

I need to find a way to bootstrap the mean and 95% confidence intervall for a difference in a (cost) variable between two groups (usual treatment vs intervention group, unmatched and different size groups), i.e., a bootstrapped alternative to the difference-results provided after a t-test. Thus, I do not want to test the difference, which I think I have found a code for.

I have understod that I need to create an ado-file for this, but I have so far got stuck on how to create the code in that ado-file, and what code should remain in the do-file. Moreover, I will need to do this for several (many) cost-variables, what I am hoping is that it will be possible to create an ado-file that can be re-used for many variables, but so far the suggestions I have found for this type of code (e.g., http://www.stata.com/statalist/archi.../msg00123.html and several other similar examples) have used the variable names. The only exception has been the code provided in the Stata manual (example 4 under Bootstrap is for bootstrapping a ratio, so similar but not exactly what I need, uses ‘y’ and ‘x’ for the variables, but maybe this is just to clarify what is what), but instead that code is so different in how it is written so that I end up getting stuck on trying to figure out how to compare between the codes from different sources.

* Does anyone have suggestions for how to write such a code for getting a bootstraped version of this difference in means and 95% confidence intervall?
* Can I instruct such an ado-program to use alternative y-variables (or do I have to to all of this in a loop that changes my variable names before calling each y-variable and making the calculations)?
* How do I call this program from my do-file (where does the code in the do-file "start")?
(* An additional question is if this also will work when the dataset incorporates multiple imputation results, which I will need to handle eventually?)

Would be happy to get any suggestions on how to handle this since I have really got stuck.

Kind regards,
Hanna
Tags: between group difference, bootstrap, confidence intervals, loop, mean difference
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#2

07 Jun 2017, 09:27

maybe I don't understand but I don't see any reason for an ado-file; why not just treat this as a regression where the only predictor (other than the constant) is a 0/1 coded indicator (dummy) variable and bootstrap that regression? see

Code:

help bootstrap

and the examples in the help file
Comment
Hanna Gyllensten

Join Date: Sep 2016

Posts: 19
#3

07 Jun 2017, 09:52

Do you mean something like example 1 in the help text:
"bootstrap, reps(100) seed(1): regress mpg weight gear foreign"

So, my code: bootstrap, reps(1000) seed(56787): regress total_cost ikny_merge
Resulting in:
Linear regression Number of obs = 92
Replications = 1,000
Wald chi2(1) = 0.10
Prob > chi2 = 0.7508
R-squared = 0.0012
Adj R-squared = -0.0099
Root MSE = 2.480e+05

------------------------------------------------------------------------------
| Observed Bootstrap Normal-based
total_cost | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ikny_merge | -17274.34 54400.95 -0.32 0.751 -123898.2 89349.56
_cons | 539820.5 93300.08 5.79 0.000 356955.7 722685.3
------------------------------------------------------------------------------

Should mean that going from group 1 to group 2 mean on average SEK 17,274 lower cost (not adjusting for other differences between groups)? Do I understand the printout correctly, do you think?

Well, that would certainly make things easier. I have been searching maybe 15 hours for a solution to this, but all the discussions/comments/questions I found were far too complicated for my coding knowledge.

Kind regards,
Hanna
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#4

07 Jun 2017, 11:42

is "ikny_merge" a 0/1 variable? if yes, then I think you have what you want, but your output is hard to read; in the future please post such results within CODE blocks (see the FAQ for more on this);
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#5

07 Jun 2017, 11:46

Hanna:
there's an interesting article about bootstrap applied to cost difference in healthcare programmes (maybe you're already aware of): https://www.ncbi.nlm.nih.gov/pubmed/10180748.

In addition to Rich's helpful advice, you may want to estimate a -bootstrap-ped difference of the means (and related 95% CIs) using -ttest- as a trigger but without imposing the equality of the means to inestigate the statistical significance of that difference (as per https://www.ncbi.nlm.nih.gov/pubmed/10180748).

Code:

. sysuse auto.dta (1978 Automobile Data) . ttest price, by(foreign) unequal . scalar A=r(mu_1) . scalar B=r(mu_2) bootstrap (r(mu_1)-r(mu_2)), reps(10000) strata(foreign) bca ties nodots : ttest price, by(foreign) unequal . estat bootstrap, all

Kind regards,
Carlo
(Stata 19.0)
Comment
Hanna Gyllensten

Join Date: Sep 2016

Posts: 19
#6

07 Jun 2017, 13:04

Thanks a lot for the very helpful advice from you both!

Yes, I have that reference, but thanks for the suggestion.
Sorry for the difficult printout, I'll make sure to learn how to do it correctly before next time. And it is such a variable (but 1/2 for some reason, and I have not bothered to change it).
Comment
Ali Malik

Join Date: Jul 2018

Posts: 23
#7

05 Aug 2018, 11:28

Hi everyone,

I am facing a problem that is in some ways similar to this problem. However, instead of looking at the difference in the mean I am interested in the difference in R2 between two subsamples of my dataset. The ultimate goal is to test for difference in R2 between the subsamples.

Here you can find my post:
https://www.statalist.org/forums/for...oss-subsamples

Any help or suggestions that would help me better understand and solve the problem would be much appreciated.

Many thanks, Ali
Comment
ericmelse

Join Date: May 2014

Posts: 434
#8

06 Aug 2018, 01:27

Possibly, this 2018 Stata Conference presentation of Phil Ender is (also) helpful:

This presentation will use Stata's bayesmh command to perform a two-sample independent t test. We will discuss the advantages of using a Bayesian approach to perform t test-type analyses and compare the output or results with the traditional frequentist t test.

Phil Ender refers to this paper of Kruschke, J.K. (2012). Bayesian Estimation Supersedes the t Test. Journal of Experimental Psychology, 142(2), 573-603.
And more background was published in Kruschke, J. K. and Liddell, T. M. (2017). The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychonomic Bulletin & Review, 25, 178-206.

http://publicationslist.org/eric.melse
Comment

Announcement

Bootstrap the difference between means for two groups

Comment

Comment

Comment

Comment

Comment

Comment

Comment