Simple Bootstrap Regression: proving not so simple

Anonymous Croissant

Join Date: Jan 2023

Posts: 2
#1

Simple Bootstrap Regression: proving not so simple

24 Jan 2023, 09:49

Dear StataList Community,

I have found myself in a peculiar situation while analyzing survey data for my thesis and would love the advice of you, mes amis, on how best to proceed.

My dataset consists of a survey in which numeric variables track the progress of certain indicators over the course of a treatment. There is treatment group and a control group. The observations for the Treatment Group and Control Group are listed in the same column for Before Treatment and After Treatment.

In other words, one can compare the numerical differences (ex. difference of means) between the treatment group and the control group as "before vs. after."

I ran a series of simple regressions and the results confirmed that over time, only the treatment group experienced a change in each variable outcome.

However, I am hoping to gain more robust results by bootstrapping each regression to have an average regression coefficient over 1000 repetitions (ex. m = 2, so an increase of 2 units for every increase of 1 unit of the treatment). In other words: randomly selecting 50 observations with replacement from the treatment group and control group respectively in the Before Treatment column and in the After Treatment column, and running a regression with those two groups of observations 1000 times.

I am met with r(199) "command _prefix_getmat is unrecognized" error every time, both in trying to manually code the bootstrap regression and when using Stata's bootstrap command.

There is clearly something missing/wrong in my syntax, but I don't understand why Stata can regress these variables with the dummy and not bootstrap them. I have attached a do.file with the code I have tried to develop thus far.

Please don't hesitate to let me know if further clarification is needed.
Merci beaucoup!
Attached Files

Bootstrap Code.do (1.6 KB, 1 view)
Tags: None
Felix Bittmann

Join Date: Aug 2018

Posts: 664
#2

24 Jan 2023, 10:05

First, bootstrapping is not a magic bullet that will make your results "more robust" by default. If some parametric assumptions do not hold, it makes sense to check them with bootstrapping, but is this really the case here? Also, keep in mind that bootstrapping will never alter your point estimates, only your inference.
Second, simply use the bootstrap vce for regressions like:

Code:

regress C1post treatment C1pre, vce(bootstrap, reps(1000))

Third, why 50 observations? This is probably not the way to go. Use the original sample size (this is the default).

Best wishes

(Stata 16.1 MP)
2 likes
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17679
#3

24 Jan 2023, 10:12

Anonymous Croissant
1) as you may already know, real name and family names are preferred on this forum (for reason well explained in the FAQ). As you're free to conceal your identity (for tons of legal reasons), others are free to ignore your posts as you decided not to abide by the rules of this forum;
2) as per FAQ agan, posters are kindly requested to share what they typed and what Stata gave them back. This good habit is not only more efficient that tons of words aimed at explaining what the issue is, but can also increase the chance of getting (more) helpful replies.
From your description, I get that you probably have problems applying bootstrap to a difference in difference regresssion and your .do file is not that helpful, since we do not an excerpt/example of your data (that you can share via -dataex-) to run it on.

Kind regards,
Carlo
(Stata 19.0)
2 likes
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#4

25 Jan 2023, 02:15

Apart from the very useful guidance by Felix and Carlo, Felix is not quite right: the nonparametric bootstrap that Stata does automatically does result in different estimates on every run.

OP can also check whether what he/she/zee wants to do is not -premute- of the treatment variable, and check here on Statalist for keywords like 'placebo test' , 'permute' 'difference in difference', 'DID' etc.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#5

25 Jan 2023, 02:15

Apart from the very useful guidance by Felix and Carlo, Felix is not quite right: the nonparametric bootstrap that Stata does automatically does result in different estimates on every run.

OP can also check whether what he/she/zee wants to do is not -permute- of the treatment variable, and check here on Statalist for keywords like 'placebo test' , 'permute' 'difference in difference', 'DID' etc.
1 like
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 664
#6

25 Jan 2023, 03:41

Originally posted by Joro Kolev View Post

Apart from the very useful guidance by Felix and Carlo, Felix is not quite right: the nonparametric bootstrap that Stata does automatically does result in different estimates on every run.

OP can also check whether what he/she/zee wants to do is not -permute- of the treatment variable, and check here on Statalist for keywords like 'placebo test' , 'permute' 'difference in difference', 'DID' etc.

Maybe I was not clear enough in my response. Yes, each bootstrap sample is always completely random and the point estimates will differ in every bootstrap sample (on average). Yet the point estimate from the original sample simply is the best estimate available. No amount of bootstrapping can improve the point estimate (and might only add bias). Bootstrapping is done to test the inference of the statistic, this is the main aim of the approach.

Best wishes

(Stata 16.1 MP)
2 likes
Comment

Announcement

Simple Bootstrap Regression: proving not so simple

Comment

Comment

Comment

Comment

Comment