Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Simple Bootstrap Regression: proving not so simple

    Dear StataList Community,

    I have found myself in a peculiar situation while analyzing survey data for my thesis and would love the advice of you, mes amis, on how best to proceed.

    My dataset consists of a survey in which numeric variables track the progress of certain indicators over the course of a treatment. There is treatment group and a control group. The observations for the Treatment Group and Control Group are listed in the same column for Before Treatment and After Treatment.

    In other words, one can compare the numerical differences (ex. difference of means) between the treatment group and the control group as "before vs. after."

    I ran a series of simple regressions and the results confirmed that over time, only the treatment group experienced a change in each variable outcome.

    However, I am hoping to gain more robust results by bootstrapping each regression to have an average regression coefficient over 1000 repetitions (ex. m = 2, so an increase of 2 units for every increase of 1 unit of the treatment). In other words: randomly selecting 50 observations with replacement from the treatment group and control group respectively in the Before Treatment column and in the After Treatment column, and running a regression with those two groups of observations 1000 times.

    I am met with r(199) "command _prefix_getmat is unrecognized" error every time, both in trying to manually code the bootstrap regression and when using Stata's bootstrap command.

    There is clearly something missing/wrong in my syntax, but I don't understand why Stata can regress these variables with the dummy and not bootstrap them. I have attached a do.file with the code I have tried to develop thus far.

    Please don't hesitate to let me know if further clarification is needed.
    Merci beaucoup!
    Attached Files

  • #2
    First, bootstrapping is not a magic bullet that will make your results "more robust" by default. If some parametric assumptions do not hold, it makes sense to check them with bootstrapping, but is this really the case here? Also, keep in mind that bootstrapping will never alter your point estimates, only your inference.
    Second, simply use the bootstrap vce for regressions like:

    Code:
    regress C1post treatment C1pre, vce(bootstrap, reps(1000))
    Third, why 50 observations? This is probably not the way to go. Use the original sample size (this is the default).
    Best wishes

    (Stata 16.1 MP)

    Comment


    • #3
      Anonymous Croissant
      1) as you may already know, real name and family names are preferred on this forum (for reason well explained in the FAQ). As you're free to conceal your identity (for tons of legal reasons), others are free to ignore your posts as you decided not to abide by the rules of this forum;
      2) as per FAQ agan, posters are kindly requested to share what they typed and what Stata gave them back. This good habit is not only more efficient that tons of words aimed at explaining what the issue is, but can also increase the chance of getting (more) helpful replies.
      From your description, I get that you probably have problems applying bootstrap to a difference in difference regresssion and your .do file is not that helpful, since we do not an excerpt/example of your data (that you can share via -dataex-) to run it on.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Apart from the very useful guidance by Felix and Carlo, Felix is not quite right: the nonparametric bootstrap that Stata does automatically does result in different estimates on every run.

        OP can also check whether what he/she/zee wants to do is not -premute- of the treatment variable, and check here on Statalist for keywords like 'placebo test' , 'permute' 'difference in difference', 'DID' etc.

        Comment


        • #5
          Apart from the very useful guidance by Felix and Carlo, Felix is not quite right: the nonparametric bootstrap that Stata does automatically does result in different estimates on every run.

          OP can also check whether what he/she/zee wants to do is not -permute- of the treatment variable, and check here on Statalist for keywords like 'placebo test' , 'permute' 'difference in difference', 'DID' etc.

          Comment


          • #6
            Originally posted by Joro Kolev View Post
            Apart from the very useful guidance by Felix and Carlo, Felix is not quite right: the nonparametric bootstrap that Stata does automatically does result in different estimates on every run.

            OP can also check whether what he/she/zee wants to do is not -permute- of the treatment variable, and check here on Statalist for keywords like 'placebo test' , 'permute' 'difference in difference', 'DID' etc.
            Maybe I was not clear enough in my response. Yes, each bootstrap sample is always completely random and the point estimates will differ in every bootstrap sample (on average). Yet the point estimate from the original sample simply is the best estimate available. No amount of bootstrapping can improve the point estimate (and might only add bias). Bootstrapping is done to test the inference of the statistic, this is the main aim of the approach.
            Best wishes

            (Stata 16.1 MP)

            Comment

            Working...
            X