Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logistic regression with bottstrap

    Good morning everyone

    I'm trying to run a logistic regression. My sample is unbalanced (80-20), for that reason a used bootstrapping command with my dependend variable as strata

    bootstrap , reps(3000) strata (aentrada3) seed(859): logistic aentrada3 i.asexo1 aedad1 aanalfabeta asinactiv azona1 aaƱo acoca.

    then I tried to check the model with ROC curve and classification table using estat classification command.

    Now, I have four specific questios:

    1) how can I know the size of the sample taken in the bootstrap. It says that is _N by default, but if -N is my total sample, what are the sizes of the subsamples? (I have 200 (0) and 800 (1))

    2) in the classification table, what stimator is taken to compare? the mean of the 3000 reps estimators?

    3) in the classification table, with what sample compare the estimator? a mean of the 3000 samples?

    4) given the unbalanced sample, is a good threshold the 0.5 by default for the classification table? if not, how can I know the good one?

    I really appreciate your kind collaboration, please.

  • #2
    In your example, the bootstrap approach only influences the SEs of your regression results and does not have an affect on the classification table (you can easily test this yourself by running the command with and without the prefix. As a general note, for regression commands, you probably want to use the vce option instead of the prefix anyway, o try: logit ..., vce(bootstrap)


    1. The size of the bootstrap equals the total sample size (or size within a stratum). So if you have 200 and 800 in these strata, Stata first samples 200 from the first group and then 800 from the second group. These samples are taken with replacement (bootstrap).
    2. & 3. the statistics are computed from the sample you have and are not affected by the bootstrap.
    Best wishes

    (Stata 16.1 MP)

    Comment


    • #3
      Thanks Felix. I understand what you're saying.

      Maybe you know what I can do (or where I can read) if I want to take balanced subsamples for my estimation. I can take the sup-samples, but I don't know how to use them to identify only one result out of all the sup-samples.

      I need this because I consider that my results could be biased by the imbalance of the original sample.

      Comment


      • #4
        I am not sure what exactly you mean here, probably the strata option. I would argue if your sample is unbalanced use this option, but as long as you take a high number of reps this will not matter much. And what do you mean by only one result out of all sub-samples? If you want to, you can save all individual bootstrap results with the saving option.
        Best wishes

        (Stata 16.1 MP)

        Comment

        Working...
        X