Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bootstrap Anova Confidence Intervals F statistic

    Hey everyone,

    I have the following issue. In order to validate my results in a rather low dataset, I wanted to bootstrap an Anova with the independent nominal variable comprising three categories (primorg) and the dependent variable being a continuous variable (valt_op).

    To do the bootstrap, I used the following command:

    bootstrap f=e(F), reps(10000): anova valt_op primorg

    after running this command, however, I get a confidence interval of the F-statistic that contains both negative and positive values. Yet, as far as I know, the F value cannot be negative per definition. So now I am wondering how to interpret the confidence interval. My initial idea was basically to say that if the F-value with a probability of 95% has a value that is above >1, there would be a significant difference in the variance between groups. In turn, if values below 1 are included, this cannot be confirmed. Am I right in this approach? But how do I now interpret negative values?

    is there any mistake in my calculation or thoughts?

    Thanks in advance for your help!’

    regards

  • #2
    The problem arises due to the nature of standard bootstrap SE CIs. A simple solution is to use percentile or bias-corrected CIs. For more details refer to the help files or the general bootstrap literature.


    Code:
    bootstrap f=e(F), reps(10000) seed(123): anova valt_op primorg
    estat bootstrap, bc
    Best wishes

    (Stata 16.1 MP)

    Comment


    • #3
      I think the difficulties here go beyond obtaining a CI with negative values for the F statistic, given that hypothesis tests and CIs rest on different logics.

      If the goal is to produce a randomization hypothesis test, then I would argue against the bootstrap on logical grounds. The bootstrap simulates the sampling distribution assuming that the population is like the sample. Hypothesis testing, by contrast, involves the sampling distribution assuming *the null is true* generally very different than the distribution obtained assuming the population is like the sample.

      The appropriate randomization analogue to an hypothesis test based on F would be a permutation test, e.g.:
      Code:
      permute primorg f= e(F), reps(10000): anova valt_op i.primorg
      (This involves "shuffling" of the "primorg" variable, thereby simulating the distribution of what would be expected to occur if there were no systematic relation between it and "valt_op.") It would be interesting to see how different a result this gives from what the proposed attempt to bootstrap a p-value would give in this particular situation.

      If, on the other hand, a CI rather than a test is of interest (my preference), then I'd define some estimates of interest (e.g., differences in category means), and bootstrap them. I rarely use -anova-, but it appears that by default it puts the sample differences of means vs. the first category mean into the _b matrix, so one could do something like the following:
      Code:
      bootstrap b21 = _b[2.primorg] b31 = _b[3.primorg], reps(10000): anova valt_op i.primorg

      Comment


      • #4
        Felix Bittmann Thank you for than hint!
        I checked in the help file and indeed I now do get a solely positive CI. Yet, I don’t really understand what the bias corrected CI does and how to interpret it, compared to the “normal” CI? So is it in any case more reliable?

        regards

        Comment


        • #5
          Mike Lacy Also thank you, mike for your help!
          i now also checked the result when computing the permits command. Yet I am not sure what the results is going to tell me.
          so my F statistic of the initial sample is F=0.40, p=0.67.
          After permutation, I get an F value (Tobs) =0.39.
          The corresponding lower one-sided p-value is p=0.31 and has a 95% CI of [0.31, 0.32].
          The upper one sided p-value is p=0.69 with a CI of [0.68;0.69].
          The two-sided p-value is p=0.63 with a CI of [0.62; 0.64].
          so what I understood is now that this means e.g. that the F statistic in 31% of the cases falls below the value of 0.39 and in 68% of the cases above that value, if assuming that no difference exists in the permuted samples. Am I right?
          but what does that tell me now exactly? I mean based on the original sample, I would not be able to reject the null hypothesis (no difference between groups). So how do I confirm this with the results of the permutation?
          In contrary, if assuming we would have been indeed able to reject H0, what values would then be expected in the permutation to appear?

          Thanks for your response!! Looking forward.
          Last edited by Thrisa ml; 06 Feb 2021, 03:39.

          Comment


          • #6
            Thrisa:
            another approach consists in a bootstrap anova (focused on r(F)), that is detailed in the following toy-example:
            Code:
            . use "C:\Program Files\Stata16\ado\base\a\auto.dta"
            (1978 Automobile Data)
            
            . oneway price rep78 , bonferroni tabulate
            
                 Repair |          Summary of Price
            Record 1978 |        Mean   Std. Dev.       Freq.
            ------------+------------------------------------
                      1 |     4,564.5   522.55191           2
                      2 |   5,967.625   3,579.357           8
                      3 |   6,429.233    3,525.14          30
                      4 |     6,071.5   1,709.608          18
                      5 |       5,913   2,615.763          11
            ------------+------------------------------------
                  Total |   6,146.043    2,912.44          69
            
                                    Analysis of Variance
                Source              SS         df      MS            F     Prob > F
            ------------------------------------------------------------------------
            Between groups      8360542.63      4   2090135.66      0.24     0.9174
             Within groups       568436416     64      8881819
            ------------------------------------------------------------------------
                Total            576796959     68   8482308.22
            
            Bartlett's test for equal variances:  chi2(4) =  11.4252  Prob>chi2 = 0.022
            
                              Comparison of Price by Repair Record 1978
                                            (Bonferroni)
            Row Mean-|
            Col Mean |          1          2          3          4
            ---------+--------------------------------------------
                   2 |    1,403.1
                     |      1.000
                     |
                   3 |    1,864.7    461.608
                     |      1.000      1.000
                     |
                   4 |      1,507    103.875   -357.733
                     |      1.000      1.000      1.000
                     |
                   5 |    1,348.5    -54.625   -516.233     -158.5
                     |      1.000      1.000      1.000      1.000
            
            . scalar Fobs = r(F)
            
            . quietly summarize  price if  rep78==1 , mean
            
            . replace  price =  price-r(mean) + 6146.043 if rep78==1
            
            . quietly summarize  price if  rep78==2 , mean
            
            . replace  price =  price-r(mean) + 6146.043 if rep78==2
            
            . quietly summarize  price if  rep78==3 , mean
            
            . replace  price =  price-r(mean) + 6146.043 if rep78==3
            
            . quietly summarize  price if  rep78==4 , mean
            
            . replace  price =  price-r(mean) + 6146.043 if rep78==4
            
            . quietly summarize  price if  rep78==5 , mean
            
            . replace  price =  price-r(mean) + 6146.043 if rep78==5
            
            
            . bootstrap r(F), reps(1000) strata(rep78) saving(C:\Users\user\Desktop\carlo_F.dta, every(1) double replace) bca ties nodots : oneway p
            > rice rep78 , bonferroni tabulate
            
            warning: Because oneway is not an estimation command or does not set e(sample), bootstrap has no way to determine which observations
                     are used in calculating the statistics and so assumes that all observations are used. This means that no observations will be
                     excluded from the resampling because of missing values or other reasons.
            
                     If the assumption is not true, press Break, save the data, and drop the observations that are to be excluded. Be sure that
                     the dataset in memory contains only the relevant data.
            
            Bootstrap results
            
            Number of strata   =         5                  Number of obs     =         69
                                                            Replications      =      1,000
            
                  command:  oneway price rep78, bonferroni tabulate
                    _bs_1:  r(F)
            
            ------------------------------------------------------------------------------
                         |   Observed   Bootstrap                         Normal-based
                         |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                   _bs_1 |   3.36e-15   .6453927     0.00   1.000    -1.264947    1.264947
            ------------------------------------------------------------------------------
            
            . use "C:\Users\user\Desktop\carlo_F.dta"
            (bootstrap: oneway)
            
            . count if_bs_1>0.24 ///*the original r(F)*
              810
            
            . di 810/1000
            .81                ///the bootstrap p-avlue, that confirms no rejection of the null
                  /// (null=no difference among the means)
            
            
            .
            Kind regards,
            Carlo
            (Stata 18.0 SE)

            Comment


            • #7
              Carlo Lazzaro thank You for your response!
              could you please elaborate a bit, what is the underlying thought of that approach and how does it differ from the ones of bootstrap or permutation without adjusting the mean values as proposed in your code? So what are the preconditions or reasons to use your solution instead of another ?

              best regards

              Comment


              • #8
                Thrisa:
                first, when it comes to resampling (that is most of the times non-parametric) it is difficult to justify why a given approach outperforms the other options (unless you can contrast it against its paranetric counterpart).
                Sticking with my previous reply, the underlying reason is to fulfill the same requirements of a boostrap -ttest- when you have more than two groups.
                The basic idea is to focus on the bootstrap p-value for r(F) instead of the CI for the same statistic (and therefore avoid the nuisance of negative value of the 95% CI for r(F), that are meaningless for the F-distribution, as it is defined on the 0-+infinitive interval; obviously, letting the data speaking for themselves, the non-parametric bootstrap does not know the interval on which a given theoretical probability distribution is defined) .
                The idea originally elaborated on the following article: https://pubmed.ncbi.nlm.nih.gov/10180748/.
                Kind regards,
                Carlo
                (Stata 18.0 SE)

                Comment


                • #9
                  Originally posted by Thrisa ml View Post
                  Felix Bittmann Thank you for than hint!
                  I checked in the help file and indeed I now do get a solely positive CI. Yet, I don’t really understand what the bias corrected CI does and how to interpret it, compared to the “normal” CI? So is it in any case more reliable?

                  regards
                  There is no simple answer to this question and four different kinds are implemented in Stata for reasons. Some work better than others in some situations. However, we can rule out the standard SEs since they produce negative and hence impossible values, this should be avoided of course. I suggest to compute them all and compare. If the other 3 are fairly similar to each other you can either report all or select one.
                  Like:

                  Code:
                  bootstrap f=e(F), reps(10000) seed(123) bca: anova valt_op primorg
                  estat bootstrap, all
                  However, as the other very interesting comments in this thread have pointed out, you are faced with various other questions and this goes far beyond a simple CI selection. I would suggest to dig deeper into the theory so you get a better understanding what resampling techniques are and which one might be best for your purpose.
                  Best wishes

                  (Stata 16.1 MP)

                  Comment


                  • #10
                    Thrisa ml , referring back to #5 here: If you would show the -permute- command you used, and the actual output, I could comment on the meaning and your interpretation. From your description, I'm uncertain what you actually got as results, but I would say that there seems to be some misunderstanding or mis-emphasis in your interpretation as you describe it here.

                    Comment


                    • #11
                      Mike Lacy Thank you for your reply!

                      Below is an example of the code and respective output that I get. So I was wondering what the Monte Carlo permutation results, especially the p-vales and CIs, finally tell me.
                      So how can I confirm the result that I get when doing the t-test on the inital dataset?

                      Hope, it helps to better understand my issue.

                      Thanks in advance


                      T-test result on initial dataset:
                      Click image for larger version

Name:	Unbenannt.PNG
Views:	1
Size:	48.6 KB
ID:	1593172


                      Input for permutation:
                      set seed 1234
                      permute aum_cat t=r(t), reps(10000): ttest cap_valt, by(aum_cat)

                      Output:
                      Click image for larger version

Name:	Unbenannt2.PNG
Views:	1
Size:	55.5 KB
ID:	1593171

                      Comment


                      • #12
                        I would regard the permutation tests as the "true" result that the t-test attempts to approximate, under asymptotic conditions.

                        Strictly speaking, you could say this: "If observations were randomly shuffled across categories of aum_cat, and the t-value calculated on each such shuffled sample, 21.84% of time a t-value would occur that is as big or bigger than that which occurred in the original sample." The conclusion would be, then, that your observed data only weakly contradicts the null hypothesis of no population difference between the means from the categories of aum_cat in relation to the alternative hypothesis that the population mean in the "small" category exceeds that in the "large" category. A similar interpretation would apply if you wanted a two-sided test.

                        It's worth noting here that the conventional t-test gives a p-value of 0.2186 that is very close to that given by the permutation test. With reasonable sized samples, as you have here, the t-test often --but not always-- gives valid results even when its usual assumptions aren't met.

                        Comment

                        Working...
                        X