Bootstrap Anova Confidence Intervals F statistic

Thrisa ml

Join Date: Feb 2021

Posts: 8
#1

Bootstrap Anova Confidence Intervals F statistic

05 Feb 2021, 10:06

Hey everyone,

I have the following issue. In order to validate my results in a rather low dataset, I wanted to bootstrap an Anova with the independent nominal variable comprising three categories (primorg) and the dependent variable being a continuous variable (valt_op).

To do the bootstrap, I used the following command:

bootstrap f=e(F), reps(10000): anova valt_op primorg

after running this command, however, I get a confidence interval of the F-statistic that contains both negative and positive values. Yet, as far as I know, the F value cannot be negative per definition. So now I am wondering how to interpret the confidence interval. My initial idea was basically to say that if the F-value with a probability of 95% has a value that is above >1, there would be a significant difference in the variance between groups. In turn, if values below 1 are included, this cannot be confirmed. Am I right in this approach? But how do I now interpret negative values?

is there any mistake in my calculation or thoughts?

Thanks in advance for your help!’

regards
Tags: None
Felix Bittmann

Join Date: Aug 2018

Posts: 702
#2

05 Feb 2021, 14:03

The problem arises due to the nature of standard bootstrap SE CIs. A simple solution is to use percentile or bias-corrected CIs. For more details refer to the help files or the general bootstrap literature.

Code:

bootstrap f=e(F), reps(10000) seed(123): anova valt_op primorg estat bootstrap, bc

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#3

05 Feb 2021, 15:16

I think the difficulties here go beyond obtaining a CI with negative values for the F statistic, given that hypothesis tests and CIs rest on different logics.

If the goal is to produce a randomization hypothesis test, then I would argue against the bootstrap on logical grounds. The bootstrap simulates the sampling distribution assuming that the population is like the sample. Hypothesis testing, by contrast, involves the sampling distribution assuming *the null is true* generally very different than the distribution obtained assuming the population is like the sample.

The appropriate randomization analogue to an hypothesis test based on F would be a permutation test, e.g.:

Code:

permute primorg f= e(F), reps(10000): anova valt_op i.primorg

(This involves "shuffling" of the "primorg" variable, thereby simulating the distribution of what would be expected to occur if there were no systematic relation between it and "valt_op.") It would be interesting to see how different a result this gives from what the proposed attempt to bootstrap a p-value would give in this particular situation.

If, on the other hand, a CI rather than a test is of interest (my preference), then I'd define some estimates of interest (e.g., differences in category means), and bootstrap them. I rarely use -anova-, but it appears that by default it puts the sample differences of means vs. the first category mean into the _b matrix, so one could do something like the following:

Code:

bootstrap b21 = _b[2.primorg] b31 = _b[3.primorg], reps(10000): anova valt_op i.primorg
4 likes
Comment
Thrisa ml

Join Date: Feb 2021

Posts: 8
#4

06 Feb 2021, 02:54

Felix Bittmann Thank you for than hint!
I checked in the help file and indeed I now do get a solely positive CI. Yet, I don’t really understand what the bias corrected CI does and how to interpret it, compared to the “normal” CI? So is it in any case more reliable?

regards
Comment
Thrisa ml

Join Date: Feb 2021

Posts: 8
#5

06 Feb 2021, 03:21

Mike Lacy Also thank you, mike for your help!
i now also checked the result when computing the permits command. Yet I am not sure what the results is going to tell me.
so my F statistic of the initial sample is F=0.40, p=0.67.
After permutation, I get an F value (Tobs) =0.39.
The corresponding lower one-sided p-value is p=0.31 and has a 95% CI of [0.31, 0.32].
The upper one sided p-value is p=0.69 with a CI of [0.68;0.69].
The two-sided p-value is p=0.63 with a CI of [0.62; 0.64].
so what I understood is now that this means e.g. that the F statistic in 31% of the cases falls below the value of 0.39 and in 68% of the cases above that value, if assuming that no difference exists in the permuted samples. Am I right?
but what does that tell me now exactly? I mean based on the original sample, I would not be able to reject the null hypothesis (no difference between groups). So how do I confirm this with the results of the permutation?
In contrary, if assuming we would have been indeed able to reject H0, what values would then be expected in the permutation to appear?

Thanks for your response!! Looking forward.

Last edited by Thrisa ml; 06 Feb 2021, 03:39.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17712

06 Feb 2021, 03:29

Thrisa:
another approach consists in a bootstrap anova (focused on r(F)), that is detailed in the following toy-example:

Code:

. use "C:\Program Files\Stata16\ado\base\a\auto.dta"
(1978 Automobile Data)

. oneway price rep78 , bonferroni tabulate

     Repair |          Summary of Price
Record 1978 |        Mean   Std. Dev.       Freq.
------------+------------------------------------
          1 |     4,564.5   522.55191           2
          2 |   5,967.625   3,579.357           8
          3 |   6,429.233    3,525.14          30
          4 |     6,071.5   1,709.608          18
          5 |       5,913   2,615.763          11
------------+------------------------------------
      Total |   6,146.043    2,912.44          69

                        Analysis of Variance
    Source              SS         df      MS            F     Prob > F
------------------------------------------------------------------------
Between groups      8360542.63      4   2090135.66      0.24     0.9174
 Within groups       568436416     64      8881819
------------------------------------------------------------------------
    Total            576796959     68   8482308.22

Bartlett's test for equal variances:  chi2(4) =  11.4252  Prob>chi2 = 0.022

                  Comparison of Price by Repair Record 1978
                                (Bonferroni)
Row Mean-|
Col Mean |          1          2          3          4
---------+--------------------------------------------
       2 |    1,403.1
         |      1.000
         |
       3 |    1,864.7    461.608
         |      1.000      1.000
         |
       4 |      1,507    103.875   -357.733
         |      1.000      1.000      1.000
         |
       5 |    1,348.5    -54.625   -516.233     -158.5
         |      1.000      1.000      1.000      1.000

. scalar Fobs = r(F)

. quietly summarize  price if  rep78==1 , mean

. replace  price =  price-r(mean) + 6146.043 if rep78==1

. quietly summarize  price if  rep78==2 , mean

. replace  price =  price-r(mean) + 6146.043 if rep78==2

. quietly summarize  price if  rep78==3 , mean

. replace  price =  price-r(mean) + 6146.043 if rep78==3

. quietly summarize  price if  rep78==4 , mean

. replace  price =  price-r(mean) + 6146.043 if rep78==4

. quietly summarize  price if  rep78==5 , mean

. replace  price =  price-r(mean) + 6146.043 if rep78==5


. bootstrap r(F), reps(1000) strata(rep78) saving(C:\Users\user\Desktop\carlo_F.dta, every(1) double replace) bca ties nodots : oneway p
> rice rep78 , bonferroni tabulate

warning: Because oneway is not an estimation command or does not set e(sample), bootstrap has no way to determine which observations
         are used in calculating the statistics and so assumes that all observations are used. This means that no observations will be
         excluded from the resampling because of missing values or other reasons.

         If the assumption is not true, press Break, save the data, and drop the observations that are to be excluded. Be sure that
         the dataset in memory contains only the relevant data.

Bootstrap results

Number of strata   =         5                  Number of obs     =         69
                                                Replications      =      1,000

      command:  oneway price rep78, bonferroni tabulate
        _bs_1:  r(F)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |   3.36e-15   .6453927     0.00   1.000    -1.264947    1.264947
------------------------------------------------------------------------------

. use "C:\Users\user\Desktop\carlo_F.dta"
(bootstrap: oneway)

. count if_bs_1>0.24 ///*the original r(F)*
  810

. di 810/1000
.81                ///the bootstrap p-avlue, that confirms no rejection of the null
      /// (null=no difference among the means)


.

Kind regards,
Carlo
(Stata 19.0)

Comment

Thrisa ml

Join Date: Feb 2021

Posts: 8
#7

06 Feb 2021, 03:42

Carlo Lazzaro thank You for your response!
could you please elaborate a bit, what is the underlying thought of that approach and how does it differ from the ones of bootstrap or permutation without adjusting the mean values as proposed in your code? So what are the preconditions or reasons to use your solution instead of another ?

best regards
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#8

06 Feb 2021, 03:56

Thrisa:
first, when it comes to resampling (that is most of the times non-parametric) it is difficult to justify why a given approach outperforms the other options (unless you can contrast it against its paranetric counterpart).
Sticking with my previous reply, the underlying reason is to fulfill the same requirements of a boostrap -ttest- when you have more than two groups.
The basic idea is to focus on the bootstrap p-value for r(F) instead of the CI for the same statistic (and therefore avoid the nuisance of negative value of the 95% CI for r(F), that are meaningless for the F-distribution, as it is defined on the 0-+infinitive interval; obviously, letting the data speaking for themselves, the non-parametric bootstrap does not know the interval on which a given theoretical probability distribution is defined) .
The idea originally elaborated on the following article: https://pubmed.ncbi.nlm.nih.gov/10180748/.

Kind regards,
Carlo
(Stata 19.0)
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 702
#9

06 Feb 2021, 06:28

Originally posted by Thrisa ml View Post

Felix Bittmann Thank you for than hint!
I checked in the help file and indeed I now do get a solely positive CI. Yet, I don’t really understand what the bias corrected CI does and how to interpret it, compared to the “normal” CI? So is it in any case more reliable?

regards

There is no simple answer to this question and four different kinds are implemented in Stata for reasons. Some work better than others in some situations. However, we can rule out the standard SEs since they produce negative and hence impossible values, this should be avoided of course. I suggest to compute them all and compare. If the other 3 are fairly similar to each other you can either report all or select one.
Like:

Code:

bootstrap f=e(F), reps(10000) seed(123) bca: anova valt_op primorg estat bootstrap, all

However, as the other very interesting comments in this thread have pointed out, you are faced with various other questions and this goes far beyond a simple CI selection. I would suggest to dig deeper into the theory so you get a better understanding what resampling techniques are and which one might be best for your purpose.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#10

06 Feb 2021, 09:40

Thrisa ml , referring back to #5 here: If you would show the -permute- command you used, and the actual output, I could comment on the meaning and your interpretation. From your description, I'm uncertain what you actually got as results, but I would say that there seems to be some misunderstanding or mis-emphasis in your interpretation as you describe it here.
Comment
Thrisa ml

Join Date: Feb 2021

Posts: 8
#11

06 Feb 2021, 11:54

Mike Lacy Thank you for your reply!

Below is an example of the code and respective output that I get. So I was wondering what the Monte Carlo permutation results, especially the p-vales and CIs, finally tell me.
So how can I confirm the result that I get when doing the t-test on the inital dataset?

Hope, it helps to better understand my issue.

Thanks in advance

T-test result on initial dataset:

Input for permutation:
set seed 1234
permute aum_cat t=r(t), reps(10000): ttest cap_valt, by(aum_cat)

Output:
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#12

07 Feb 2021, 11:04

I would regard the permutation tests as the "true" result that the t-test attempts to approximate, under asymptotic conditions.

Strictly speaking, you could say this: "If observations were randomly shuffled across categories of aum_cat, and the t-value calculated on each such shuffled sample, 21.84% of time a t-value would occur that is as big or bigger than that which occurred in the original sample." The conclusion would be, then, that your observed data only weakly contradicts the null hypothesis of no population difference between the means from the categories of aum_cat in relation to the alternative hypothesis that the population mean in the "small" category exceeds that in the "large" category. A similar interpretation would apply if you wanted a two-sided test.

It's worth noting here that the conventional t-test gives a p-value of 0.2186 that is very close to that given by the permutation test. With reasonable sized samples, as you have here, the t-test often --but not always-- gives valid results even when its usual assumptions aren't met.
2 likes
Comment

Announcement

Bootstrap Anova Confidence Intervals F statistic

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment