Random Tie-Breaking in sort() with Bootstrap Methods

Elliot Paschal

Join Date: May 2025

Posts: 1
#1

Random Tie-Breaking in sort() with Bootstrap Methods

12 May 2025, 17:18

I recently discovered an interesting behavior while replicating a Stata package in R for multiple hypothesis testing that uses bootstrap resampling.

When using a small number of bootstrap samples (B=20) for testing purposes, I noticed slight inconsistencies in results between Stata and R outputs with the same data. After investigating, I found that Stata's sort() function appears to randomize the order of tied values, which can affect p-value calculations in bootstrap procedures when using:

sortmaxstats = sort(maxstats', -1)'
indx = find(sortmaxstats :<= observed_stat)
p_value = indx/B

With B=20, this randomization could shift p-values by ~0.05, but when I increased to B=3000 the difference becomes much less.

Is there an accepted standard in statistical practice for handling ties in bootstrap procedures - random or deterministic tie-breaking?

Thanks!

Last edited by Elliot Paschal; 12 May 2025, 17:39.
Tags: None
Felix Bittmann

Join Date: Aug 2018

Posts: 691
#2

13 May 2025, 00:10

As you say, in Stata, the sort() function is not stable. This not only concerns bootstrapping but any application of this function. So this becomes a more general question and I am not aware of any general rule. In any case, for bootstrapping, this should not have much relevant consequences as the application of using only like 20 resamples is unrealistic.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
1 like
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3455
#3

13 May 2025, 04:08

What is the correct sort order of the numbers 2, 2, and 2? There is no information that would let us answer that question. So random seems to me to make the most sense.

you seem to be using Mata for your bootstrap function. I would do that differently:

Code:

Nextreme = sum(abs(maxstats) :>= abs(observed_stat)) pvalue = Nextreme/B

If you do this in Stata, I would use count. I would have stored the observed_stat as a scalar with a tempname and B as a local, and than:

Code:

count if maxstats >= `observed_stat' local pvalue = r(N)/`B'

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment

Announcement

Random Tie-Breaking in sort() with Bootstrap Methods

Comment

Comment