I recently discovered an interesting behavior while replicating a Stata package in R for multiple hypothesis testing that uses bootstrap resampling.
When using a small number of bootstrap samples (B=20) for testing purposes, I noticed slight inconsistencies in results between Stata and R outputs with the same data. After investigating, I found that Stata's sort() function appears to randomize the order of tied values, which can affect p-value calculations in bootstrap procedures when using:
sortmaxstats = sort(maxstats', -1)'
indx = find(sortmaxstats :<= observed_stat)
p_value = indx/B
With B=20, this randomization could shift p-values by ~0.05, but when I increased to B=3000 the difference becomes much less.
Is there an accepted standard in statistical practice for handling ties in bootstrap procedures - random or deterministic tie-breaking?
Thanks!
When using a small number of bootstrap samples (B=20) for testing purposes, I noticed slight inconsistencies in results between Stata and R outputs with the same data. After investigating, I found that Stata's sort() function appears to randomize the order of tied values, which can affect p-value calculations in bootstrap procedures when using:
sortmaxstats = sort(maxstats', -1)'
indx = find(sortmaxstats :<= observed_stat)
p_value = indx/B
With B=20, this randomization could shift p-values by ~0.05, but when I increased to B=3000 the difference becomes much less.
Is there an accepted standard in statistical practice for handling ties in bootstrap procedures - random or deterministic tie-breaking?
Thanks!
Comment