Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Random Tie-Breaking in sort() with Bootstrap Methods

    I recently discovered an interesting behavior while replicating a Stata package in R for multiple hypothesis testing that uses bootstrap resampling.

    When using a small number of bootstrap samples (B=20) for testing purposes, I noticed slight inconsistencies in results between Stata and R outputs with the same data. After investigating, I found that Stata's sort() function appears to randomize the order of tied values, which can affect p-value calculations in bootstrap procedures when using:

    sortmaxstats = sort(maxstats', -1)'
    indx = find(sortmaxstats :<= observed_stat)
    p_value = indx/B

    With B=20, this randomization could shift p-values by ~0.05, but when I increased to B=3000 the difference becomes much less.

    Is there an accepted standard in statistical practice for handling ties in bootstrap procedures - random or deterministic tie-breaking?

    Thanks!
    Last edited by Elliot Paschal; 12 May 2025, 17:39.

  • #2
    As you say, in Stata, the sort() function is not stable. This not only concerns bootstrapping but any application of this function. So this becomes a more general question and I am not aware of any general rule. In any case, for bootstrapping, this should not have much relevant consequences as the application of using only like 20 resamples is unrealistic.
    Best wishes

    Stata 18.0 MP | ORCID | Google Scholar

    Comment


    • #3
      What is the correct sort order of the numbers 2, 2, and 2? There is no information that would let us answer that question. So random seems to me to make the most sense.

      you seem to be using Mata for your bootstrap function. I would do that differently:

      Code:
      Nextreme = sum(abs(maxstats) :>= abs(observed_stat))
      pvalue = Nextreme/B
      If you do this in Stata, I would use count. I would have stored the observed_stat as a scalar with a tempname and B as a local, and than:

      Code:
      count if maxstats >= `observed_stat'
      local pvalue = r(N)/`B'
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment

      Working...
      X