Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Possible bug in the "permute" command?

    after having observed some unexpected results from permutation tests using stata command "permute" I tried to run the following code:

    ------------------------------------------------------------------------------------------------------------------------
    clear

    set obs 200
    gen x = uniform()
    gen treat = (x>.5)

    capture:program drop sum2
    program define sum2, rclass
    quietly{
    tempname mean1

    summarize x if treat==1
    scalar `mean1'=r(mean)

    return scalar sum2mean = `mean1'
    }
    end

    set more off
    permute treat mean=r(sum2mean) , ///
    seed(123456) reps(10000) : sum2

    set more off
    permute treat mean=r(mean) , ///
    seed(123456) reps(10000): summarize x if treat==1
    ------------------------------------------------------------------------------------------------------------------------

    From my understanding of the command the two instances of it in the code above should be equivalent, nevertheless they give me two opposite results: the first one reports a p-value of zero (which makes sense) while the second one reports a p-value of one (which makes no sense at all to me).

    I experience this problem with the following versions of the command: *! version 2.6.1 05feb2014
    *! version 2.5.1 09jun2011


    Am I using the command incorrectly?

    Thanks for any answer,
    Tommaso

  • #2
    Tommaso,

    I'm not terribly familiar with the permute command, but I think I can see where the problem might be. First, note that in the first instance, the sample size is given as 200 (the original sample size), while in the second instance it is given as 88 (the number of observations for which treat==1). Accordingly, I think what is going on here is that in the first instance permute is using all of the data, permuting treat, and running the summarize command on the permuted value of treat. It sounds like this is what you want. In the second instance, permute is only using data for which treat=1 (in the original data set) and is not doing any permutation.

    This should not be considered a bug, since StataCorp has a very good reasons for not permuting a variable found in the if condition of the original command: how is Stata supposed to know whether you really want to do something like that or whether you really want to restrict the permute command to a subset of the data? That said, the documentation could be more explicit about the consequences of permuting a variable found in an if condition.

    In any case, it looks like you found an appropriate solution. I don't see any other way to accomplish what you want.

    Regards,
    Joe

    Comment


    • #3
      I'm not familiar with -permute- but I notice with the first command c = 0 and in the second c = 10,000.

      You also get a message that says "Warning: Because summarize is not an estimation command or does not set e(sample), permute has no way to determine which observations are used in calculating the statistics and so assumes that all observations are used. This means that no observations will be excluded from the resampling because of missing values or other reasons. If the assumption is not true, press Break, save the data, and drop the observations that are to be excluded. Be sure that the dataset in memory contains only the relevant data."

      I am not sure if that means the if qualifier won't work right; but if you add the line

      keep if treat == 1

      Right after you generate the treat variable then both sets of commands give the same results. But perhaps that means they are both wrong now.
      -------------------------------------------
      Richard Williams
      Professor Emeritus of Sociology
      University of Notre Dame
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://academicweb.nd.edu/~rwilliam/

      Comment


      • #4
        Incidentally, Joe gets a different N (88) than I do. But that is because the seed is not set at the beginning of the original program. If you want the results to be perfectly replicable by anyone (including yourself), then right after the clear statement add something like

        set seed 123456
        -------------------------------------------
        Richard Williams
        Professor Emeritus of Sociology
        University of Notre Dame
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://academicweb.nd.edu/~rwilliam/

        Comment


        • #5
          Thanx!

          I had a wrong understanding of how permute treats the if qualifier of the command after the colon. I thought it was clear that such a qualifier refers to the summarize command and not to the whole permute command since in that case generally the if qualifier is placed before the comma sign.

          After having read your comments I have noticed that stata has provided the nodrop option to deal with situations like this one and with situations where <command> is an estimation command, and indeed if I add this option to the second instance of permute in my code it works just fine.

          Thanks again,

          Tom

          Comment

          Working...
          X