Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Permute in Dif-in-Dif setting with multiple periods

    Hello everyone,

    I am trying to conduct a Permutation Test for Inference in Dif-in-Dif with multiple periods. I have 19 clusters (provinces) that received treatment in different years. There are 7 years in my sample and the treatment arrives to one province first, another province second, and then to the rest of the country. Once a province is treated, it must remain treated for all subsequent periods.

    My plan is to shuffle the treatment assignment across provinces subject to the restriction that in each year the number of treated units must remain fixed, and subject to the restriction that once a province is treated it remains treated all following years. I could not manage to implement this with the STATA command "PERMUTE", so I wrote my own program:

    gen N=_n
    gen beta=.
    reg Y treatment i.year i.prov // this is the "original beta"
    replace beta = _b[treatment] if N==1001

    forvalues i = 1(1)1000 {
    generate a = uniform()
    gen N2=_n
    replace a = . if _n>19
    sort a
    gen a1 = N2 if _n==1
    gen a2 = N2 if _n==2
    egen b = mean(a1)
    egen c = mean(a2)
    replace treatment=0
    replace treatment = 1 if year>=3 & prov == b
    replace treatment = 1 if year>=5 & prov == c
    replace treatment = 1 if year>=6 & prov !=b & prov !=c

    qui: reg Y treatment i.year i.province

    replace beta = _b[treatment] if N==`i'
    drop a a1 a2 b c N2
    }

    sum beta if N==1001
    gen bigbeta = (beta>=`r(mean)') // this is because my original "beta" is >0
    sum bigbeta if (N>=1 & N<=1000)
    global p1 = round(`r(mean)', 0.001)
    di $p1

    The problem with my current program is that it produces a different p-value every time I run it. I think it is not something that would be solved efficiently by just increasing replications. I think the problem is that I have in total 19*18=375 ways to shuffle the treatment, and every time I get different combinations of treatments. Instead, I would like to only conduct 375 repetitions, ensuring that each of those is different from the others so that I have exactly one version of each of the possible 375 placebos. Then, I would compute my p-value as the number of betas larger than my original beta, divided by 375. Is there an existing command that would do this for me? Does anyone have any ideas on how to update my program to do this?


    Thanks a lot in advance
    MLY




  • #2
    You need to set the random number generator seed before you use random numbers in order to get reproducible results. See -help set seed-.

    Comment


    • #3
      Thanks Clyde! The seed command is indeed very useful to replicate exact results. The problem is that since the p-value is changing significantly, I can't rely on a result that depends on the seed. I was looking for a method that was more seed-robust.

      Fortunately, I think I managed to get what I wanted. I share the code here in case someone is interested:

      gen N =_n
      gen rep=0
      gen beta=.

      reg Y treatment i.year i.prov [pw=wgt]
      replace beta=_b[treatment] if N==1001

      foreach j in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 {
      foreach k in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 {
      replace rep = rep+1
      replace treatment = 0
      replace treatment = 1 if year>=2 & prov ==`j'
      replace treatment = 1 if year>=3 & prov ==`k'
      replace treatment = 1 if year>=4 & prov !=`j' & prov !=`k'

      reg Y treatment i.year i.prov [pw=wgt]

      replace beta=_b[treatment] if N==rep
      sum beta
      replace beta = . if N==rep & `j'==`k'
      }
      }

      sum beta if N==1001
      gen bigbeta = (beta>=`r(mean)') if beta!=.
      sum bigbeta if (N>=1 & N<=rep)
      global p1= round(`r(mean)', 0.0001)
      di $p1

      Comment


      • #4
        Well, then you are just sweeping the problem under the rug, not fixing it. If your goal here is to calculate a significance probability, and if the result is strongly seed-dependent then the only valid conclusion is that 375 permutations is insufficient to allow you to reach any conclusion about this and you must go on to a larger number of permutations. By relying on a fixed set of permutations rather than one generated randomly you have not overcome that limitation: you've just stipulated that you're going to rely on one particular, arbitrary, set of 375 permutations.

        Now, I'm not sure what you are saying when you say the result is seed dependent. In any randomization procedure, permutation tests being no exception, the result is not deterministic and there will be some degree of sampling variation in the result. If your p-value is varying within a narrow range depending on the seed, that is not a problem. You can't take any of those p-values literally. They are just estimates. And if you are obsessing about the distinction between < 0.05 vs >= 0.05, you are "barking up the wrong tree." If on the other hand, you are getting p-values that are running all over the place, then you have a problem, but that, in turn, requires a larger test to produce stability, not fixing on one arbitrary set of permutations.

        Comment

        Working...
        X