Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Estimating quantile treatment effects using rifhdreg

    Dear Stata community,

    If possible, I'd like to direct this enquiry to professor Rios-Avila FernandoRios but I'd be happy if anyone could help.

    I am studying the effects of a policy implemented in 2012 on students' scores. My data is pooled from repeated crosssections from standardised exams from 2010 to 2018 at individual level. I have two variables "treat" (or D=1) and "policy" (T=1 after the treatment, 0 otherwise), the former indicates whether the individual is eligible for treatment (there are observations before and after the treatment); the latter indicates whether the period is before or after the treatment. The outcome variable is the students scores (standardised by year with mean=0 sd=1).

    My TWFE DiD estimator should be the coefficient of treat#policy. I managed to calculate the atet with DRDID and CSDID and to generate my gvar, which is great after long time struggling! And, as expected, the average effects were not different from zero. Although, I only have two years of pretreatment, my pretrends test failed to reject the null, thus, I can assume that PT holds (though we can never know for sure).

    However, effects are heterogeneous and vary considerably along the distribution of the outcome variable, so I am using rifhdreg to estimate QTE, but I'm still struggling with the use of the option "over". I tried using the following a code I found in one of the previous threads:

    Code:
    egen dd=group(treat policy2) // that way it can indicate what's pretreatment and what's pos treatment
    then I apply it in over
    rifhdreg scores treat##i.policy2##i.black  $xvar, rif(q(50)) over(dd) abs(stateid) vce(cluster stateid) rwlogit($xvar) trim(0.01,0.98) att 
    then I get the error message:
    "More than 2 groups detected. Only 2 groups allowed for the estimator"
    When I substitute "dd" by "treat", it works fine. But I still doubt whether it is right.

    My questions:

    QUESTION #1: How is reweighting calculated? I know it uses IPW but if I don't indicate the time before and after, it will calculate the weights of treated x untreated during all the period, isn't it so? How do I know that weighting is done only at pre-treatment period? I understand that trimming does the common support, but how is it being done? If not, should I adjust my variable? Sorry if I'm missing something here, in fact, I just want to make sure that my design is correct.


    QUESTION #2: How do I know how many observations from my sample are being considered ? Because the other thing is that I notice after reweighting most of my estimates are not significant. I'm practically working with the entire population (N=10 MM) and I find strange those large CIs. It could be because of the previous problem or else, could it be because I'm using a triple interaction? This is happening to nearly all the variables I use mostly for triple interactions but also some single interactions as well. I notice the more controls I use, the larger the CIs, which makes sense but is there a way to know the best variables to consider?

    Anyway, the graph below illustrates this problem and still considering the triple interaction. Here is my code:

    Code:
    qui rifhdreg scores treat##i.policy2##i.black  $xvar, rif(q(50)) over(treat) abs(stateid) vce(cluster stateid) rwlogit($xvar) trim(0.01,0.98) att 
     qregplot 1.treat#1.policy , q(5(5)95)  ols raopt( color(black%5))

    Click image for larger version

Name:	QTE.png
Views:	1
Size:	23.5 KB
ID:	1660989



    From the graph above I can say that the effects for blacks was only positive for higher quantiles. That also makes sense, but still striking that CIs are so large for the amount of data I have. Unfortunately, I cannot run QREG or BSQREG with my data (I can only do that without covariates) to compare. Of course, I'm aware that it wouldn't give me the effects on the whole distribution which I am interested at considering my outcome variable is a relative rank.

    QUESTION #3: bootstraps: due to the size of my dataset (N= 10 MM), I can't calculate use bootstraps to calculate my SEs is it possible to identify QTE just by clustering my errors? Is there another way of doing it with large datasets?

    QUESTION #4: Suppose only the double interaction, as code below, I'm a bit confused about robustness checks for identifying QTE. Could anyone help me with that?

    Code:
    qui rifhdreg scores treat##i.policy2  $xvar, rif(q(50)) over(treat) abs(stateid) vce(cluster stateid) rwlogit($xvar) trim(0.01,0.98) att
    I'm sorry for the long thread and so many questions.
    Thanks a lot for the help!

    Sandra

  • #2
    sorry I forgot my data

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float scores int year byte(policy treat sex maeduc black age location)
       .8320931 2010 0 0 2 4 0 17 1
       1.521267 2010 0 0 2 3 0 17 1
     -1.5315053 2010 0 0 2 3 0 18 1
       .7247713 2010 0 0 1 5 0 19 1
      .29095936 2010 0 0 2 6 0 17 1
      1.8560517 2010 0 0 2 4 1 19 1
       .6624395 2010 0 0 2 4 0 17 1
      2.1880713 2010 0 0 2 4 0 17 1
       .6898354 2010 0 0 1 4 0 17 1
       1.604305 2017 1 0 1 5 0 18 1
      1.8774157 2010 0 0 2 5 0 17 1
      .05997732 2010 0 0 2 3 0 17 1
       .6790272 2010 0 0 2 3 0 18 1
      2.1643803 2017 1 0 2 4 0 18 1
       .2278345 2015 1 1 2 3 0 17 1
      .25283846 2013 1 1 1 2 0 17 1
       2.432053 2016 1 1 1 6 0 17 1
      -.5717257 2017 1 1 2 4 1 18 1
       .8760782 2010 0 0 2 4 0 19 1
     -1.3276983 2018 1 1 2 3 0 23 1
       .4561282 2018 1 0 2 6 0 16 1
      -.8694832 2014 1 1 1 2 0 21 1
       1.174584 2018 1 1 2 3 0 17 1
      1.6289072 2016 1 1 1 4 0 18 1
       .6161923 2010 0 0 1 6 0 18 1
      -.5441809 2017 1 1 1 4 0 18 1
     -1.3034487 2013 1 1 2 4 0 18 1
      .58383375 2013 1 1 2 4 0 17 1
      -.8152862 2016 1 1 2 6 0 16 1
       .5783804 2014 1 0 1 4 0 17 1
       .7800663 2010 0 0 1 6 0 17 1
      -.6734315 2010 0 0 1 5 0 18 1
      -.3557064 2015 1 1 1 3 0 18 1
     .001299482 2018 1 1 1 2 0 17 1
      2.3512013 2014 1 1 1 5 0 17 1
      .13095658 2014 1 0 1 4 0 18 1
      1.2302914 2017 1 1 2 4 0 18 1
       .7307063 2017 1 1 1 4 0 16 1
      -.7883023 2017 1 1 1 5 0 17 1
     -1.1688735 2014 1 1 1 4 0 18 1
      -.6819463 2013 1 1 2 4 0 17 1
      -.5144381 2015 1 1 1 4 0 18 1
      1.4828734 2018 1 1 2 4 1 18 1
      .20754145 2013 1 1 2 4 0 19 1
       .3618076 2012 1 1 2 4 0 17 2
      1.9815017 2014 1 0 2 6 0 17 1
       .9056387 2014 1 1 1 5 0 19 1
      -.4618928 2012 1 1 2 3 1 29 1
      -.8234084 2017 1 1 1 3 0 18 1
     -1.0425493 2013 1 1 2 1 0 17 1
      1.3576448 2012 1 0 2 5 0 17 1
      .10809034 2013 1 0 2 3 0 18 1
     -1.1648142 2011 0 1 2 4 1 16 1
       1.287451 2011 0 1 2 4 0 16 1
       .5967888 2014 1 1 1 5 0 17 1
       .9599905 2011 0 0 1 4 0 17 1
     -.13123797 2018 1 1 2 2 1 17 1
      -.4003771 2016 1 1 1 3 1 19 1
     -1.2680684 2018 1 1 2 4 0 20 1
     -.56337726 2016 1 1 2 4 1 18 1
     -.17352413 2016 1 1 1 4 0 17 1
      -.6207638 2016 1 0 1 4 0 18 1
      .08984858 2012 1 1 2 3 0 17 1
         .78215 2014 1 1 1 4 0 17 1
      -.2257615 2015 1 1 2 2 0 16 1
       .9117269 2015 1 1 1 3 0 18 1
       .5517904 2014 1 0 2 4 0 17 1
     -1.8610117 2010 0 1 2 3 0 17 1
       .3615459 2015 1 1 2 4 1 17 1
    -.013528896 2012 1 1 2 3 0 17 1
       1.957257 2018 1 0 2 4 0 17 1
       -.599749 2016 1 1 1 3 0 20 1
       -1.36215 2017 1 1 2 4 0 17 1
      .10172886 2018 1 1 1 3 0 18 1
        .320066 2011 0 1 2 4 0 16 1
      1.6837468 2011 0 1 1 6 0 18 1
      -.5043347 2012 1 1 2 3 0 17 1
     .016434357 2017 1 0 2 5 0 17 1
      -.8085562 2017 1 1 2 4 0 18 1
      -.4228181 2013 1 1 2 1 0 17 1
      -.4731587 2017 1 1 2 3 0 17 1
      -.5042143 2017 1 1 1 4 0 17 1
     .010418868 2012 1 1 2 4 0 16 1
        .641746 2016 1 0 2 6 0 18 1
      -.9403694 2012 1 1 1 4 0 17 1
      -.4721707 2014 1 1 1 3 1 20 1
      .15140992 2014 1 1 2 4 0 18 1
       .3226664 2017 1 1 2 2 0 17 1
      -.8297302 2013 1 1 2 4 0 18 1
      -.9539801 2013 1 1 2 2 0 18 1
       .8690779 2014 1 0 2 6 0 17 1
       1.350337 2013 1 1 1 6 0 18 1
      -.7104812 2016 1 1 2 2 0 18 1
      1.7767092 2014 1 0 2 4 0 18 1
      -.3402585 2012 1 0 1 4 0 18 1
       .6838526 2015 1 0 1 4 0 17 1
      -.4897343 2011 0 0 2 4 0 17 1
       .9448453 2016 1 1 1 6 0 17 1
       .3808841 2013 1 1 2 3 0 17 1
      -.5811592 2016 1 1 2 5 0 20 1
    end
    label values sex sex
    label def sex 1 "M", modify
    label def sex 2 "F", modify
    label values maeduc maeduc
    label def maeduc 1 "No education", modify
    label def maeduc 2 "Primary", modify
    label def maeduc 3 "Middle school", modify
    label def maeduc 4 "High school", modify
    label def maeduc 5 "University", modify
    label def maeduc 6 "Postgrad", modify
    label values location location
    label def location 1 "Urban", modify
    label def location 2 "rural", modify
    Last edited by Sandra Macedo; 22 Apr 2022, 07:09.

    Comment


    • #3
      Hi Sandra,
      so some answers
      Q1: when you use rwlogit, the program assumes your Overvariable is binary (0 or 1). However, for a DID, at the very least , you will have 4 groups (treated untreated , before and after (I would choose a different name from policy here. It did confuse me on your other tread).
      When dealing with 3 or more groups, one could use rwmlogit. However, in these case one can only estimate something similar to ATE's

      In regards to reweighting.
      The command doesn't differentiate between pre post periods. It simply creates IPW's based on whether you are using ATT ATU or ATE.
      For ATE, for example w= 1/p(x) for treated and 1/(1-p(x)) for untreated. And 1/p_k(X) when using rwmlogit or rwmprobit
      For ATT, W=(p(x)/(1-p(x))) if untreated and W=1 if treated. there is no equivalent for mprobit mlogit

      Trimming doesn't deal with common support either. It only uses a subsample with predicted probability between the provided ranges.

      Q2. the command should produce total number of observations N=... and the e(sample) function will identify the sample after trimming as well.
      For the significance loss. That is odd, but you could try running the logit your self, and apply the reweighting strategy manually as well, to make sure its not a bug in the program.
      Other factors you mention could also be explaining that.

      Q3: You can still do bootstrapping here. It would just take a very long time. The problem with reweighting, tho, I would still try doing it by hand, and check what is going on.
      given sample size, you are right, I would say should be possible doing it without Bootstrap, and id QTE. I don't think there are many other options with large datasets other than run and wait.

      Q4: This is the specification I would have started with.
      Here, you are saying you want ALL ppl in the treated group to look like those in the untreated group, and report ATT.
      I wonder, how many treated groups you have? and how many are left after trimming? You should be able to run the logit on your own and look at the distribution of the pscore.


      For robustness checks. I don't think there are formal checks. Just understanding of what approach is being used for the QTE estimation.

      HTH
      F

      Comment


      • #4
        Oh Fernando FernandoRios , million thanks for your time answer and advice.
        I thought something was odd about the over variable.

        Regarding reweighting I think you're right. The best idea is to calculate the propensity scores by hand. [ The other option (rmlogit) I am sure wont work because with DID I can only estimate ATT, unless I mention that is an approximation. Not sure how it will look].

        However, how could I apply those weights to my code with rifhdreg ?(I guess after trimming for common support on 0<p(x)<1)?
        Would it be like the one below?

        Code:
         
         qui rifhdreg scores treat##i.policy2 WEIGHT , rif(q(50))  abs(stateid) vce(cluster stateid)
        I tried bootstrapping, I left it running for 12 hours and nothing, so I guess it's unfeasible (same as bsqreg with covariates).

        And yes, I'm doing my analyses using the binary treatment variable for the moment, but the policy considers 4 different treated groups. Unfortunately, there are not many options for working with multinomial treatment (it's not a dose), though I was thinking of using the weights for the binary variable (assuming most treated are somewhat similar) and run the quantile regressions with the i.eligible##i.policy interaction. [yes, I will change my time variable - yes, it's confusing ].

        So, once you confirm the above specification is the way to go, I will move!

        Thanks again!
        Sandra
        .

        Comment


        • #5
          You can apply the weights as you would do in standard regression
          for example
          Code:
          use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta, clear
          drop if lnwage==.
          logit female educ exper tenure
          predict pr
          *for ATT
          gen w=1 if female==1
          replace w=pr/(1-pr) if female==0
          ** Check that it does what you want
          tabstat educ exper tenure, by(female)
          tabstat educ exper tenure [aw=w], by(female)
          ** using weights in rifhdreg
          rifhdreg lnwage i.female educ exper tenure [w=w], over(female) rif(q(10)) robust
          ** Should be the same as
          rifhdreg lnwage i.female educ exper tenure , over(female) rif(q(10)) rwlogit(educ exper tenure) att

          Comment


          • #6
            Wow, perfect!! Thank you so much!!

            Comment


            • #7
              Hi FernandoRios, I hope you don't mind returning to this thread. I've got another doubt wonder if you can help.
              Here is the code I used to calculate the IPW by hand, just followed your instructions.

              Code:
              logit treat $var if post==0  
              predict psc
              sum psc
              
              #because I want the ATET
              
              gen weight =1 if treat==1
              replace weight=psc/(1-psc) if treat==0 
              
              #but I'm using a DiD setup so had to generate the 4 groups
              
              egen dd=group(treat post)
              
              rifhdreg  z_scores i.treat##i.post $var [aw=weight], over(dd) rif(q(10)) abs(i.stateid#i.year) cluster(stateid)
              
              #also I'm using the same weights for the categorical eligible variable // probably not QTE, but still relevant to look at or might use mlogit and apply to treated x not treated. 
              
              rifhdreg  z_scores i.eligible##i.post $var [aw=weight], over(dd) rif(q(10)) abs(i.stateid#i.year) cluster(stateid)
              Q1: I am a bit confused about interpretation. This will give me the partial QTE on the whole distribution of the treated population am I right? I mean if I follow DiD assumptions, can l now identify distributional (DiD) treatment effects with this code? the rif regression got me confused as it will require a unconditional quantile interpretation. Could you please clarify?

              GRAPHS:
              About the graphs I asked you about, I found your code in one of the threads

              Code:
              clear all
              use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta, clear
               
              
              capture matrix drop rbs rll rul
               forvalues i=10(5)95 {
               qui:rifhdreg lnwage educ exper tenure age , rif(q(`i')) robust
               matrix rtb=r(table)
               matrix rbs=nullmat(rbs)\[`i',rtb["b",.]]
               matrix rll=nullmat(rll)\[`i',rtb["ll",.]]
               matrix rul=nullmat(rul)\[`i',rtb["ul",.]]
              }
              
              svmat rbs rll rul
              I wonder if there's a way of saving only the coefficients of the variables I need. The way it is there is a variable for each coefficient and that overloads my dataset which is too large (10M).
              Apart from that, I found this code very useful for plotting more than one quantile models.

              Well, thanks once again!!

              Sandra





              Comment


              • #8
                Hi Sandra
                ok so, RIF regressions require unconditional interpretation as long as the RIF is unconditional. THe way you are using it, however is "conditional on "dd" because that is your "over" variable.
                At leas that is what I m advocating.
                Regarding the graph, you have two options.
                1. you can use qregplot (which i think you already tried).
                2. given your large set of variables you can apply the code you cite with a twist

                1) estimate all models as above
                2) if you are using Stata 16+ create a new frame and "svmat" it there.
                2b) if you hare using Stata 15 or lower, just clear the data from memory, and svmat the matrices

                HTH
                F

                Comment


                • #9
                  Hi Fernando thanks so much for the quick answer!! I thought I had understood that, then I got confused. It's clear now!! Thanks for clarifying. Regarding the matrices, am using Stata 16+ but I thought there was an easier way. I'll leave as they are, Many thanks!

                  Comment

                  Working...
                  X