Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How does htdidregress estimate ATET for repeated cross sections with only one observation per-unit

    The point estimates for xthdidregress can be obtained by subtracting each unit's outcome in the base period from the treated period, and applying teffects estimators to this transformed outcome. (At least, IPW results can be replicated as such.) However, assume we have only one observation per-unit across a number of years. There is a treatment status indicator for observations in earlier years prior to the treatment, but we don't observe these pre-treatment observations again in the post treatment period. How does hdidregress ipw work in this context? Can anyone give me an example point-estimate replication using logit to calculate propensity scores and regress with IPW weights?

    Let me give a concrete example. Say we are trying to estimate the effect of a scholarship on university attendance of students within a district by comparing students who are eligible for the scholarship or would have been if it was in place to ineligible students before and after the program began. Say it began in 2023. The treatment is a scholarship program going into effect, each unit of observation is an individual high school graduate's college attendance outcome within six-months for the graduating cohorts of 2016 to 2024, and treated and untreated observations are grouped based on whether graduating students were eligible post-treatment or would have been eligible pre-treatment if the program had been in effect. I tried replicating the htdidregress atet for the class of 2023 using the following code:



    Code:
    preserve
    
    gen treat_2023 = inlist(graduate_year, 2022, 2023) & eligible_dummy == 1
    gen control_2022 = inlist(graduate_year, 2022, 2023) & eligible_dummy == 0
    
    keep if treat_2023 | control_2022 
    
    
    gen treated = 0
    replace treated = 1 if graduate_year == 2023
    
    
    logit eligible_dummy i.FRPL i.sex i.racex if graduate_year == 2022
    predict pscore, pr
    
    gen ipw = .
    replace ipw = 1/pscore if treat_2023 == 1
    replace ipw = 1/(1 - pscore) if treat_2023 == 0
    replace ipw = 1 if treat_2023 == 1
    
    drop if graduate_year < 2022
    
    reg attend_uni i.treat_2023##treated [pw = ipw], vce(robust)
    But this isn't quite correct. I actually get an even closer result to htdidregress if I include both years in the 2x2 comparison in the logit, but they still do not exactly match.

    Ignore the fact that this is not really a staggered treatment timing case. I am using the command as a convenience tool to incorporate a selection model into the regression. I want to do this replication so that I can check covariate balance diagnostics for each ATET estimate. Can the ATETs be replicated using basic commands here? What am I missing?

    Sorry that I cannot give a data example for confidentiality reasons.

  • #2
    Please let me know if I can provide any additional details to clarify my question. I feel like I am missing something simple here. I've reviewed the estimator equations in the documentation but can't figure out exactly what I am doing wrong here. https://www.stata.com/manuals/causalhdidregress.pdf

    Comment


    • #3
      I don't do this kind of thing, but just underline that our longstanding FAQ Advice covers the problem of confidentiality. The advice then is to pose your question in terms of a realistic fake data example that can be used to show your problem or in terms of a dataset anyone can use that does.

      Your question very likely makes perfect sense to people who do do this, but among the many reasons people don't get answers here is that they're phrased in terms of results for data we can't access. In addition, it is the weekend, and so on.
      Last edited by Nick Cox; 25 May 2025, 05:50.

      Comment


      • #4
        https://friosavila.github.io/app_met..._metrics2.html
        in the aprendiz of these slides I give exact fórmulas I use for csdid
        bottom line
        youe logit should consider both pre and post treat eme t periods data

        Comment


        • #5
          Apologies, Nick. I did not mean to rush anyone on a Holiday weekend! Here is some code to generate a fake version of my dataset:

          clear
          set obs 500
          gen graduate_year = 2016 + floor((_n - 1) / 200)
          gen attend_uni = runiform() > 0.5
          gen eligible_dummy = runiform() > 0.5
          gen school_dummy = 1 + floor(2 * runiform())
          gen FRPL = runiform() > 0.5
          gen sex = runiform() > 0.5
          gen racex = ceil(3 * runiform())
          gen student_id = _n

          Note that I can manually replicate ATETs for panel data settings using the following code:

          Code:
          ****************** Htdid regress replication *************************
          clear all
          use "\mpdta.dta" 
          
          
          gen treatx = treat 
          replace treatx = 0 if first_treat > year
          xtset countyreal year
          xthdidregress ipw (lemp) (treatx lpop), group(first_treat) 
          
          
          keep if first_treat == 0 | first_treat ==2004
          *keep if year == 2003 | year == 2004
          * Can switch to 2004, 05, 06, 07, etc.
          keep if year == 2003 | year == 2007
          
          frame copy default frame,replace
          frame copy default frame2,replace
          frame change frame
          keep if year == 2003
          frame change frame2
          keep if year == 2007
          rename lemp lemp_2007
          rename lpop lpop_2007
          frame change frame
          frlink 1:1 countyreal,frame(frame2)
          frget lemp_2007 lpop_2007,from(frame2)
          gen lemp_diff = lemp_2007 - lemp
          
          * Replication of ipw htdidregress
          teffects ipw (lemp_diff) (treat lpop), atet
          An r-version of the dataset can be found here.
          https://github.com/bcallaway11/did/tree/master/data

          Based on the Stata documentation, it seems that hdidregress actually uses the post-treatment period to calculate propensity scores, so I think that's one problem with the original replication attempt I gave, but I still don't get the same results when I correct it.

          Let me know what additional details would be helpful.

          Comment


          • #6
            Thanks, Fernando! When I set the logit model in my original replication code to include both periods, my ATET is still a few decimals off from what I get from htdidregress.

            Code:
            preserve
            
            gen treat_2023 = inlist(graduate_year, 2022, 2023) & eligible_dummy == 1
            gen control_2022 = inlist(graduate_year, 2022, 2023) & eligible_dummy == 0
            
            keep if treat_2023 | control_2022 
            
            
            gen treated = 0
            replace treated = 1 if graduate_year == 2023
            
            
            logit eligible_dummy i.FRPL i.sex i.racex 
            predict pscore, pr
            
            gen ipw = .
            replace ipw = 1/pscore if treat_2023 == 1
            replace ipw = 1/(1 - pscore) if treat_2023 == 0
            replace ipw = 1 if treat_2023 == 1
            
            
            reg attend_uni i.treat_2023##treated i.graduate_year [pw = ipw], vce(robust)
            
            bysort eligible_dummy graduate_year: sum attend_uni [aw = ipw]
            
            restore
            I will dig into the slides to see where I am still going wrong and post the corrected code here for future reference.

            Comment


            • #7
              This code does the trick:

              preserve

              * Step 0: Keep only obs for 2022 and 2023
              keep if inlist(graduate_year, 2022, 2023) & inlist(eligible_dummy, 0, 1)

              * Step 1: Define treatment and time indicators
              gen D = eligible_dummy == 1
              gen T = graduate_year == 2023

              * Step 2: Estimate propensity score on pooled sample
              logit D i.FRPL i.sex i.racex
              predict pscore, pr
              gen omega = pscore / (1 - pscore)

              * Step 3: Define group indicators
              gen D1_T1 = (D == 1 & T == 1)
              gen D1_T0 = (D == 1 & T == 0)
              gen D0_T1 = (D == 0 & T == 1)
              gen D0_T0 = (D == 0 & T == 0)

              * Step 4: Compute unweighted treated means
              summarize attend_uni if D1_T1 == 1, meanonly
              local E11 = r(mean)
              summarize attend_uni if D1_T0 == 1, meanonly
              local E10 = r(mean)

              * Step 5: Compute weighted control means
              gen wy_01 = attend_uni * omega if D0_T1 == 1
              gen wy_00 = attend_uni * omega if D0_T0 == 1

              gen omega_01 = omega if D0_T1 == 1
              gen omega_00 = omega if D0_T0 == 1

              summarize wy_01, meanonly
              local num01 = r(sum)
              summarize omega_01, meanonly
              local den01 = r(sum)
              local E01 = `num01' / `den01'

              summarize wy_00, meanonly
              local num00 = r(sum)
              summarize omega_00, meanonly
              local den00 = r(sum)
              local E00 = `num00' / `den00'

              * Step 6: Compute IPW ATET
              local ipw_atet = (`E11' - `E10') - (`E01' - `E00')
              display "IPW ATET(2023, 2023) = `ipw_atet'"

              restore


              Thanks for the helpful slides.

              Comment


              • #8
                Also
                stata’s hdidregress does something different than cadid when using regression adjustment
                in my code I only run 1 regression
                Stata runs 4. Pre post treated not treated

                Comment


                • #9
                  My very small point about the weekend was that answers are less frequent at weekends, not at all that questions are out of order then.

                  Comment

                  Working...
                  X