Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • sdid, sdid_event and covariates using optimized method

    Hi,

    I am working with a panel dataset, for which the panel identifier is at region*occupational sector level, covering multiple years. The treatment is staggered and there are 2 adoption times while the rest remain untreated throughout. I have an outome of interest - Y
    I would like to estimate an event study using the sdid_event command from the sdid_event package (Ciccia, Clarke & Pailañir), and include a set of covariates that are essentially Occupation× Year fixed effects - to control for time-varying occupational sector shocks.

    From my understanding of both the sdid and the sdid_event package:
    • When I use the covariates(...) option in sdid, without the projected keyword, the algorithm performs a joint optimization. That is, it estimates the covariate coefficients β^\hat{\beta} simultaneously with the unit and time weights, so as to minimize the weighted imbalance in pre-treatment residual outcomes. This is the “optimized” covariate adjustment approach described in Arkhangelsky et al. (2021).
    • When I use the covariates(...), projected option, it instead runs a two-way fixed effects regression of the outcome on covariates only using untreated observations, computes residuals, and then applies SDID to those residuals. This corresponds to the approach proposed by Kranz (2022).
    Now, for my event study, I would like to use the sdid_event command and include the covariates described above. However, I notice that sdid_event currently only allows the projected covariate adjustment method.

    My question is:
    Is it theoretically or computationally possible to implement the "optimized" covariate adjustment within sdid_event? That is, to jointly estimate the covariate coefficients and the weights in the event study setting, as is done in the regular sdid command?
    If anyone has extended or can help in extending sdid_event in this direction or knows whether this functionality already exists, I would really benefit from some help.
    I'm attaching a code that simulates a dataset quite similar to mine. If possible, please help me with an event study estimation for staggered treatment, using the 'optimized' method for including covariates.
    Code:
    clear
    set obs 144  // 8 regions * 3 occupational sectors * 6 years/time periods = 144 obs
    
    // Generate panel identifiers
    gen region_id = ceil(_n / (3 * 6))       // 1 to 8
    gen occ_id = mod(ceil(_n / 6) - 1, 3) + 1  // 1 to 3 (loops every 6 obs)
    gen year = mod(_n - 1, 6) + 1           // 1 to 6
    
    // Label regions for interpretability
    gen region = ""
    replace region = "a" if region_id == 1
    replace region = "b" if region_id == 2
    replace region = "c" if region_id == 3
    replace region = "d" if region_id == 4
    replace region = "e" if region_id == 5
    replace region = "f" if region_id == 6
    replace region = "g" if region_id == 7
    replace region = "h" if region_id == 8
    
    // Create Occupational Sector var as string
    gen occupation = "occ" + string(occ_id)
    
    // Create unique panel id: Region * Occupational Sector
    egen panel_id = group(region_id occ_id)
    
    // Outcome - random 
    set seed 12345
    gen outcome = runiform()
    
    // Generate treatment indicator (Staggered treatment adoption timing- differs between region "e" and region "c", all others are always not exposed to treatment)
    gen post = 0
    replace post = 1 if region == "e" & year >= 5
    replace post = 1 if region == "c" & year >= 3
    egen occtrends = group(occupation year) // occupational sector specific time trends
    tab occtrends, gen(occtrends_) // dummies for occupational sector specific time trends
    
    ***SDID** 
    *No covariates
    sdid outcome panel_id year post, vce(noinference)
    // ATT = -0.10393
    *Optimized method for covariates
    sdid outcome panel_id year post, vce(noinference) covariates(occtrends_*, optimized)
    // ATT = -0.13821
    *Projected method for covariates 
    sdid outcome panel_id year post, vce(noinference) covariates(occtrends_*, projected)
    // ATT =-0.08656
    
    **SDID- EVENT STUDY *** 
    *No covariates
    sdid_event outcome panel_id year post, disag
    // ATT = -0.10393 (identical to sdid w/o covariates)
    *Projected method for covariates (defualt, no other option available as part of package)
    sdid_event outcome panel_id year post, covariates(occtrends_1 occtrends_2 occtrends_3 occtrends_4 occtrends_5 occtrends_6 occtrends_7 occtrends_8 occtrends_9 occtrends_10 occtrends_11 occtrends_12 occtrends_13 occtrends_14 occtrends_15 occtrends_16 occtrends_17 occtrends_18) disag 
    //ATT = -0.08657 (identical to sdid with projected covariates)

    Thanks in Advance,

    Warmly,
    Aadya

  • #2
    I would like to bump this and also point out that I find that sdid_event and sdid give the same overall ATET with no covariates, but sometimes give slightly but noticeably different ATTs with covariates, even when using the projected option.

    In other words, this:

    Code:
    sdid_event rate id year after, vce(placebo)
    sdid rate id year after, vce(placebo) graph
    produces the same ATT

    but this:

    Code:
    sdid_event rate id year after, vce(placebo) covariates(frpl non_white)
    sdid rate id year after, vce(placebo) covariates(frpl non_white,projected) graph
    produces an ATT of .0970592 using sdid_event and 0.09649 using sdid.

    will produce slightly different ATTs for sdid_event vs. sdid. The difference is not great enough to substantively impact an analysis, but I am wondering if anyone can shed light on where this discrepancy might arise in my case.
    Last edited by Kyle Huisman; 08 Jul 2025, 03:58.

    Comment


    • #3
      This is Diego Ciccia, maintainer of sdid_event.Thanks for your interest in sdid_event!

      We have recently updated both sdid and sdid_event. Among the new features, we have included (a) cluster-robust inference for bootstrap, placebo and jackknife inference methods (both in sdid and sdid_event), and (b) optimized method for covariate adjustment in sdid_event.
      You can re-install the packages directly from the Github repository. Both updates will soon be pushed to SSC.

      As for the first question, the optimized method for covariate adjustment now also works in sdid_event, it is the default (as in sdid), and returns the same ATT estimate as sdid.
      Here's an example:
      Code:
      clear
      
      set seed 123456
      local GG = 50
      local TT = 10
      set obs `=`GG'*`TT''
      gen G = mod(_n-1, `GG') + 1
      bys G: gen T = _n
      sort G T
      
      gen D = G >= `GG'/2 & T >= `TT'/2 + 4*(G/`GG')
      gen C = uniform() * (1 + D)
      gen Y = uniform() * (1 + D) + 0.5 * C
      
      sdid_event Y G T D, covariates(C) vce(off)
      sdid Y G T D, covariates(C) vce(noinference)
      As for the second question, the small discrepancy is due to the fact that sdid computes the residuals explicitly via matrix algebra, while sdid_event uses the build-in Stata predict function.

      I hope this helps! Let me know if you have any other questions and/or comments!

      Comment


      • #4
        Dear all,

        I have two small follow-up questions regarding the post of Aadya Swaminathan above.

        1. If I'm not mistaken, the sdid and sdid_event packages have originally been developed for typical country-time (or state-time, region-time, firm-time) settings. Does it also work as intended in the region-sector-time setting as described above (when you manually define the unit as a combination of region-sector)? I guess that in that case it is possible that e.g. other sectors of the same region are combined to create a control unit for the treated region-state (as the package is really going to see them as two different units which have nothing in common). Is this desirable/undesirable?
        2. Intuitively, can adding sector-year dummies as covariates to sdid / sdid_event be interpreted in the same way as adding them to a "regular" panel regression (e.g. reghdfe)? Do they serve the same purpose of taking sector-year variation out of the data?

        Thank you in advance for your clarifications!

        Best,

        Mathieu

        Comment

        Working...
        X