Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looking for Methods to Improve Pre-treatment Matching in a Generalised Synthetic Control

    Hi all, to preface I'm currently using R rather than Stata although my questions are primarily methodological. Please redirect me and apologies if inappropriate!

    So I'm using a GSC to find the impact on employment of reducing the eligibility for the living wage from age 25 to 23 in 2021.

    Clustered from APS individual data, I have 26 treatment groups (1 for each combination of 13 regions, 2 sex's and 1 age category (23-24)) and 104 control groups (13 regions, 2, sex's and 4 age categories spanning (25-64)). I have 60 time periods, this is quarterly data from 2010 to 2025, so there are 45 pre-treatment periods. My covariates are Employment rate (%), log average hourly wage and bite (share of workers affected by min wage increases calculated yearly) (%).

    When I run gsynth with cross validation it says:

    "Cross validation cannot be performed since available pre-treatment records of treated units are too few. So set r.cv = 0. Parametric Bootstrap"

    and I get a counterfactual plot that looks like this:
    Click image for larger version

Name:	Rplot(counterfactual).png
Views:	1
Size:	8.1 KB
ID:	1785304


    I've tried artificially setting the number of factors as 1 - 5 and it doesn't look significantly different. As far as I can tell my treatment groups are not at the extreme's of the trend data:
    Click image for larger version

Name:	Rplot(trends).png
Views:	1
Size:	15.0 KB
ID:	1785305



    My questions are:

    1) What methods can I use to improve this pre-treatment matching and why has cross validation failed?
    2) Is this a weak application of the synthetic control method?

    I have read the thread from 2023 Unparallel pre-intervention trends in synthetic control - Statalist But don't think I share any of the same concerns. I'm interested in the method proposed by Jeff Wooldridge but would need more information on how to actually do this.

    I'd be grateful for any help/ ideas at all as I'm fairly new to econometrics.
    Thanks, Moses

  • #2
    In case it helps anyone down the line this is a response I got from xiqing xu:

    What's happening: gsynth estimates latent factors from never-treated units only, then projects them onto treated units via estimated factor loadings. When CV selects r = 0, it means adding factors estimated from controls doesn't improve out-of-sample prediction for treated units' pre-treatment outcomes — in other words, the control group's factor structure doesn't extrapolate well to the treated group. The "No factors are included in the model" error when plotting loadings is expected in this case (there are no factors to plot).

    The fact that manually setting r still gives poor pre-treatment matching confirms this: the treated and control groups likely load differently on the latent factors, so the counterfactual constructed from control-group factors doesn't track the treated units well.

    Suggestions:
    1. Try method = "mc" (matrix completion) or method = "ife" (interactive fixed effects) — both use all units for estimation rather than only never-treated controls, which often produces better pre-treatment fit when the treated and control groups differ in their factor structure.
    2. If you have time-varying covariates, include them (e.g., Y ~ D + X1 + X2). With Y ~ D alone, the model relies entirely on the outcome series to identify the factor structure.
    3. The poor pre-treatment fit is itself informative — it suggests the control units may not serve as a good donor pool for these treated units under the factor model assumption. This is worth discussing substantively in your analysis.

    Comment

    Working...
    X