Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to prevent duplicates with coarsened Exact Matching (CEM) on yearly panel data

    Problem: When applying Coarsened Exact Matching (CEM) on yearly panel data (2005-2011), two different treatment persons are matched on two different years of only one control person. I have many control individuals and want to force every treated individual to be matched to only one control individual, so control individuals can only be used once.

    Data: I observe every individual over 7 years (500 treated persons, 14.000 control persons, over 100.000 person-year observations) and treatment occurs at a different years for different treated individuals.

    Aim and approach so far: I created a new dataset in which I only kept the year prior to treatment for the treated, but the full observation period (so 7 rows) for the control group as we want the treated person to be matched to the control group in the same calendar year (so the full span of control observations should be available to find the match).

    Code:
    cem sex(#0) age(18 25 45 45 55 65 100) employed(#0) year(#0), treatment(treated) k2k
    keep if cem_matched==1
    duplicates tag ID, gen(dup)
    tab dup
    With CEM, the year(#0) (and other covariates) and k2k option I made sure that treated individuals are matched to treated observations with similar same pre-treatment characteristics in the same calendar year. But while treated individual A (treatment in 2007, pre-treatment year 2006) can be matched to individual C in 2006, treated individual B (treatment in 2010) can also be matched to individual C in 2009. This problem is rather substantial, for the 500 matches made over a 100 come from the same control individual (I identified duplicates to find this out).

    I understand that the problem occurs because I match a treated individual to a multiple control observations per control individual. A straightforward solution is of course to match in wide format with the k2k option, but this is not possible, as I want to force a match in the same calendar year and need the full span of control observations. Unlike psmatch2, where you can identify pairs, CEM sorts individuals in strata, so I have problems seeing how I can identify those treated individuals that are matched to the same control individuals and how to circumvent this.

    Any suggestions to prevent the duplicates? (any other comments and suggestions on my approach are very much welcome, thanks!)

    P.S. My aim eventually is to merge the matches back to the original panel dataset (keeping the CEM strata identifier) and select only the pre- and post-treatment year for the treatment variables and the same years of the control group and apply a difference in difference design.

  • #2
    I am curious whether it is resolved? I have a similar problem. I have a panel dataset that includes who change jobs (treated) and who do not. My objective is to look at the compensation change after job change.

    I am trying to match job changers (individuals) with those who do not change jobs (preferably with similar characteristics e.g. age gender tenure... in the pre jobchange years of the treated individuals). Treatment is the job change. Hence, it occurs in different years for treated individuals.

    I want to keep all observations of the job changers (before and after treatment) so that I could compare t+1, t+2, t+3 compensations of job changers with those of the ones who stay in their firms.

    Comment

    Working...
    X