Diff in diff: Fix data for one-to-one matching propensity score

Andrea Bravo

Join Date: Oct 2021

Posts: 3
#1

Diff in diff: Fix data for one-to-one matching propensity score

31 Oct 2021, 20:51

I have a data set of companies with the date they were acquired by a business group and a control group in which the companies have been in their business group since their creation. Each company has as many observations as the years between the creation date of the company and 2018 and a dummy variable that is 1 between the acquisition date and 2018. For the control group the dummy variable is 0 in all observations.

I also used one-to-one propensity score matching to form groups within the treatment and the control group considering industry and company creation date. Using the pscore, how can I change the dummy variable for control observations to be 1 in the date their pair in the treatment group was acquired?

Thanks!
Tags: None
Fei Wang

Join Date: Oct 2021

Posts: 726
#2

31 Oct 2021, 21:20

Andrea, I understand your question. But using dataex to display your data example, as in many other posts, would be useful for receiving suggestion on specific code.

Last edited by Fei Wang; 31 Oct 2021, 21:23.
Comment
Andrea Bravo

Join Date: Oct 2021

Posts: 3
#3

01 Nov 2021, 13:03

This is an example of my data. Ruc indicates the company ID, Date_Acq is the date the company was acquired, dummy is 1 if the company is part of the treatment and trat is 1 in the periods after the company gets acquired. _Pscore is the matching score. As you can see "trat" is always 0 for the companies that are not part of the treatment but I would like them to be 1 in the year their treatment pair is acquired.
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#4

01 Nov 2021, 22:26

Andrea, it seems you used psmatch2 to match treated and control companies. While using this command, you'll not only get _pscore, but also some other new variables, like _id and _n1. _id is the new identifier of each company and _n1 is the new id of the company that matches the former one. Based on this linkage, you may change the values of x for a control company to the x values of its matched company in the treated group. I don't have a panel data like yours, and will just show a simple example using a cross-sectional data.

Code:

webuse cattaneo2, clear * Manipulate a variable x gen x = runiform() * PSM psmatch2 mbsmoke mmarried c.mage##c.mage fbaby medu, ate * Replace x values of control units to the x values of their matchted units in the treated group sort _id replace x = x[_n1] if !_treated

Of course, your case is much more complicated -- A staggered DiD with PSM. Not sure if your operation is correct. It seems you'd like to do a one-to-one matching for ATE not for ATET (for the former, each treated or control unit has one matched unit in the other group; for the latter, only treated units have their matched units from the control group). The matching before DiD can only be conducted for pre-acquisition years. As companies may be acquired at different years, I would pick the common years before acquisition for matching. In your data example in #3, I would pick variables in 2014 and 2015 for matching. (Information could be more efficiently used for matching for ATET where I can match treated units with their control counterparts by the timing of acquisition) I would first reshape the data from long form to wide form and then operate matching, like my example code above. After finishing everything, including replacing trat values of control units with their treated counterparts, I reshape the data back to long form for further DiD analysis.
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#5

01 Nov 2021, 23:49

Personally, I would match for ATET and then do DiD, because DiD is essentially ATET. For example, companies 1-5 are treated, 6-10 are control. After matching for ATET, the matched counterparts for companies 1-5 are companies 6, 7, 6, 8, 7, respectively. Companies 6 and 7 are used twice for matching. Then the matched data should contain another duplicates for 6 and 7 -- replace trat of the first company 6 with that of company 1, and trat of the second company 6 with that of company 3. There are different ways of doing it, but I'd stop for now unless I can find similar panel data to specify the code.
Comment

Fei Wang

Join Date: Oct 2021
Posts: 726

02 Nov 2021, 02:05

Ultimately, I managed to go through the whole process with a simulated data. Not necessarily a good solution to PSM-Staggered DiD, but just a code to realize my algorithm in #5.

Code:

* DGP for PSM-Staggered DID
    clear
    set obs 100
    tempfile data1 data2
    gen id = _n        // unit id
    gen d = id <= 50    // d = 1 for treated units, = 0 for control units
    gen tt = runiformint(1,2) if d    // treatment type
    expand 3
    bys id: gen t = _n    // time
    bys id (t): gen w = _n > tt        // treatment
    gen x = rnormal(d,1) + rnormal(t,1) + rnormal(w,1)    // covariate
    gen y = 1 + 2*d*(tt==1) + 3*d*(tt==2) + 4*(t==2) + 5*(t==3) + 6*w*(tt==1) + 7*w*(tt==2) + 8*x + rnormal()
    
* PSM
    reshape wide d tt x w y, i(id) j(t)        // reshape to cross-section
    psmatch2 d1 x1 if (tt1 == 1) | (tt1 == .)    // match for treatment type-1 firms
    sort _id
    gen cid = id[_n1] if tt1 == 1    // id of matched control units
    drop _*

    psmatch2 d1 x1 x2 if (tt1 == 2) | (tt1 == .)    // match for treatment type-2 firms
    sort _id
    replace cid = id[_n1] if tt1 == 2    // id of matched control units
    drop _*
    save `data1', replace
    
    keep if d1 == 0
    ren * c=.
    save `data2', replace    // subsample for control units
    
    use `data1', clear
    merge m:1 cid using `data2', nogen keep(3)

    * replace values of control unit with counterparts from treated units
    forvalues i = 1(1)3 {
        replace cw`i' = w`i'
        replace ctt`i' = tt`i'
    }

    save `data1', replace
    keep c*
    ren c* *
    append using `data1'    // matched data
    drop c*
    
    sort id
    gen newid = _n
    reshape long d tt x w y, i(newid) j(t)    // reshape back to long form
    
* Staggered DiD
    gen d1 = d*(tt==1)
    gen d2 = d*(tt==2)
    gen w1 = d1*w
    gen w2 = d2*w
    
    reg y d1 d2 i.t w1 w2 x, vce(cl id)

Announcement

Diff in diff: Fix data for one-to-one matching propensity score

Comment

Comment

Comment

Comment

Comment