Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Diff in diff: Fix data for one-to-one matching propensity score

    I have a data set of companies with the date they were acquired by a business group and a control group in which the companies have been in their business group since their creation. Each company has as many observations as the years between the creation date of the company and 2018 and a dummy variable that is 1 between the acquisition date and 2018. For the control group the dummy variable is 0 in all observations.

    I also used one-to-one propensity score matching to form groups within the treatment and the control group considering industry and company creation date. Using the pscore, how can I change the dummy variable for control observations to be 1 in the date their pair in the treatment group was acquired?

    Thanks!

  • #2
    Andrea, I understand your question. But using dataex to display your data example, as in many other posts, would be useful for receiving suggestion on specific code.
    Last edited by Fei Wang; 31 Oct 2021, 21:23.

    Comment


    • #3
      This is an example of my data. Ruc indicates the company ID, Date_Acq is the date the company was acquired, dummy is 1 if the company is part of the treatment and trat is 1 in the periods after the company gets acquired. _Pscore is the matching score. As you can see "trat" is always 0 for the companies that are not part of the treatment but I would like them to be 1 in the year their treatment pair is acquired.
      Click image for larger version

Name:	imagen_2021-11-01_135933.png
Views:	1
Size:	11.5 KB
ID:	1634254

      Comment


      • #4
        Andrea, it seems you used psmatch2 to match treated and control companies. While using this command, you'll not only get _pscore, but also some other new variables, like _id and _n1. _id is the new identifier of each company and _n1 is the new id of the company that matches the former one. Based on this linkage, you may change the values of x for a control company to the x values of its matched company in the treated group. I don't have a panel data like yours, and will just show a simple example using a cross-sectional data.

        Code:
            webuse cattaneo2, clear
        
        * Manipulate a variable x
            gen x = runiform()
            
        * PSM
            psmatch2 mbsmoke mmarried c.mage##c.mage fbaby medu, ate
            
        * Replace x values of control units to the x values of their matchted units in the treated group
            sort _id
            replace x = x[_n1] if !_treated
        Of course, your case is much more complicated -- A staggered DiD with PSM. Not sure if your operation is correct. It seems you'd like to do a one-to-one matching for ATE not for ATET (for the former, each treated or control unit has one matched unit in the other group; for the latter, only treated units have their matched units from the control group). The matching before DiD can only be conducted for pre-acquisition years. As companies may be acquired at different years, I would pick the common years before acquisition for matching. In your data example in #3, I would pick variables in 2014 and 2015 for matching. (Information could be more efficiently used for matching for ATET where I can match treated units with their control counterparts by the timing of acquisition) I would first reshape the data from long form to wide form and then operate matching, like my example code above. After finishing everything, including replacing trat values of control units with their treated counterparts, I reshape the data back to long form for further DiD analysis.

        Comment


        • #5
          Personally, I would match for ATET and then do DiD, because DiD is essentially ATET. For example, companies 1-5 are treated, 6-10 are control. After matching for ATET, the matched counterparts for companies 1-5 are companies 6, 7, 6, 8, 7, respectively. Companies 6 and 7 are used twice for matching. Then the matched data should contain another duplicates for 6 and 7 -- replace trat of the first company 6 with that of company 1, and trat of the second company 6 with that of company 3. There are different ways of doing it, but I'd stop for now unless I can find similar panel data to specify the code.

          Comment


          • #6
            Ultimately, I managed to go through the whole process with a simulated data. Not necessarily a good solution to PSM-Staggered DiD, but just a code to realize my algorithm in #5.

            Code:
            * DGP for PSM-Staggered DID
                clear
                set obs 100
                tempfile data1 data2
                gen id = _n        // unit id
                gen d = id <= 50    // d = 1 for treated units, = 0 for control units
                gen tt = runiformint(1,2) if d    // treatment type
                expand 3
                bys id: gen t = _n    // time
                bys id (t): gen w = _n > tt        // treatment
                gen x = rnormal(d,1) + rnormal(t,1) + rnormal(w,1)    // covariate
                gen y = 1 + 2*d*(tt==1) + 3*d*(tt==2) + 4*(t==2) + 5*(t==3) + 6*w*(tt==1) + 7*w*(tt==2) + 8*x + rnormal()
                
            * PSM
                reshape wide d tt x w y, i(id) j(t)        // reshape to cross-section
                psmatch2 d1 x1 if (tt1 == 1) | (tt1 == .)    // match for treatment type-1 firms
                sort _id
                gen cid = id[_n1] if tt1 == 1    // id of matched control units
                drop _*
            
                psmatch2 d1 x1 x2 if (tt1 == 2) | (tt1 == .)    // match for treatment type-2 firms
                sort _id
                replace cid = id[_n1] if tt1 == 2    // id of matched control units
                drop _*
                save `data1', replace
                
                keep if d1 == 0
                ren * c=.
                save `data2', replace    // subsample for control units
                
                use `data1', clear
                merge m:1 cid using `data2', nogen keep(3)
            
                * replace values of control unit with counterparts from treated units
                forvalues i = 1(1)3 {
                    replace cw`i' = w`i'
                    replace ctt`i' = tt`i'
                }
            
                save `data1', replace
                keep c*
                ren c* *
                append using `data1'    // matched data
                drop c*
                
                sort id
                gen newid = _n
                reshape long d tt x w y, i(newid) j(t)    // reshape back to long form
                
            * Staggered DiD
                gen d1 = d*(tt==1)
                gen d2 = d*(tt==2)
                gen w1 = d1*w
                gen w2 = d2*w
                
                reg y d1 d2 i.t w1 w2 x, vce(cl id)

            Comment

            Working...
            X