Dear all,
I am using the current Version Stata 14 on Windows.
First, I want to provide a short explanation of what my analysis is:
I have an unbalanced panel of firm data for the years 2000-2014. I investigate the consequences of successions in family firms on firm performance using a difference-in-differences estimation approach on a matched sample. In my initial sample I have around 1600 firms out of which 235 firms experienced a succession in one year. To create a matched sample with control firms similar to the treated firms I use propensity score matching applying the Stata psmatch2 command. I consider firms that experienced a succession in one year as treated and firms that never experienced a succession as untreated.
After the matching procedure I run a diff-in-diff panel regression (using xtreg re) to evaluate whether the performance in the years after succession of firms with a succession differs from those firms that did not experience a succession. As performance measures I look at several different outcomes (from Survey answers or balance sheet information) such as the expected development of Business, the expected the development of employment, credit allocation, capital expenditures, debt, cash flow, roa etc.
So in my first step I run a logit regression and obtain pscores. For the logit regression I collapse my dataset to the firm Level and extimate the logit regression in the cross-section. I Regress the dummy of Treatment (succession yes or no) on several firm characteristics such as firm Age, firm Age squared, legal form, industry and employment size dummies.
Here is the code for that step:
* collapse data to firm level
collapse succession_yes state industry year_of_incorporation legal_form employment employment_size l_employment firm_age firm_age_cat state_business exp_business exp_employment orders diff_finan credit_alloc debt capex total_assets size_assets total_equity tangible_assets cash_flow cash_cash_equivalent roa sales operating_revenue gross_profit_loss, by(IDNUM_ZAEHLER)
*logit
logit succession_yes firm_age firm_age_2 i.r_legal_form i.r_employment_size i.industry
est store model1
predict pscore1
In the next step I apply the matching algorithm using psmatch2. For my baseline I use nearest-neighbor matching (1-to-1) without replacement imposing a caliper of 0.05 and common support option. I had to modify the matching procedure because of the following problems I encountered:
1) I looped over all years to guarantee that treatment and controls are taken always from same year
2) before matching I need to exclude firms that are treated in a year other than i, so that those can't be used as controls in year i (because later in the diff-in-diff I look at performance in the following years after treatment)
3) I need to exclude firms that were used as controls in year i (so they can't be used again as controls in other years)
4) I re-run the matching for every outcome as some of the outcomes have a lot worse data availability (many missing) and I wanted each match to create a sample as big as possible
Here is the code:
* loop over possible outcomes
foreach o in $outcomes_survey $outcomes_bs {
*go to folder
cd "${root}/${succession}/results/analysis/1NN-caliper0-05/`o'"
* loop over all years to guarantee that treatment and controls are taken always from same year
* replace outcome here
capture drop outcome
gen outcome = `o'
label variable outcome "`o'"
*1 nearest neighbor without replacement, caliper 0.05
capture drop ident treated control pscore treated2 support weight2 id_2 nn n1 pdif
capture drop _pscore _treated _support _weight _id _n1 _nn _pdif _outcome
foreach var in ident treated control pscore treated2 support weight2 id_2 nn n1 pdif {
gen `var' = .
}
local start = 2000
local end = 2014
forvalue i = `start'(1)`end' {
qui count if year == `i' & succession == 1 & pscore1 != .
local decideon = 0
local decideon = r(N)
if `decideon' > 0 {
capture drop _pscore _treated _weight _id _n1 _nn _pdif
set seed 123456
*DEALING WITH TREATED
*before matching I need to somehow exclude firms that are treated in a year other than i, so that those can't be used as controls in year i
*tagging firms treated in year other than i
sort IDNUM_ZAEHLER year
bysort IDNUM_ZAEHLER (year): gen treatnot`i'=1 if succession==1 & year!=`i'
count if treatnot`i'==1
bysort IDNUM_ZAEHLER: carryforward treatnot`i', gen(treatnot`i'2)
gsort IDNUM_ZAEHLER - year
bysort IDNUM_ZAEHLER: carryforward treatnot`i'2, gen(treatnot`i'final)
cap drop treatnot`i' treatnot`i'2
xtsum treatnot`i'final
sort IDNUM_ZAEHLER year
*save dataset containing firms treated in year other than i
preserve
by IDNUM_ZAEHLER (year): keep if treatnot`i'final==1
save data/treatnot`i'dataset.dta, replace
restore
*drop firms treated in year other than i
sort IDNUM_ZAEHLER year
by IDNUM_ZAEHLER (year): drop if treatnot`i'final==1
*MATCH
capture psmatch2 succession if year == `i' & pscore1 != .,out(`o') p(pscore1) neighbor(1) common caliper(.05) noreplacement
capture replace year_dummy = 1 if _treated!=. & year == `i'
capture replace ident = 1 if _weight != . & year == `i'
capture replace treated = 1 if _treated == 1 & _support == 1 & year == `i'
capture replace control = 1 if _treated == 0 & _support == 1 & year == `i'
capture replace pscore = _pscore if year == `i'
capture replace treated2 = _treated if year == `i'
capture replace support = _support if year == `i'
capture replace weight2 = _weight if year == `i'
capture replace id_2 = _id if year == `i'
capture replace n1 = _n1 if year == `i'
capture replace nn = _nn if year == `i'
capture replace pdif = _pdif if year == `i'
qui count if succession == 1 & year == `i'
di r(N) " treated firms exist in year = `i' "
qui count if _treated == 1 & year == `i'
di r(N) " treated firms are identified by the command in year = `i' "
qui count if _treated == 1 & _support == 0 & year == `i'
di r(N) " treated firms were off support in year = `i' "
*drop variable treatnot i
cap drop treatnot`i'final
*append dataset containing firms treated in year other than i
merge 1:1 IDNUM_ZAEHLER year using data/treatnot`i'dataset.dta
drop _merge
drop treatnot*final
*DEALING WITH CONTROLS
**drop firms that were used as controls in year i (so they can't be used again as controls in other years)
*tag controls
sort IDNUM_ZAEHLER year
bysort IDNUM_ZAEHLER (year): gen control`i'=1 if _treated == 0 & _weight == 1 & year == `i'
count if control`i'==1
bysort IDNUM_ZAEHLER: carryforward control`i', gen(control`i'2)
gsort IDNUM_ZAEHLER - year
bysort IDNUM_ZAEHLER: carryforward control`i'2, gen(control`i'final)
cap drop control`i' control`i'2
xtsum control`i'final
*problem now, as all control firms are dropped, we need to save them and add back in the end
preserve
sort IDNUM_ZAEHLER year
by IDNUM_ZAEHLER (year): keep if control`i'final!=.
if `i' == `start' {
save data/controldataset.dta, replace
}
else {
append using data/controldataset.dta
}
save data/controldataset.dta, replace
restore
*drop controls in i
sort IDNUM_ZAEHLER year
by IDNUM_ZAEHLER (year): drop if control`i'final!=.
cap drop control`i'final
}
}
*merge back controls
merge 1:1 IDNUM_ZAEHLER year using data/controldataset.dta
drop _merge
drop control*final
}
After that I looked at the quality of the match (balancing properties and graph pscore density). I will not post this part here.
As my last step I now want to run the difference-in-differences estimation using the matched sample given by the psmatch2 routine.
For the estimation I want to Regress my outcomes (=firm performance) on a dummy indication succession (yes, no), a dummy indicating the years post-succession (post = 1 if years after succession, 0 otherwise), the treatment effect is then the interaction of succession and post variable. As further controls I include the firm characteristics I used in the logit regression when I calculated the pscores.
In order to run this regression I first need to define the post variable for the matched control firms. For that I use the year of succession for treated firms to compute the counterfactual year also for the matched control group.
* generate post_c with a fake succession event for control group
gen post_c=1 if ident==1 & treated2==0 & weight2==1
* post_c for all years after fake succession
sort IDNUM_ZAEHLER year
forvalues i = 1/15 {
bysort IDNUM_ZAEHLER: replace post_c=1 if ident[_n-`i']==1 & treated2[_n-`i']==0 & weight2[_n-`i']==1
}
The next problem I encountered was than that the weight2 variable is only non missing in the year of succession, but whole firms should be included, otherwise I cant look at the development of performance after succession. So I created a variable that includes the whole firm ID.
* extend weight variable to whole idnum instead of just one year
sort IDNUM_ZAEHLER year
cap drop inmatch
bysort IDNUM_ZAEHLER (year): gen inmatch=1 if weight2 == 1
count if inmatch==1
cap drop inmatch2
bysort IDNUM_ZAEHLER: carryforward inmatch, gen(inmatch2)
gsort IDNUM_ZAEHLER - year
cap drop inmatchfinal
bysort IDNUM_ZAEHLER: carryforward inmatch2, gen(inmatchfinal)
cap drop inmatch inmatch2
xtsum inmatchfinal
sort IDNUM_ZAEHLER year
So now I can finally run my diff-in-diff estimation using the weights from the psmacth2 which I extended to include the whole firms:
I first run pooled OLS:
* DiD treatment effect
xi: reg outcome succession_yes##post firm_age firm_age_2 i.legal_form i.employment_size i.industry i.year [aw=inmatchfinal], cluster(IDNUM_ZAEHLER)
estimates store didatt1`v'
But to account for my panel data I actually want to run panel OLS using random effects.
xi: xtreg outcome succession_yes##post firm_age firm_age_2 i.legal_form i.employment_size i.industry i.year if inmatchfinal!=., re rob
estimates store didatt2`v'
My problem here is that no aweights are allowed with panel OLS RE.
Since my weight with the 1-1- matching is always 1, it should not matter and I just run xtreg re on all nonmissings.
But as robustness tests I run different matching algorithms ( 2NN, 5NN, radius and caliper). When using those matching techniques weights differ by firm and are smaller than 1. As far as I understand how I should run the diff-in-diff on the matched sample, I would have to use the weights also in the xtreg re regression for my panel data. But weights are not allowed for the Stata command xtreg re. I read that the population-averaged xtreg is supposed to be similar to xtreg re. So I tried to run xtreg pa rob instead and include the weights as pweights. But this does not work neither because the weights are not constant within the panel.
So how can I run a panel random-effects OLS regression (diff-in-diff) including the weights from matching?
I hope my procedure and estimations are clear. Your help is greatly appreciated.
I have the following questions:
- Is the Stata code how I perform the matching correct given my research question and data structure?
- Is my understanding of the matching procedure and how I apply it to the diff-in-diff estimation later correct? To run the regression on the matched sample is it enough to use the weights from psmatch2 or do I need to somehow differently account for the pairs created my the match? Because the way it is now, I just run the regression on a smaller sample than the full sample but I do not account for which controls are matched to which treated firms, correct? Or do the weights take care of that?
- And especially important for matching algorithms other than 1 NN: How can I run a panel OLS with XTREG RE including weights??
Thank you in advance,
Marina
I am using the current Version Stata 14 on Windows.
First, I want to provide a short explanation of what my analysis is:
I have an unbalanced panel of firm data for the years 2000-2014. I investigate the consequences of successions in family firms on firm performance using a difference-in-differences estimation approach on a matched sample. In my initial sample I have around 1600 firms out of which 235 firms experienced a succession in one year. To create a matched sample with control firms similar to the treated firms I use propensity score matching applying the Stata psmatch2 command. I consider firms that experienced a succession in one year as treated and firms that never experienced a succession as untreated.
After the matching procedure I run a diff-in-diff panel regression (using xtreg re) to evaluate whether the performance in the years after succession of firms with a succession differs from those firms that did not experience a succession. As performance measures I look at several different outcomes (from Survey answers or balance sheet information) such as the expected development of Business, the expected the development of employment, credit allocation, capital expenditures, debt, cash flow, roa etc.
So in my first step I run a logit regression and obtain pscores. For the logit regression I collapse my dataset to the firm Level and extimate the logit regression in the cross-section. I Regress the dummy of Treatment (succession yes or no) on several firm characteristics such as firm Age, firm Age squared, legal form, industry and employment size dummies.
Here is the code for that step:
* collapse data to firm level
collapse succession_yes state industry year_of_incorporation legal_form employment employment_size l_employment firm_age firm_age_cat state_business exp_business exp_employment orders diff_finan credit_alloc debt capex total_assets size_assets total_equity tangible_assets cash_flow cash_cash_equivalent roa sales operating_revenue gross_profit_loss, by(IDNUM_ZAEHLER)
*logit
logit succession_yes firm_age firm_age_2 i.r_legal_form i.r_employment_size i.industry
est store model1
predict pscore1
In the next step I apply the matching algorithm using psmatch2. For my baseline I use nearest-neighbor matching (1-to-1) without replacement imposing a caliper of 0.05 and common support option. I had to modify the matching procedure because of the following problems I encountered:
1) I looped over all years to guarantee that treatment and controls are taken always from same year
2) before matching I need to exclude firms that are treated in a year other than i, so that those can't be used as controls in year i (because later in the diff-in-diff I look at performance in the following years after treatment)
3) I need to exclude firms that were used as controls in year i (so they can't be used again as controls in other years)
4) I re-run the matching for every outcome as some of the outcomes have a lot worse data availability (many missing) and I wanted each match to create a sample as big as possible
Here is the code:
* loop over possible outcomes
foreach o in $outcomes_survey $outcomes_bs {
*go to folder
cd "${root}/${succession}/results/analysis/1NN-caliper0-05/`o'"
* loop over all years to guarantee that treatment and controls are taken always from same year
* replace outcome here
capture drop outcome
gen outcome = `o'
label variable outcome "`o'"
*1 nearest neighbor without replacement, caliper 0.05
capture drop ident treated control pscore treated2 support weight2 id_2 nn n1 pdif
capture drop _pscore _treated _support _weight _id _n1 _nn _pdif _outcome
foreach var in ident treated control pscore treated2 support weight2 id_2 nn n1 pdif {
gen `var' = .
}
local start = 2000
local end = 2014
forvalue i = `start'(1)`end' {
qui count if year == `i' & succession == 1 & pscore1 != .
local decideon = 0
local decideon = r(N)
if `decideon' > 0 {
capture drop _pscore _treated _weight _id _n1 _nn _pdif
set seed 123456
*DEALING WITH TREATED
*before matching I need to somehow exclude firms that are treated in a year other than i, so that those can't be used as controls in year i
*tagging firms treated in year other than i
sort IDNUM_ZAEHLER year
bysort IDNUM_ZAEHLER (year): gen treatnot`i'=1 if succession==1 & year!=`i'
count if treatnot`i'==1
bysort IDNUM_ZAEHLER: carryforward treatnot`i', gen(treatnot`i'2)
gsort IDNUM_ZAEHLER - year
bysort IDNUM_ZAEHLER: carryforward treatnot`i'2, gen(treatnot`i'final)
cap drop treatnot`i' treatnot`i'2
xtsum treatnot`i'final
sort IDNUM_ZAEHLER year
*save dataset containing firms treated in year other than i
preserve
by IDNUM_ZAEHLER (year): keep if treatnot`i'final==1
save data/treatnot`i'dataset.dta, replace
restore
*drop firms treated in year other than i
sort IDNUM_ZAEHLER year
by IDNUM_ZAEHLER (year): drop if treatnot`i'final==1
*MATCH
capture psmatch2 succession if year == `i' & pscore1 != .,out(`o') p(pscore1) neighbor(1) common caliper(.05) noreplacement
capture replace year_dummy = 1 if _treated!=. & year == `i'
capture replace ident = 1 if _weight != . & year == `i'
capture replace treated = 1 if _treated == 1 & _support == 1 & year == `i'
capture replace control = 1 if _treated == 0 & _support == 1 & year == `i'
capture replace pscore = _pscore if year == `i'
capture replace treated2 = _treated if year == `i'
capture replace support = _support if year == `i'
capture replace weight2 = _weight if year == `i'
capture replace id_2 = _id if year == `i'
capture replace n1 = _n1 if year == `i'
capture replace nn = _nn if year == `i'
capture replace pdif = _pdif if year == `i'
qui count if succession == 1 & year == `i'
di r(N) " treated firms exist in year = `i' "
qui count if _treated == 1 & year == `i'
di r(N) " treated firms are identified by the command in year = `i' "
qui count if _treated == 1 & _support == 0 & year == `i'
di r(N) " treated firms were off support in year = `i' "
*drop variable treatnot i
cap drop treatnot`i'final
*append dataset containing firms treated in year other than i
merge 1:1 IDNUM_ZAEHLER year using data/treatnot`i'dataset.dta
drop _merge
drop treatnot*final
*DEALING WITH CONTROLS
**drop firms that were used as controls in year i (so they can't be used again as controls in other years)
*tag controls
sort IDNUM_ZAEHLER year
bysort IDNUM_ZAEHLER (year): gen control`i'=1 if _treated == 0 & _weight == 1 & year == `i'
count if control`i'==1
bysort IDNUM_ZAEHLER: carryforward control`i', gen(control`i'2)
gsort IDNUM_ZAEHLER - year
bysort IDNUM_ZAEHLER: carryforward control`i'2, gen(control`i'final)
cap drop control`i' control`i'2
xtsum control`i'final
*problem now, as all control firms are dropped, we need to save them and add back in the end
preserve
sort IDNUM_ZAEHLER year
by IDNUM_ZAEHLER (year): keep if control`i'final!=.
if `i' == `start' {
save data/controldataset.dta, replace
}
else {
append using data/controldataset.dta
}
save data/controldataset.dta, replace
restore
*drop controls in i
sort IDNUM_ZAEHLER year
by IDNUM_ZAEHLER (year): drop if control`i'final!=.
cap drop control`i'final
}
}
*merge back controls
merge 1:1 IDNUM_ZAEHLER year using data/controldataset.dta
drop _merge
drop control*final
}
After that I looked at the quality of the match (balancing properties and graph pscore density). I will not post this part here.
As my last step I now want to run the difference-in-differences estimation using the matched sample given by the psmatch2 routine.
For the estimation I want to Regress my outcomes (=firm performance) on a dummy indication succession (yes, no), a dummy indicating the years post-succession (post = 1 if years after succession, 0 otherwise), the treatment effect is then the interaction of succession and post variable. As further controls I include the firm characteristics I used in the logit regression when I calculated the pscores.
In order to run this regression I first need to define the post variable for the matched control firms. For that I use the year of succession for treated firms to compute the counterfactual year also for the matched control group.
* generate post_c with a fake succession event for control group
gen post_c=1 if ident==1 & treated2==0 & weight2==1
* post_c for all years after fake succession
sort IDNUM_ZAEHLER year
forvalues i = 1/15 {
bysort IDNUM_ZAEHLER: replace post_c=1 if ident[_n-`i']==1 & treated2[_n-`i']==0 & weight2[_n-`i']==1
}
The next problem I encountered was than that the weight2 variable is only non missing in the year of succession, but whole firms should be included, otherwise I cant look at the development of performance after succession. So I created a variable that includes the whole firm ID.
* extend weight variable to whole idnum instead of just one year
sort IDNUM_ZAEHLER year
cap drop inmatch
bysort IDNUM_ZAEHLER (year): gen inmatch=1 if weight2 == 1
count if inmatch==1
cap drop inmatch2
bysort IDNUM_ZAEHLER: carryforward inmatch, gen(inmatch2)
gsort IDNUM_ZAEHLER - year
cap drop inmatchfinal
bysort IDNUM_ZAEHLER: carryforward inmatch2, gen(inmatchfinal)
cap drop inmatch inmatch2
xtsum inmatchfinal
sort IDNUM_ZAEHLER year
So now I can finally run my diff-in-diff estimation using the weights from the psmacth2 which I extended to include the whole firms:
I first run pooled OLS:
* DiD treatment effect
xi: reg outcome succession_yes##post firm_age firm_age_2 i.legal_form i.employment_size i.industry i.year [aw=inmatchfinal], cluster(IDNUM_ZAEHLER)
estimates store didatt1`v'
But to account for my panel data I actually want to run panel OLS using random effects.
xi: xtreg outcome succession_yes##post firm_age firm_age_2 i.legal_form i.employment_size i.industry i.year if inmatchfinal!=., re rob
estimates store didatt2`v'
My problem here is that no aweights are allowed with panel OLS RE.
Since my weight with the 1-1- matching is always 1, it should not matter and I just run xtreg re on all nonmissings.
But as robustness tests I run different matching algorithms ( 2NN, 5NN, radius and caliper). When using those matching techniques weights differ by firm and are smaller than 1. As far as I understand how I should run the diff-in-diff on the matched sample, I would have to use the weights also in the xtreg re regression for my panel data. But weights are not allowed for the Stata command xtreg re. I read that the population-averaged xtreg is supposed to be similar to xtreg re. So I tried to run xtreg pa rob instead and include the weights as pweights. But this does not work neither because the weights are not constant within the panel.
So how can I run a panel random-effects OLS regression (diff-in-diff) including the weights from matching?
I hope my procedure and estimations are clear. Your help is greatly appreciated.
I have the following questions:
- Is the Stata code how I perform the matching correct given my research question and data structure?
- Is my understanding of the matching procedure and how I apply it to the diff-in-diff estimation later correct? To run the regression on the matched sample is it enough to use the weights from psmatch2 or do I need to somehow differently account for the pairs created my the match? Because the way it is now, I just run the regression on a smaller sample than the full sample but I do not account for which controls are matched to which treated firms, correct? Or do the weights take care of that?
- And especially important for matching algorithms other than 1 NN: How can I run a panel OLS with XTREG RE including weights??
Thank you in advance,
Marina
Comment