Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Diff-in-Diff regression with panel data using weights from psmatch2: How to use weights with xtreg re?

    Dear all,

    I am using the current Version Stata 14 on Windows.
    First, I want to provide a short explanation of what my analysis is:
    I have an unbalanced panel of firm data for the years 2000-2014. I investigate the consequences of successions in family firms on firm performance using a difference-in-differences estimation approach on a matched sample. In my initial sample I have around 1600 firms out of which 235 firms experienced a succession in one year. To create a matched sample with control firms similar to the treated firms I use propensity score matching applying the Stata psmatch2 command. I consider firms that experienced a succession in one year as treated and firms that never experienced a succession as untreated.
    After the matching procedure I run a diff-in-diff panel regression (using xtreg re) to evaluate whether the performance in the years after succession of firms with a succession differs from those firms that did not experience a succession. As performance measures I look at several different outcomes (from Survey answers or balance sheet information) such as the expected development of Business, the expected the development of employment, credit allocation, capital expenditures, debt, cash flow, roa etc.

    So in my first step I run a logit regression and obtain pscores. For the logit regression I collapse my dataset to the firm Level and extimate the logit regression in the cross-section. I Regress the dummy of Treatment (succession yes or no) on several firm characteristics such as firm Age, firm Age squared, legal form, industry and employment size dummies.
    Here is the code for that step:

    * collapse data to firm level
    collapse succession_yes state industry year_of_incorporation legal_form employment employment_size l_employment firm_age firm_age_cat state_business exp_business exp_employment orders diff_finan credit_alloc debt capex total_assets size_assets total_equity tangible_assets cash_flow cash_cash_equivalent roa sales operating_revenue gross_profit_loss, by(IDNUM_ZAEHLER)

    *logit
    logit succession_yes firm_age firm_age_2 i.r_legal_form i.r_employment_size i.industry
    est store model1
    predict pscore1


    In the next step I apply the matching algorithm using psmatch2. For my baseline I use nearest-neighbor matching (1-to-1) without replacement imposing a caliper of 0.05 and common support option. I had to modify the matching procedure because of the following problems I encountered:
    1) I looped over all years to guarantee that treatment and controls are taken always from same year
    2) before matching I need to exclude firms that are treated in a year other than i, so that those can't be used as controls in year i (because later in the diff-in-diff I look at performance in the following years after treatment)
    3) I need to exclude firms that were used as controls in year i (so they can't be used again as controls in other years)
    4) I re-run the matching for every outcome as some of the outcomes have a lot worse data availability (many missing) and I wanted each match to create a sample as big as possible

    Here is the code:
    * loop over possible outcomes

    foreach o in $outcomes_survey $outcomes_bs {


    *go to folder
    cd "${root}/${succession}/results/analysis/1NN-caliper0-05/`o'"


    * loop over all years to guarantee that treatment and controls are taken always from same year

    * replace outcome here
    capture drop outcome
    gen outcome = `o'
    label variable outcome "`o'"
    *1 nearest neighbor without replacement, caliper 0.05
    capture drop ident treated control pscore treated2 support weight2 id_2 nn n1 pdif
    capture drop _pscore _treated _support _weight _id _n1 _nn _pdif _outcome
    foreach var in ident treated control pscore treated2 support weight2 id_2 nn n1 pdif {
    gen `var' = .
    }
    local start = 2000
    local end = 2014
    forvalue i = `start'(1)`end' {
    qui count if year == `i' & succession == 1 & pscore1 != .
    local decideon = 0
    local decideon = r(N)
    if `decideon' > 0 {
    capture drop _pscore _treated _weight _id _n1 _nn _pdif
    set seed 123456
    *DEALING WITH TREATED
    *before matching I need to somehow exclude firms that are treated in a year other than i, so that those can't be used as controls in year i
    *tagging firms treated in year other than i
    sort IDNUM_ZAEHLER year
    bysort IDNUM_ZAEHLER (year): gen treatnot`i'=1 if succession==1 & year!=`i'
    count if treatnot`i'==1
    bysort IDNUM_ZAEHLER: carryforward treatnot`i', gen(treatnot`i'2)
    gsort IDNUM_ZAEHLER - year
    bysort IDNUM_ZAEHLER: carryforward treatnot`i'2, gen(treatnot`i'final)
    cap drop treatnot`i' treatnot`i'2
    xtsum treatnot`i'final
    sort IDNUM_ZAEHLER year
    *save dataset containing firms treated in year other than i
    preserve
    by IDNUM_ZAEHLER (year): keep if treatnot`i'final==1
    save data/treatnot`i'dataset.dta, replace
    restore
    *drop firms treated in year other than i
    sort IDNUM_ZAEHLER year
    by IDNUM_ZAEHLER (year): drop if treatnot`i'final==1
    *MATCH
    capture psmatch2 succession if year == `i' & pscore1 != .,out(`o') p(pscore1) neighbor(1) common caliper(.05) noreplacement
    capture replace year_dummy = 1 if _treated!=. & year == `i'
    capture replace ident = 1 if _weight != . & year == `i'
    capture replace treated = 1 if _treated == 1 & _support == 1 & year == `i'
    capture replace control = 1 if _treated == 0 & _support == 1 & year == `i'
    capture replace pscore = _pscore if year == `i'
    capture replace treated2 = _treated if year == `i'
    capture replace support = _support if year == `i'
    capture replace weight2 = _weight if year == `i'
    capture replace id_2 = _id if year == `i'
    capture replace n1 = _n1 if year == `i'
    capture replace nn = _nn if year == `i'
    capture replace pdif = _pdif if year == `i'
    qui count if succession == 1 & year == `i'
    di r(N) " treated firms exist in year = `i' "
    qui count if _treated == 1 & year == `i'
    di r(N) " treated firms are identified by the command in year = `i' "
    qui count if _treated == 1 & _support == 0 & year == `i'
    di r(N) " treated firms were off support in year = `i' "
    *drop variable treatnot i
    cap drop treatnot`i'final
    *append dataset containing firms treated in year other than i
    merge 1:1 IDNUM_ZAEHLER year using data/treatnot`i'dataset.dta
    drop _merge
    drop treatnot*final
    *DEALING WITH CONTROLS
    **drop firms that were used as controls in year i (so they can't be used again as controls in other years)
    *tag controls
    sort IDNUM_ZAEHLER year
    bysort IDNUM_ZAEHLER (year): gen control`i'=1 if _treated == 0 & _weight == 1 & year == `i'
    count if control`i'==1
    bysort IDNUM_ZAEHLER: carryforward control`i', gen(control`i'2)
    gsort IDNUM_ZAEHLER - year
    bysort IDNUM_ZAEHLER: carryforward control`i'2, gen(control`i'final)
    cap drop control`i' control`i'2
    xtsum control`i'final
    *problem now, as all control firms are dropped, we need to save them and add back in the end
    preserve
    sort IDNUM_ZAEHLER year
    by IDNUM_ZAEHLER (year): keep if control`i'final!=.
    if `i' == `start' {
    save data/controldataset.dta, replace
    }
    else {
    append using data/controldataset.dta
    }
    save data/controldataset.dta, replace
    restore
    *drop controls in i
    sort IDNUM_ZAEHLER year
    by IDNUM_ZAEHLER (year): drop if control`i'final!=.
    cap drop control`i'final
    }
    }
    *merge back controls
    merge 1:1 IDNUM_ZAEHLER year using data/controldataset.dta
    drop _merge
    drop control*final
    }


    After that I looked at the quality of the match (balancing properties and graph pscore density). I will not post this part here.

    As my last step I now want to run the difference-in-differences estimation using the matched sample given by the psmatch2 routine.
    For the estimation I want to Regress my outcomes (=firm performance) on a dummy indication succession (yes, no), a dummy indicating the years post-succession (post = 1 if years after succession, 0 otherwise), the treatment effect is then the interaction of succession and post variable. As further controls I include the firm characteristics I used in the logit regression when I calculated the pscores.

    In order to run this regression I first need to define the post variable for the matched control firms. For that I use the year of succession for treated firms to compute the counterfactual year also for the matched control group.

    * generate post_c with a fake succession event for control group
    gen post_c=1 if ident==1 & treated2==0 & weight2==1
    * post_c for all years after fake succession
    sort IDNUM_ZAEHLER year
    forvalues i = 1/15 {
    bysort IDNUM_ZAEHLER: replace post_c=1 if ident[_n-`i']==1 & treated2[_n-`i']==0 & weight2[_n-`i']==1
    }


    The next problem I encountered was than that the weight2 variable is only non missing in the year of succession, but whole firms should be included, otherwise I cant look at the development of performance after succession. So I created a variable that includes the whole firm ID.

    * extend weight variable to whole idnum instead of just one year
    sort IDNUM_ZAEHLER year
    cap drop inmatch
    bysort IDNUM_ZAEHLER (year): gen inmatch=1 if weight2 == 1
    count if inmatch==1
    cap drop inmatch2
    bysort IDNUM_ZAEHLER: carryforward inmatch, gen(inmatch2)
    gsort IDNUM_ZAEHLER - year
    cap drop inmatchfinal
    bysort IDNUM_ZAEHLER: carryforward inmatch2, gen(inmatchfinal)
    cap drop inmatch inmatch2
    xtsum inmatchfinal
    sort IDNUM_ZAEHLER year


    So now I can finally run my diff-in-diff estimation using the weights from the psmacth2 which I extended to include the whole firms:

    I first run pooled OLS:
    * DiD treatment effect
    xi: reg outcome succession_yes##post firm_age firm_age_2 i.legal_form i.employment_size i.industry i.year [aw=inmatchfinal], cluster(IDNUM_ZAEHLER)
    estimates store didatt1`v'

    But to account for my panel data I actually want to run panel OLS using random effects.

    xi: xtreg outcome succession_yes##post firm_age firm_age_2 i.legal_form i.employment_size i.industry i.year if inmatchfinal!=., re rob
    estimates store didatt2`v'

    My problem here is that no aweights are allowed with panel OLS RE.
    Since my weight with the 1-1- matching is always 1, it should not matter and I just run xtreg re on all nonmissings.
    But as robustness tests I run different matching algorithms ( 2NN, 5NN, radius and caliper). When using those matching techniques weights differ by firm and are smaller than 1. As far as I understand how I should run the diff-in-diff on the matched sample, I would have to use the weights also in the xtreg re regression for my panel data. But weights are not allowed for the Stata command xtreg re. I read that the population-averaged xtreg is supposed to be similar to xtreg re. So I tried to run xtreg pa rob instead and include the weights as pweights. But this does not work neither because the weights are not constant within the panel.
    So how can I run a panel random-effects OLS regression (diff-in-diff) including the weights from matching?



    I hope my procedure and estimations are clear. Your help is greatly appreciated.

    I have the following questions:
    - Is the Stata code how I perform the matching correct given my research question and data structure?
    - Is my understanding of the matching procedure and how I apply it to the diff-in-diff estimation later correct? To run the regression on the matched sample is it enough to use the weights from psmatch2 or do I need to somehow differently account for the pairs created my the match? Because the way it is now, I just run the regression on a smaller sample than the full sample but I do not account for which controls are matched to which treated firms, correct? Or do the weights take care of that?
    - And especially important for matching algorithms other than 1 NN: How can I run a panel OLS with XTREG RE including weights??


    Thank you in advance,
    Marina







  • #2
    I think your post is too long to get a good response. You can try splitting your questions into several readable pieces. I couldn't read the whole post but I can say that there is an underlying econometric reason why xtreg ..., re doesn't allow for weights. aweights, fweights, and pweights are allowed for the fixed-effects model. iweights, fweights, and pweights are allowed for the population-averaged model. iweights are allowed for the maximum-likelihood random-effects (MLE) model. Weights must be constant within panel. You can try looking at [U] 11.1.6 and [U] 20.23. You may need to write a new program based on iweights and designed specifically for your model to run a weighted estimation.

    help weight

    Comment


    • #3
      In addition to shortening the post, show all code and results between code delimiters, described in FAQ 12, which we ask you to read in its entirety.
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        Thank you for your hints. I am sorry for not posting my question correctly. This is the first time I posted here and I thought if I explain in detail what I do it would be easier to answer. So I will try again.
        I would like to pick up just the last part of my question.

        For my analysis I want to run a diff-in-diff estimation on a matched sample. The matching is done using propensity score matching with the command psmatch2.
        I have panel data. Yearly firm data. Treatment occurs in different years.

        Running a pooled OLS regression using the weights from psmatch2 looks like this in my case:

        Code:
        xi: reg outcome succession_yes##post firm_age firm_age_2 i.legal_form i.employment_size i.industry i.year [aw=inmatchfinal], cluster(IDNUM_ZAEHLER)
        But shouldn't I account for the panel structure of my data and rather use panel OLS?

        This is what I would like to run to account for the panel structure:

        Code:
        xi: xtreg outcome succession_yes##post firm_age firm_age_2 i.legal_form i.employment_size i.industry i.year [aw=inmatchfinal], re rob
        But this model is not possible as xtreg re only allows weights that are constant within panel. The weights from matching are not constant for the whole panel.

        So how can I run a diff-in-diff on the matched sample that uses weights from matching and accounts for the panel structure of the data?
        Or do scholars who test diff-in-diff models on a matched sample usually just run pooled OLS?

        Thank you.

        Comment


        • #5
          Sorry for this late response. i know little about psmatch2 and nothing about how to use panel data in a DID. However the units of "treatment" are firms and therefore, it seems to me, that 1) you should study only firms that have both pre-treatment and post-treatment observation; 2) you should be matching firms in the pre-treatment period only 3) the propensity score analysis should apply only to firms pretreatment and each firm retains the weight in the post-treatment period. Therefore each should be getting a constant weight. Why yours are not I cannot tell. It is up to you to research DID and panel data. Have you tried a google search? You can also ask on Statalist for examples.

          Best of luck




          .
          Steve Samuels
          Statistical Consulting
          [email protected]

          Stata 14.2

          Comment


          • #6
            - Is my understanding of the matching procedure and how I apply it to the diff-in-diff estimation later correct? To run the regression on the matched sample is it enough to use the weights from psmatch2 or do I need to somehow differently account for the pairs created my the match? Because the way it is now, I just run the regression on a smaller sample than the full sample but I do not account for which controls are matched to which treated firms, correct? Or do the weights take care of that?
            In our case, you are using the matching only to create a matched sample, and then use xtreg to estimate the coefficient. The -xtreg- command (and its underlying econometric procedure) is not designed to compare just the matches. Since the coefficients are essentially mean effects, the effects should be comparable to those obtained from a "pure" matching analysis - at least to my understanding.

            Beyond that, I share Steve's remark that you should just estimate the propensity score for the baseline observations. However, I'm not so sure if it is a good idea to exclude those observation which do not have valid information in the follow-up period, because this may be a potential source for a bias if the panel attrition is systematic.
            After performing the matching only for the baseline period you can spread the estimated weights to your follow-up observations for each firm. This should yield constant weights for all firms.

            Comment


            • #7
              Marina:
              smallprint: please note that -xi- is redundant with the most recent Stata releases.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                I don't think you should be using PS weights at all after PS matching. Stata doesn't do it in any of its -teffects- weighting examples. Peter Austin (2011) makes no reference to this strategy I found one panel study of DID following PS matching(Allen & Allnut, 2013): no weighting. Ferraro and Miranda (2014) also don't mention it.

                Wooldridge (2012, p. 98) thinks differently first period matching in panel studies.
                ∙ Sometimes see PS matching done based only on first-period variables. Avoid this. Much less convincing than a fixed effects analysis – which allows unobserved time-constant covariates – or a sequential analysis that allows the propensity score to depend on the recent past.
                I'm way out of my areas of expertise here; so I'm going to stop trying to give you substantive advice. If your analysis is for a thesis or dissertation, you can always mention that there are disagreements about methodology. You are better off replicating a published design/analysis than trying to come up with one of your own.


                References: P. C. Austin. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research, 46(3):399–424, 05 2011.
                http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3144483/

                Rebecca Allen Jay Allnutt. Matched panel data estimates of the impact of Teach First on school and departmental performance DoQSS Working Paper No. 13-11 September 2013
                http://repec.ioe.ac.uk/REPEc/pdf/qsswp1311.pdf

                Ferraro, Paul J., and Juan Jose Miranda. Panel data designs and estimators as alternatives for randomized controlled trials in the evaluation of social programs. Working Paper, 2014.

                http://www2.gsu.edu/~wwwcec/docs/Fer...Rep%20POST.pdf


                Jeffrey Wooldridge, TREATMENT EFFECT ESTIMATION WITH UNCONFOUNDED ASSIGNMENT Jeff Wooldridge Michigan State University FARS Workshop, Chicago January 6, 2012

                http://www.ruf.rice.edu/~kr10/slides_fars_2012.pdf
                Last edited by Steve Samuels; 02 Jun 2016, 08:34.
                Steve Samuels
                Statistical Consulting
                [email protected]

                Stata 14.2

                Comment


                • #9
                  Steve:

                  How do you then conduct a matched DiD without weights with? Does not that assume no replacement, i.e. each control group is only used once which could instead lead to bias due to the lack of common support?

                  Comment


                  • #10
                    Steve Samuels

                    Hi

                    I would greatly appreciate if you could let me know how I could have access to the dataset and dofile which is used in the following material that you referred to: http://www.ruf.rice.edu/~kr10/slides_fars_2012.pdf

                    Thanks in advance.
                    Best regards,

                    Comment

                    Working...
                    X