Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Propensity Score Matching with Cross Sectional Data

    Hi,
    Let me start with a simple summary of my understanding of matching with Panel data:
    • ​Usually when we use propensity score matching we have an event that happens at time T and we want to estimate the effect of that event (a new law or a training or whatever) on the participants.To do so we compare the people that receive the treatment (the treatment group) and the (control group) before and after the event. The difference between the treated group before and after the event and the the control group before and after the event, gives a good estimation of the effect of the event/treatment. However due to selection bias this estimation might be incorrect. So in order to better compare the treated group with the control group we match the observation that we have in each group based on the probability of participating given some PRE-treatment characteristics. This works better with panel data.
    In my case I have pooled crossed sectional data and the treatment condition is being a US firm (treatment group) while the control group is NOT being a US firm. In order to look at the effect of my X on firm's Y I need to find firms that are identical. Someone suggested to use propensity score matching (where being a US firm is the treatment). However since I don't have a panel data and I can't observe before and after "becoming a US firm" event, I don't understand how matching would be any different than a regression where I control for some firm's specific characteristics.

    Can anyone help?
    Thank you

    Marco


  • #2
    • ​Usually when we use propensity score matching we have an event that happens at time T and we want to estimate the effect of that event (a new law or a training or whatever) on the participants.To do so we compare the people that receive the treatment (the treatment group) and the (control group) before and after the event. The difference between the treated group before and after the event and the the control group before and after the event, gives a good estimation of the effect of the event/treatment. However due to selection bias this estimation might be incorrect. So in order to better compare the treated group with the control group we match the observation that we have in each group based on the probability of participating given some PRE-treatment characteristics. This works better with panel data.
    It seems to me that you are confusing two different methodologies that are often used together, but are distinct and can each be used separately.

    The comparison of two groups both before and after an event is difference-in-differences estimation (DID). The use of DID itself is intended to reduce the bias that would result from a simple comparison of a treated group and concurrent controls not randomly assigned. The ability of DID estimation to reduce bias is best when the pre-event differences are small. A separate matter altogether is propensity score matching. Propensity score matching is a particular way of forming matched pairs, in which one matches on an overall score rather than jointly on several traits. It is commonly used along with DID estimation, although it can also be used in other contexts, and DID estimation can certainly be used without propensity score (or any other) matching.

    The data you describe is not suitable for DID estimation because, as you note, there is no before-event condition for the "treated" group. So you simply have to do a direct contrast between outcomes in the treated entities and outcomes in the control entities. But you can apply matching techniques to this, and it will reduce bias if done well. Propensity score matching is one way to do that--though I would encourage you to read the new paper at http://gking.harvard.edu/publication...ed-Formatching, which provides some (to me) convincing evidence that propensity score matching is one of the least effective ways to do this.

    Comment


    • #3
      Thank you for your response Dr. Schechter. In the first part of my question I was referring to DID as you correctly pointed out. I looked at the paper that you suggested and it seems very interesting. I still have a couple of questions regarding your comment:
      1. "So you simply have to do a direct contrast between outcomes in the treated entities and outcomes in the control entities". I should use PSM (or MDM according to the paper) with the variables that could characterize US firms and not US firms (like P/E ratio, total assets.. or whatever the literature suggests). This will match companies that are identical except for one characteristic: being a US form or not. After I matched firms that are identical, then I compare the outcomes. Is that what you meant?
      2. How's matching with cross-sectional data better than a simple OLS that controls for the variables that I would have used to match?
      Thank you so much for your help.

      Marco

      Comment


      • #4
        With regards to question 1, at that point in the post what I was referring to was an unmatched comparison. You know, something like -reg outcome i.treat_vs_control-, a non-DID analysis. But, one way of reducing the potential bias in that analysis is to first create matched pairs and then do a matched-pairs contrast. This both reduces omitted variable (confounding) bias and reduces outcome variance so you have a more powerful test.

        As between that and simply including covariates in the analysis to adjust for their effects, matching is the more powerful way to reduce bias and variance, and has the advantage of not being as model dependent. The problem with matching, though, is that it is often difficult or impossible to find good matches for each case, so you end up having to exclude cases or controls that don't get matched. This discards information, and perhaps introduces bias if matchability itself is informative about outcome!. If you can get a good match that doesn't end up excluding lots of data, that is preferable to just adding covariates to the model. In the real world, though, good matching often proves unfeasible, and using covariates is the fallback position that one often ends up falling back to.

        Comment

        Working...
        X