Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Propensity score matching

    Dear Stata Users,
    I am using the PSM to create a control sample for my main sample. My main group is 100 firms, I want to create a control group of 100, starting from a much larger list of firms. In this way, I would have 100 (main g) 100 firms (control g. with similar characteristics). I am working with panel data. I have worked as follows:

    1.
    logistic Main_Control ln_emp DebtEq_100_ BoardSize BoardIndep BoardGD CEO_sep CEOcomp SustComm CapexSales, iterate (100)
    predict double pscore if e(sample), pr2. *To create consistency across sector year:
    egen Sector_year = group ( SectorCode Period )
    br Code1 Period SectorCode Sector_year3.
    gen double pscore2= Sector_year*1000+pscore if pscore!=.
    4.
    gmatch Main_Control pscore2, cal (0.008)
    The procedure works. Unfortunately, I noted that that within the panel each firm is matched with a different benchmark depending on the year. I instead want just 100 similar firms as benchmark for my 100 firms in main group. I do not want that the benchmark changes across the time dimension of the panel. Is there a way to obtain this?

    Any help would be really appreciated,
    Best,
    nr

  • #2
    -gmatch- is not an official Stata command, and I am not familiar with it. So I will speak in general terms about propensity score matching. There may be specific aspects of using -gmatch- that facilitate or make things harder, I wouldn't know. I'll leave it to you to figure out how to use -gmatch- in this general context (or perhaps you will choose some other method of doing the matching.)

    Propensity score matching in panel data is a complicated problem. The specific calculations you did led you to get different matches for the same case firms in different years. Let me start by saying that this is not entirely unusable. It is possible to analyze the data in this way, although it at least partly defeats matching's goal of reducing variance. So I don't recommend it.

    Before you start writing code, you need to think carefully about what you want to base your matching score on. (I'm using the term matching score generically here--my remarks apply to propensity scores as much as they do to other forms of matching such as Mahalanobis distance, or caliper matching, or even exact matching.) The point of a matching score is to identify for each case a control form that is as similar as possible to the case in all relevant ways. The word relevant here is doing a lot of work, and requires a lot of thought. In a single cross-section you have many variables to consider as a start. In panel data it is even more complicated because you have all of these variables available at multiple time periods.

    Sometimes there is a particular time period that you can single out as a baseline period. It might be the first year that a panel appears in the data set, or the final year before some important event occurs in the life of he panel. In this situation, one approach is to reduce the case data set to just one observation per case, the one in its baseline period. Then you can do your matching, but restricting potential control observations for any case to those for the same period. If the definition of the baseline period is applicable to the controls as well as the cases, then you might restrict potential control matches to those controls having the same baseline period as the case.

    Of course, this approach discards potentially useful information from other times for the controls and cases. So you could take the alternative approach of -reshape-ing the data set to wide layout and using all of the periodic values of the matching variables to calculate your score. Some caution needs to be used with this approach, however. If you are studying the effect of a change in some particular variable (like a non-randomized intervention) you generally should not include data that is subsequent to the intervention in the matching process, as you might inadvertently be matching on a mediating variable that way. So one might have to restrict this approach to use only sufficiently early periods. It might prove difficult or impossible to set this up, however, because you need non-missing values on the variables for all of the periods in all the observations--but the times of observation for different panels may not pair up that well across different panels, especially in unbalanced data.

    In any case, you need to decide on an appropriate set of variables and one or more time periods, and reduce your data set to one observation per panel containing those variables as measured at that (those) time periods. Then do your matching on that. Once you have identified a matched control for each case (or as many as possible) then you can merge the original panel data into that to do the matched analysis.

    Hope this helps.

    Comment


    • #3
      Thank you very much Clyde for your accurate observations

      Comment

      Working...
      X