Clarification on Propensity Score Matching

Francisco Leal Augusto

Join Date: Jul 2023

Posts: 7
#1

Clarification on Propensity Score Matching

16 May 2025, 08:27

Dear Statalisters,

I am currently developing a project for which I have information on a panel of units which have been affected by an event in 2019 and for which i want to evaluate the effect of that treatment in 2019/2020/2021. I have information for this units for the 2017-2022 window

My standard approach was to consider a Diff-in-Diff approach, and from that I concluded that the treatment had an effect in 2019, but not on subsequent years.

Presenting this work, i received a comment that it might be of use to try a Propensity Score Matching approach. Given that it is the first time I am using this technique, for which I am considering psmatch2 in Stata 18, I am struggling with some doubts which I believe your expertise might be of great use.

The event that I am studying is rare by definition, leading me to a control group significantly larger than the treatment group: I have a significant number of units under study (around 8000), but only around 1% received treatment. To do the match, I consider the propensity score matching procedure given the units characteristics in 2018, with the treatment referring to 2019. Given the units identified for these years, I took that information to the panel data Diff-in-Diff setting again to check the new results.

My first doubt is related to the regression step in psmatch2: I am running a logistic regression so that it identifies units in the control group similar to the ones in the treatment group. Still, given the set of available covariates I have, few are statistically significant, which for me would be a sign that the something is not fine. However, moving on with the procedure, the dimensions under analysis on pstest show that for those covariates, their averages for each of the two groups are in fact similar. Does this mean that I am correctly matching under a poorly defined model? Does this invalidate the results?

My second question relates to the matching criteria: given the small number of units on the treatment group, I estimated the impact of the treatment testing several neighborhood sizes: 1/5/10. The results appear to converge to the original approach as I increase the neighbor size, which makes sense to me as I am converging to the original setting. Still, I am not sure on how to identify the best neighborhood size and if this the best procedure in this case (e.g., would kernel matching be more suitable in this case?).

I thank you for your time and any help regarding these questions. I am available for any additional information you might need.

As this is not strictly a Stata technical question, please tell me if you think I should delete this thread.
Tags: panel data, propensity score matching
Felix Bittmann

Join Date: Aug 2018

Posts: 752
#2

16 May 2025, 09:09

I suggest you use the modern implementation of kmatch which uses kernel matching. This should resolve question 2. In general, there is no general tule of how many matches you should choose.
Regarding question 1, if your variables do not explain group assignment, the entire psm control approach might not be very useful or add much to your DiD approach. Having a larger control group is, in general, not a problem. For further robustness checks, kmatch also offers alternatives, such as entropy balancing.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
1 like
Comment
Francisco Leal Augusto

Join Date: Jul 2023

Posts: 7
#3

21 May 2025, 04:49

Dear Felix,

Thank you very much for your answer! I will follow your suggestions and evaluate the results.

Best, Francisco
Comment

Announcement

Clarification on Propensity Score Matching

Comment

Comment