Using Propensity Score Matching (PSM) to identify comparable firms in a specific year

Nicola Carta

Join Date: Aug 2021
Posts: 13

Using Propensity Score Matching (PSM) to identify comparable firms in a specific year

17 Aug 2021, 03:46

Dear Statalist,
I would like to use the nearest-neighbour propensity score matching (PSM) method to find, for each company that received Venture Capital (VC) investment in a certain year, a group of non-VC-backed companies (i.e. 10 control group companies for each sample company) that had the most similar probability of receiving capital resources from a venture capitalist.
The problem is am not sure about the methodology I am following and I would like to have feedback from experts. Before entering into the detail I will briefly explain each variable in the dataset.

-------------------------------------------------------------------------------------------------------------------------------------------------------------
VARIABLES:
-> treatment: dummy equal to 1 if the company is in the treatment group, whereas equal to 0 if the company belong to the set of potential comparables firms. In the dataset, there are more or less 200 treated firms and 50.000 potential comparables firms.
-> id: identifier of the company
-> year
-> T: timeline variable equal to 0 in the event-year (year in which the company in the treatment group received the investment by the VC)
-> Industry: industry of the company
-> GeograpicalArea: geographical area in which the company is located
-> ln_Firm_age: logarithm of firm age
-> Intangible_ratio: total intangibles over total assets
-> ln_Total_assets: logarithm of total assets
-> ln_Revenues: logarithm of revenues
-> Revenues_growth: growth rate of revenues
-> profitloss: profit or loss of the company
-> Employees: number of employees

In the last part of this post, I also reported a very small extract from my huge dataset.
-------------------------------------------------------------------------------------------------------------------------------------------------------------

The procedure I used to identify comparable firms is the following:

1. I estimated the following model only on my treatment group (treatment=1). This step aims to find a model ables to estimate the probability of receiving VC-support using some proxy variables (Employees ln_Revenues Intangible_ratio ln_Firm_age) according to what happened in reality.

Code:

gen VCsupport=0
replace VCsupport=1 if T=0
logit VCsupport Employees ln_Revenues Intangible_ratio ln_Firm_age if treatment==1
estat classification

The model works, even if the post-estimation outcome classification shows it predicts correctly only 25% of support.

2. I computed the probability of receiving a VC-investment for all the companies in my dataset (treatment=1 and treatment=0) using the coefficients estimated in the previous step

Code:

gen ProbabilityOfSupport=-0.1426623*Employees-0.0245476*ln_Revenues+0.0089229*Intangible_ratio-1.026105*ln_Firm_age

3. I performed the nearest-neighbour propensity score matching procedure, using:
-> as dependent variable: the probability estimated in the previous step, i.e. ProbabilityOfSupport
-> as treatment variable: treatment
-> as matching variables: ln_Firm_age ln_Revenues Revenues_growth ln_Total_assets i.Industry i.GeograpicalArea

Code:

teffects nnmatch (ProbabilityOfSupport ln_Firm_age ln_Revenues Revenues_growth ln_Total_assets i.Industry i.GeograpicalArea) (treatment), nneighbor(10) gen(match) dmvariables

In conclusion, I wanted to use the PSM just to find comparable firms but I am not sure about the previous procedure since it is the first time I am using it and the final result reveals there are some companies in the treatment group that do not match to anyone despite a large number of potential comparables firms which were previously carefully selected.

I would absolutely appreciate any kind of feedback or suggestions for improvement. Thanks in advance for your help and patience.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(treatment id) int year byte T long(Industry GeographicalArea) double ln_Firm_age float(Intangible_ratio ln_Total_assets ln_Revenues Revenues_growth) double profitloss int Employees
1 1 2008 -2  3 19  3.091042453358316  .007965114  8.077967  7.384061          .    73.156 15
1 1 2009 -1  3 19 3.1354942159291497  .015642287  8.234834  7.423266  .04000895    12.846 13
1 1 2010  0  3 19 3.1780538303479458   .02827282  8.415235  7.350791 -.06995244    11.849 15
1 1 2011  1  3 19 3.2188758248682006  .021038927  8.599017  7.630339   .3227375    10.533 14
1 1 2012  2  3 19  3.258096538021482           .         .         .          .         .  .
1 1 2013  3  3 19  3.295836866004329           .         .         .          .         .  .
1 2 2007 -2 10 15 1.3862943611198906           .         .         .          .         .  .
1 2 2008 -1 10 15 1.6094379124341003   .27431533  7.874802         0          .  -321.237  .
1 2 2009  0 10 15  1.791759469228055   .26194736  8.569549         0          .   -438.63 11
1 2 2010  1 10 15 1.9459101490553132     .187449  8.315349         0          . -1770.359 15
1 2 2011  2 10 15 2.0794415416798357           .         .         .          .         .  .
1 2 2012  3 10 15 2.1972245773362196           .         .         .          .         .  .
0 3 2009  .  8  1  .6931471805599453           .         .         .          .         .  .
0 3 2010  .  8  1 1.0986122886681098    .4749049  8.047207  8.943937          .  -205.595  .
0 3 2011  .  8  1 1.3862943611198906    .4340765  8.078743  8.798356 -.13549794   -44.256  .
0 3 2012  .  8  1 1.6094379124341003    .3699694  8.196082  8.880292  .08539934   -446.88 23
0 3 2013  .  8  1  1.791759469228055           0  6.953561 2.8003254  -.9978505   -26.622  0
0 3 2014  .  8  1 1.9459101490553132 .0041904794  7.023394 3.5101414  1.1005177  -142.908  0
0 4 2007  .  1 15                  .           .         .         .          .         .  .
0 4 2008  .  1 15                  0   .08861052  3.571418         0          .     9.566  0
0 4 2009  .  1 15  .6931471805599453    .6654887  3.834602         0          .     -8.97  0
0 4 2010  .  1 15 1.0986122886681098    .5810004  4.150646         0          .   -14.747  0
0 4 2011  .  1 15 1.3862943611198906    .5515633  3.913921  1.252763          .   -15.963  0
0 4 2012  .  1 15 1.6094379124341003   .24565923 4.2385316 1.0986123        -.2     8.916  0
0 5 2012  .  1 15 1.0986122886681098    .3649698   5.02932  5.088633          .     1.534  0
0 5 2013  .  1 15 1.3862943611198906    .3247113  5.509064  3.979308   -.674377   -59.257  0
0 5 2014  .  1 15 1.6094379124341003    .4130051   5.22494         0         -1  -168.915  0
0 5 2015  .  1 15  1.791759469228055           .         .         .          .         .  .
0 5 2016  .  1 15 1.9459101490553132           .         .         .          .         .  .
0 5 2017  .  1 15 2.0794415416798357           .         .         .          .         .  .
end
label values Industry Industry
label def Industry 1 "C", modify
label def Industry 3 "F", modify
label def Industry 8 "K", modify
label def Industry 10 "M", modify
label values GeographicalArea NUTS2
label def NUTS2 1 "ITC1", modify
label def NUTS2 15 "ITH4", modify
label def NUTS2 19 "ITI3", modify

Tags: None

Announcement

Using Propensity Score Matching (PSM) to identify comparable firms in a specific year