Dear Statalist,
I would like to use the nearest-neighbour propensity score matching (PSM) method to find, for each company that received Venture Capital (VC) investment in a certain year, a group of non-VC-backed companies (i.e. 10 control group companies for each sample company) that had the most similar probability of receiving capital resources from a venture capitalist.
The problem is am not sure about the methodology I am following and I would like to have feedback from experts. Before entering into the detail I will briefly explain each variable in the dataset.
-------------------------------------------------------------------------------------------------------------------------------------------------------------
VARIABLES:
-> treatment: dummy equal to 1 if the company is in the treatment group, whereas equal to 0 if the company belong to the set of potential comparables firms. In the dataset, there are more or less 200 treated firms and 50.000 potential comparables firms.
-> id: identifier of the company
-> year
-> T: timeline variable equal to 0 in the event-year (year in which the company in the treatment group received the investment by the VC)
-> Industry: industry of the company
-> GeograpicalArea: geographical area in which the company is located
-> ln_Firm_age: logarithm of firm age
-> Intangible_ratio: total intangibles over total assets
-> ln_Total_assets: logarithm of total assets
-> ln_Revenues: logarithm of revenues
-> Revenues_growth: growth rate of revenues
-> profitloss: profit or loss of the company
-> Employees: number of employees
In the last part of this post, I also reported a very small extract from my huge dataset.
-------------------------------------------------------------------------------------------------------------------------------------------------------------
The procedure I used to identify comparable firms is the following:
1. I estimated the following model only on my treatment group (treatment=1). This step aims to find a model ables to estimate the probability of receiving VC-support using some proxy variables (Employees ln_Revenues Intangible_ratio ln_Firm_age) according to what happened in reality.
The model works, even if the post-estimation outcome classification shows it predicts correctly only 25% of support.
2. I computed the probability of receiving a VC-investment for all the companies in my dataset (treatment=1 and treatment=0) using the coefficients estimated in the previous step
3. I performed the nearest-neighbour propensity score matching procedure, using:
-> as dependent variable: the probability estimated in the previous step, i.e. ProbabilityOfSupport
-> as treatment variable: treatment
-> as matching variables: ln_Firm_age ln_Revenues Revenues_growth ln_Total_assets i.Industry i.GeograpicalArea
In conclusion, I wanted to use the PSM just to find comparable firms but I am not sure about the previous procedure since it is the first time I am using it and the final result reveals there are some companies in the treatment group that do not match to anyone despite a large number of potential comparables firms which were previously carefully selected.
I would absolutely appreciate any kind of feedback or suggestions for improvement. Thanks in advance for your help and patience.
I would like to use the nearest-neighbour propensity score matching (PSM) method to find, for each company that received Venture Capital (VC) investment in a certain year, a group of non-VC-backed companies (i.e. 10 control group companies for each sample company) that had the most similar probability of receiving capital resources from a venture capitalist.
The problem is am not sure about the methodology I am following and I would like to have feedback from experts. Before entering into the detail I will briefly explain each variable in the dataset.
-------------------------------------------------------------------------------------------------------------------------------------------------------------
VARIABLES:
-> treatment: dummy equal to 1 if the company is in the treatment group, whereas equal to 0 if the company belong to the set of potential comparables firms. In the dataset, there are more or less 200 treated firms and 50.000 potential comparables firms.
-> id: identifier of the company
-> year
-> T: timeline variable equal to 0 in the event-year (year in which the company in the treatment group received the investment by the VC)
-> Industry: industry of the company
-> GeograpicalArea: geographical area in which the company is located
-> ln_Firm_age: logarithm of firm age
-> Intangible_ratio: total intangibles over total assets
-> ln_Total_assets: logarithm of total assets
-> ln_Revenues: logarithm of revenues
-> Revenues_growth: growth rate of revenues
-> profitloss: profit or loss of the company
-> Employees: number of employees
In the last part of this post, I also reported a very small extract from my huge dataset.
-------------------------------------------------------------------------------------------------------------------------------------------------------------
The procedure I used to identify comparable firms is the following:
1. I estimated the following model only on my treatment group (treatment=1). This step aims to find a model ables to estimate the probability of receiving VC-support using some proxy variables (Employees ln_Revenues Intangible_ratio ln_Firm_age) according to what happened in reality.
Code:
gen VCsupport=0 replace VCsupport=1 if T=0 logit VCsupport Employees ln_Revenues Intangible_ratio ln_Firm_age if treatment==1 estat classification
2. I computed the probability of receiving a VC-investment for all the companies in my dataset (treatment=1 and treatment=0) using the coefficients estimated in the previous step
Code:
gen ProbabilityOfSupport=-0.1426623*Employees-0.0245476*ln_Revenues+0.0089229*Intangible_ratio-1.026105*ln_Firm_age
-> as dependent variable: the probability estimated in the previous step, i.e. ProbabilityOfSupport
-> as treatment variable: treatment
-> as matching variables: ln_Firm_age ln_Revenues Revenues_growth ln_Total_assets i.Industry i.GeograpicalArea
Code:
teffects nnmatch (ProbabilityOfSupport ln_Firm_age ln_Revenues Revenues_growth ln_Total_assets i.Industry i.GeograpicalArea) (treatment), nneighbor(10) gen(match) dmvariables
I would absolutely appreciate any kind of feedback or suggestions for improvement. Thanks in advance for your help and patience.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(treatment id) int year byte T long(Industry GeographicalArea) double ln_Firm_age float(Intangible_ratio ln_Total_assets ln_Revenues Revenues_growth) double profitloss int Employees 1 1 2008 -2 3 19 3.091042453358316 .007965114 8.077967 7.384061 . 73.156 15 1 1 2009 -1 3 19 3.1354942159291497 .015642287 8.234834 7.423266 .04000895 12.846 13 1 1 2010 0 3 19 3.1780538303479458 .02827282 8.415235 7.350791 -.06995244 11.849 15 1 1 2011 1 3 19 3.2188758248682006 .021038927 8.599017 7.630339 .3227375 10.533 14 1 1 2012 2 3 19 3.258096538021482 . . . . . . 1 1 2013 3 3 19 3.295836866004329 . . . . . . 1 2 2007 -2 10 15 1.3862943611198906 . . . . . . 1 2 2008 -1 10 15 1.6094379124341003 .27431533 7.874802 0 . -321.237 . 1 2 2009 0 10 15 1.791759469228055 .26194736 8.569549 0 . -438.63 11 1 2 2010 1 10 15 1.9459101490553132 .187449 8.315349 0 . -1770.359 15 1 2 2011 2 10 15 2.0794415416798357 . . . . . . 1 2 2012 3 10 15 2.1972245773362196 . . . . . . 0 3 2009 . 8 1 .6931471805599453 . . . . . . 0 3 2010 . 8 1 1.0986122886681098 .4749049 8.047207 8.943937 . -205.595 . 0 3 2011 . 8 1 1.3862943611198906 .4340765 8.078743 8.798356 -.13549794 -44.256 . 0 3 2012 . 8 1 1.6094379124341003 .3699694 8.196082 8.880292 .08539934 -446.88 23 0 3 2013 . 8 1 1.791759469228055 0 6.953561 2.8003254 -.9978505 -26.622 0 0 3 2014 . 8 1 1.9459101490553132 .0041904794 7.023394 3.5101414 1.1005177 -142.908 0 0 4 2007 . 1 15 . . . . . . . 0 4 2008 . 1 15 0 .08861052 3.571418 0 . 9.566 0 0 4 2009 . 1 15 .6931471805599453 .6654887 3.834602 0 . -8.97 0 0 4 2010 . 1 15 1.0986122886681098 .5810004 4.150646 0 . -14.747 0 0 4 2011 . 1 15 1.3862943611198906 .5515633 3.913921 1.252763 . -15.963 0 0 4 2012 . 1 15 1.6094379124341003 .24565923 4.2385316 1.0986123 -.2 8.916 0 0 5 2012 . 1 15 1.0986122886681098 .3649698 5.02932 5.088633 . 1.534 0 0 5 2013 . 1 15 1.3862943611198906 .3247113 5.509064 3.979308 -.674377 -59.257 0 0 5 2014 . 1 15 1.6094379124341003 .4130051 5.22494 0 -1 -168.915 0 0 5 2015 . 1 15 1.791759469228055 . . . . . . 0 5 2016 . 1 15 1.9459101490553132 . . . . . . 0 5 2017 . 1 15 2.0794415416798357 . . . . . . end label values Industry Industry label def Industry 1 "C", modify label def Industry 3 "F", modify label def Industry 8 "K", modify label def Industry 10 "M", modify label values GeographicalArea NUTS2 label def NUTS2 1 "ITC1", modify label def NUTS2 15 "ITH4", modify label def NUTS2 19 "ITI3", modify