First, I have a dataset (call it "count.dta") that looks like this:
Then I have another dataset (call it "fulldata.dta") that looks like this:
What I want to do is from the "count.dta" data, for each id, the num_treat and num_control represents how many treatment and control gvkeys I want to randomly select (without replacement) from the "fulldata.dta" database.
Let me give an example. Look at the first row of "count.dta". We have id=1 and num_treat=2, this means from the "fulldata.dta" dataset, I want to randomly select two gvkeys with treatment=1. Specifically, there are three gvkeys to choose from, gvkey=7544, 8922, and 10334 and I want to pick two of these randomly without replacement. So, say I pick gvkeys 8922 and 10334. Now similarly, num_control=2 means I want to randomly select two gvkeys with treatment=0 from "fulldata.dta". Specifically, there are four gvkeys to choose from gvkey=1004, 1008, 2013, and 4055 without replacement. So, say I pick 1004 and 2013.
For id=6, we do exactly the same procedure, i.e., num_treat=1 means we pick ONE gvkey randomly from "fulldata.dta" with treatment=1 and num_control=2 means we pick TWO gvkey randomly from "fulldata.dta" with treatment=0 (without replacement). So, say I randomly picked gvkey=80892 for treatment=1 and gvkey=2742 and 6342 for treatment=0.
This means the random sample I have constructed should look as follows (call this dataset "randomselect.dta"):
Then I want to run the following regression on the above randomly selected sample:
One should get the following output:

What I want to do is save the coefficient output on treatment#post, which is 0.1613656, its corresponding t-statistic and p-value, which are 0.43 and 0.685.
Then, I want to repeat the above process 1000 times, saving the coefficient, t-statistic, and p-value from each random sample and save the results in a dataset. Could someone help me code up this process? I am very new to STATA. Thank you.
Code:
input id num_treat num_control 1 2 2 6 1 2 end
Code:
input id gvkey treatment outcome_var event_year year post control_var 1 1004 0 4.20 2007 2005 0 1.55 1 1004 0 1.25 2007 2006 0 1.41 1 1004 0 1.38 2007 2007 1 1.47 1 1004 0 1.38 2007 2008 1 1.65 1 1004 0 2.24 2007 2009 1 1.73 1 1008 0 1.47 2007 2005 0 1.72 1 1008 0 2.04 2007 2006 0 0.58 1 1008 0 2.03 2007 2007 1 0.82 1 1008 0 1.35 2007 2008 1 0.98 1 1008 0 1.90 2007 2009 1 0.79 1 2013 0 1.12 2007 2005 0 0.95 1 2013 0 0.00 2007 2006 0 0.79 1 2013 0 0.00 2007 2007 1 0.75 1 2013 0 0.00 2007 2008 1 1.06 1 2013 0 0.00 2007 2009 1 1.03 1 4055 0 0.00 2007 2005 0 0.75 1 4055 0 0.00 2007 2006 0 0.77 1 4055 0 0.00 2007 2007 1 0.78 1 4055 0 0.00 2007 2008 1 0.74 1 4055 0 0.00 2007 2009 1 0.56 1 7544 1 1.82 2007 2005 0 0.62 1 7544 1 1.67 2007 2006 0 0.63 1 7544 1 4.00 2007 2007 1 0.73 1 7544 1 3.17 2007 2008 1 0.78 1 7544 1 3.85 2007 2009 1 0.98 1 8922 1 2.70 2007 2005 0 0.77 1 8922 1 1.89 2007 2006 0 0.62 1 8922 1 1.25 2007 2007 1 0.95 1 8922 1 1.28 2007 2008 1 1.28 1 8922 1 1.56 2007 2009 1 0.81 1 10334 1 1.52 2007 2005 0 1.09 1 10334 1 1.72 2007 2006 0 1.52 1 10334 1 2.00 2007 2007 1 0.30 1 10334 1 1.85 2007 2008 1 0.19 1 10334 1 1.96 2007 2009 1 0.88 6 1008 0 0.98 2014 2012 0 0.87 6 1008 0 1.45 2014 2013 0 0.36 6 1008 0 1.41 2014 2014 1 1.02 6 1008 0 1.49 2014 2015 1 1.26 6 1008 0 1.56 2014 2016 1 1.44 6 2742 0 1.64 2014 2012 0 0.18 6 2742 0 1.39 2014 2013 0 0.14 6 2742 0 1.12 2014 2014 1 0.18 6 2742 0 1.09 2014 2015 1 0.12 6 2742 0 1.37 2014 2016 1 0.10 6 6342 0 1.35 2014 2012 0 0.15 6 6342 0 2.63 2014 2013 0 0.06 6 6342 0 2.67 2014 2014 1 0.05 6 6342 0 2.67 2014 2015 1 0.07 6 6342 0 2.56 2014 2016 1 0.94 6 10334 1 2.63 2014 2012 0 0.94 6 10334 1 2.60 2014 2013 0 0.97 6 10334 1 1.39 2014 2014 1 0.99 6 10334 1 0.00 2014 2015 1 0.95 6 10334 1 0.00 2014 2016 1 1.09 6 74232 1 0.00 2014 2012 0 1.01 6 74232 1 2.78 2014 2013 0 1.04 6 74232 1 0.00 2014 2014 1 1.08 6 74232 1 0.00 2014 2015 1 0.11 6 74232 1 0.00 2014 2016 1 0.62 6 80892 1 0.00 2014 2012 0 0.29 6 80892 1 1.89 2014 2013 0 0.10 6 80892 1 1.89 2014 2014 1 0.15 6 80892 1 0.46 2014 2015 1 0.06 6 80892 1 0.51 2014 2016 1 0.05 end
Let me give an example. Look at the first row of "count.dta". We have id=1 and num_treat=2, this means from the "fulldata.dta" dataset, I want to randomly select two gvkeys with treatment=1. Specifically, there are three gvkeys to choose from, gvkey=7544, 8922, and 10334 and I want to pick two of these randomly without replacement. So, say I pick gvkeys 8922 and 10334. Now similarly, num_control=2 means I want to randomly select two gvkeys with treatment=0 from "fulldata.dta". Specifically, there are four gvkeys to choose from gvkey=1004, 1008, 2013, and 4055 without replacement. So, say I pick 1004 and 2013.
For id=6, we do exactly the same procedure, i.e., num_treat=1 means we pick ONE gvkey randomly from "fulldata.dta" with treatment=1 and num_control=2 means we pick TWO gvkey randomly from "fulldata.dta" with treatment=0 (without replacement). So, say I randomly picked gvkey=80892 for treatment=1 and gvkey=2742 and 6342 for treatment=0.
This means the random sample I have constructed should look as follows (call this dataset "randomselect.dta"):
Code:
input id gvkey treatment outcome_var event_year year post control_var 1 1004 0 4.20 2007 2005 0 1.55 1 1004 0 1.25 2007 2006 0 1.41 1 1004 0 1.38 2007 2007 1 1.47 1 1004 0 1.38 2007 2008 1 1.65 1 1004 0 2.24 2007 2009 1 1.73 1 2013 0 1.12 2007 2005 0 0.95 1 2013 0 0.00 2007 2006 0 0.79 1 2013 0 0.00 2007 2007 1 0.75 1 2013 0 0.00 2007 2008 1 1.06 1 2013 0 0.00 2007 2009 1 1.03 1 8922 1 2.70 2007 2005 0 0.77 1 8922 1 1.89 2007 2006 0 0.62 1 8922 1 1.25 2007 2007 1 0.95 1 8922 1 1.28 2007 2008 1 1.28 1 8922 1 1.56 2007 2009 1 0.81 1 10334 1 1.52 2007 2005 0 1.09 1 10334 1 1.72 2007 2006 0 1.52 1 10334 1 2.00 2007 2007 1 0.30 1 10334 1 1.85 2007 2008 1 0.19 1 10334 1 1.96 2007 2009 1 0.88 6 2742 0 1.64 2014 2012 0 0.18 6 2742 0 1.39 2014 2013 0 0.14 6 2742 0 1.12 2014 2014 1 0.18 6 2742 0 1.09 2014 2015 1 0.12 6 2742 0 1.37 2014 2016 1 0.10 6 6342 0 1.35 2014 2012 0 0.15 6 6342 0 2.63 2014 2013 0 0.06 6 6342 0 2.67 2014 2014 1 0.05 6 6342 0 2.67 2014 2015 1 0.07 6 6342 0 2.56 2014 2016 1 0.94 6 80892 1 0.00 2014 2012 0 0.29 6 80892 1 1.89 2014 2013 0 0.10 6 80892 1 1.89 2014 2014 1 0.15 6 80892 1 0.46 2014 2015 1 0.06 6 80892 1 0.51 2014 2016 1 0.05 end
Code:
egen firm_evt = group(gvkey event_year) egen year_evt = group(year event_year) reghdfe outcome_var ib0.treatment##ib0.post control_var, absorb(firm_evt year_evt) vce(cluster gvkey)
What I want to do is save the coefficient output on treatment#post, which is 0.1613656, its corresponding t-statistic and p-value, which are 0.43 and 0.685.
Then, I want to repeat the above process 1000 times, saving the coefficient, t-statistic, and p-value from each random sample and save the results in a dataset. Could someone help me code up this process? I am very new to STATA. Thank you.
Comment