Hello all,
I'm working on a first year summer paper and am having trouble creating a matched sample. I am using data from the COMPUSTAT database. I want to look at US firms who have recently moved their headquarters, and created an indicator variable for these firms by "gvkey" (a unique firm identifier) called "usmoveind". This value is equal to one for firms who have moved their headquarters during the time period I wish to observe. I have also created an indicator for the 1-3 years prior to the year in which the firm moved it's headquarters called "apremoveind". It should be noted that "apremoveind" is a function of the individual firm (captured by the "gvkey" variable) and that firm's specific year of interest (variable name "fyear").
For example, here is how I created an indicator for one of the gvkeys to indicate that I wished to look at the 3 years before and after the move which was completed in the year 2001:
Code:
gen usmoveind=1 if gvkey==5959 & fyear>1997 & fyear<2005
And now, this is how I created a unique identifier for the 3 years before the move for this particular firm:
gen apremoveind=1 if gvkey==5959 & fyear>1997 & fyear<2001
I also created a variable to identify the industry in which the firm operates called "sic2". This variable was applied to the entire dataset, not just the US firms who moved. Finally, I created a measure for GAAP effective tax rates, called "gaap_etr" which is my primary variable of interest..
What I would like to do is identify a matched sample of firms (by "gvkey") with respect to "gaap_etr" in the "apremoveind" period for firms who have the "usmoveind" ==1. It is absolutely critical that the matched sample have the exact same same value for "sic2". Thus, for my US move firms, I would like to find comparable firms (in the same industry) who have a "gaap_etr" which is within a certain margin (say, +/- 1%) of the "gaap_etr" for my US move firms DURING the US firms "apremoveind" period (which ranges from 1-3 years and is a function of the firm year, variable name "fyear", as well as the unique firm identifier, variable name "gvkey"). Ideally, I would need the 3 year period for the matched firm to be the same 3 year period as the US Move firm, thus, the "fyear" variable should cover the same 3 year period.
I would also ideally like to match on two other variables, total assets (variable name "at") and revenues (variable name "revt"). Trying to match exactly on "sic2" and find a comfortable range for "gaap_etr" (most important matching characteristic), "at", and "revt" all in the same 3 years which cover the "apremoveind" for my US Move firms is giving me a bit of a headache. I would like to get a list of these control groups, create an indicator for them (call it "controlind"), and run some analyses to see how they compare to my "usmoveind" firms during the 3 year period before the headquarters move.
Any help on this matter would be greatly appreciated! I hope this is clear, and would be happy to clarify!
Thanks,
Erik
I'm working on a first year summer paper and am having trouble creating a matched sample. I am using data from the COMPUSTAT database. I want to look at US firms who have recently moved their headquarters, and created an indicator variable for these firms by "gvkey" (a unique firm identifier) called "usmoveind". This value is equal to one for firms who have moved their headquarters during the time period I wish to observe. I have also created an indicator for the 1-3 years prior to the year in which the firm moved it's headquarters called "apremoveind". It should be noted that "apremoveind" is a function of the individual firm (captured by the "gvkey" variable) and that firm's specific year of interest (variable name "fyear").
For example, here is how I created an indicator for one of the gvkeys to indicate that I wished to look at the 3 years before and after the move which was completed in the year 2001:
Code:
gen usmoveind=1 if gvkey==5959 & fyear>1997 & fyear<2005
And now, this is how I created a unique identifier for the 3 years before the move for this particular firm:
gen apremoveind=1 if gvkey==5959 & fyear>1997 & fyear<2001
I also created a variable to identify the industry in which the firm operates called "sic2". This variable was applied to the entire dataset, not just the US firms who moved. Finally, I created a measure for GAAP effective tax rates, called "gaap_etr" which is my primary variable of interest..
What I would like to do is identify a matched sample of firms (by "gvkey") with respect to "gaap_etr" in the "apremoveind" period for firms who have the "usmoveind" ==1. It is absolutely critical that the matched sample have the exact same same value for "sic2". Thus, for my US move firms, I would like to find comparable firms (in the same industry) who have a "gaap_etr" which is within a certain margin (say, +/- 1%) of the "gaap_etr" for my US move firms DURING the US firms "apremoveind" period (which ranges from 1-3 years and is a function of the firm year, variable name "fyear", as well as the unique firm identifier, variable name "gvkey"). Ideally, I would need the 3 year period for the matched firm to be the same 3 year period as the US Move firm, thus, the "fyear" variable should cover the same 3 year period.
I would also ideally like to match on two other variables, total assets (variable name "at") and revenues (variable name "revt"). Trying to match exactly on "sic2" and find a comfortable range for "gaap_etr" (most important matching characteristic), "at", and "revt" all in the same 3 years which cover the "apremoveind" for my US Move firms is giving me a bit of a headache. I would like to get a list of these control groups, create an indicator for them (call it "controlind"), and run some analyses to see how they compare to my "usmoveind" firms during the 3 year period before the headquarters move.
Any help on this matter would be greatly appreciated! I hope this is clear, and would be happy to clarify!
Thanks,
Erik