Hi Statalisters,
I would appreciate your help with another challenge I am trying to tackle before I can (hopefully) finalize my data and start doing the 'real work' (creating variables and regressions) required for my thesis. Again, I found a post: https://www.statalist.org/forums/for...ill-in-trouble that kind of resembles my problem. At very least that I think that the code provided can be used in my case too but I do not really have an idea on how I could apply it to my data (-dataex- below).
I have two sets of data, one from CRSP with daily stock return data and the other with Compustat yearly (fiscal) data that I want to merge. I have unique identifiers and date variables that make the merge possible but my one issues is that in Compustat the firms in my dataset start and end their fiscal years at different points (e.g. from December till December, from March till March etc.).
I cannot ignore this characteristic of my dataset because I calculate a number of variables using the CRSP data, over differing windows of historical data. So not being careful would bias my results. My goal is to have CRSP daily data matched to the correct fiscal years when I merge it with my Compustat yearly data.
I would merge the two databases on unique combinations of gvkey and date, bdate is a Stata recognized business calendar date variable but obviously trading does not always start on the first and last calendar day of the month.
I realize that even if I think I am being perfectly clear it might not be the case, so I am happy to provide any additional elaboration. In the Compustat data I added gender and at (total assets) to show that I have many other accounting variables ( based fiscal yearly obs) in that dataset. fyr = represents the month in which the firm's fiscal year ends. This also determines how fyear corresponds to the year of the datadate in the Compustat dataset, because if fyr is <= 5, fyear = year t-1 and if fyr is >5, year = t.
CRSP Data:
Compustat Data:
I would appreciate your help with another challenge I am trying to tackle before I can (hopefully) finalize my data and start doing the 'real work' (creating variables and regressions) required for my thesis. Again, I found a post: https://www.statalist.org/forums/for...ill-in-trouble that kind of resembles my problem. At very least that I think that the code provided can be used in my case too but I do not really have an idea on how I could apply it to my data (-dataex- below).
I have two sets of data, one from CRSP with daily stock return data and the other with Compustat yearly (fiscal) data that I want to merge. I have unique identifiers and date variables that make the merge possible but my one issues is that in Compustat the firms in my dataset start and end their fiscal years at different points (e.g. from December till December, from March till March etc.).
I cannot ignore this characteristic of my dataset because I calculate a number of variables using the CRSP data, over differing windows of historical data. So not being careful would bias my results. My goal is to have CRSP daily data matched to the correct fiscal years when I merge it with my Compustat yearly data.
I would merge the two databases on unique combinations of gvkey and date, bdate is a Stata recognized business calendar date variable but obviously trading does not always start on the first and last calendar day of the month.
I realize that even if I think I am being perfectly clear it might not be the case, so I am happy to provide any additional elaboration. In the Compustat data I added gender and at (total assets) to show that I have many other accounting variables ( based fiscal yearly obs) in that dataset. fyr = represents the month in which the firm's fiscal year ends. This also determines how fyear corresponds to the year of the datadate in the Compustat dataset, because if fyr is <= 5, fyear = year t-1 and if fyr is >5, year = t.
CRSP Data:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long(gvkey lpermco lpermno) float year str8 date float(bdate annualized_sd annualized_sd_resid) 1690 7 14593 2002 "20020102" 0 48.58944 . 1690 7 14593 2002 "20020103" 1 48.58944 . 1690 7 14593 2002 "20020104" 2 48.58944 . 1690 7 14593 2002 "20020107" 3 48.58944 . 1690 7 14593 2002 "20020108" 4 48.58944 . 1690 7 14593 2002 "20020109" 5 48.58944 . 1690 7 14593 2002 "20020110" 6 48.58944 . 1690 7 14593 2002 "20020111" 7 48.58944 . 1690 7 14593 2002 "20020114" 8 48.58944 . 1690 7 14593 2002 "20020115" 9 48.58944 . 1690 7 14593 2002 "20020116" 10 48.58944 . 1690 7 14593 2002 "20020117" 11 48.58944 . 1690 7 14593 2002 "20020118" 12 48.58944 . 1690 7 14593 2002 "20020122" 13 48.58944 . 1690 7 14593 2002 "20020123" 14 48.58944 . 1690 7 14593 2002 "20020124" 15 48.58944 . 1690 7 14593 2002 "20020125" 16 48.58944 . 1690 7 14593 2002 "20020128" 17 48.58944 . 1690 7 14593 2002 "20020129" 18 48.58944 . 1690 7 14593 2002 "20020130" 19 48.58944 . 1690 7 14593 2002 "20020131" 20 48.58944 . 1690 7 14593 2002 "20020201" 21 48.58944 . 1690 7 14593 2002 "20020204" 22 48.58944 . 1690 7 14593 2002 "20020205" 23 48.58944 . 1690 7 14593 2002 "20020206" 24 48.58944 . 1690 7 14593 2002 "20020207" 25 48.58944 . 1690 7 14593 2002 "20020208" 26 48.58944 . 1690 7 14593 2002 "20020211" 27 48.58944 . 1690 7 14593 2002 "20020212" 28 48.58944 . 1690 7 14593 2002 "20020213" 29 48.58944 . 1690 7 14593 2002 "20020214" 30 48.58944 . 1690 7 14593 2002 "20020215" 31 48.58944 . 1690 7 14593 2002 "20020219" 32 48.58944 . 1690 7 14593 2002 "20020220" 33 48.58944 . 1690 7 14593 2002 "20020221" 34 48.58944 . 1690 7 14593 2002 "20020222" 35 48.58944 . 1690 7 14593 2002 "20020225" 36 48.58944 . 1690 7 14593 2002 "20020226" 37 48.58944 . 1690 7 14593 2002 "20020227" 38 48.58944 . 1690 7 14593 2002 "20020228" 39 48.58944 . 1690 7 14593 2002 "20020301" 40 48.58944 . 1690 7 14593 2002 "20020304" 41 48.58944 . 1690 7 14593 2002 "20020305" 42 48.58944 . 1690 7 14593 2002 "20020306" 43 48.58944 . 1690 7 14593 2002 "20020307" 44 48.58944 . 1690 7 14593 2002 "20020308" 45 48.58944 . 1690 7 14593 2002 "20020311" 46 48.58944 . 1690 7 14593 2002 "20020312" 47 48.58944 . 1690 7 14593 2002 "20020313" 48 48.58944 . 1690 7 14593 2002 "20020314" 49 48.58944 . end format %tbtrade_day2 bdate
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long datadate int fyear byte fyr long gvkey float at str6 gender 20021231 2002 12 100004 2378.857 "" 20031231 2003 12 100004 3189.38 "" 20041231 2004 12 100004 3653.951 "" 20051231 2005 12 100004 4301.948 "" 20061231 2006 12 100004 8367.235 "" 20071231 2007 12 100004 2518.692 "" 20081231 2008 12 100004 2435.242 "" 20091231 2009 12 100004 2447.668 "" 20011231 2001 12 10000 509.429 "" 20021231 2002 12 10000 490.758 "" 20031231 2003 12 10000 694.525 "" 20041231 2004 12 10000 656.569 "" 20051231 2005 12 10000 653.044 "" 20061231 2006 12 10000 640.092 "MALE" 20071231 2007 12 10000 678.092 "MALE" 20081231 2008 12 10000 575.027 "MALE" 20091231 2009 12 10000 484.459 "MALE" 20101231 2010 12 10000 492.801 "MALE" 20111231 2011 12 10000 550.722 "MALE" 20121231 2012 12 10000 576.594 "MALE" 20131231 2013 12 10000 615.523 "MALE" 20141231 2014 12 10000 673.551 "MALE" 20151231 2015 12 10000 681.064 "MALE" 20161231 2016 12 10000 768.697 "MALE" 20171231 2017 12 10000 787.567 "MALE" 20010831 2001 8 100011 5595.056 "" 20020831 2002 8 100011 7628.12 "" 20030831 2003 8 100011 7998.488 "" 20040831 2004 8 100011 7856.107 "" 20011231 2001 12 10002 1366.301 "" 20021231 2002 12 10002 1792.126 "" 20031231 2003 12 10002 2460.703 "" 20041231 2004 12 10002 3013.233 "" 20051231 2005 12 10002 4280.842 "" 20061231 2006 12 10002 4502.941 "MALE" 20071231 2007 12 10002 3400.726 "MALE" 20081231 2008 12 10002 2249.854 "MALE" 20091231 2009 12 10002 1861.011 "MALE" 20101231 2010 12 10002 2133.123 "MALE" 20111231 2011 12 10002 2200.383 "MALE" 20121231 2012 12 10002 3113.074 "MALE" 20131231 2013 12 10002 3662.105 "MALE" 20141231 2014 12 10002 4174.42 "MALE" 20151231 2015 12 10002 8345.505 "MALE" 20161231 2016 12 10002 8709.044 "MALE" 20010331 2000 3 100039 23752.21 "" 20020331 2001 3 100039 20674.504 "" 20030331 2002 3 100039 22871.05 "" 20040331 2003 3 100039 24939.877 "" 20050331 2004 3 100039 24305.39 "" end
Comment