Problems to match annual fundamental data from Compustat with monthly data from "CCM security monthly"

Johannes Kobe

Join Date: Aug 2018

Posts: 1
#1

Problems to match annual fundamental data from Compustat with monthly data from "CCM security monthly"

22 Aug 2018, 04:39

Hello All,
As you can see in the title I have some issues to merge the two datasets correctly. I basically have a similar problem as “Shaquille” who postet a similar question a few days ago. My question is a bit different since I try to merge CRSP/Compustat Merged (CCM) monthly (not daily as in the recent post) stock returns with Compustat annual fundamental data. However, after studying the recent post I got stuck and I hoping that you can give me some advice.
First of all: I did manage to merge these two datasets on the identifiers DATADATE and GVKEY code. Afterwards I dropped all duplicates which had been in the dataset.
But: I can´t determine if I did it correctly. The code it posted below. Any Advice on that? It worked, but I don’t know if it makes sense. Or have I done bullshit?

The second problem is to match each annual fundamental observation with the correct monthly return. The returns are monthly, but I want them to be annual, so they match with the fundamental data. Do you have any advice on that?
There some information about the datasets.

If you need any further information on the problem, just let me know!
THANKYOU!

The merged data sets looks like the following:
acronyms:
fyear: data date fiscal year
prcc_c: Price Close - Annual – Calendar
prcc_f: Price Close - Annual – fiscal
prccm: Price Close – monthly
trt1m: monthly total return
csho: Common Shares Outstanding

Gvkey datadate fyear company name cusip prcc_f prcc_c trt1m csho
Xxx 31may1972 1971 AAR Corp. xxx 22 32 -5.7 1000

Stata Code to merge the datasets and to eliminate duplicates:

use "CCM_monthlyse_0822_LPERMNO.dta"

sort gvkey datadate
save, replace
clear

*Merge with CCM und CT*
use "CT_0822_GVKEY.dta"

sort gvkey datadate
merge gvkey datadate using "CCM_monthlyse_0822_LPERMNO.dta"
tab _merge
drop if _merge==1
drop if _merge==2

destring gvkey, replace
xtset gvkey datadate

sort gvkey datadate

*delete duplikates*
desc
summ datadate, format

duplicates tag gvkey datadate, gen(dup)

tab dup

preserve

drop if dup==1
restore

order gvkey datadate dup
preserve

gen keeper=0
levelsof gvkey if dup==1, local(levels)
foreach x of local levels {
replace keeper=1 if ´x' == gvkey
}

tab dup
keep if dup==0
tab dup

xtset gvkey datadate
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

22 Aug 2018, 10:42

First, a tip about posting. If you read the Forum FAQ (which all members are asked to do before posting) you will learn, among other things, that it is almost always a good idea to include sample data when asking for help with code. Many people will just ignore your question if you don't post example data. Others who might be more adventurous, might post a possible solution, but there is a good chance it won't work with your actual data, because coding almost always depends on details of the data.

Your question, as I understand it, however, isn't so much about coding as figuring out what you want the code to do in the first place. It is common in data management to have to pair up data series that are on different time scales. Here it's yearly data with monthly data. There are several approaches to this, and which one makes sense is a question of the underlying science--it's not a statistical issue or a coding problem. Here is a list of some approaches to the general problem of merging data sets on different time scales:

1. Reduce the more frequent data to lower frequency by
1a. Selecting a "representative" observation for each "year" from the "monthly" data. It might be the month in the middle of the year, or the first, or the last, or the highest, or the lowest, or something else.
1b. Combining all of the monthly observations into a single one by taking the average or median, or some other statistic.

2. Expand the lower frequency data to higher by using some form of imputation such as linear interpolation or some other model of the data suggests.

3. Join each observation in the lower frequency data set to every observation in the higher frequency data set where the time periods referred to concur. Thus you might pair up each observation in the yearly data set with every observation from the monthly data set that comes from that year. (This could be viewed as a special case of #2 where the model is to just carry the one yearly observation unchanged to all of the months.)

So you need to decide which of these overall approaches you want to take, and then decide on the particular implementation details. Both of those decisions depends on what makes sense from the perspective of finance; they are not statistical questions.

Once you've made those decisions, if it isn't clear how to code your solution, do post back explaining precisely what you want to do and showing example data. To show your example data (from both data sets) be sure to use the -dataex- command. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
Comment

Announcement

Problems to match annual fundamental data from Compustat with monthly data from "CCM security monthly"

Comment