CRSP linkytypes

Simon Vasquez

Join Date: May 2017

Posts: 8
#1

CRSP linkytypes

19 May 2017, 05:26

This topic is somewhat off as it is not a specific Stata question but as a lot of topics discussed here refer to Compustat or CRSP data, I was wondering if someone made some experiences with the following topic:

I use Stata and WRDS. I want to merge monthly CRSP data on stock returns with yearly accounting information from CRSP/Compustat/Merged (CCM).

I have read through the various posts and documentation about linktypes (LC, LU, LS) in the CCM database. I believe that I understood the principles. However, I find it surprising that there does not seem to exist a general guideline on how to properly merge CRSP and Compustat/CRSP/Merged (CCM).

My question regards the procedure:
When downloading the CCM data from WRDS, is it sufficient to chose the „correct“ filters under Step 3?
In explanation: in Step 2 chose PERMNO in step 3 „Linking Options“ chose LC, LU, LS (as suggested by WRDS). Download the CCM Data.

Next, download the CRSP data on stock returns with company identifier PERMNO.

Now I would have CCM data with PERMNO and CRSP data with PERMNO. But is the link type correctly specified if I want to merge the two databases? Or is a further step necessary (I read sth. about downloading a link table but those posts were mostly from 2014)

Simon
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35697
#2

19 May 2017, 05:35

I agree with you personally. This seems off-topic. If there is a dedicated forum for discussion of this kind of data, please use it. That's just a personal opinion and I can't and don't want to (appear to) forbid questions or answers, but I won't encourage this kind of question either.
Comment
Mike Guo

Join Date: May 2017

Posts: 1
#3

20 May 2017, 11:10

Hello Simon,

I am currently have exactly the same problem with merging data. And unfortunately, even though I have read many posts online, I have not fully understand how to merge the data via STATA.

So far I have downloaded the data from CCM, variables such as "total assets", "retained earnings"...etc. There are 290,343 observations. Follow what suggested online, I choose the "LPERMCO" in step 2.

On the other hand, I need the variable "Share Code (shrcd)", which can only be found in CRSP monthly/daily. I have also downloaded the "share code" from CRSP and choose the "PERMCO" in step 2. There are 3,430,687 observations, as it is monthly data.

So do you mean the "LPERMCO" from CCM and the "PERMCO" from CRSP is identical, with which I can merge two datasets?

I rename the "LPERMCO" and "PERMCO" in both datasets as "lpermco". And I have tried the 1-to-1 merge, 1-to-many, many-to-1 merge function, however it does not work at all and shows "variable lpermco does not uniquely identify observations in the master data".

So I am really struggled with this problem and don't know how to continue.

It would be great if you can contact me and explain me more details about the merge procedure. My e-mail address is [email protected]. Please contact me if you see my post.

PS: I agree with Nick that this is off-topic, but I really cannot find annother forum or helpful source to deal with this problem. I leave my contact address here just because I hope Simon can contact me and we will try to discuss this problem through some individual chats. Sorry for this inconvenience.

Best regards,
Mike
Comment
Robson Glasscock

Join Date: Apr 2014

Posts: 25
#4

21 May 2017, 08:21

There are dedicated forums for WRDS users here:

HTML Code:

http://www.wrds.us/

and

HTML Code:

https://groups.google.com/forum/#!forum/wrdssas

I'm not sure if the first one is still maintained. I quickly looked at it and it appears that the most recent posts are from 2016. I also vaguely remember an email from WRDS about them no longer hosting a forum under Support -> WRDS Community. The first link actually links to the Google group on the second link, and this group has very recent posts.

It makes sense that Statalisters see posts like this as being off-topic. I'd like to provide some context for why questions like these are appearing on the Statalist. This is not a defense of these questions or an argument that questions of this nature belong here. It is merely some context.

The WRDS platform is an integrated platform with a variety of individual databases used by accounting and finance researchers. Via accessing WRDS, researchers gain access to multiple databases from different data providers (i.e., companies). Some examples are: IBES which contains equity analysts' forecasts, Audit Analytics which contains a variety of information about financial statement auditors, COMPUSTAT which contains historical financial accounting data, and CRSP which contains stock market price and return data.

These databases often have different primary keys for the publicly traded companies they follow. This is because various, non-related data providers (i.e., companies) have their own individual databases that are then accessed by researchers via the WRDS umbrella. There are also a variety of time variables, many with different names and definitions across (between?) the various databases, and these differences often cause confusion on the part of new researchers.

Step one for finance and accounting researchers is pulling all this data together and merging it from these various, non-related sources. A given research question may result in merging across three, four, or five of these unrelated databases.

This next bit is based on my personal experiences and interactions with others. Others within accounting and finance may disagree with me. Having said that, my opinion is that SAS has been the dominant platform used by accounting and finance. If one logs into WRDS and looks at example code provided by the WRDS support team, programs in FORTRAN, C, and SAS are all available. Example Stata code is not provided. Also, the WRDS platform capitalizes "Stata," so if one were to download a .dta file, it will be shown as "STATA" format one the web query. I think this contributes to accounting and finance people capitalizing "Stata" when they first post here, but I realize that is not a valid reason for them to ignore the FAQ prior to posting.

I think that more posts about using data from the WRDS suite is evidence that Stata is actually making inroads in accounting and finance. This is really good news. I've met many accounting and finance people who say the following, "I use SAS to pull all my data together, but then I use Stata for all my estimations." That always struck me as odd, but I think it is because the shared code of various PhD students is often in SAS. Older accounting and finance professors often use SAS. WRDS itself has example programs in SAS, and researchers can actually download data from WRDS directly in SAS.

I started using Stata in 2009 in my PhD program because of an econometrics course with Dave Harless. I decided to use Stata for data management as well, but this resulted in an extra time commitment because I couldn't simply cut and paste code templates from WRDS in SAS that someone else had already programmed for me. I am not advocating at all for people blindly relying on code from the internet, but new accounting and finance researchers using Stata don't have the same the support system that SAS users have. To further drive home the point, the above links also mention SAS and not Stata.

I'd like to reiterate that I am not arguing for what should be posted here.
Comment
Simon Vasquez

Join Date: May 2017

Posts: 8
#5

23 May 2017, 05:40

Mike Guo

I doubt than I can be a reliable source for you. I started this topic because I am not sure if what I am doing is correct.

A „complication“ could arise from the fact that there was a change in the way Compustat and CRSP can be merged (keywords: "linklist", "linking dataset", "usedflag"). Maybe this is a partial explanation for why there is so much differing coverage on how to merge the databases.

First of all, I would clarify the difference between PERMNO and PERMCO. What you use depends on your research question.
Whether you use PERMNO or PERMCO, it is important to understand the different criterion used for matching. You could read on "linktypes" especially "LC", "LU" and "LS"
I can just suggest to read on these topics. I can make no recommendation on how to proceed because I am not sure and do not want to send you on the wrong tracks.

I have tried the 1-to-1 merge, 1-to-many, many-to-1 merge function, however it does not work at all and shows "variable lpermco does not uniquely identify observations in the master data".

Its difficult to help with the - merge - command without further details on your data (e.g. which is the "using" data set). Generally spoken, there is a lot of coverage on the different - merge - commands on this forum, so you might wanna go back to that first or state your question more specifically.

With regard to the problem that arrises from some duplicates in your dataset.
I encountered a similar problem. However, I use permno to merge, so things could be different in your case. Maybe you could try the following:

Code:

bysort permco year: gen dup = cond(_N==1, 0, _n) tab dup

Then you can look at the specific cases and decide on what to do. Maybe:

Code:

drop if dup>0

Robson Glasscock .
Thank you for the links.

HTML Code:

http://www.wrds.us/

is indeed not supported anymore.

Same for the Google Groups with so far only 3 posts in May. I had posted a (different) question in this forum (I found the link via www.wrds.us) a couple of weeks ago and have never received an answer.

Nick Cox
I don´t want to defend the question. As I said in my initial post: I am aware that the topic does not fit properly.
The reasons I still stated it is that
- this forum already had this topic in prior posts and years
- help seems to be rare according to Robson´s statement and my "research" on this topic (at least 20 hours working time, probably way more over the last weeks)
- some of the statalist users definitely make use of databases like compustat, crsp among others. So my thought was, that if we discuss on how to properly merge and work withdata, why not discuss how to correctly specify the variables needed such that not only the "technical" merge is correct but also the specification and alignment of variables.

However, I want to express that this forum is a truly rich source of in-depth knowledge, especially thanks to people like you and others. And I guess a foundation for its high quality is also to keep the content "on topic".

In that sense: I trust on your judgement and I won´t pose such questions in the future.

Last edited by Simon Vasquez; 23 May 2017, 05:45.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#6

23 May 2017, 06:25

Thanks for the several positives. I can't really add much to #2 and want to underline that it's just personal opinion.

But I am puzzled by any argument that many Stata users also do X, so can't we discuss X here?

Many Stata users are interested in clinical trials, how to recognise successful or failing firms, rainfall data (now that really is interesting), what the heck is happening in this crazy world, and much, much else, but it's best to keep discussion to questions where someone is asking about using Stata.

Disclosure: I can be as naughty as anyone else in slipping in occasional odd humour and asides that may seem off-topic.
Comment
David Benson

Join Date: Oct 2018

Posts: 489
#7

24 Oct 2018, 13:49

I know this is an older thread, but given that merging daily / monthly stock return data from CRSP to annual accounting data in COMPUSTAT, I thought I would pull together some helpful links.

If you have access to the Compustat-CRSP merged (CCM) database, they have done all the hard work of matching CRSP's PERMNO's to Compustat's GVKEY's (and matching up the dates.) No merging needed.

If like many of us, you don't have access (so you have to match them manually), Robson Glasscock's blog "Stata and Accounting Research" here has a number of useful entries. In particular see:
Merging CRSP and Compustat data (which itself points to a number of other helpful CRSP / COMPUSTAT resources

Dealing with GVKEY and DATATADATE (FYEAR) duplicates in Compustat in Stata

Rui Dai has a very helpful tutorial about CUSIP, GVKEY, PERMNO, ticker, etc as well as linking CRSP / Compustat data at http://www.ruidaiwrds.info/data/link...-and-compustat

Kai Chen has a number of posts on using Stata in finance & accounting research (Calculate CFO tenure with Execucomp in Stata, Handy Stata command to create Fama-French industry classifications based on SIC codes, etc). The relevant post on linking CRSP and Compustat is here

Luis Palacios excellent tutorial (as a PDF) is here

Ian Gow has some good insight on the Compustat-CRSP linktables here

And when all else fails, you can read the manual
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment