Problem with merging of datasets

Gianno van Well

Join Date: May 2020

Posts: 2
#1

Problem with merging of datasets

04 May 2020, 06:27

Hello,

For a research I want to combine 3 datasets. First I want to combine the AuditAnalytics dataset with Compustat;North-America Daily;fundamentals annual (both from WRDS-Wharton). I took the years 2006 till 2009 and selected "CIK" for company codes. After selecting all the available variables, I wanted to merge these 2 datasets into 1 by using "CIK" as the main variable to base the merging on. However, when I try to merge, I get the message: "variable cik does not uniquely identify observations in the master data. How can I solve this?

After the merging of AuditAnalytics and Compustat, I want to merge the BoardEx dataset in it as well. However, the first step is already going wrong.
I hope somebody can help me.

Thanks in advance for any help!
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

04 May 2020, 08:23

Without knowing your data - and most members of Statalist have no familiarity with the datasets you describe - we are limited to guessing what might be the problem.

Here is my guess. You seem to suggest that the two datasets from WRDS-Wharton have observations for 2006, 2007, 2008, and 2009 for each company code. If that is the case, your should be merging by CIK and year. If just one of them has observations for each year, then instead of merge 1:1 you need merge 1:m or merge m:1.

But the first step is to better understand the merging of data by reading the complete documentation for the merge command found in the Stata Data Management Reference Manual PDF included in your Stata installation and accessible from Stata's Help menu.
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#3

05 May 2020, 11:16

To add to William's helpful comment, it is possible in Compustat to have multiple entries for the same firm – year. The way to check this is with the duplicates command. If you have such duplicates, you then need to get rid of the excess assuming the observations are all the same on all of the variables. Otherwise, you have to figure out which are the right ones for your analysis.

You can also get the duplicates problem if you have missing values for one of the variables on which you plan to merge – the multiple missing will be treated as if they are one value creating the duplicate.

I'm not sure if there is a option that eliminates the redundancy when you originally pull the data from WRDS.
1 like
Comment

Announcement

Problem with merging of datasets

Comment

Comment