Help – How to drop observations depending on multiple variables in 2 datasets?

Bam Bovitzki

Join Date: Jan 2023

Posts: 13
#1

Help – How to drop observations depending on multiple variables in 2 datasets?

17 Jan 2023, 10:40

Dear Members,

I currently have two datasets "SDC - Mergers and Acquisitions" (A) and "CRSP/Compustat Merged - Fundamentals Annual" (B).
In the dataset A, there are two variables "acusip" (buyer) and "master_cusip" (seller) each showing a 6-digit code which identifies the individual company in a M&A deal. In the dataset B there is a variable "cusip" which shows a 9-digit code (the last 3 are irrelevant) - I have already shortened this to 6 digits so that it is comparable with the other dataset.

Now to my problem: I would like to drop ALL observations in the dataset "SDC - Mergers and Acquisitions" where the two variables "acusip" AND "master_cusip" do not match any value of the variable "cusip" in the 2nd dataset. It is important that both, "acusip" & "master_cusip" occur somewhere in the variable "cusip".

My best guess was the following:

Code:

append using "Input/CRSP/Compustat Merged - Fundamentals Annual.dta", keep(cusip) gen cusip_adj = substr(cusip,1,length(cusip)-3) duplicates tag cusip_adj acusip, generate(dupl_acquirer) duplicates tag cusip_adj master_cusip, generate(dupl_target) drop if dupl_acquirer == 0 drop if dupl_target == 0 t

Unfortunately, what seems to happen with this attempt is that the duplicates for the first dataset (where after append everything is empty at the column "cusip") are just duplicates that are present in dataset A anyway. Related to the inserted "cusip" column, the values begin only after the actual dataset A and there again the duplicates are calculated only for these values. Thus far too many observations are dropped... how do I get that best fixed?

Ps.: Merge would probably also be a very useful idea. But unfortunately I am not sure about the application. In dataset A the codes can occur multiple times (i.e. not unique) and in dataset B the codes occur multiple times as well.

I hope someone can help me with this problem! Thanks in advance!

Best regards
Bam
Tags: data, drop, merge

Hemanshu Kumar

Join Date: Mar 2015
Posts: 1385

17 Jan 2023, 10:51

Perhaps something like this? (warning: untested code; also, you will need to fix file names and paths)

Code:

use dataset_A, clear
keep acusip
duplicates drop acusip, force
rename acusip cusip
tempfile acusips
save `acusips'

use dataset_A, clear
keep master_cusip
duplicates drop master_cusip, force
rename master_cusip cusip
tempfile master_cusips
save `master_cusips'

use dataset_B, clear
merge m:1 cusip using `acusips', keep(3) nogen
merge m:1 cusip using `master_cusips', keep(3) nogen

Last edited by Hemanshu Kumar; 17 Jan 2023, 11:08.

Comment

Bam Bovitzki

Join Date: Jan 2023

Posts: 13
#3

18 Jan 2023, 04:04

Thanks for the fast reply! This actually solved my problem :-)
Comment

Announcement

Help – How to drop observations depending on multiple variables in 2 datasets?

Comment

Comment