Dear Members,
I currently have two datasets "SDC - Mergers and Acquisitions" (A) and "CRSP/Compustat Merged - Fundamentals Annual" (B).
In the dataset A, there are two variables "acusip" (buyer) and "master_cusip" (seller) each showing a 6-digit code which identifies the individual company in a M&A deal. In the dataset B there is a variable "cusip" which shows a 9-digit code (the last 3 are irrelevant) - I have already shortened this to 6 digits so that it is comparable with the other dataset.
Now to my problem: I would like to drop ALL observations in the dataset "SDC - Mergers and Acquisitions" where the two variables "acusip" AND "master_cusip" do not match any value of the variable "cusip" in the 2nd dataset. It is important that both, "acusip" & "master_cusip" occur somewhere in the variable "cusip".
My best guess was the following:
Unfortunately, what seems to happen with this attempt is that the duplicates for the first dataset (where after append everything is empty at the column "cusip") are just duplicates that are present in dataset A anyway. Related to the inserted "cusip" column, the values begin only after the actual dataset A and there again the duplicates are calculated only for these values. Thus far too many observations are dropped... how do I get that best fixed?
Ps.: Merge would probably also be a very useful idea. But unfortunately I am not sure about the application. In dataset A the codes can occur multiple times (i.e. not unique) and in dataset B the codes occur multiple times as well.
I hope someone can help me with this problem! Thanks in advance!
Best regards
Bam
I currently have two datasets "SDC - Mergers and Acquisitions" (A) and "CRSP/Compustat Merged - Fundamentals Annual" (B).
In the dataset A, there are two variables "acusip" (buyer) and "master_cusip" (seller) each showing a 6-digit code which identifies the individual company in a M&A deal. In the dataset B there is a variable "cusip" which shows a 9-digit code (the last 3 are irrelevant) - I have already shortened this to 6 digits so that it is comparable with the other dataset.
Now to my problem: I would like to drop ALL observations in the dataset "SDC - Mergers and Acquisitions" where the two variables "acusip" AND "master_cusip" do not match any value of the variable "cusip" in the 2nd dataset. It is important that both, "acusip" & "master_cusip" occur somewhere in the variable "cusip".
My best guess was the following:
Code:
append using "Input/CRSP/Compustat Merged - Fundamentals Annual.dta", keep(cusip) gen cusip_adj = substr(cusip,1,length(cusip)-3) duplicates tag cusip_adj acusip, generate(dupl_acquirer) duplicates tag cusip_adj master_cusip, generate(dupl_target) drop if dupl_acquirer == 0 drop if dupl_target == 0 t
Ps.: Merge would probably also be a very useful idea. But unfortunately I am not sure about the application. In dataset A the codes can occur multiple times (i.e. not unique) and in dataset B the codes occur multiple times as well.
I hope someone can help me with this problem! Thanks in advance!
Best regards
Bam
Comment