Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help – How to drop observations depending on multiple variables in 2 datasets?

    Dear Members,

    I currently have two datasets "SDC - Mergers and Acquisitions" (A) and "CRSP/Compustat Merged - Fundamentals Annual" (B).
    In the dataset A, there are two variables "acusip" (buyer) and "master_cusip" (seller) each showing a 6-digit code which identifies the individual company in a M&A deal. In the dataset B there is a variable "cusip" which shows a 9-digit code (the last 3 are irrelevant) - I have already shortened this to 6 digits so that it is comparable with the other dataset.

    Now to my problem: I would like to drop ALL observations in the dataset "SDC - Mergers and Acquisitions" where the two variables "acusip" AND "master_cusip" do not match any value of the variable "cusip" in the 2nd dataset. It is important that both, "acusip" & "master_cusip" occur somewhere in the variable "cusip".

    My best guess was the following:

    Code:
    append using "Input/CRSP/Compustat Merged - Fundamentals Annual.dta", keep(cusip)
    
    gen cusip_adj = substr(cusip,1,length(cusip)-3)
    
    duplicates tag cusip_adj acusip, generate(dupl_acquirer)
    
    duplicates tag cusip_adj master_cusip, generate(dupl_target)
    
    drop if dupl_acquirer == 0
    
    drop if dupl_target == 0
    t
    Unfortunately, what seems to happen with this attempt is that the duplicates for the first dataset (where after append everything is empty at the column "cusip") are just duplicates that are present in dataset A anyway. Related to the inserted "cusip" column, the values begin only after the actual dataset A and there again the duplicates are calculated only for these values. Thus far too many observations are dropped... how do I get that best fixed?

    Ps.: Merge would probably also be a very useful idea. But unfortunately I am not sure about the application. In dataset A the codes can occur multiple times (i.e. not unique) and in dataset B the codes occur multiple times as well.

    I hope someone can help me with this problem! Thanks in advance!

    Best regards
    Bam

  • #2
    Perhaps something like this? (warning: untested code; also, you will need to fix file names and paths)

    Code:
    use dataset_A, clear
    keep acusip
    duplicates drop acusip, force
    rename acusip cusip
    tempfile acusips
    save `acusips'
    
    use dataset_A, clear
    keep master_cusip
    duplicates drop master_cusip, force
    rename master_cusip cusip
    tempfile master_cusips
    save `master_cusips'
    
    use dataset_B, clear
    merge m:1 cusip using `acusips', keep(3) nogen
    merge m:1 cusip using `master_cusips', keep(3) nogen
    Last edited by Hemanshu Kumar; 17 Jan 2023, 11:08.

    Comment


    • #3
      Thanks for the fast reply! This actually solved my problem :-)

      Comment

      Working...
      X