Merging data by matching

Chris Parfitt

Join Date: Jun 2018
Posts: 24

Merging data by matching

23 Jul 2018, 13:12

Hi all,

I currently have 3 data sets.
1) Deal level data on M&A's.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id_deal id_acquiror) str30 acquirorname_compu long acquirorass float id_target str46 targetname long targetass
  1   1 "ABBOTT LABORATORIES"                735   1 "KOS PHARMACEUTICALS INC"         695573
  2   1 "ABBOTT LABORATORIES"                735   2 "Solvay Pharmaceuticals SA"       734307
  3   3 "ASTRAZENECA PLC"                 770646   3 "Novexel SA"                      896419
  4   3 "ASTRAZENECA PLC"                 770646   4 "KuDOS Pharmaceuticals"           783835
  5   3 "ASTRAZENECA PLC"                 770646   5 "Cambridge Antibody Tech Grp"     739243
  6   3 "ASTRAZENECA PLC"                 770646   6 "NPS PHARMACEUTICALS INC"         719353
  7   3 "ASTRAZENECA PLC"                 770646   7 "MEDIMMUNE INC"                   773774
  8   3 "ASTRAZENECA PLC"                 770646   8 "Arrow Therapeutics Ltd"          801361
  9   9 "Abraxis BioScience Inc"          918063   9 "Shimoda Biotech (Pty) Ltd"       862380
 10  10 "Actavis Group hf"                933768  10 "Zhejiang Chiral Medicine Chem"   835576
 11  10 "Actavis Group hf"                933768  11 "Sindan"                         1026564
 12  12 "Aeterna Laboratories Inc"        729290  12 "Zentaris AG"                     791707
 13  13 "Akela Pharma Inc"               1033688  13 "Nventa Biopharmaceuticals Corp"  893508
 14  14 "Allergan Inc"                     19170  14 "EndoArt SA"                      856456
 15  15 "Alliance Pharmaceutical Corp"     19355  15 "Molecular Biosystems Inc"        382425
 16  16 "American Pharm Partners Inc"     823833  16 "American BioScience Inc"         762274
 17  17 "Amgen Inc"                       651581  17 "Abgenix Inc"                     755139
 18  17 "Amgen Inc"                       651581  18 "TULARIK INC"                     714795
 19  17 "Amgen Inc"                       651581  19 "Alantos Pharm Hldg Inc"          909253
 20  17 "Amgen Inc"                       651581  20 "IMMUNEX CORP"                    267870
end

2) Patent data for the target firms.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double patent float asscode str30 targetname
5714160 22715 "ALZA CORP"
5997527 22715 ""         
6465008 22715 ""         
7390780 22715 ""         
6276512 22715 "ALZA CORP"
6153678 22715 "ALZA CORP"
6195582 22715 ""         
6283953 22715 ""         
7338663 22715 ""         
5698119 22715 "ALZA CORP"
6975902 22715 ""         
6749575 22715 ""         
8383149 22715 ""         
6066619 22715 ""         
6508808 22715 "ALZA CORP"
6057374 22715 "ALZA CORP"
6764697 22715 "ALZA CORP"
5985324 22715 "ALZA CORP"
6224908 22715 "ALZA CORP"
6974589 22715 ""         
end

3) Patent data for the acquiring firms in the dataset.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double patent float asscode str30 acquirorname_compu
7365193 735 ""                   
6693136 735 "ABBOTT LABORATORIES"
8213001 735 "ABBOTT LABORATORIES"
6207380 735 "ABBOTT LABORATORIES"
7037927 735 "ABBOTT LABORATORIES"
7897332 735 ""                   
8071145 735 ""                   
7108710 735 "ABBOTT LABORATORIES"
6579882 735 "ABBOTT LABORATORIES"
7014864 735 ""                   
8455212 735 ""                   
8338466 735 ""                   
5883236 735 "ABBOTT LABORATORIES"
6379671 735 ""                   
6093179 735 "ABBOTT LABORATORIES"
6521790 735 "ABBOTT LABORATORIES"
6596302 735 "ABBOTT LABORATORIES"
8137618 735 "ABBOTT LABORATORIES"
8480691 735 ""                   
6437106 735 ""                   
end

The datasets 2 and 3 are just snippets of the patent data. Each target and acquiring firm have a compustat name and unique assignee code (asscode). I guess I have to do a merge twice, once by ass code, and once by name, then I want to remove the duplicates.

My aim is to have the M&A deal dataset (1) with all the patents of the acquiring and target firm (after that I plan to exploit the duplicates to see the overlaps). For example, all the patents corresponding to acquirorname = "ABBOT LABORATORIES" as well as all the patents acquirorass = 735 from dataset 3), as well as all the patents from dataset 2) corresponding to targetname = "KOS PHARMACEUTICALS INC" and targetass = "695573". There are more variables not in the dataex snippet from dataset 2) and 3) (such as number of forward citations, date, etc) but I left those out for simplicity sake.

I generated an id_deal, and also a unique ID for each acquiror (id_acquiror) as well as an id for the target id_target. I wasn't sure how to proceed from here.

Any help would be greatly appreciated!

Tags: None

Chris Parfitt

Join Date: Jun 2018

Posts: 24
#2

23 Jul 2018, 19:03

If there are any issues with my question, please let me know
Comment
Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#3

23 Jul 2018, 19:14

I'm having a hard time conceptualizing what the final dataset would look like. For example, how would it be organized for deal==1 (which variables would you have and how would it be organized)?

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
1 like
Comment

Daniel Bela

Join Date: Apr 2014
Posts: 246

24 Jul 2018, 02:52

Hi,

I second Carole's point: There is several possible variants for the final data layout; my preferred one would be a long layout (and, from what you write, I assume that this also is what you're interested in) reflecting one observation per target-patent/acquiror-patent combination per deal.

If this assumption is correct, you can achieve this faster with -joinby- instead of -merge-:

Code:

/* create temporary dataset with the example data */
tempfile deals targets acquirors
clear
input float(id_deal id_acquiror) str30 acquirorname_compu long acquirorass float id_target str46 targetname long targetass
  1   1 "ABBOTT LABORATORIES"                735   1 "KOS PHARMACEUTICALS INC"         695573
  2   1 "ABBOTT LABORATORIES"                735   2 "Solvay Pharmaceuticals SA"       734307
  3   3 "ASTRAZENECA PLC"                 770646   3 "Novexel SA"                      896419
  4   3 "ASTRAZENECA PLC"                 770646   4 "KuDOS Pharmaceuticals"           783835
  5   3 "ASTRAZENECA PLC"                 770646   5 "Cambridge Antibody Tech Grp"     739243
  6   3 "ASTRAZENECA PLC"                 770646   6 "NPS PHARMACEUTICALS INC"         719353
  7   3 "ASTRAZENECA PLC"                 770646   7 "MEDIMMUNE INC"                   773774
  8   3 "ASTRAZENECA PLC"                 770646   8 "Arrow Therapeutics Ltd"          801361
  9   9 "Abraxis BioScience Inc"          918063   9 "Shimoda Biotech (Pty) Ltd"       862380
 10  10 "Actavis Group hf"                933768  10 "Zhejiang Chiral Medicine Chem"   835576
 11  10 "Actavis Group hf"                933768  11 "Sindan"                         1026564
 12  12 "Aeterna Laboratories Inc"        729290  12 "Zentaris AG"                     791707
 13  13 "Akela Pharma Inc"               1033688  13 "Nventa Biopharmaceuticals Corp"  893508
 14  14 "Allergan Inc"                     19170  14 "EndoArt SA"                      856456
 15  15 "Alliance Pharmaceutical Corp"     19355  15 "Molecular Biosystems Inc"        382425
 16  16 "American Pharm Partners Inc"     823833  16 "American BioScience Inc"         762274
 17  17 "Amgen Inc"                       651581  17 "Abgenix Inc"                     755139
 18  17 "Amgen Inc"                       651581  18 "TULARIK INC"                     714795
 19  17 "Amgen Inc"                       651581  19 "Alantos Pharm Hldg Inc"          909253
 20  17 "Amgen Inc"                       651581  20 "IMMUNEX CORP"                    267870
end
save `deals'

clear
input double patent float asscode str30 targetname
5714160 22715 "ALZA CORP"
5997527 22715 ""        
6465008 22715 ""        
7390780 22715 ""        
6276512 22715 "ALZA CORP"
6153678 22715 "ALZA CORP"
6195582 22715 ""        
6283953 22715 ""        
7338663 22715 ""        
5698119 22715 "ALZA CORP"
6975902 22715 ""        
6749575 22715 ""        
8383149 22715 ""        
6066619 22715 ""        
6508808 22715 "ALZA CORP"
6057374 22715 "ALZA CORP"
6764697 22715 "ALZA CORP"
5985324 22715 "ALZA CORP"
6224908 22715 "ALZA CORP"
6974589 22715 ""        
end
save `targets'

clear
input double patent float asscode str30 acquirorname_compu
7365193 735 ""                  
6693136 735 "ABBOTT LABORATORIES"
8213001 735 "ABBOTT LABORATORIES"
6207380 735 "ABBOTT LABORATORIES"
7037927 735 "ABBOTT LABORATORIES"
7897332 735 ""                  
8071145 735 ""                  
7108710 735 "ABBOTT LABORATORIES"
6579882 735 "ABBOTT LABORATORIES"
7014864 735 ""                  
8455212 735 ""                  
8338466 735 ""                  
5883236 735 "ABBOTT LABORATORIES"
6379671 735 ""                  
6093179 735 "ABBOTT LABORATORIES"
6521790 735 "ABBOTT LABORATORIES"
6596302 735 "ABBOTT LABORATORIES"
8137618 735 "ABBOTT LABORATORIES"
8480691 735 ""                  
6437106 735 ""                  
end
save `acquirors'

* open deals
use `deals'

* rename id variable
rename acquirorass asscode
* join with acquirors' patent data
joinby asscode using `acquirors' , unmatched(master)
drop _merge
* re-rename id variable and merged patent number
rename (patent asscode) (acquiror_patent acquirorass)

* rename id variable
rename targetass asscode
* join with targets' patent data
joinby asscode using `targets' , unmatched(master)
drop _merge
* re-rename id variable and merged patent number
rename (patent asscode) (target_patent targetass)

Note that your sample data does not contain a match for the targets' id variable; thus, no target patent information can be joined. If you search and replace the id value 22715 with 695573 within the sample data, you can inspect how a matched result looks like for deal 1.

Secondly, my code assumes that the identifier for a firm is its asscode – despite the fact that the sample data does not contain consistent firm names for all of the ids (because of empty firm names). I don't know if my approach is appropriate for the full dataset.

Kind regards
Bela

Last edited by Daniel Bela; 24 Jul 2018, 03:00. Reason: formatting, typos

Comment

Chris Parfitt

Join Date: Jun 2018
Posts: 24

24 Jul 2018, 05:50

Hi Carole and Daniel,

Thanks for your responses. To clarify, I would ideally like a dataset formatted as follows.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(id_deal id_acquiror) str19 acquirorname int acquirorass long patent_acq str19 acquirorname_compu byte id_target str23 targetname long(targetass patent_tar) str23 targetname_compu
1 1 "ABBOTT LABORATORIES" 735 7365193 ""                    1 "KOS PHARMACEUTICALS INC" 695573 6491950 "KOS PHARMACEUTICALS INC"
1 1 "ABBOTT LABORATORIES" 735 6693136 "ABBOTT LABORATORIES" 1 "KOS PHARMACEUTICALS INC" 695573 6596308 "KOS PHARMACEUTICALS INC"
1 1 "ABBOTT LABORATORIES" 735 8213001 "ABBOTT LABORATORIES" 1 "KOS PHARMACEUTICALS INC" 695573 6406715 "KOS PHARMACEUTICALS INC"
1 1 "ABBOTT LABORATORIES" 735 6207380 "ABBOTT LABORATORIES" 1 "KOS PHARMACEUTICALS INC" 695573 7011848 "KOS PHARMACEUTICALS INC"
1 1 "ABBOTT LABORATORIES" 735 7037927 "ABBOTT LABORATORIES" 1 "KOS PHARMACEUTICALS INC" 695573 7056957 ""                       
1 1 "ABBOTT LABORATORIES" 735 7897332 ""                    1 "KOS PHARMACEUTICALS INC" 695573 6746691 "KOS PHARMACEUTICALS INC"
1 1 "ABBOTT LABORATORIES" 735 8071145 ""                    1 "KOS PHARMACEUTICALS INC" 695573 6524615 "KOS PHARMACEUTICALS INC"
1 1 "ABBOTT LABORATORIES" 735 7108710 "ABBOTT LABORATORIES" 1 "KOS PHARMACEUTICALS INC" 695573 6155251 ""                       
1 1 "ABBOTT LABORATORIES" 735 6579882 "ABBOTT LABORATORIES" 1 "KOS PHARMACEUTICALS INC" 695573 6676967 "KOS PHARMACEUTICALS INC"
1 1 "ABBOTT LABORATORIES" 735 7014864 ""                    1 "KOS PHARMACEUTICALS INC" 695573 7056494 ""                       
1 1 "ABBOTT LABORATORIES" 735 8455212 ""                    1 "KOS PHARMACEUTICALS INC" 695573 6818229 "KOS PHARMACEUTICALS INC"
1 1 "ABBOTT LABORATORIES" 735 8338466 ""                    1 "KOS PHARMACEUTICALS INC" 695573 6020305 "KOS PHARMACEUTICALS INC"
1 1 "ABBOTT LABORATORIES" 735 5883236 "ABBOTT LABORATORIES" 1 "KOS PHARMACEUTICALS INC" 695573 5855884 "KOS PHARMACEUTICALS INC"
1 1 "ABBOTT LABORATORIES" 735 6379671 ""                    . "."                            .       . ""                       
1 1 "ABBOTT LABORATORIES" 735 6093179 "ABBOTT LABORATORIES" . "."                            .       . ""                       
1 1 "ABBOTT LABORATORIES" 735 6521790 "ABBOTT LABORATORIES" . "."                            .       . ""                       
1 1 "ABBOTT LABORATORIES" 735 6596302 "ABBOTT LABORATORIES" . "."                            .       . ""                       
1 1 "ABBOTT LABORATORIES" 735 8137618 "ABBOTT LABORATORIES" . "."                            .       . ""                       
1 1 "ABBOTT LABORATORIES" 735 8480691 ""                    . "."                            .       . ""                       
1 1 "ABBOTT LABORATORIES" 735 6437106 ""                    . "."                            .       . ""                       
1 1 "ABBOTT LABORATORIES" 735 6471436 ""                    . "."                            .       . ""                       
1 1 "ABBOTT LABORATORIES" 735 6723844 "ABBOTT LABORATORIES" . "."                            .       . ""                       
1 1 "ABBOTT LABORATORIES" 735 7939544 "ABBOTT LABORATORIES" . "."                            .       . ""                       
1 1 "ABBOTT LABORATORIES" 735 6377894 ""                    . "."                            .       . ""                       
1 1 "ABBOTT LABORATORIES" 735 7786153 ""                    . "."                            .       . ""                       
1 1 "ABBOTT LABORATORIES" 735 5998593 "ABBOTT LABORATORIES" . "."                            .       . ""                       
1 1 "ABBOTT LABORATORIES" 735 5990135 "ABBOTT LABORATORIES" . "."                            .       . ""                       
1 1 "ABBOTT LABORATORIES" 735 8509870 ""                    . "."                            .       . ""                       
1 1 "ABBOTT LABORATORIES" 735 7015233 "ABBOTT LABORATORIES" . "."                            .       . ""                       
1 1 "ABBOTT LABORATORIES" 735 5698522 "ABBOTT LABORATORIES" . "."                            .       . ""                       
end

That would be the example for one deal (ABBOT AND KOS PHARMACEUTICAL.) I renamed some variables to uniquely identify patent_acquirer and patent_target.

Does that help to clarify?

Comment

Daniel Bela

Join Date: Apr 2014

Posts: 246
#6

24 Jul 2018, 06:14

Originally posted by Chris Parfitt View Post

Hi Carole and Daniel,
[…]
Does that help to clarify?

Not to me, no. I'm sorry if I'm dull here.

Again: Your sample data does not contain a single patent for "KOS PHARMACEUTICALS INC" (targetass==695573). Where should the information come from?

And why are there empty strings for some of the observations in targetname_compu and acquirorname_compu?

Are the "unmatched" observations (in rows 14 through 30) in your results data the patents that originate from the acquirors' patents dataset? If so, why are there only 17 of them, despite 20 patents for "ABBOTT LABORATORIES" in the acquirors' patents sample data?

And finally (this, I guess, is the most crucial question to solve my confusion): What's the matching logic between the patents? Why, in the first observation, is acquiror's patent 7365193 matched to target's patent 6491950, and not to any other of the patents?

Kind regards
Bela
Comment

Chris Parfitt

Join Date: Jun 2018
Posts: 24

24 Jul 2018, 07:06

Hi Bela,

Sorry if I explained poorly.

KOS PHARMACEUTICALS is a target firm, and it's corresponding variables (targetass, name, etc) are in the 2) dataset, just like ALZA CORP.

There are empty strings because before obtaining the dataset, I merged two similar datasets by assignee code, but one didn't have the acquirorname_compu and targetname_compu, while the other did. The empty cells are when there is a patent which was in the dataset without the acquirorname_compu and targetname_compu variables. After removing the duplicates, I got dataset 2)

KOS PHARMACEUTICALS has unmatched patents because they only have 13 patents recorded (matched by name/assignee code) while ABBOTT have a lot more (thousands, but I didn't list them all)

As to answer your last question, I don't neccessarily want to match each patent to each other. I just want a list of all patents of ABBOT, and a list of all patents of KOS PHARMACEUTICALS. From here, the next stage in my research is to consider things like the quantity of overlapping patent by class.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(id_deal id_acquiror) str19 acquirorname int acquirorass long patent_acq int class_acq str19 acquirorname_compu byte id_target str23 targetname long(targetass patent_tar) int class_tar str23 targetname_compu
1 1 "ABBOTT LABORATORIES" 735 7365193 540 ""                    1 "KOS PHARMACEUTICALS INC" 695573 6491950 424 "KOS PHARMACEUTICALS INC"
1 1 "ABBOTT LABORATORIES" 735 6693136 514 "ABBOTT LABORATORIES" 1 "KOS PHARMACEUTICALS INC" 695573 6596308 424 "KOS PHARMACEUTICALS INC"
1 1 "ABBOTT LABORATORIES" 735 8213001 356 "ABBOTT LABORATORIES" 1 "KOS PHARMACEUTICALS INC" 695573 6406715 424 "KOS PHARMACEUTICALS INC"
1 1 "ABBOTT LABORATORIES" 735 6207380 435 "ABBOTT LABORATORIES" 1 "KOS PHARMACEUTICALS INC" 695573 7011848 424 "KOS PHARMACEUTICALS INC"
1 1 "ABBOTT LABORATORIES" 735 7037927 514 "ABBOTT LABORATORIES" 1 "KOS PHARMACEUTICALS INC" 695573 7056957 521 ""                       
1 1 "ABBOTT LABORATORIES" 735 7897332 435 ""                    1 "KOS PHARMACEUTICALS INC" 695573 6746691 424 "KOS PHARMACEUTICALS INC"
1 1 "ABBOTT LABORATORIES" 735 8071145 426 ""                    1 "KOS PHARMACEUTICALS INC" 695573 6524615 424 "KOS PHARMACEUTICALS INC"
1 1 "ABBOTT LABORATORIES" 735 7108710 606 "ABBOTT LABORATORIES" 1 "KOS PHARMACEUTICALS INC" 695573 6155251 128 ""                       
1 1 "ABBOTT LABORATORIES" 735 6579882 514 "ABBOTT LABORATORIES" 1 "KOS PHARMACEUTICALS INC" 695573 6676967 424 "KOS PHARMACEUTICALS INC"
1 1 "ABBOTT LABORATORIES" 735 7014864 424 ""                    1 "KOS PHARMACEUTICALS INC" 695573 7056494 424 ""                       
1 1 "ABBOTT LABORATORIES" 735 8455212 435 ""                    1 "KOS PHARMACEUTICALS INC" 695573 6818229 424 "KOS PHARMACEUTICALS INC"
1 1 "ABBOTT LABORATORIES" 735 8338466 514 ""                    1 "KOS PHARMACEUTICALS INC" 695573 6020305 424 "KOS PHARMACEUTICALS INC"
1 1 "ABBOTT LABORATORIES" 735 5883236 536 "ABBOTT LABORATORIES" 1 "KOS PHARMACEUTICALS INC" 695573 5855884 424 "KOS PHARMACEUTICALS INC"
1 1 "ABBOTT LABORATORIES" 735 6379671 424 ""                    . "."                            .       .   . ""                       
1 1 "ABBOTT LABORATORIES" 735 6093179 604 "ABBOTT LABORATORIES" . "."                            .       .   . ""                       
1 1 "ABBOTT LABORATORIES" 735 6521790 564 "ABBOTT LABORATORIES" . "."                            .       .   . ""                       
1 1 "ABBOTT LABORATORIES" 735 6596302 424 "ABBOTT LABORATORIES" . "."                            .       .   . ""                       
1 1 "ABBOTT LABORATORIES" 735 8137618 422 "ABBOTT LABORATORIES" . "."                            .       .   . ""                       
1 1 "ABBOTT LABORATORIES" 735 8480691 606 ""                    . "."                            .       .   . ""                       
1 1 "ABBOTT LABORATORIES" 735 6437106 536 ""                    . "."                            .       .   . ""                       
1 1 "ABBOTT LABORATORIES" 735 6471436 403 ""                    . "."                            .       .   . ""                       
1 1 "ABBOTT LABORATORIES" 735 6723844 540 "ABBOTT LABORATORIES" . "."                            .       .   . ""                       
1 1 "ABBOTT LABORATORIES" 735 7939544 514 "ABBOTT LABORATORIES" . "."                            .       .   . ""                       
1 1 "ABBOTT LABORATORIES" 735 6377894 702 ""                    . "."                            .       .   . ""                       
1 1 "ABBOTT LABORATORIES" 735 7786153 514 ""                    . "."                            .       .   . ""                       
1 1 "ABBOTT LABORATORIES" 735 5998593 536 "ABBOTT LABORATORIES" . "."                            .       .   . ""                       
1 1 "ABBOTT LABORATORIES" 735 5990135 514 "ABBOTT LABORATORIES" . "."                            .       .   . ""                       
1 1 "ABBOTT LABORATORIES" 735 8509870 600 ""                    . "."                            .       .   . ""                       
1 1 "ABBOTT LABORATORIES" 735 7015233 514 "ABBOTT LABORATORIES" . "."                            .       .   . ""                       
1 1 "ABBOTT LABORATORIES" 735 5698522 514 "ABBOTT LABORATORIES" . "."                            .       .   . ""                       
end

I included the class of the patent (taken from my patent datasets). The next step of my research is to consider the number of patents KOS has in this example which have the same class as ABBOTT (I would look at the number of duplicating).

Example:

KOS PHARMACEUTICALS has
11 patents where class_tar = 424
1 patent where class_tar = 521
1 patent where class_tar = 128

Then I check if ABBOTT has any patents where class_acq = 424 or 521 or 128.
I see that they do indeed have at least one patent where class_acq = 424 (the number that the acquirer has is not important). However, they don't have a patent where class_acq = 521 or class_acq = 128.

If there is an "overlap" in the class_tar and class_acq, then I say there is a "knowledge overlap". If not, there is a "knowledge nonoverlap"
Hence, 11 patents of KOS are knowledge overlapped, and 2 patents are knowledge nonoverlapped.

Does this clear things up?
Thanks!
Chris

Comment

Daniel Bela

Join Date: Apr 2014
Posts: 246

26 Jul 2018, 10:02

I don't see where the data structure you're presenting helps you calculating what you want to know. Matching observations "horizontally" is not the usual way Stata performs best. A long data layout seems more appropriate, I guess.

Additionally, writing this took me some time, as you (even upon request) did not present data examples that help to produce the structure you're up to. Also, the late appearance of the variables you're interested in (the patents' classification variables) made this whole effort a little cumbersome. This (or my resulting lack of motivation) is why it took me so long to write this answer, and, supposedly, explains why no other answers have been given to your inquiry. Sorry about that.

Thus said, I created a minimal example that contains all features of your data that I (by now) am aware of; for ease of presentation, it contains two deals only, with three involved firms. Each of the firms owns five patents. The first acquired firm contains two patents with overlapping patent classes to the acquiror's portfolio, the second one contains four of them.

Given my data sample's structure matches your real data, this is how to calculate the share you're interested in:

Code:

/* create temporary datasets with the example data */
tempfile deals targets acquirors joined_acquirors
clear
input float(id_deal id_acquiror) str30 acquirorname_compu long acquirorass float id_target str46 targetname long targetass
 1 1 "ABBOTT LABORATORIES" 735 1 "KOS PHARMACEUTICALS INC" 695573
 2 1 "ABBOTT LABORATORIES" 735 2 "Solvay Pharmaceuticals SA" 734307
end
save `deals'

clear
input double patent float asscode str30 targetname class
5714160 695573 "KOS PHARMACEUTICALS INC" 1
5997527 695573 "" 1
6465008 695573 "" 6
7390780 695573 "" 7
6276512 695573 "KOS PHARMACEUTICALS INC" 8
6714160 734307 "Solvay Pharmaceuticals SA" 3
6997527 734307 "" 3
7465008 734307 "Solvay Pharmaceuticals SA" 3
8390780 734307 "" 9
7276512 734307 "Solvay Pharmaceuticals SA" 10
end
save `targets'

clear
input double patent float asscode str30 acquirorname_compu class
7365193 735 "" 1
6693136 735 "ABBOTT LABORATORIES" 1
8213001 735 "ABBOTT LABORATORIES" 3
6207380 735 "ABBOTT LABORATORIES" 10
7037927 735 "ABBOTT LABORATORIES" 3
end
save `acquirors'

/* procedure starts here */
clear
* open the deals daaset
use `deals'
* generate an (empty) indicator to reflect the patent type (pre-owned vs acquired)
generate byte patenttype=.
label define type_lbl 1 "acquiror's patent" 2 "target's patent"
label values patenttype type_lbl

* join all acquirors' patents from the aquired-patents-file , temporarily save
preserve
rename acquirorass asscode
joinby asscode using `acquirors' , unmatched(master)
rename asscode acquirorass
replace patenttype=1 if (_merge==3)
drop _merge
save `joined_acquirors'
restore

* join all targets' patents from the target-patents-file
rename targetass asscode
joinby asscode using `targets' , unmatched(master)
rename asscode targetass
replace patenttype=2 if (_merge==3)
drop _merge

* append both joined files together
append using `joined_acquirors'

* calculate number of overlapping patent classes per id_deal
tempvar acquired_num overlapping overlaps overlapshare
bysort id_deal (class patenttype) :  generate `overlapping'=(class==class[_n-1]) if (patenttype==2)
bysort id_deal patenttype : generate `acquired_num'=_N if (patenttype==2)
bysort id_deal : egen `overlaps'=total(`overlapping') if (patenttype==2)
generate `overlapshare'=`overlaps'/`acquired_num' if (patenttype==2)
bysort id_deal : egen deal_overlap=mode(`overlapshare')
drop `acquired_num' `overlapping' `overlaps' `overlapshare'

* re-sort and view (better not to do this on the full data, maybe?)
sort id_deal id_acquiror id_target patenttype
compress
list , sepby(id_deal)

* if you want to reduce your data back to the information on deals, eventually:
keep id_deal id_acquiror acquirorname_compu acquirorass id_target targetname targetass deal_overlap
duplicates drop
list

Especially the part that calculates the share of "overlapping" patents is probably not written in the most efficient way. Depending on the size of your data, it may need some time to execute, if used unmodified.

I have the feeling that all of this has to be expanded to include some time indicator. I assume the deals that are documented in your data did not take place simultaneously. This would mean that the set of patents in possession of acquiror 1 will increase after deal 1, changing the base line portfolio for calculation overlaps in deal 2. If this is true, there has to be some kind of time or date measure that should be taken into account. This makes the whole -joinby- operation a little (but not necessarily much) more complex.

Kind regards
Bela

Announcement

Merging data by matching

Comment

Comment

Comment

Comment

Comment

Comment

Comment