Question on merge results not consistent

Salem Alsanousi

Join Date: Apr 2021

Posts: 11
#1

Question on merge results not consistent

28 Apr 2021, 18:15

Hi Statalist community,

I am trying to merge, but every time I run it using the same do.file I get different results for the matched observations.

Code:

merge 1:1 permno event event_count using "04. Cleanfiles\04A. Eventus_CARs_clean.dta"

where event is the date of announcement of the merge/acquisition and event_count is basically a variable created to capture if there is more than one observation in events.

Basically, this is the code for event_count

Code:

bysort permno event: gen event_count = _n

Code:

* Example generated by -dataex-. For more info, type help dataex clear input double permno float(event event_count) 12236 17356 1 12308 19396 1 12308 20178 1 12308 20234 1 12330 18638 1 12330 19758 1 12431 14304 1 12431 14522 1 12431 14522 2 12431 14742 1 12431 17517 1 12448 19299 1 12449 19947 1 12477 19152 1 12490 13921 1 12490 14437 1 end format %d event

This is the first time I get inconsistent results from the merge command, and I am not sure how to troubleshoot it.

I am using Stata 16.1
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

28 Apr 2021, 18:39

event_count is basically a variable created to capture if there is more than one observation in events

However,

Code:

bysort permno event: gen event_count = _n

doesn't guarantee that for two observations with the same permno and event, the same observation will be 1 each time you run the command. The data is sorted, and ties are broken randomly.

What you are telling us is that each of your datasets may have multiple observations for a given combination of permno and event. Numbering them randomly to make the distinct is not a very meaningful way of matching the events.

Without knowing more about the problem you are working on, it's difficult to advise, but perhaps you want the joinby command.
1 like
Comment
Salem Alsanousi

Join Date: Apr 2021

Posts: 11
#3

28 Apr 2021, 23:55

Hi William Lisowski ,

Thank you for your response. I believe I found the culprit that is causing the problem. It started earlier in my merging, but now I have to figure out what is causing the problem.
Comment

Salem Alsanousi

Join Date: Apr 2021
Posts: 11

04 May 2021, 14:30

I found the code that is causing the merge to be inconsistent but I am not able to solve it.

Step1

Code:

gen ncusip6 = AcquirorCUSIP
gen fyear=year(DateAnnounced) 
gen USbased = 1 if AcquirorNation=="United States"
replace USbased = 0 if AcquirorNation!="United States"
drop if USbased ==0

sort ncusip6 fyear
bysort ncusip6 fyear: egen total_acq_num=count(USbased)
label variable total_acq_num "total number of acquistions for USbased acquiror"
bysort ncusip6 fyear: egen total_acq_mean=mean(ValueofTransactionmil)
label variable total_acq_mean "Mean value of Acquisition "
bysort ncusip6 fyear: egen total_acq_min=min(ValueofTransactionmil)
label variable total_acq_min "Min value of Acquisition "
bysort ncusip6 fyear: egen total_acq_max=max(ValueofTransactionmil)
label variable total_acq_max "Max value of Acquisition "
tempfile sdc123
save `sdc123', replace
clear

Step2:

Code:

use "01. Raw Data\stocknames_new.dta" 
gen ncusip6 = substr(ncusip,1,6)
bysort ncusip6: gen copies1 = _n
drop if copies1 >1
sort ncusip6
merge 1:m ncusip6 using `sdc123' // firm-event. Here no issues whatsoever on merge results. They are consistent
keep if _merge==3
gen cusip6 = substr(cusip,1,6)
drop _merge
sort cusip6 fyear
tempfile stock_SDC
save `stock_SDC', replace
clear

Step3:

Code:

use "04. Cleanfiles\03. Compustat-ExecuComp-TR-Bushee-Merge.dta" // firm-year
gen cusip6=substr(cusip,1,6)
label variable cusip6 "6 digit CUSIP"
sort cusip6 fyear
sort cusip6 cusip8 fyear 
by cusip6: gen cusip6_rep = _n
sort cusip8 cusip6 fyear
by cusip8: gen cusip8_rep = _n
drop if cusip_identical ==.
sort cusip6 fyear
merge 1:m cusip6 fyear using `stock_SDC'

What I mean by inconsistent is that the _merge==3 results are not the same every time I run the code.

I would appreciate any help to overcome this

Thank you

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

04 May 2021, 14:52

In step 2

Code:

bysort ncusip6: gen copies1 = _n drop if copies1 >1

results in dropping all but the first observation for each value of ncusip6, However, the output of help sort tells us

the ordering of observations with equal values of varlist is randomized

so on two successive runs, the retained observation for a given ncusip6 will perhaps be different. Apparently, for observations in "01. Raw Data\stocknames_new.dta" with the same value of ncusip6, some of the variables may have different values. This would seem to me to be a problem.

If you want to drop all but the earliest observation in the dataset with each value of ncusip6, change the above lines to

Code:

sort ncusip6, stable by ncusip6: gen copies1 = _n drop if copies1 >1
1 like
Comment
Salem Alsanousi

Join Date: Apr 2021

Posts: 11
#6

04 May 2021, 15:25

William Lisowski

Thank you! that works like a charm! I really appreciate it!
Comment

Announcement

Question on merge results not consistent

Comment

Comment

Comment

Comment

Comment