Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question on merge results not consistent

    Hi Statalist community,

    I am trying to merge, but every time I run it using the same do.file I get different results for the matched observations.

    Code:
    merge 1:1 permno event event_count using "04. Cleanfiles\04A. Eventus_CARs_clean.dta"
    where event is the date of announcement of the merge/acquisition and event_count is basically a variable created to capture if there is more than one observation in events.

    Basically, this is the code for event_count
    Code:
    bysort permno event: gen event_count = _n
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double permno float(event event_count)
    12236 17356 1
    12308 19396 1
    12308 20178 1
    12308 20234 1
    12330 18638 1
    12330 19758 1
    12431 14304 1
    12431 14522 1
    12431 14522 2
    12431 14742 1
    12431 17517 1
    12448 19299 1
    12449 19947 1
    12477 19152 1
    12490 13921 1
    12490 14437 1
    end
    format %d event

    This is the first time I get inconsistent results from the merge command, and I am not sure how to troubleshoot it.

    I am using Stata 16.1

  • #2
    event_count is basically a variable created to capture if there is more than one observation in events
    However,
    Code:
    bysort permno event: gen event_count = _n
    doesn't guarantee that for two observations with the same permno and event, the same observation will be 1 each time you run the command. The data is sorted, and ties are broken randomly.

    What you are telling us is that each of your datasets may have multiple observations for a given combination of permno and event. Numbering them randomly to make the distinct is not a very meaningful way of matching the events.

    Without knowing more about the problem you are working on, it's difficult to advise, but perhaps you want the joinby command.

    Comment


    • #3
      Hi William Lisowski ,

      Thank you for your response. I believe I found the culprit that is causing the problem. It started earlier in my merging, but now I have to figure out what is causing the problem.

      Comment


      • #4
        I found the code that is causing the merge to be inconsistent but I am not able to solve it.

        Step1
        Code:
        gen ncusip6 = AcquirorCUSIP
        gen fyear=year(DateAnnounced) 
        gen USbased = 1 if AcquirorNation=="United States"
        replace USbased = 0 if AcquirorNation!="United States"
        drop if USbased ==0
        
        sort ncusip6 fyear
        bysort ncusip6 fyear: egen total_acq_num=count(USbased)
        label variable total_acq_num "total number of acquistions for USbased acquiror"
        bysort ncusip6 fyear: egen total_acq_mean=mean(ValueofTransactionmil)
        label variable total_acq_mean "Mean value of Acquisition "
        bysort ncusip6 fyear: egen total_acq_min=min(ValueofTransactionmil)
        label variable total_acq_min "Min value of Acquisition "
        bysort ncusip6 fyear: egen total_acq_max=max(ValueofTransactionmil)
        label variable total_acq_max "Max value of Acquisition "
        tempfile sdc123
        save `sdc123', replace
        clear
        Step2:
        Code:
        use "01. Raw Data\stocknames_new.dta" 
        gen ncusip6 = substr(ncusip,1,6)
        bysort ncusip6: gen copies1 = _n
        drop if copies1 >1
        sort ncusip6
        merge 1:m ncusip6 using `sdc123' // firm-event. Here no issues whatsoever on merge results. They are consistent
        keep if _merge==3
        gen cusip6 = substr(cusip,1,6)
        drop _merge
        sort cusip6 fyear
        tempfile stock_SDC
        save `stock_SDC', replace
        clear
        Step3:
        Code:
        use "04. Cleanfiles\03. Compustat-ExecuComp-TR-Bushee-Merge.dta" // firm-year
        gen cusip6=substr(cusip,1,6)
        label variable cusip6 "6 digit CUSIP"
        sort cusip6 fyear
        sort cusip6 cusip8 fyear 
        by cusip6: gen cusip6_rep = _n
        sort cusip8 cusip6 fyear
        by cusip8: gen cusip8_rep = _n
        drop if cusip_identical ==.
        sort cusip6 fyear
        merge 1:m cusip6 fyear using `stock_SDC'
        What I mean by inconsistent is that the _merge==3 results are not the same every time I run the code.

        I would appreciate any help to overcome this

        Thank you

        Comment


        • #5
          In step 2
          Code:
          bysort ncusip6: gen copies1 = _n
          drop if copies1 >1
          results in dropping all but the first observation for each value of ncusip6, However, the output of help sort tells us

          the ordering of observations with equal values of varlist is randomized
          so on two successive runs, the retained observation for a given ncusip6 will perhaps be different. Apparently, for observations in "01. Raw Data\stocknames_new.dta" with the same value of ncusip6, some of the variables may have different values. This would seem to me to be a problem.

          If you want to drop all but the earliest observation in the dataset with each value of ncusip6, change the above lines to
          Code:
          sort ncusip6, stable
          by ncusip6: gen copies1 = _n
          drop if copies1 >1

          Comment


          • #6
            William Lisowski

            Thank you! that works like a charm! I really appreciate it!

            Comment

            Working...
            X