Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Transforming Datastream outputs to panel data

    Dear all,

    I am researching M&A, in order to do so I need to calculate abnormal returns based on stock returns. I retrieved the M&A transactions from SDC Platinum and the corresponding stock prices from Datastream. The identifier used is "Datastream Code". My data sets to merge into panel data are shaped as follows:
    SDC output (example):
    Date announced Target Name Datastream Code Deal Value Target Advisor
    1/1/1992 ABC 123456A 1000 BAML
    5/2/1995 DEF 123456B 1000 Credit Suisse
    10/5/1997 GHI 123456C 1000 JPMorgan
    6/20/2001 JKL 123456D 1000 Houlihan Lokey
    Datastream output (example):
    ABC DEF GHI JKL
    Date 123456A 123456B 123456C 123456D
    1/1/1992 1 2.25 1.17 23.57
    1/2/1992 1.12 2.23 1.25 22.52
    x x x x x
    20/5/2018 2.68 12.13 NA 38.59
    Is any one aware of a way to merge the two data sets into on panel data set? For each company, I wish to retrieve the stock prices for the 200 days prior to the announcement and 20 days thereafter.

    Thanks in advance!

    Best,
    Stijn

  • #2
    The examples you show are not possible Stata data sets because the variable names would be illegal. It is premature to ask for help with code when you have not even taken the initial step of importing your data into Stata, as the solution you require may well depend on the details of how you import it. That said, I will assume that when you bring the second data set the first row of that table will come in as the names of the variables, except in the first column which I will assume is called var1. The task then is to convert this into a panel data set that you can merge with the first data set. The following code will more or less work:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str9 var1 str7(ABC DEF GHI JKL)
    "Date"      "123456A" "123456B" "123456C" "123456D"
    "1/1/1992"  "1"       "2.25"    "1.17"    "23.57"  
    "1/2/1992"  "1.12"    "2.23"    "1.25"    "22.52"  
    "20/5/2018" "2.68"    "12.13"   "NA"      "38.59"  
    end
    
    //    PUT THE DATASTREAM CODE FOUND IN OBS 1
    //    INTO THE VARIABLE NAME AND MAKE THE VARIABLE
    //    NAMES SUITABLE FOR RESHAPE
    ds var1, not
    rename (`r(varlist)') _=
    foreach v of varlist _* {
        local newname `v'___`=`v'[1]'
        rename `v' `newname'
    }
    drop in 1
    
    //    CONVERT STRING DATA TO NUMERIC
    destring _*, replace ignore("NA")
    gen date = daily(var1, "DMY")
    assert !missing(date)
    format date %td
    drop var1
    isid date
    
    //    NOW RESHAPE
    reshape long _, i(date) j(target_code) string
    rename _ return
    
    //    SORT OUT TARGET NAME AND DATASTREAM CODE
    gen int location = strpos(target_code, "___")
    gen target_name = substr(target_code, 1, location-1)
    gen datastream_code = substr(target_code, location+3, .)
    drop target_code
    I say more or less because it is likely that the values of target code are not really "ABC" etc. More likely they are actual names of firms that contain embedded spaces and other illegal characters, like "Microsoft, Inc." In the course of importing that second dataset, those will be transformed into legal Stata variable names, like "Microsoft__Inc_" In the above code, I have made the assumption that the real names will never include three or more consecutive characters that are not legal for variable names, so I use ___ to separate the target name from the datastream code (in two places in the code, shown in red. If that is not true, then you may need to replace ___ by a longer sequence of underscores.

    The other issue you will face is that when you import the first data set, because it has a more civilized layout, the target names will still contain the characters that would be illegal in variable names, so they will not match with the values of the target_name variable created in the second data set by the above code. The fix for that is to apply the strtoname() function to the target name variable in the first data set before trying to -merge-. See -help strtoname()- if you are not familiar with it.

    In the future, when you want help with code, please show examples from actual Stata data sets (unless you are specifically asking for help on how to import a non-Stata data set into Stata). Just "seeing" the data is often insufficient: the correct code often depends on meta-data that cannot be gleaned from a visual. To show examples of Stata data, please use the -dataex- command, as I have done in my code. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example Stata data. When showing example data, always use -dataex-.

    Comment


    • #3
      Dear Mr. Schechter,

      Thank you for the response. I put the tables there as an indication of the data set. Thank you for the advise on how to ask questions in the future.

      Best,
      Stijn

      Comment

      Working...
      X