Transforming Datastream outputs to panel data

Stijn Groeningen

Join Date: Nov 2015
Posts: 29

Transforming Datastream outputs to panel data

29 May 2018, 00:40

Dear all,

I am researching M&A, in order to do so I need to calculate abnormal returns based on stock returns. I retrieved the M&A transactions from SDC Platinum and the corresponding stock prices from Datastream. The identifier used is "Datastream Code". My data sets to merge into panel data are shaped as follows:
SDC output (example):

Date announced	Target Name	Datastream Code	Deal Value	Target Advisor
1/1/1992	ABC	123456A	1000	BAML
5/2/1995	DEF	123456B	1000	Credit Suisse
10/5/1997	GHI	123456C	1000	JPMorgan
6/20/2001	JKL	123456D	1000	Houlihan Lokey

Datastream output (example):

	ABC	DEF	GHI	JKL
Date	123456A	123456B	123456C	123456D
1/1/1992	1	2.25	1.17	23.57
1/2/1992	1.12	2.23	1.25	22.52
x	x	x	x	x
20/5/2018	2.68	12.13	NA	38.59

Is any one aware of a way to merge the two data sets into on panel data set? For each company, I wish to retrieve the stock prices for the 200 days prior to the announcement and 20 days thereafter.

Thanks in advance!

Best,
Stijn

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#2

29 May 2018, 09:12

The examples you show are not possible Stata data sets because the variable names would be illegal. It is premature to ask for help with code when you have not even taken the initial step of importing your data into Stata, as the solution you require may well depend on the details of how you import it. That said, I will assume that when you bring the second data set the first row of that table will come in as the names of the variables, except in the first column which I will assume is called var1. The task then is to convert this into a panel data set that you can merge with the first data set. The following code will more or less work:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str9 var1 str7(ABC DEF GHI JKL) "Date" "123456A" "123456B" "123456C" "123456D" "1/1/1992" "1" "2.25" "1.17" "23.57" "1/2/1992" "1.12" "2.23" "1.25" "22.52" "20/5/2018" "2.68" "12.13" "NA" "38.59" end // PUT THE DATASTREAM CODE FOUND IN OBS 1 // INTO THE VARIABLE NAME AND MAKE THE VARIABLE // NAMES SUITABLE FOR RESHAPE ds var1, not rename (`r(varlist)') _= foreach v of varlist _* { local newname `v'___`=`v'[1]' rename `v' `newname' } drop in 1 // CONVERT STRING DATA TO NUMERIC destring _*, replace ignore("NA") gen date = daily(var1, "DMY") assert !missing(date) format date %td drop var1 isid date // NOW RESHAPE reshape long _, i(date) j(target_code) string rename _ return // SORT OUT TARGET NAME AND DATASTREAM CODE gen int location = strpos(target_code, "___") gen target_name = substr(target_code, 1, location-1) gen datastream_code = substr(target_code, location+3, .) drop target_code

I say more or less because it is likely that the values of target code are not really "ABC" etc. More likely they are actual names of firms that contain embedded spaces and other illegal characters, like "Microsoft, Inc." In the course of importing that second dataset, those will be transformed into legal Stata variable names, like "Microsoft__Inc_" In the above code, I have made the assumption that the real names will never include three or more consecutive characters that are not legal for variable names, so I use ___ to separate the target name from the datastream code (in two places in the code, shown in red. If that is not true, then you may need to replace ___ by a longer sequence of underscores.

The other issue you will face is that when you import the first data set, because it has a more civilized layout, the target names will still contain the characters that would be illegal in variable names, so they will not match with the values of the target_name variable created in the second data set by the above code. The fix for that is to apply the strtoname() function to the target name variable in the first data set before trying to -merge-. See -help strtoname()- if you are not familiar with it.

In the future, when you want help with code, please show examples from actual Stata data sets (unless you are specifically asking for help on how to import a non-Stata data set into Stata). Just "seeing" the data is often insufficient: the correct code often depends on meta-data that cannot be gleaned from a visual. To show examples of Stata data, please use the -dataex- command, as I have done in my code. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example Stata data. When showing example data, always use -dataex-.
Comment
Stijn Groeningen

Join Date: Nov 2015

Posts: 29
#3

30 May 2018, 02:53

Dear Mr. Schechter,

Thank you for the response. I put the tables there as an indication of the data set. Thank you for the advise on how to ask questions in the future.

Best,
Stijn
Comment

Announcement

Transforming Datastream outputs to panel data

Comment

Comment