Binary variable

Sebastian Grein

Join Date: Oct 2023

Posts: 28
#1

Binary variable

18 Oct 2023, 07:55

Hi,
my sample consists of M&A deals and peer companies which are not filtered yet. For each observation there is a unique identifier for the target company (Target6digitCUSIP) and the acquirer (a_cusip_six) of the deal as well as the year. The variable deal determines if the observation is a deal (1) or not (0). If it is not a deal there variable Target6digitCUSIP is empty.Since I want to proceed with Propensity Score Matching to create my final sample, I want to follow Beta and Li (2014) and exclude every company in year t which was part in a deal in the last 3 years, either as an acquirer or a target company.
I tried many formulas but none of them led to the correct result for the variable participated which should be 1 if the company participated in a deal and 0 otherwise.

Here is an excerpt of my data.

Target6digitCUSIP deal Year a_cusip_six participated
4C8250 1 2014 000307
7C7919 1 2015 000307
7C1485 1 2015 000307
5C7040 1 2015 000307
*empty* 0 2018 000307
*empty* 0 2021 00032Q
*empty* 0 2022 00032Q

Excuse me if I provided something incorrectly, it's my first time using the forum. Many thanks for your support in advance.

Best,
Sebastian

Last edited by Sebastian Grein; 18 Oct 2023, 08:02.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30154
#2

18 Oct 2023, 09:10

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str7 target6digitcusip byte deal int year str6 a_cusip_six "4C8250" 1 2014 "000307" "7C7919" 1 2015 "000307" "7C1485" 1 2015 "000307" "5C7040" 1 2015 "000307" "*empty*" 0 2018 "000307" "*empty*" 0 2021 "00032Q" "*empty*" 0 2022 "00032Q" end preserve rename target6digitcusip cusip1 rename a_cusip_six cusip2 gen `c(obs_t)' obs_no = _n reshape long cusip, i(obs_no) drop obs_no _j duplicates drop tempfile deals save `deals' restore gen low = year - 3 gen high = year - 1 clonevar cusip = target6digitcusip rangejoin year low high using `deals', by(cusip) drop if !missing(year_U) drop cusip *_U clonevar cusip = a_cusip_six rangejoin year low high using `deals', by(cusip) drop if !missing(year_U) drop cusip *_U

-rangejoin- is written by Robert Picard and is available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

You do not define what "last three years" means: it could mean the current year and the two preceding, or the preceding year and the two years preceding that. I've assumed the latter. You will need to change the code if you meant otherwise.

In the future, when showing example data, please use the -dataex- command, as I have done here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment
Sebastian Grein

Join Date: Oct 2023

Posts: 28
#3

19 Oct 2023, 02:09

Hi Clyde, many thanks for your fast response. You assumed the right regarding the 3 years, I meant the 3 years prior to the deal.

Unfortunately, I get the error message 'no xij variables found' when trying to use the reshape command. I tried to understand by studying the command but I couldn't find an explanation.

Last edited by Sebastian Grein; 19 Oct 2023, 02:33.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30154
#4

19 Oct 2023, 08:59

I tested the code on your example data before posting it and it ran with no difficulty and produced a correct solution. So your example data is in some way not representative of your real data set. Please post a new data example that reproduces the problem you are having and I will troubleshoot.
Comment
Sebastian Grein

Join Date: Oct 2023

Posts: 28
#5

19 Oct 2023, 15:27

I want to provide a representative dataset but did not find a way to start the example data with a certain observation with the command dataex. Is there any way to start with a certain observation from which on you can specify the count() option? The first e.g. 100 observations are for example not representative and therefore I need to start from an observation which is representative.

Last edited by Sebastian Grein; 19 Oct 2023, 15:37.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30154
#6

19 Oct 2023, 16:08

Yes. Let's say you want to start from observation #249 and show 100 observations. You would issue the command:

Code:

dataex in 249/L, count(100)
Comment

Sebastian Grein

Join Date: Oct 2023
Posts: 28

20 Oct 2023, 02:46

This is a representative example, especially due to company 531229 being an acquirer and a target.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float Year str6(a_cusip_six Target6digitCUSIP) float deal
1999 "530370" ""       0
2000 "530370" "17881X" 1
1999 "530436" ""       0
2000 "530436" ""       0
2016 "53046P" ""       0
2017 "53046P" ""       0
2018 "53046P" ""       0
1999 "530582" ""       0
2000 "530582" ""       0
2001 "530582" ""       0
2002 "530582" ""       0
2003 "530582" ""       0
2008 "53071M" "15100A" 1
2018 "53115L" ""       0
2019 "53115L" ""       0
2020 "53115L" ""       0
2021 "53115L" ""       0
2022 "53115L" ""       0
2013 "531229" ""       0
2014 "531229" ""       0
2015 "531229" "531229" 1
2015 "531229" "531229" 1
2015 "531229" "531229" 1
2016 "531229" "24776M" 1
2016 "531229" "24776M" 1
2017 "531229" ""       0
2017 "531229" ""       0
2018 "531229" ""       0
2018 "531229" ""       0
2019 "531229" ""       0
2019 "531229" ""       0
2020 "531229" ""       0
2020 "531229" ""       0
2021 "531229" ""       0
2021 "531229" ""       0
2022 "531229" ""       0
2022 "531229" ""       0
2014 "531465" ""       0
2015 "531465" ""       0
2016 "531465" ""       0
2017 "531465" ""       0
2018 "531465" ""       0
2019 "531465" ""       0
2020 "531465" ""       0
2021 "531465" ""       0
2022 "531465" ""       0
2021 "53190C" ""       0
2022 "53190C" ""       0
1999 "531914" ""       0
2000 "531914" ""       0
2001 "531914" ""       0
2002 "531914" ""       0
2003 "531914" ""       0
2004 "531914" ""       0
2005 "531914" ""       0
2006 "531914" "42853H" 1
2007 "531914" ""       0
2008 "531914" ""       0
2009 "531914" "35800E" 1
2010 "531914" ""       0
2011 "531914" ""       0
2012 "531914" ""       0
2013 "531914" ""       0
2014 "531914" ""       0
2015 "531914" ""       0
2016 "531914" ""       0
2017 "531914" ""       0
2018 "531914" ""       0
2019 "531914" ""       0
2020 "531914" ""       0
2021 "531914" ""       0
2022 "531914" ""       0
1999 "531925" ""       0
2000 "531925" ""       0
1999 "531927" ""       0
2000 "531927" ""       0
2001 "531927" ""       0
2002 "531927" ""       0
2003 "531927" ""       0
2004 "531927" ""       0
end

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30154

20 Oct 2023, 09:42

The only substantial difference between this example and the previous one that I can see is that the Year and Target6digitCUSIP variables are no longer in lower case. When I adapt the code accordingly (Stata variable names are case sensitive) it runs properly:

Code:

. * Example generated by -dataex-. For more info, type help dataex
. clear

. input float Year str6(a_cusip_six Target6digitCUSIP) float deal

          Year  a_cusip~x  Target6~P       deal
  1. 1999 "530370" ""       0
  2. 2000 "530370" "17881X" 1
  3. 1999 "530436" ""       0
  4. 2000 "530436" ""       0
  5. 2016 "53046P" ""       0
  6. 2017 "53046P" ""       0
  7. 2018 "53046P" ""       0
  8. 1999 "530582" ""       0
  9. 2000 "530582" ""       0
 10. 2001 "530582" ""       0
 11. 2002 "530582" ""       0
 12. 2003 "530582" ""       0
 13. 2008 "53071M" "15100A" 1
 14. 2018 "53115L" ""       0
 15. 2019 "53115L" ""       0
 16. 2020 "53115L" ""       0
 17. 2021 "53115L" ""       0
 18. 2022 "53115L" ""       0
 19. 2013 "531229" ""       0
 20. 2014 "531229" ""       0
 21. 2015 "531229" "531229" 1
 22. 2015 "531229" "531229" 1
 23. 2015 "531229" "531229" 1
 24. 2016 "531229" "24776M" 1
 25. 2016 "531229" "24776M" 1
 26. 2017 "531229" ""       0
 27. 2017 "531229" ""       0
 28. 2018 "531229" ""       0
 29. 2018 "531229" ""       0
 30. 2019 "531229" ""       0
 31. 2019 "531229" ""       0
 32. 2020 "531229" ""       0
 33. 2020 "531229" ""       0
 34. 2021 "531229" ""       0
 35. 2021 "531229" ""       0
 36. 2022 "531229" ""       0
 37. 2022 "531229" ""       0
 38. 2014 "531465" ""       0
 39. 2015 "531465" ""       0
 40. 2016 "531465" ""       0
 41. 2017 "531465" ""       0
 42. 2018 "531465" ""       0
 43. 2019 "531465" ""       0
 44. 2020 "531465" ""       0
 45. 2021 "531465" ""       0
 46. 2022 "531465" ""       0
 47. 2021 "53190C" ""       0
 48. 2022 "53190C" ""       0
 49. 1999 "531914" ""       0
 50. 2000 "531914" ""       0
 51. 2001 "531914" ""       0
 52. 2002 "531914" ""       0
 53. 2003 "531914" ""       0
 54. 2004 "531914" ""       0
 55. 2005 "531914" ""       0
 56. 2006 "531914" "42853H" 1
 57. 2007 "531914" ""       0
 58. 2008 "531914" ""       0
 59. 2009 "531914" "35800E" 1
 60. 2010 "531914" ""       0
 61. 2011 "531914" ""       0
 62. 2012 "531914" ""       0
 63. 2013 "531914" ""       0
 64. 2014 "531914" ""       0
 65. 2015 "531914" ""       0
 66. 2016 "531914" ""       0
 67. 2017 "531914" ""       0
 68. 2018 "531914" ""       0
 69. 2019 "531914" ""       0
 70. 2020 "531914" ""       0
 71. 2021 "531914" ""       0
 72. 2022 "531914" ""       0
 73. 1999 "531925" ""       0
 74. 2000 "531925" ""       0
 75. 1999 "531927" ""       0
 76. 2000 "531927" ""       0
 77. 2001 "531927" ""       0
 78. 2002 "531927" ""       0
 79. 2003 "531927" ""       0
 80. 2004 "531927" ""       0
 81. end

.
. preserve

. rename Target6digitCUSIP cusip1

. rename a_cusip_six cusip2

. gen `c(obs_t)' obs_no = _n

. reshape long cusip, i(obs_no)
(j = 1 2)

Data                               Wide   ->   Long
-----------------------------------------------------------------------------
Number of observations               80   ->   160         
Number of variables                   5   ->   5           
j variable (2 values)                     ->   _j
xij variables:
                          cusip1 cusip2   ->   cusip
-----------------------------------------------------------------------------

. drop obs_no _j

. duplicates drop

Duplicates in terms of all variables

(62 observations deleted)

. tempfile deals

. save `deals'
file C:\Users\clyde\AppData\Local\Temp\ST_5594_000002.tmp saved as .dta format

.
. restore

. gen low = Year - 3

. gen high = Year - 1

. clonevar cusip = Target6digitCUSIP
(71 missing values generated)

. rangejoin Year low high using `deals', by(cusip)
  (using rangestat version 1.1.1)

. drop if !missing(Year_U)
(183 observations deleted)

. drop cusip *_U

. clonevar cusip = a_cusip_six

. rangejoin Year low high using `deals', by(cusip)
  (using rangestat version 1.1.1)

. drop if !missing(Year_U)
(13 observations deleted)

. drop cusip *_U

The adapted code is

Code:

preserve
rename Target6digitCUSIP cusip1
rename a_cusip_six cusip2
gen `c(obs_t)' obs_no = _n
reshape long cusip, i(obs_no)
drop obs_no _j
duplicates drop
tempfile deals
save `deals'

restore
gen low = Year - 3
gen high = Year - 1
clonevar cusip = Target6digitCUSIP
rangejoin Year low high using `deals', by(cusip)
drop if !missing(Year_U)
drop cusip *_U
clonevar cusip = a_cusip_six
rangejoin Year low high using `deals', by(cusip)
drop if !missing(Year_U)
drop cusip *_U

though, again, the only change from before is to conform the case of the variable names to that of the data set.

Comment

Sebastian Grein

Join Date: Oct 2023

Posts: 28
#9

21 Oct 2023, 02:57

Unfortunately, I still get the same error message 'no xij variables found' when trying to execute the reshape command. Could that be bcs my dataset contains more variables than only the one's listed above?
There are many more but I only listed these because these are the important variables to be considered for the output of dropping the certain observations.

Last edited by Sebastian Grein; 21 Oct 2023, 03:02.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30154
#10

21 Oct 2023, 08:40

I still get the same error message 'no xij variables found' when trying to execute the reshape command. Could that be bcs my dataset contains more variables than only the one's listed above?

No. Stata is not complaining about excess variables that it can't handle, it's complaining about a lack of variables that match the pattern spelled out in the -reshape-. The xij variables that Stata is looking for in this instance are cusip1 and cusip2 (well, it doesn't know at that point that it's only 1 and 2, it's looking for variables with names of the form cusip# where # is any number). In that code cusip1 and cusip2 must exist because they were brought into being by the -rename- commands a couple of lines earlier in the code.

I suggest that you run the code again, and then post back copy/pasting everything that shows in the Results screen (echoed commands, results, and messages) for everything from the first line through the -reshape- command into your post so we can get a closer look at what's going wrong.
Comment
Sebastian Grein

Join Date: Oct 2023

Posts: 28
#11

22 Oct 2023, 05:47

The reshape commands works now. Unfortunately, my Stata always collapses when I am executing the rangejoin command. Is there any way way to address this issue or is it (probably) just too much data for my RAM (I already closed everything else than Stata)?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30154
#12

22 Oct 2023, 09:56

What do you mean when you say Stata "collapses?" It is possible that the demands on RAM exceed what is possible in your setup and Stata is truly hung. But it is also true that in similar situations, -rangejoin- can take a very long time to put together two very large data sets, and perhaps you just need to be patient. Have you opened Task Manager to see what Stata is really doing? (I'm assuming you are on a Windows machine. I imagine Mac has something analogous.)

Assuming that you really are exceeding the available memory, try breaking up your data set into chunks, where each chunk contains all the observations involving some subset of the cusips, the chunks do not overlap each other, and all the chunks combined cover everything. You can run this code on each chunk separately, save the results, and then -append- them all together at the end.
Comment
Sebastian Grein

Join Date: Oct 2023

Posts: 28
#13

28 Oct 2023, 04:06

By "collapses" I mean my Mac tells me to close Stata because the app Stata is not reacting anymore. Unfortunately, I can not break up my data by acquirer cusips because it could be that an acquirer cusip from another chunk has a target with a cusip which matches to an acquirer cusip which wouldn't be considered then.
Comment

Sebastian Grein

Join Date: Oct 2023
Posts: 28

#14

28 Oct 2023, 06:52

I tried something else now:
a_participated checks if the respective firm (a_cusip_six) participated in a deal as an acquirer (a_cusip_six) in the last 3 years. The variable deal is 1 if the observation is an M&A deal and 0 otherwise. I used 1/99 since I checked before that there's a maximum of 99 observations per company.
t_participated checks if the respective firm (a_cusip_six) participated in a deal as a target (Target6digitCUSIP) is the last 3 years. After creating the 3 lists, the command "foreach cusip in `t_list1'...." does not lead to replacements. Can you support on why this might be the case and generally advise on my procedure? Many thanks!

Code:

gen a_participated=0

forval i=1/99 {
    bysort a_cusip_six (Year):replace a_participated = a_participated + deal[_n-`i'] if Year==Year[_n-`i']+1
}
forval i=1/99 {
    bysort a_cusip_six (Year):replace a_participated = a_participated + deal[_n-`i'] if Year==Year[_n-`i']+2
}
forval i=1/99 {
    bysort a_cusip_six (Year):replace a_participated = a_participated + deal[_n-`i'] if Year==Year[_n-`i']+3
}


// Do the same for targets.
tostring Year, gen (Year_s)
order Year_s
gen  a_combined = Year_s + a_cusip_six
order a_combined deal a_cusip_six Target6digitCUSIP


gen Year1 = Year +1
gen Year2 = Year +2
gen Year3 = Year +3
order Year1 Year2 Year3

tostring Year1, replace
tostring Year2, replace
tostring Year3, replace

gen  t_combined1 = Year1 + Target6digitCUSIP
replace t_combined1 = "0" if deal == 0

gen  t_combined2 = Year2 + Target6digitCUSIP
replace t_combined2 = "0" if deal == 0

gen  t_combined3 = Year3 + Target6digitCUSIP
replace t_combined3 = "0" if deal == 0

drop Year1 Year2 Year3

order a_combined t_combined1 t_combined2 t_combined3 deal Target6digitCUSIP

levelsof t_combined1, local(t_list1)
levelsof t_combined2, local(t_list2)
levelsof t_combined3, local(t_list3)

gen t_participated = 0
foreach cusip in `t_list1' {
    replace t_participated = 1 if a_combined = `cusip'
}

foreach cusip in `t_list2' {
    replace t_participated = 1 if a_combined = `cusip'
}

foreach cusip in `t_list3' {
    replace t_participated = 1 if a_combined = `cusip'
}

that's the outcome for the last command:
r(110);

. do "/var/folders/n1/h6bdd40s7yd0qnd7g38vl3tr0000gn/T//SD01012.000000"

. foreach cusip in `t_list1' {
2. replace t_participated = 1 if a_combined = `cusip'
3. }

.
. foreach cusip in `t_list2' {
2. replace t_participated = 1 if a_combined = `cusip'
3. }

.
. foreach cusip in `t_list3' {
2. replace t_participated = 1 if a_combined = `cusip'
3. }

.
end of do-file

Last edited by Sebastian Grein; 28 Oct 2023, 07:17.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30154
#15

28 Oct 2023, 10:53

Try this. It is simpler than the approach I previously suggested, and it uses less memory (though whether sufficiently less that it will work with your massive data set, I can't be sure.)

Code:

rename a_cusip_six cusipa rename Target6digitCUSIP cusipt gen `c(obs_t)' obs_no = _n reshape long cusip, i(obs_no) j(role) string rangestat (sum) deal, by(cusip) interval(Year -3 -1) by obs_no (deal_sum), sort: drop if inrange(deal_sum[_N], 1, .) drop deal_sum reshape wide rename cusipa a_cusip_six rename cusipt Target6digitCUSIP

Having already tried using -rangejoin-, you already have -rangestat- installed.

It may be that you will encounter the same memory limitations with -rangestat- that you encountered with -rangejoin-, but I don't know.

The -reshape- and -rangestat- commands are going to take a long time. Be patient. That said, if you are running a fully updated version 18, you can use -reshape-'s -favor(speed)- option to make it faster. However, the price of -favor(speed)- is that more memory is used, so this might not be viable in your situation.

If this approach also exceeds memory limitations, then we will probably have to resort to some complicated code with explicit iteration over observations. It will be difficult to write and debug, and ugly to read. And it will be even slower than anything we have tried so far. But if we cannot avoid it, such an approach is possible as a last resort.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment