How to randomly pair Firm A and Firm B in the year of deal announcement?

Jae Li

Join Date: May 2017
Posts: 184

How to randomly pair Firm A and Firm B in the year of deal announcement?

01 Jun 2022, 14:34

Dear Statalist,

I would like to randomly pair one sample of firms with another sample of firms drawn from the first sample firms' industry in the year of deal announcement. And then I want to bootstrap the process 500 times with replacements for each deal and report the average ownership for these simulated pairings. What I want to examine is whether two firms are more likely to merge together in the presence of higher ownership. The following is my trial codes but I'm not sure if I am on the right track. Can anyone help me with that, please?

Code:

use "data1.dta", clear

keep id fyear
duplicates drop
save "holding.dta"

forvalues i = 1/500 {

    use "target1.dta", clear
    keep id fyear dateanno
    save "working_data.dta"

  
    use "holding.dta", clear
    bsample
   merge 1:m id fyear using "working_data.dta", assert (match using) keep(match)
gen Target = 1
replace Target = 0 if dateanno ==.
 save "MA_bootstrap`i'.dta", replace
}

I tried to use -bootstrap- but don't know how to program it for randomly pairing datasets...

Here is my sample data: For missing observations in dataanno, they are no deals announced on that year.

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long id double fyear long dateanno double ownership float industry
1004 1999     . .11129193007946014 50
1004 2000     .                  0 50
1004 2013     .                  0 50
1004 2014     .                  0 50
1004 2015     .                  0 50
1004 2016     .                  0 50
1009 1992     .  .6800046563148499 34
1009 1993     .  .6899944543838501 34
1009 1994     .  .6399991512298584 34
1013 2003     .  .4449999928474426 36
1013 2004     .  .4630001187324524 36
1013 2005     .  .4267875552177429 36
1013 2006     .   .439971923828125 36
1013 2007     .  .4560580849647522 36
1013 2008     . .42502060532569885 36
1013 2009 18456  .3802548348903656 36
1013 2010     . .38561299443244934 36
1050 2008     .                  0 35
1055 1990     .                  0 35
1055 1991     .                  0 35
1056 1998     . .15499617159366608 36
1056 2001     . .10799886286258698 36
1056 2002     . .10799767076969147 36
1072 2013     .                  0 36
1072 2014     .                  0 36
1072 2015     .                  0 36
1072 2016     .                  0 36
1072 2017     .                  0 36
1073 1990     .  .6103059649467468 73
1073 1991     .  .7399349808692932 73
1073 1992     .  .6599476933479309 73
1073 1995     .  .5800233483314514 73
1082 1989     .  .4799616038799286 15
1082 1990     .    .66282719373703 15
1082 1991     .  .7242097854614258 15
1082 1996     .  .5123208165168762 15
1094 2003     .                  0 51
1094 2011     .                  0 51
1094 2012     .                  0 51
1094 2013     .                  0 51
1094 2014     .                  0 51
1094 2015     . .13000068068504334 51
1094 2016     .  .1399993598461151 51
1094 2017     . .22999978065490723 51
1109 1995     .                  0 20
1109 1996     .                  0 39
1109 2005     .   .360088586807251 39
1109 2006     . .34005647897720337 39
1109 2007     . .23000000417232513 39
1109 2008     .  .2499537467956543 39
1111 1989     . .18999235332012177 73
1111 1994     . .14999139308929443 73
1111 1995     . .13999967277050018 73
1111 2001     . .28000059723854065 73
1111 2002     .  .1600005030632019 73
1111 2003     . .19999979436397552 73
1111 2004     . .22999991476535797 73
1111 2005     .  .3199999928474426 73
1111 2006     .  .2200002372264862 73
1121 1990     .  .4368971586227417 51
1121 1991     .  .4887270927429199 51
1121 1992     . .12298694998025894 51
1121 1993     .                  0 51
1121 1997     .   .268926739692688 51
1121 1998     . .16158509254455566 51
1121 1999     . .14140990376472473 51
1121 2000     . .14753295481204987 51
1121 2003     .  .1027916967868805 51
1121 2006     .  .4178103506565094 51
1121 2007     .  .6908037066459656 51
1121 2008     .  .5660001635551453 51
1121 2009     .  .7279999256134033 51
1121 2010     .  .9790001511573792 51
1121 2011     .  .5829998254776001 51
1121 2012     .  .5490002036094666 51
1121 2013     .   .624000072479248 51
1121 2014     .  .3430001735687256 51
1121 2015     .   .382000207901001 51
1121 2016     .  .6119995713233948 51
1121 2017     .  .6139993667602539 51
1121 2018     . .41399991512298584 51
1121 2019     . .48699983954429626 51
1137 1990     .  .1469968557357788 36
1137 1991     .                  0 36
1137 1992     . .10199470818042755 36
1137 1993     .    .25900799036026 36
1137 1994     .  .1749972105026245 36
1151 1989     .                  0 38
1151 1990     .                  0 38
1151 1991     .                  0 38
1151 1992     .                  0 38
1151 1993     .                  0 38
1151 1994     .                  0 38
1151 1995 13402                  0 38
1161 1990     .                  0 36
1161 1996     . .12999975681304932 36
1161 1997     . .11999957263469696 36
1161 1998     . .12000003457069397 36
1161 2004     .                  0 36
1161 2005     .  .2659221291542053 36
end
format %d dateanno

------------------ copy up to and including the previous line ------------------

Listed 100 out of 39343 observations
Use the count() option to list more

.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30356
#2

01 Jun 2022, 15:07

I don't think your code will do what you want. In particular, the use of -merge 1:m id fyear- results in every firm being matched only with itself in every single iteration. All that changes, due to bsample, is the number of times it appears in the resulting data set. Also, as you use it, the -merge- command will ignore all of the variables in working_data.dta except for the ones it is matching on. So there seems to be no point to the pairing when done this way.

I think it is simpler to do this, by the way, if we match without replacement here, each id in each fyear being matched with an id from the same industry in that year. The following code will do this. It also results in data sets that contain the values of the variables from both members of each matched pair it creates. I did not create the variable Target here because I could not be sure which member of the matched pair you want it to be based on. You can add that in yourself.

Code:

sort industry fyear id by industry fyear (id): gen long priority = _n preserve ds industry fyear priority, not rename (`r(varlist)') =_match tempfile source save `source' set seed 1234 forvalues i = 1/500 { restore, preserve gen double shuffle = runiform() by industry fyear (shuffle), sort: replace priority = _n merge 1:1 industry fyear priority using `source', assert(match) nogenerate drop priority tempfile matchup`i' save `matchup`i'', replace } restore

I don't understand "What I want to examine is whether two firms are more likely to merge together in the presence of higher ownership." If there is any connection between the variables in your example data and this goal, I do not see it. In any case, the code above will deal with real data sets that contain additional variables. I have not included any code to And it will run about as quickly as you can get given the inherent size of the task.

I notice that in your example data, most combinations of industry and fyear are instantiated by only a single firm (id). So the matching process has no choice but to match that firm with itself in those years all the time. Presumably that's not the case in your real data set.

I also note that there is an id, 1109, which appears to be in different industries in different years. Is that a data error, or is that correct?
Comment

Jae Li

Join Date: May 2017
Posts: 184

02 Jun 2022, 08:07

Clyde Schechter Hi, Clyde, thank you for your helpful response! I should have sorted my sample data by industry and fyear so there won't only a single firm (id) for combinations of industry and fyear. Please see the following resorted sample data. The sample data includes the final merge of actual firms' pairing based on their firm id and fyear and that's why they are some firms don't have any deal announcements. Is that right to do random paring from this step? or should I start from the beginning to randomly pair two datasets? Can I ask you a few questions about your codes?

1. which

Code:

 `matchup`i''

dataset is my resulting data? the last one `matchup500'?
2. what does this

Code:

gvkey_match

mean? does it mean firm ids are randomly drawn from the sample?
3. what does the

Code:

restore, present

imply in the first line of the loop?

4. If I want to have the replacement for the bootstrapping, do you know how to achieve that based on your code?

The purpose of such random pairings is to construct the dummy variable of Target (whether the firms are likely to be acquired at that date), so the correct member of the matched pair that needs to be based on is dateanno_match? is that right? Since I need to do such bootstrapping in the year of the deal announcement (deal_year). However, deal_year is different from fyear. Shall I still do the merge based on -industry fyear priority-?

Many thanks!

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long id double fyear float deal_year long dateanno double ownership float industry
 3107 1990    .     . .25999999046325684 10
 5140 1990    .     .  .8653791546821594 10
12750 1990    .     .  .1599999964237213 10
 1856 1990    .     .                  1 10
 5686 1990    .     .  .5318354964256287 10
 5560 1990    .     .  .6285449862480164 10
 7920 1990    .     . .45842039585113525 10
14590 1990    .     . .36000165343284607 10
14465 1990    .     .                  1 10
11985 1990    .     .  .6899950504302979 10
12342 1990    .     .  .6182301044464111 10
 7424 1990 1991 11350                  1 10
 1856 1991    .     .                  1 10
11985 1991    .     .  .7683180570602417 10
 7881 1991    .     . .45891666412353516 10
 3107 1991    .     .   .370998740196228 10
13566 1991    .     . .44539782404899597 10
 7920 1991    .     .  .4525596499443054 10
 4878 1991    .     .                  1 10
15240 1991    .     .                  1 10
 5686 1991    .     . .39877909421920776 10
14465 1991    .     .                  0 10
12342 1991    .     .  .4515058994293213 10
14590 1991    .     . .27999967336654663 10
14002 1991    .     .   .821772038936615 10
 5560 1991    .     .  .3191429674625397 10
21233 1991    .     .                  1 10
21957 1991    .     .  .6397821307182312 10
 1266 1991    .     .   .574764609336853 10
 5140 1991    .     .  .6797562837600708 10
11985 1992    .     .  .8256189823150635 10
 3107 1992    .     .  .3799999952316284 10
 7920 1992    .     . .42512667179107666 10
 5686 1992    .     .  .1396203488111496 10
14002 1992    .     .   .810495913028717 10
 1856 1992    .     .  .9963542819023132 10
 5560 1992    .     . .12150897830724716 10
11993 1992    .     .                  0 10
14465 1992    .     .                  0 10
 7881 1992    .     . .31327664852142334 10
14590 1992    .     .  .1300000697374344 10
13468 1992    .     .                  0 10
13551 1992    .     .                  1 10
 1266 1992    .     .  .5579315423965454 10
21957 1992    .     . .43236061930656433 10
 4878 1992    .     .                  1 10
13566 1992    .     .  .4653961956501007 10
12342 1992    .     . .34789323806762695 10
 5686 1993    .     .  .6041179895401001 10
 3107 1993    .     .  .3700000047683716 10
12342 1993    .     . .46625587344169617 10
11993 1993    .     .                  0 10
13468 1993    .     .                  0 10
 7881 1993    .     .  .4423816204071045 10
 4878 1993    .     .                  1 10
13566 1993    .     . .46398046612739563 10
 1266 1993    .     .  .5552690625190735 10
13551 1993    .     .                  0 10
11985 1993    .     .  .6038828492164612 10
14465 1993    .     .                  0 10
 7920 1993    .     .  .4113018214702606 10
 3600 1993    .     .                  0 10
14590 1993    .     .                  0 10
10390 1994    .     . .34999674558639526 10
14465 1994    .     .                  0 10
14590 1994    .     .                  0 10
 5686 1994    .     .  .5289182662963867 10
13468 1994    .     .                  0 10
13401 1994    .     .  .9910208582878113 10
 3107 1994    .     .  .4000000059604645 10
 3702 1994    .     .                  0 10
12342 1994    .     .  .5244655609130859 10
21233 1994    .     .  .9886655807495117 10
11993 1994    .     .                  0 10
 7881 1994    .     .  .5244655609130859 10
11985 1994    .     .  .7306134700775146 10
13566 1994    .     .  .6871035695075989 10
12342 1995    .     .  .5501250624656677 10
13401 1995    .     .  .9888923764228821 10
13468 1995    .     .                  0 10
 3429 1995    .     . .12295220047235489 10
14465 1995    .     .                  0 10
 7881 1995    .     .  .5501250624656677 10
11993 1995    .     .                  0 10
20488 1995    .     .                  1 10
30344 1995 1996 13492  .6604371070861816 10
14590 1995    .     . .25000014901161194 10
13566 1995    .     .  .6004140973091125 10
29406 1995    .     .   .448012113571167 10
13550 1995    .     .                  1 10
 3107 1995    .     .  .3799999952316284 10
 5560 1995    .     . .32461076974868774 10
28524 1995    .     . .27079254388809204 10
10390 1995    .     .  .4399977922439575 10
 9726 1995    .     .                  0 10
 5686 1995    .     .  .5392251014709473 10
10390 1996    .     .  .3900039792060852 10
 3429 1996    .     .  .1857708990573883 10
31203 1996    .     .  .9800049662590027 10
20488 1996    .     .                  1 10
end
format %d dateanno

------------------ copy up to and including the previous line ------------------

Listed 100 out of 39343 observations
Use the count() option to list more

Last edited by Jae Li; 02 Jun 2022, 08:15.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30356
#4

02 Jun 2022, 09:59

1. which Code:

`matchup`i''
dataset is my resulting data? the last one `matchup500'?

There is no one resulting data set. Your original code implied that you wanted to save 500 data sets. That's what this code does. It saves 500 data sets, named `matchup1' through `matchup500'. Now, it saves them as tempfiles. You may want to save them as permanent files; that is a simple change to the code that I think you can do yourself if that is what you want.

2. what does this Code:

gvkey_match
mean? does it mean firm ids are randomly drawn from the sample?

There is nothing called gvkey_match in my code, so I'm not sure what you are asking. If you are wondering what the _match suffix means, it is this. The code produces 500 data sets. Each data set contains observations representing a pair of id's that have been randomly matched to each other subject to having the same value for industry and fyear. Within the observation, one id's information has the same variable names as used in the original data set; the other id's information is in variables with those names suffixed by _match. Since industry and fyear are necessarily the same for both, there are no separate industry_match and fyear_match variables.

3. what does the Code:

restore, present
imply in the first line of the loop?

It's -restore, preserve-. It means that the previously -preserve-d data set is brought back into memory, but the -preserve-d copy is not erased and is still available to be re-used. The result is that at the top of the loop, the original data (plus the variable priority, which was created before the first -preserve-) is brought back into memory. In other words, the loop always starts from the same data. (Added: In my code in #2, this wasn't really necessary--I could have built each new set of matched pairs from the preceding one. But in the new code shown below, which uses bootstrap sampling [sampling with replacement], it is absolutely required. If you keep repeatedly sampling with replacement from the previous sample, a little of the data gets lost in each iteration, and by the end of 500 you might have only one id in each industry#fyear group.)

4. If I want to have the replacement for the bootstrapping, do you know how to achieve that based on your code?

I'm not entirely sure I understand the question, but I think you are asking whether that code can be modified to sample with replacement. If so,

Code:

sort industry fyear id by industry fyear (id): gen long priority = _n preserve ds industry fyear priority, not rename (`r(varlist)') =_match tempfile source save `source' set seed 1234 forvalues i = 1/500 { restore, preserve bsample, strata(industry fyear) sort industry fyear, stable by industry fyear, sort: replace priority = _n merge 1:1 industry fyear priority using `source', assert(match) nogenerate drop priority tempfile matchup`i' save `matchup`i'', replace } restore

Again, this code saves 500 tempfiles. If you want permanent files instead, eliminate the -tempfile matchup`i'- command and change the save command to -save matchup`i', replace-.

The purpose of such random pairings is to construct the dummy variable of Target (whether the firms are likely to be acquired at that date), so the correct member of the matched pair that needs to be based on is dateanno_match? is that right? Since I need to do such bootstrapping in the year of the deal announcement (deal_year). However, deal_year is different from fyear. Shall I still do the merge based on -industry fyear priority-?

I don't understand any of this. I have no idea what any of this data has to do with being acquired, and I don't know whether it is more sensible to match on deal_year or fyear. These are substantive issues, not data management questions, and they are beyond my domain of knowledge. I can't help you with these. Perhaps other Forum members who work in finance understand this will join in and help you. If not, I suggest you consult a colleague.

Last edited by Clyde Schechter; 02 Jun 2022, 10:02.
Comment

Announcement

How to randomly pair Firm A and Firm B in the year of deal announcement?

Comment

Comment

Comment