Matching Firms Based on Industry and Total Assets

Alex Conju

Join Date: Aug 2024

Posts: 17
#31

15 Aug 2024, 17:53

Wow yes apologies! I cannot tell you how sure I was that the code syntax was correct that is a touch poor from my end; yes, making that adjustment has solved things thank you! I imagine this code (much like the brace pairing code) should work for any data set size in terms of firms and years?

Also - purely as a learning exercise - I have tried to adjust the triplet code into a quad pairing code to see if my logic and understanding worked. Unfortunately - no. However, the error I get is "double not allowed" after coding drop double shuffle in line 50 (all code below). I then tried with triple shuffle in case it was to do with the fact that there was already a triplet formed, however the "variable triple was not found", as of course it had not been defined I just figured it worth a try to see if it was the next stage above a double shuffle as it were. Unsure where the logic in my code falls for quad matching? N.B. The fourth stock exchange code here is 37.

// SEPARATE INTO FOUR DATA SETS
preserve
keep if StockExchangeCode == 11
rename (GlobalCompanyKey AssetsTotal) =_11 // SUFFIX IN RENAME MUST MATCH STOCK EXCHANGE
drop StockExchangeCode
tempfile SE11 // DECLARATION IN TEMPFILE MUST MATCH SUBSEQUENT USE OF THE FILE
save `SE11' // N.B. NO .dta

restore, preserve
keep if StockExchangeCode == 120
rename (GlobalCompanyKey AssetsTotal) =_120
drop StockExchangeCode
tempfile SE120
save `SE120'

restore, preserve
keep if StockExchangeCode == 90
rename (GlobalCompanyKey AssetsTotal) =_90
drop StockExchangeCode
tempfile SE90
save `SE90'

restore
keep if StockExchangeCode == 37
rename (GlobalCompanyKey AssetsTotal) =_37
drop StockExchangeCode
tempfile SE37
save `SE37'

// COMBINE POTENTIAL MATCHES & SELECT BEST 4, BREAKING TIES AT RANDOM
use `SE120', clear
joinby NAICS DataYearFiscal using `SE11.dta'
gen double shuffle = runiform()
gen delta = AssetsTotal_120/AssetsTotal_11
keep if inrange(delta, 0.75, 1.25)
replace delta = abs(log(delta))

joinby NAICS DataYearFiscal using `SE90' // gives error when .dta
drop shuffle
gen double shuffle = runiform()
gen delta1 = AssetsTotal_90/AssetsTotal_120
gen delta2 = AssetsTotal_90/AssetsTotal_11
keep if inrange(delta1, 0.75, 1.25) & inrange(delta2, 0.75, 1.25)
replace delta1 = abs(log(delta1))
replace delta2 = abs(log(delta2))
gen delta3 = max(delta1, delta2)

use `SE37', clear
joinby NAICS DataYearFiscal using `SE90' // gives error when .dta
drop double shuffle
gen doubele shuffle = runiform()
gen delta4 = AssetsTotal_37/AssetsTotal_90
gen delta5 = AssetsTotal_37/AssetsTotal_120
keep if inrange (delta3, 0.75, 1.25) & inrange(delta4, 0.75, 1.25) & inrange(delta5, 0.75, 1.25)
replace delta4 = abs(log(delta4))
replace delta5 = abs(log(delta5))
gen delta6 = max(delta3, delta4, delta5)

// MATCHING WITHOUT REPLACEMENT
local allocation_ratio 1
local current 1

sort GlobalCompanyKey_120 DataYearFiscal (delta shuffle)
while `current' < _N {
local end_current = `current' + `allocation_ratio' - 1
while GlobalCompanyKey_120[`end_current'] != GlobalCompanyKey_120[`current'] ///
& DataYearFiscal[`end_current'] != DataYearFiscal[`current'] {
local end_current = `end_current' - 1
}
// KEEP REQUIRED # OF MATCHES FOR THE CURRENT CASE
drop if GlobalCompanyKey_120 == GlobalCompanyKey_120[`current'] & DataYearFiscal == DataYearFiscal[`current'] in `=`end_current'+1'/L
// REMOVE THE SELECTED MATCHES FROM FURTHER CONSIDERATION
forvalues i = 0/`=`allocation_ratio'-1' {
drop if GlobalCompanyKey_37 == GlobalCompanyKey_37[`current'+`i'] & DataYearFiscal == DataYearFiscal[`current' + `i'] & _n > `end_current'
}
local current = `end_current' + 1
}
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30170
#32

15 Aug 2024, 20:00

The -drop- command does not care about the storage type of the variable(s) being dropped. So when you write -drop double shuffle- it thinks you have a variable named double that you want to drop. But, of course, you don't. And hence the error message you got. So, it's just -drop shuffle-.
By the way, watch out for the line after that: here you do need the word -double- because a -float- is not long enough for the quantity of distinct random numbers you need to create. But you misspelled it as doubele. So fix that.

Now, at a higher level of the logic, the code you show here tries to join all four stock exchanges together and then select matches. That will almost certainly explode your memory problem. Also, even if it doesn't, the code you use for selecting the matches without replacement only checks for re-use on stock exchange 37, not on stock exchanges 11 and 90.

Now, one could modify that code to do it this way. But it will be very cumbersome code, and hard to read and understand. The better way to do this is to iterate one stock exchange at a time.

You can create the four data sets and store them as tempfiles. That's fine. Then get matched pairs from exchanges 11 and 120. Then join that with exchange 90 and select the matched triplets along the lines of #28. Once you have those triplets, you can join in exchange 37 and select from those to form quadruples. The code for selecting the fourth member of the quadruple from exchange 37 is exactly like that for selecting from exchange 90 to make the triples, except 90 is replaced by 37 everywhere in that "paragraph."
Comment
Alex Conju

Join Date: Aug 2024

Posts: 17
#33

16 Aug 2024, 07:00

Yes, that seems to have worked thank you for the guidance in getting there! That should be (I hope) the end of my queries regarding matching, thank you for all your help!
Comment

Announcement

Comment

Comment

Comment