Hi there,
I have a dataset on ownership concetration for various firms with roughly 490000 observations . As i am interested in ownership concentrations for each particular firm, the problem i have with my dataset is that, ownership concentrations are not given collectively for the firm but rather independently for each individual within a firm. Hence, my questions is how i can aggregate the observations for the variables sh_direct1 and sh_total1 to one observations for each firm? Below you can find an example of my data
input str16 idnr double idnr1 str2 sh_type str6(sh_direct sh_total) long sh_date str24 repbas_header long sh_type1 float(sh_type2 firmid) long(sh_direct1 sh_total1) float sh_direct1temp
"DKFO3" 3 "I" ">5.00" "" 21598 "Consolidated data" 8 8 1 8877 . 26631
"DKFO3" 3 "I" ">5.00" "" 21598 "Consolidated data" 8 8 1 8877 . 26631
"DKFO3" 3 "I" ">5.00" "" 21598 "Consolidated data" 8 8 1 8877 . 26631
"RO19" 19 "I" "-" "" 21579 "Unconsolidated data" 8 8 2 1 . 809
"RO19" 19 "I" "16.05" "" 21365 "Unconsolidated data" 8 8 2 808 . 809
"GBSE000027" 27 "I" "10.27" "" 21419 "Unconsolidated data" 8 8 3 229 . 21159
"GBSE000027" 27 "I" "9.82" "" 21419 "Unconsolidated data" 8 8 3 8015 . 21159
"GBSE000027" 27 "I" "9.82" "" 21419 "Unconsolidated data" 8 8 3 8015 . 21159
"GBSE000027" 27 "I" "10.27" "" 21419 "Unconsolidated data" 8 8 3 229 . 21159
"GBSE000027" 27 "I" "19.64" "" 21419 "Unconsolidated data" 8 8 3 1167 . 21159
"GBSE000027" 27 "I" "40.18" "" 21419 "Unconsolidated data" 8 8 3 3504 . 21159
"MTC35" 35 "I" "2.23" "" 20939 "Unconsolidated data" 8 8 4 1226 . 8459
"MTC35" 35 "I" "0.48" "" 20939 "Unconsolidated data" 8 8 4 50 . 8459
"RO35" 35 "I" "23.71" "" 21579 "Unconsolidated data" 8 8 4 1674 . 8459
"MTC35" 35 "I" "6.83" "" 20939 "Unconsolidated data" 8 8 4 5509 . 8459
"GBJE67" 67 "I" "NG" "" 21166 "Consolidated data" 8 8 5 8892 . 44460
"GBJE67" 67 "I" "NG" "" 21166 "Consolidated data" 8 8 5 8892 . 44460
"GBJE67" 67 "I" "NG" "" 21166 "Consolidated data" 8 8 5 8892 . 44460
"GBJE67" 67 "I" "NG" "" 21166 "Consolidated data" 8 8 5 8892 . 44460
"GBJE67" 67 "I" "NG" "" 21166 "Consolidated data" 8 8 5 8892 . 44460
"MTC68" 68 "I" "0.36" "" 21605 "Unconsolidated data" 8 8 6 38 . 76
"MTC68" 68 "I" "0.36" "" 21605 "Unconsolidated data" 8 8 6 38 . 76
"GBSE000075" 75 "I" "19.11" "" 21317 "Consolidated data" 8 8 7 1114 . 15307
"GBSE000075" 75 "I" "20.25" "" 21317 "Consolidated data" 8 8 7 1328 . 15307
"GBSE000075" 75 "I" "NG" "" 21317 "Consolidated data" 8 8 7 8892 . 15307
"GBSE000075" 75 "I" "20.21" "" 21317 "Consolidated data" 8 8 7 1324 . 15307
"GBSE000075" 75 "I" "20.18" "" 21317 "Consolidated data" 8 8 7 1321 . 15307
"GBSE000075" 75 "I" "20.24" "" 21317 "Consolidated data" 8 8 7 1327 . 15307
"GBSE000075" 75 "I" "-" "MO" 21564 "Consolidated data" 8 8 7 1 1128 15307
"GBSE000078" 78 "I" "-" "3.93" 21586 "Consolidated data" 8 8 8 1 604 4782
"GBSE000078" 78 "I" "-" "10.22" 21184 "Consolidated data" 8 8 8 1 175 4782
"GBSE000078" 78 "I" "0.01" "" 21184 "Consolidated data" 8 8 8 3 . 4782
"GBSE000078" 78 "I" "4.82" "" 21264 "Consolidated data" 8 8 8 3468 . 4782
"GBSE000078" 78 "I" "0.02" "" 21184 "Consolidated data" 8 8 8 4 . 4782
"GBSE000078" 78 "I" "-" "1.05" 21457 "Consolidated data" 8 8 8 1 100 4782
"GBSE000078" 78 "I" "0.01" "" 21184 "Consolidated data" 8 8 8 3 . 4782
For example for the firm MTC35 i would like to have just one observation with the total amount of 2.71.
I have already grouped the firms by giving them their individual firm IDs, and used the following 2 codes trying to summarize the observations for each individual observation.
However as can be seen above the summation doesn't add up, not sure if this is due to the format as sh_direct1 and sh_total1 are supposed to be percentages of ownership concentration.
Thus to re-emphasise i would like have for each individual firm one observation with the total sh_direct1 and sh_total1 (even though in this case this firm doesn't have a percentage for sh_total1) instead of three like is the case with DKF03 or firmid 1 in this case.
Any help would be greatly appreciated, Thank you!
Regards,
Shariff
I have a dataset on ownership concetration for various firms with roughly 490000 observations . As i am interested in ownership concentrations for each particular firm, the problem i have with my dataset is that, ownership concentrations are not given collectively for the firm but rather independently for each individual within a firm. Hence, my questions is how i can aggregate the observations for the variables sh_direct1 and sh_total1 to one observations for each firm? Below you can find an example of my data
input str16 idnr double idnr1 str2 sh_type str6(sh_direct sh_total) long sh_date str24 repbas_header long sh_type1 float(sh_type2 firmid) long(sh_direct1 sh_total1) float sh_direct1temp
"DKFO3" 3 "I" ">5.00" "" 21598 "Consolidated data" 8 8 1 8877 . 26631
"DKFO3" 3 "I" ">5.00" "" 21598 "Consolidated data" 8 8 1 8877 . 26631
"DKFO3" 3 "I" ">5.00" "" 21598 "Consolidated data" 8 8 1 8877 . 26631
"RO19" 19 "I" "-" "" 21579 "Unconsolidated data" 8 8 2 1 . 809
"RO19" 19 "I" "16.05" "" 21365 "Unconsolidated data" 8 8 2 808 . 809
"GBSE000027" 27 "I" "10.27" "" 21419 "Unconsolidated data" 8 8 3 229 . 21159
"GBSE000027" 27 "I" "9.82" "" 21419 "Unconsolidated data" 8 8 3 8015 . 21159
"GBSE000027" 27 "I" "9.82" "" 21419 "Unconsolidated data" 8 8 3 8015 . 21159
"GBSE000027" 27 "I" "10.27" "" 21419 "Unconsolidated data" 8 8 3 229 . 21159
"GBSE000027" 27 "I" "19.64" "" 21419 "Unconsolidated data" 8 8 3 1167 . 21159
"GBSE000027" 27 "I" "40.18" "" 21419 "Unconsolidated data" 8 8 3 3504 . 21159
"MTC35" 35 "I" "2.23" "" 20939 "Unconsolidated data" 8 8 4 1226 . 8459
"MTC35" 35 "I" "0.48" "" 20939 "Unconsolidated data" 8 8 4 50 . 8459
"RO35" 35 "I" "23.71" "" 21579 "Unconsolidated data" 8 8 4 1674 . 8459
"MTC35" 35 "I" "6.83" "" 20939 "Unconsolidated data" 8 8 4 5509 . 8459
"GBJE67" 67 "I" "NG" "" 21166 "Consolidated data" 8 8 5 8892 . 44460
"GBJE67" 67 "I" "NG" "" 21166 "Consolidated data" 8 8 5 8892 . 44460
"GBJE67" 67 "I" "NG" "" 21166 "Consolidated data" 8 8 5 8892 . 44460
"GBJE67" 67 "I" "NG" "" 21166 "Consolidated data" 8 8 5 8892 . 44460
"GBJE67" 67 "I" "NG" "" 21166 "Consolidated data" 8 8 5 8892 . 44460
"MTC68" 68 "I" "0.36" "" 21605 "Unconsolidated data" 8 8 6 38 . 76
"MTC68" 68 "I" "0.36" "" 21605 "Unconsolidated data" 8 8 6 38 . 76
"GBSE000075" 75 "I" "19.11" "" 21317 "Consolidated data" 8 8 7 1114 . 15307
"GBSE000075" 75 "I" "20.25" "" 21317 "Consolidated data" 8 8 7 1328 . 15307
"GBSE000075" 75 "I" "NG" "" 21317 "Consolidated data" 8 8 7 8892 . 15307
"GBSE000075" 75 "I" "20.21" "" 21317 "Consolidated data" 8 8 7 1324 . 15307
"GBSE000075" 75 "I" "20.18" "" 21317 "Consolidated data" 8 8 7 1321 . 15307
"GBSE000075" 75 "I" "20.24" "" 21317 "Consolidated data" 8 8 7 1327 . 15307
"GBSE000075" 75 "I" "-" "MO" 21564 "Consolidated data" 8 8 7 1 1128 15307
"GBSE000078" 78 "I" "-" "3.93" 21586 "Consolidated data" 8 8 8 1 604 4782
"GBSE000078" 78 "I" "-" "10.22" 21184 "Consolidated data" 8 8 8 1 175 4782
"GBSE000078" 78 "I" "0.01" "" 21184 "Consolidated data" 8 8 8 3 . 4782
"GBSE000078" 78 "I" "4.82" "" 21264 "Consolidated data" 8 8 8 3468 . 4782
"GBSE000078" 78 "I" "0.02" "" 21184 "Consolidated data" 8 8 8 4 . 4782
"GBSE000078" 78 "I" "-" "1.05" 21457 "Consolidated data" 8 8 8 1 100 4782
"GBSE000078" 78 "I" "0.01" "" 21184 "Consolidated data" 8 8 8 3 . 4782
For example for the firm MTC35 i would like to have just one observation with the total amount of 2.71.
I have already grouped the firms by giving them their individual firm IDs, and used the following 2 codes trying to summarize the observations for each individual observation.
Code:
egen sh_direct1temp = sum(sh_direct1), by(firmid) collapse (sum) sh_direct1, by (firmid)
Thus to re-emphasise i would like have for each individual firm one observation with the total sh_direct1 and sh_total1 (even though in this case this firm doesn't have a percentage for sh_total1) instead of three like is the case with DKF03 or firmid 1 in this case.
Any help would be greatly appreciated, Thank you!
Regards,
Shariff
Comment