Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with a string variable.

    Hi all,

    As im trying to calculate the CEO pay slice I'm encoutering two problems. (CEO pay slice is is the fraction of total compensation of the top-five executives that is received by the CEO)

    First, since the value (pceo) is missing in some cases I would like to fill those gaps. If the variable (pceo) is missing, I would like to fill it in with the value CEO, for the employee with the highest (tdc1). I know it's not ideal, but this might be the best alternative for now. Is there a command for this? Because I can't find anything for this.

    Second, as I will run the command collapse (max) ''CEOpay'' (sum) tdc1, by (cusip) to calculate the CEO pay slice, I will need a distinct variable the the CEOpay. However, as I tried to run gen CEOpay = tdc1 if pceo==CEO it says that CEO is not found.

    Regards,

    Geert
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(execdir co_per_rol execrankann tdc1) str6 gvkey double year str4 pceo str8 cusip str50 sicdesc double sic
    0 38458 2   202.549 "021542" 2011 ""    "00036020" "AIR-COND,HEATING,REFRIG EQ"   3585
    1 38455 1    447.51 "021542" 2011 "CEO" "00036020" "AIR-COND,HEATING,REFRIG EQ"   3585
    0 38456 5   182.568 "021542" 2011 ""    "00036020" "AIR-COND,HEATING,REFRIG EQ"   3585
    0 38457 4   188.492 "021542" 2011 ""    "00036020" "AIR-COND,HEATING,REFRIG EQ"   3585
    0 38459 3   194.505 "021542" 2011 ""    "00036020" "AIR-COND,HEATING,REFRIG EQ"   3585
    0 44350 3  1727.069 "001004" 2011 ""    "00036110" "MACHINERY AND EQUIPMENT-WHSL" 5080
    1  5623 1    5786.4 "001004" 2011 "CEO" "00036110" "MACHINERY AND EQUIPMENT-WHSL" 5080
    0 35684 4  1696.431 "001004" 2011 ""    "00036110" "MACHINERY AND EQUIPMENT-WHSL" 5080
    0 38212 5  1081.963 "001004" 2011 ""    "00036110" "MACHINERY AND EQUIPMENT-WHSL" 5080
    1  5626 2  2781.156 "001004" 2011 ""    "00036110" "MACHINERY AND EQUIPMENT-WHSL" 5080
    0 26645 4  1258.209 "001410" 2011 ""    "00095710" "SVCS TO DWELLINGS, OTH BLDGS" 7340
    1 15843 1  3326.191 "001410" 2011 ""    "00095710" "SVCS TO DWELLINGS, OTH BLDGS" 7340
    0 24909 3  1705.794 "001410" 2011 ""    "00095710" "SVCS TO DWELLINGS, OTH BLDGS" 7340
    0 36693 5  1324.973 "001410" 2011 ""    "00095710" "SVCS TO DWELLINGS, OTH BLDGS" 7340
    0 45039 2  2832.258 "001410" 2011 ""    "00095710" "SVCS TO DWELLINGS, OTH BLDGS" 7340
    0 37517 5   424.991 "179288" 2011 ""    "00128210" "NEWSPAPER:PUBG, PUBG & PRINT" 2711
    0 44054 3   891.788 "179288" 2011 ""    "00128210" "NEWSPAPER:PUBG, PUBG & PRINT" 2711
    0 37516 4   626.091 "179288" 2011 ""    "00128210" "NEWSPAPER:PUBG, PUBG & PRINT" 2711
    1 36139 1  1609.912 "179288" 2011 ""    "00128210" "NEWSPAPER:PUBG, PUBG & PRINT" 2711
    1 37514 2  1125.081 "179288" 2011 "CEO" "00128210" "NEWSPAPER:PUBG, PUBG & PRINT" 2711
    1 21200 1 20827.217 "024216" 2011 ""    "00130H10" "COGENERATN-SM POWER PRODUCER" 4991
    0 43468 5   3189.95 "024216" 2011 ""    "00130H10" "COGENERATN-SM POWER PRODUCER" 4991
    0 42704 6  3290.074 "024216" 2011 ""    "00130H10" "COGENERATN-SM POWER PRODUCER" 4991
    1 32523 2  6916.265 "024216" 2011 "CEO" "00130H10" "COGENERATN-SM POWER PRODUCER" 4991
    0 32522 3  3764.291 "024216" 2011 ""    "00130H10" "COGENERATN-SM POWER PRODUCER" 4991
    0 43914 5    837.24 "029968" 2011 ""    "00154710" "STEEL WORKS & BLAST FURNACES" 3312
    0 29356 4  1308.651 "029968" 2011 ""    "00154710" "STEEL WORKS & BLAST FURNACES" 3312
    0 24062 2  1652.464 "029968" 2011 ""    "00154710" "STEEL WORKS & BLAST FURNACES" 3312
    0 29355 3  1551.895 "029968" 2011 ""    "00154710" "STEEL WORKS & BLAST FURNACES" 3312
    1 21573 1  5704.842 "029968" 2011 "CEO" "00154710" "STEEL WORKS & BLAST FURNACES" 3312
    1 52120 1  5365.978 "012250" 2011 ""    "00163U10" "IN VITRO,IN VIVO DIAGNOSTICS" 2835
    0 52119 7   905.285 "012250" 2011 ""    "00163U10" "IN VITRO,IN VIVO DIAGNOSTICS" 2835
    0 52116 3  1286.573 "012250" 2011 ""    "00163U10" "IN VITRO,IN VIVO DIAGNOSTICS" 2835
    0 52109 8  1356.074 "012250" 2011 ""    "00163U10" "IN VITRO,IN VIVO DIAGNOSTICS" 2835
    0 52107 2  1041.162 "012250" 2011 ""    "00163U10" "IN VITRO,IN VIVO DIAGNOSTICS" 2835
    1 46076 5   627.388 "187406" 2011 ""    "00164V10" "TELEVISION BROADCAST STATION" 4833
    0 46077 3  1534.533 "187406" 2011 ""    "00164V10" "TELEVISION BROADCAST STATION" 4833
    0 46078 2   3736.23 "187406" 2011 ""    "00164V10" "TELEVISION BROADCAST STATION" 4833
    0 46075 4   1229.09 "187406" 2011 ""    "00164V10" "TELEVISION BROADCAST STATION" 4833
    0 46079 1 11514.114 "187406" 2011 "CEO" "00164V10" "TELEVISION BROADCAST STATION" 4833
    0 31797 3   1036.33 "144520" 2011 ""    "00174410" "EMPLOYMENT AGENCIES"          7361
    0 45744 4   911.391 "144520" 2011 ""    "00174410" "EMPLOYMENT AGENCIES"          7361
    0 39828 5    36.764 "144520" 2011 ""    "00174410" "EMPLOYMENT AGENCIES"          7361
    0 33991 2  1144.797 "144520" 2011 ""    "00174410" "EMPLOYMENT AGENCIES"          7361
    1 31795 1  2554.568 "144520" 2011 "CEO" "00174410" "EMPLOYMENT AGENCIES"          7361
    0 34027 2  7016.059 "009899" 2011 ""    "00206R10" "RADIOTELEPHONE COMMUNICATION" 4812
    1 28491 1 18690.824 "009899" 2011 "CEO" "00206R10" "RADIOTELEPHONE COMMUNICATION" 4812
    0 43386 4  5384.793 "009899" 2011 ""    "00206R10" "RADIOTELEPHONE COMMUNICATION" 4812
    0 32289 6  5426.549 "009899" 2011 ""    "00206R10" "RADIOTELEPHONE COMMUNICATION" 4812
    0 34026 3   7065.78 "009899" 2011 ""    "00206R10" "RADIOTELEPHONE COMMUNICATION" 4812
    0 44168 5   278.514 "024708" 2011 ""    "00215F10" "RADIOTELEPHONE COMMUNICATION" 4812
    0 44169 4   390.019 "024708" 2011 ""    "00215F10" "RADIOTELEPHONE COMMUNICATION" 4812
    0 42956 2   904.138 "024708" 2011 ""    "00215F10" "RADIOTELEPHONE COMMUNICATION" 4812
    0 42955 3   465.275 "024708" 2011 ""    "00215F10" "RADIOTELEPHONE COMMUNICATION" 4812
    1 42959 1   1739.36 "024708" 2011 "CEO" "00215F10" "RADIOTELEPHONE COMMUNICATION" 4812
    1 45024 6   578.464 "001076" 2011 ""    "00253530" "EQUIP RENTAL & LEASING, NEC"  7359
    1 17978 4  1655.792 "001076" 2011 ""    "00253530" "EQUIP RENTAL & LEASING, NEC"  7359
    1 17979 3  2805.984 "001076" 2011 ""    "00253530" "EQUIP RENTAL & LEASING, NEC"  7359
    1 17981 1  2846.024 "001076" 2011 ""    "00253530" "EQUIP RENTAL & LEASING, NEC"  7359
    1 17980 2  2724.679 "001076" 2011 ""    "00253530" "EQUIP RENTAL & LEASING, NEC"  7359
    0 35842 2  1152.341 "024888" 2011 ""    "00256710" "ELECTROMEDICAL APPARATUS"     3845
    1 35839 1  2302.419 "024888" 2011 "CEO" "00256710" "ELECTROMEDICAL APPARATUS"     3845
    0 35840 5  1129.373 "024888" 2011 ""    "00256710" "ELECTROMEDICAL APPARATUS"     3845
    0 42705 4  1137.099 "024888" 2011 ""    "00256710" "ELECTROMEDICAL APPARATUS"     3845
    0 35843 3  1147.458 "024888" 2011 ""    "00256710" "ELECTROMEDICAL APPARATUS"     3845
    0 24994 2  5736.462 "001078" 2011 ""    "00282410" "PHARMACEUTICAL PREPARATIONS"  2834
    0 18545 4  4669.851 "001078" 2011 ""    "00282410" "PHARMACEUTICAL PREPARATIONS"  2834
    0 37037 3   4429.37 "001078" 2011 ""    "00282410" "PHARMACEUTICAL PREPARATIONS"  2834
    1 13896 1 18591.822 "001078" 2011 "CEO" "00282410" "PHARMACEUTICAL PREPARATIONS"  2834
    0 45025 5  3828.118 "001078" 2011 ""    "00282410" "PHARMACEUTICAL PREPARATIONS"  2834
    1 46494 .  4669.851 "016101" 2011 "CEO" "00287Y10" "BIOLOGICAL PDS,EX DIAGNSTICS" 2836
    0 46493 .  1503.002 "016101" 2011 ""    "00287Y10" "BIOLOGICAL PDS,EX DIAGNSTICS" 2836
    0 46495 .  2481.799 "016101" 2011 ""    "00287Y10" "BIOLOGICAL PDS,EX DIAGNSTICS" 2836
    0 46492 .   3117.98 "016101" 2011 ""    "00287Y10" "BIOLOGICAL PDS,EX DIAGNSTICS" 2836
    0 46491 .   4429.37 "016101" 2011 ""    "00287Y10" "BIOLOGICAL PDS,EX DIAGNSTICS" 2836
    0 44211 5  1108.523 "063643" 2011 ""    "00289620" "FAMILY CLOTHING STORES"       5651
    1 15624 1  46756.16 "063643" 2011 ""    "00289620" "FAMILY CLOTHING STORES"       5651
    0 37617 4  4043.167 "063643" 2011 "CEO" "00289620" "FAMILY CLOTHING STORES"       5651
    0 16943 2  4542.037 "063643" 2011 ""    "00289620" "FAMILY CLOTHING STORES"       5651
    0 19491 3   4555.59 "063643" 2011 ""    "00289620" "FAMILY CLOTHING STORES"       5651
    0 46483 2  1078.367 "013619" 2011 ""    "00365410" "SURGICAL,MED INSTR,APPARATUS" 3841
    1 46485 1  4193.467 "013619" 2011 "CEO" "00365410" "SURGICAL,MED INSTR,APPARATUS" 3841
    0 46484 4   963.864 "013619" 2011 ""    "00365410" "SURGICAL,MED INSTR,APPARATUS" 3841
    0 46479 5   838.186 "013619" 2011 ""    "00365410" "SURGICAL,MED INSTR,APPARATUS" 3841
    0 46481 3   902.099 "013619" 2011 ""    "00365410" "SURGICAL,MED INSTR,APPARATUS" 3841
    0 39692 .  1988.328 "065706" 2011 ""    "00374N10" "COMMUNICATIONS SERVICES, NEC" 4899
    0 39695 3  3024.537 "065706" 2011 ""    "00374N10" "COMMUNICATIONS SERVICES, NEC" 4899
    1 39696 1  9586.366 "065706" 2011 "CEO" "00374N10" "COMMUNICATIONS SERVICES, NEC" 4899
    0 39693 2  6886.393 "065706" 2011 ""    "00374N10" "COMMUNICATIONS SERVICES, NEC" 4899
    0 39694 4  3020.955 "065706" 2011 ""    "00374N10" "COMMUNICATIONS SERVICES, NEC" 4899
    0 42364 3   1331.88 "031564" 2011 ""    "00449810" "CMP PROGRAMMING,DATA PROCESS" 7370
    1 30595 1  3378.608 "031564" 2011 "CEO" "00449810" "CMP PROGRAMMING,DATA PROCESS" 7370
    0 34290 4   1011.97 "031564" 2011 ""    "00449810" "CMP PROGRAMMING,DATA PROCESS" 7370
    0 34292 5  1151.144 "031564" 2011 ""    "00449810" "CMP PROGRAMMING,DATA PROCESS" 7370
    0 37863 2  1307.653 "031564" 2011 ""    "00449810" "CMP PROGRAMMING,DATA PROCESS" 7370
    0 46386 5  1114.166 "156617" 2011 ""    "00484M10" "BIOLOGICAL PDS,EX DIAGNSTICS" 2836
    0 46382 4  1501.247 "156617" 2011 ""    "00484M10" "BIOLOGICAL PDS,EX DIAGNSTICS" 2836
    0 46380 3   1645.47 "156617" 2011 ""    "00484M10" "BIOLOGICAL PDS,EX DIAGNSTICS" 2836
    0 46383 6   1082.63 "156617" 2011 ""    "00484M10" "BIOLOGICAL PDS,EX DIAGNSTICS" 2836
    1 46385 1  3770.583 "156617" 2011 "CEO" "00484M10" "BIOLOGICAL PDS,EX DIAGNSTICS" 2836
    end

  • #2
    There are a few ways to solve this problem (this is my first post on statalist -- apologies for any conventions that I break). I would flag companies where CEO is populated and replace the pceo with "CEO":
    Code:
    gen flag_ceo = pceo == "CEO"
    egen mflag_ceo = max(flag_ceo), by(cusip)
    egen max_pay = max(tdc1), by(cusip)
    replace pceo = "CEO" if max_pay == tdc1 & !mflag_ceo
    Now the second part will provide the desired fraction for ceo pay slice:
    Code:
    collapse (sum) tdc1, by(cusip max_pay)
    gen ceo_pay_slice = tdc1/max_pay
    Last edited by Josh Clothiaux; 01 May 2017, 15:38.

    Comment


    • #3
      I tried to find a solution to your query. But there is a problem with the data. The variable "pceo" is a string variable, and has no "values".

      That said, I believe you may need to fiddle with - egen - plus the max() function.

      Something like:

      Code:
      . egen myvar = max(tdc1) if missing(pceo)
      Hope that helps.
      Best regards,

      Marcos

      Comment


      • #4
        Originally posted by Josh Clothiaux View Post
        There are a few ways to solve this problem (this is my first post on statalist -- apologies for any conventions that I break). I would flag companies where CEO is populated and replace the pceo with "CEO":
        Code:
        gen flag_ceo = pceo == "CEO"
        egen mflag_ceo = max(flag_ceo), by(cusip)
        egen max_pay = max(tdc1), by(cusip)
        replace pceo = "CEO" if max_pay == tdc1 & !mflag_ceo
        Now the second part will provide the desired fraction for ceo pay slice:
        Code:
        collapse (sum) tdc1, by(cusip max_pay)
        gen ceo_pay_slice = tdc1/max_pay
        Thanks for all your help! Unfortunately this didn't work as this took the max tdc1 per cusip for all firms, whereas I only want this to be the case if the variable (pceo) is missing for a firm.

        Comment


        • #5
          If we think about the way that values will be sorted, then

          1. Non-missing strings sort after missing strings.

          2. Highest numeric values sort to the end.

          Putting the two together,

          Code:
          . bysort cusip (pceo tdc1) : gen CEO1 = tdc1[_N] if !missing(pceo[_N]) 
          (15 missing values generated)
          
          . by cusip (pceo tdc1) : gen CEO2 = tdc1[_N] 
          
          . egen max = max(tdc1), by(cusip)  
          
          . tabdisp cusip, c(CEO? max) 
          
          ----------------------------------------------
              cusip |       CEO1        CEO2         max
          ----------+-----------------------------------
           00036020 |     447.51      447.51      447.51
           00036110 |     5786.4      5786.4      5786.4
           00095710 |               3326.191    3326.191
           00128210 |   1125.081    1125.081    1609.912
           00130H10 |   6916.265    6916.265    20827.22
           00154710 |   5704.842    5704.842    5704.842
           00163U10 |               5365.978    5365.978
           00164V10 |   11514.11    11514.11    11514.11
           00174410 |   2554.568    2554.568    2554.568
           00206R10 |   18690.82    18690.82    18690.82
           00215F10 |    1739.36     1739.36     1739.36
           00253530 |               2846.024    2846.024
           00256710 |   2302.419    2302.419    2302.419
           00282410 |   18591.82    18591.82    18591.82
           00287Y10 |   4669.851    4669.851    4669.851
           00289620 |   4043.167    4043.167    46756.16
           00365410 |   4193.467    4193.467    4193.467
           00374N10 |   9586.366    9586.366    9586.366
           00449810 |   3378.608    3378.608    3378.608
           00484M10 |   3770.583    3770.583    3770.583
          ----------------------------------------------

          Comment


          • #6
            A combination of Josh Clothiaux's solution and the original post works for me:
            Code:
            * fill in pceo
            generate flag_ceo = pceo == "CEO"
            egen mflag_ceo = max(flag_ceo), by(cusip)
            egen double max_pay = max(tdc1), by(cusip)
            replace pceo = "CEO" if max_pay == tdc1 & !mflag_ceo
            
            * collapse and generate ceo_pay_slice
            generate CEOpay = tdc1 if pceo=="CEO"
            collapse (max) "CEOpay" (sum) tdc1, by (cusip)
            generate ceo_pay_slice=CEOpay/tdc1
            Note that Josh's original code contained a precision issue (at least with the example data), because it did not calculate 'max_pay' as double.

            Regards
            Bela

            Comment


            • #7
              Thanks Nick Cox! That's excellent work!

              Comment

              Working...
              X