Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Compare two observations in dataset within same firm and take highest value of other variable with if condition

    Hi everyone,

    I am new to Stata, using version 16.

    I would like to take:

    1) either an average of the total compensation (total_comp) of the two observations (within one year by one firm) if ceoann_dup is 1 (here the two observations in 2013)
    2) or take the highest of the respective two total compensations within one year by one firm of the two indicated observations

    I tagged all duplicates (ceoann_dup) and my dataset looks following:

    gvkey fyear execid exec_fullname ceoann cfoann ceoann_dup total_comp
    001004 2012 19999 Michael J. Sharp CFO 0 841.204
    001004 2013 09249 David P. Storch CEO 0 5247.779
    001004 2013 19999 Michael J. Sharp CFO 1 1003.249
    001004 2013 46404 John C. Fortson CFO 1 1411.156
    001004 2014 09249 David P. Storch CEO 0 5234.648


    How could I operationalize this in Stata?

    Thank you so much for you help!

  • #2
    Something like this?
    Code:
    egen avg_comp = mean(total_comp), by(gvkey fyear)
    egen max_comp = max(total_comp), by(gvkey fyear)
    
    gen wanted = cond(ceoann_dup == 1, avg_comp, max_comp)

    Comment


    • #3
      Thank you very much for the quick reply !

      This did the magic:

      Code:
      gen total_comp_new = cond(ceoann_dup == 1, avg_comp,total_comp)

      Short follow-up question:
      Once that the first observation row is adjusted with the "new total compensation": Which code could help to delete the second observation from the dataset?
      In the example from above, I would like to delete the following row:

      gvkey fyear execid exec_fullname ceoann cfoann ceoann_dup total_comp total_comp_new
      001004 2013 46404 John C. Fortson CFO 1 1411.156 1207.203

      Thank you again so much for your help!

      Comment


      • #4
        Consider this:
        Code:
        gen `c(obs_t)' n = _n
        bysort gvkey fyear ceoann_dup (n): drop if ceoann_dup == 1 & _n > 1
        I am generating n here to sort the data on, so that it remains in the current order you showed. If you have another variable that orders the rows, please use that in the second line instead, and omit the first line.

        Comment


        • #5
          Thank you so much! Worked perfectly!!

          Comment

          Working...
          X