Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining the "max" function of one column with a value of another column

    Hi Statalist users

    I have a panel data set in which respondents (persnr) did state their annual income. I was interested in the highest income they ever obtained shown in the observations. For this purpose I used the max function (egen max_inc= max(inc)). But of course now I am interested at which age they actually earned the most.

    here is my data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str6 A str10 B str3 C str7 D
    "persnr" "survey_age" "inc" "max_inc"
    "2402"   "46"         "256" "665"    
    "2402"   "47"         "488" ""      
    "2402"   "48"         "665" ""      
    "2403"   "20"         "716" "716"    
    "2403"   "21"         "716" ""      
    end
    So for the person 2402 the max_age should be 48 and for the case of 2403 I would prefer the first year in which the person earned 716 - so it would be 20
    Do you have any idea which code would suit best?

    Thanks a lot

    Jessica
    Last edited by Jessica Smith; 15 Apr 2019, 08:40.

  • #2
    All your variables are string according to this report. It looks as if you have copied and pasted from some spreadsheet and left metadata in observation 1.

    But egen's max() function couldn't possibly work with such data. So please use dataex (FAQ Advice #12) to show us a sample of your real data.

    Comment


    • #3
      Hi Nick , thanks for your help!

      Sorry for the bad example.

      Here is an example of the original data:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long persnr float survey_age long pglabnet float max_inc
       103 24 1099 1099
       103 23 1023    .
       103 21  818    .
       103 22 1023    .
       203 36  869 2300
       203 42 1667    .
       203 40 1644    .
       203 38  486    .
       203 48 1400    .
       203 32  818    .
       203 41 1790    .
       203 46 2300    .
       203 28  110    .
       203 30  921    .
       203 35 1213    .
       203 31  791    .
       203 45 1750    .
       203 47 1200    .
       203 43 1978    .
       901 49 1149 1252
       901 53 1100    .
       901 39 1114    .
       901 60 1200    .
       901 42 1161    .
       901 46 1032    .
       901 54 1178    .
       901 44 1088    .
       901 35  971    .
       901 55 1252    .
       901 33  818    .
       901 57 1200    .
       901 52 1250    .
       901 48 1176    .
       901 40 1125    .
       901 37  926    .
       901 43 1074    .
       901 59 1200    .
       901 45 1137    .
       901 61 1226    .
       901 47 1080    .
       901 50 1176    .
       901 34  929    .
       901 56 1150    .
       901 62 1200    .
       901 51 1110    .
       901 41 1074    .
       901 63 1240    .
       901 58 1100    .
       901 38  948    .
       901 36 1015    .
      1301 44  767 1279
      1301 49 1279    .
      1301 47  869    .
      1301 50  920    .
      1301 46  869    .
      1301 48  890    .
      1301 45  869    .
      1301 53 1023    .
      1301 54 1023    .
      1301 51 1023    .
      1301 43  879    .
      1301 42  767    .
      1301 52 1278    .
      1302 20  251 1227
      1302 21  716    .
      1302 24 1227    .
      1302 22  792    .
      1302 23 1074    .
      1601 51 1723 1842
      1601 57 1309    .
      1601 46 1176    .
      1601 54 1457    .
      1601 49 1565    .
      1601 50 1595    .
      1601 58 1842    .
      1601 56 1329    .
      1601 44 1329    .
      1601 53 1595    .
      1601 47 1329    .
      1601 55 1227    .
      1704 49 2297 2368
      1704 31  818    .
      1704 29  716    .
      1704 42 1648    .
      1704 32  767    .
      1704 47 2173    .
      1704 43 1778    .
      1704 51 2337    .
      1704 48 2254    .
      1704 35 1452    .
      1704 34 1380    .
      1704 30  716    .
      1704 36 1483    .
      1704 50 2368    .
      1704 40 1643    .
      1704 37 1647    .
      1704 39 1439    .
      1704 45 1738    .
      1704 41 1665    .
      1704 38 1636    .
      end

      Here you can see the person id (persnr) in the first column, then the age at which a person had a certain income (survey_age) and to the right the annual income (pglabnet). The variable max_inc shows the highest income a person ever had and now I need a variable that shows the according age at which a person had the highest income...

      So for person 103 it would be 24, for person 203 it would be 46, for person 901 it would be 55 etc. Sometimes there is the case that over 5 years income has never changed. So there I would need a code that shows the youngest age at which a person earned this certain amount the first time.

      Thanks a lot

      Jessica

      Comment


      • #4
        Thanks for your data example. These both work for that:

        Code:
        . bysort persnr (max_inc survey_age) : gen wanted = survey_age[1] 
        
        . by persnr : egen wanted2 = min(cond(max_inc < ., survey_age, .)) 
        
        . list if !missing(max_inc), sepby(persnr) 
        
             +-----------------------------------------------------------+
             | persnr   survey~e   pglabnet   max_inc   wanted   wanted2 |
             |-----------------------------------------------------------|
          1. |    103         24       1099      1099       24        24 |
             |-----------------------------------------------------------|
          5. |    203         36        869      2300       36        36 |
             |-----------------------------------------------------------|
         20. |    901         49       1149      1252       49        49 |
             |-----------------------------------------------------------|
         51. |   1301         44        767      1279       44        44 |
             |-----------------------------------------------------------|
         64. |   1302         20        251      1227       20        20 |
             |-----------------------------------------------------------|
         69. |   1601         51       1723      1842       51        51 |
             |-----------------------------------------------------------|
         81. |   1704         49       2297      2368       49        49 |
             +-----------------------------------------------------------+

        Comment


        • #5
          Hi Nick,

          thanks for the quick respond. Unfortunately, there is still something wrong about it. For example, person nr 203 did not earn the most at the age of 36 but at the age of 46. I guess it is my fault as I created max_inc the wrong way. It is based on a variable that equals 1 if the person id changes to the next id number (y). And max_inc replaces the one with the highest income of all observations for one person. See the example:

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input long persnr float survey_age long pglabnet float(y max_inc)
           103 24 1099 1 1099
           103 21  818 0    .
           103 22 1023 0    .
           103 23 1023 0    .
           203 36  869 1 2300
           203 28  110 0    .
           203 30  921 0    .
           203 31  791 0    .
           203 32  818 0    .
           203 35 1213 0    .
           203 38  486 0    .
           203 40 1644 0    .
           203 41 1790 0    .
           203 42 1667 0    .
           203 43 1978 0    .
           203 45 1750 0    .
           203 46 2300 0    .
           203 47 1200 0    .
           203 48 1400 0    .
           901 49 1149 1 1252
           901 33  818 0    .
           901 34  929 0    .
           901 35  971 0    .
           901 36 1015 0    .
           901 37  926 0    .
           901 38  948 0    .
           901 39 1114 0    .
           901 40 1125 0    .
           901 41 1074 0    .
           901 42 1161 0    .
           901 43 1074 0    .
           901 44 1088 0    .
           901 45 1137 0    .
           901 46 1032 0    .
           901 47 1080 0    .
           901 48 1176 0    .
           901 50 1176 0    .
           901 51 1110 0    .
           901 52 1250 0    .
           901 53 1100 0    .
           901 54 1178 0    .
           901 55 1252 0    .
           901 56 1150 0    .
           901 57 1200 0    .
           901 58 1100 0    .
           901 59 1200 0    .
           901 60 1200 0    .
           901 61 1226 0    .
           901 62 1200 0    .
           901 63 1240 0    .
          1301 44  767 1 1279
          1301 42  767 0    .
          1301 43  879 0    .
          1301 45  869 0    .
          1301 46  869 0    .
          1301 47  869 0    .
          1301 48  890 0    .
          1301 49 1279 0    .
          1301 50  920 0    .
          1301 51 1023 0    .
          1301 52 1278 0    .
          1301 53 1023 0    .
          1301 54 1023 0    .
          1302 20  251 1 1227
          1302 21  716 0    .
          1302 22  792 0    .
          1302 23 1074 0    .
          1302 24 1227 0    .
          1601 51 1723 1 1842
          1601 44 1329 0    .
          1601 46 1176 0    .
          1601 47 1329 0    .
          1601 49 1565 0    .
          1601 50 1595 0    .
          1601 53 1595 0    .
          1601 54 1457 0    .
          1601 55 1227 0    .
          1601 56 1329 0    .
          1601 57 1309 0    .
          1601 58 1842 0    .
          1704 49 2297 1 2368
          1704 29  716 0    .
          1704 30  716 0    .
          1704 31  818 0    .
          1704 32  767 0    .
          1704 33  969 0    .
          1704 34 1380 0    .
          1704 35 1452 0    .
          1704 36 1483 0    .
          1704 37 1647 0    .
          1704 38 1636 0    .
          1704 39 1439 0    .
          1704 40 1643 0    .
          1704 41 1665 0    .
          1704 42 1648 0    .
          1704 43 1778 0    .
          1704 44 1838 0    .
          1704 45 1738 0    .
          1704 46 2000 0    .
          1704 47 2173 0    .
          end

          So the variable wanted just takes the age next to the value where y==1, even though it is not always the age next to y==1.

          I hope this makes sense...

          Thanks again.

          Jessica

          Comment


          • #6
            You'll have to forgive me for not understanding your data structure and in particular why the max income is held where it is and why income is called pglabnet.

            So, you want the minimum age at which the income was maximum. With your example, I now suggest

            Code:
            . egen max_wanted = max(pglabnet), by(persnr) 
            
            . egen age_wanted = min(cond(pglabnet == max_wanted, survey_age, .)), by(persnr) 
            
            . egen check = max(max_inc), by(persnr) 
            
            . tabdisp persnr, c(*wanted check) 
            
            ----------------------------------------------
               persnr | max_wanted  age_wanted       check
            ----------+-----------------------------------
                  103 |       1099          24        1099
                  203 |       2300          46        2300
                  901 |       1252          55        1252
                 1301 |       1279          49        1279
                 1302 |       1227          24        1227
                 1601 |       1842          58        1842
                 1704 |       2297          49        2368
            ----------------------------------------------
            https://journals.sagepub.com/doi/pdf...867X1101100210 may prove helpful, especially Sections 9 and 10.

            Comment


            • #7
              Hi Nick! No problem at all - it worked with the last code you posted ! Thanks a lot!

              Comment

              Working...
              X