Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create variable by groups age

    Hello,

    I have the dataset below where dvage is the age of an individual.
    I wouls like to create a variale "new_age" where 1 "age between 16 - 25" 2 "age between 26 - 35" 3 "age between 36 - 45" 4 "age between 46 - 55" 5 "age >56.
    Thank you very much in advance

    ​​​​​​
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long pidp byte sex int dvage float wave
     750735647 1 42 5
    1293676887 1 23 1
    1293131527 1 40 2
     680919371 1 48 2
    1225819755 1 38 5
    1564975135 1 35 2
     693774765 1 60 4
    1156796372 1 30 4
     680186327 1 51 2
      68539251 1 53 2
     341650371 1 65 4
     612422979 1 22 3
     954257611 1 65 5
    1224665735 1 21 1
     544770451 1 45 2
    1360687491 1 42 3
    1293872735 1 18 1
    1088719451 1 56 2
     613878171 1 23 5
     477216527 1 57 4
     612022451 1 55 1
     447338762 1 46 2
    1224460367 1 64 4
     681536813 1 16 2
     340968327 1 54 2
     544562367 1 68 4
    1497221287 1 50 4
     816085687 1 69 2
     340428411 1 52 3
     748520207 1 51 3
     884494371 1 38 2
    1031845730 1 24 5
     819300733 1 46 1
      68506611 1 47 5
     205262765 1 52 4
     482290136 1 36 5
     156591089 1 45 2
     272175447 1 53 1
    1157130855 1 18 4
     748240735 1 58 5
    1566057691 1 52 5
     818635687 1 84 1
    1496729655 1 32 3
    1496288327 1 59 2
     273183207 1 51 4
     952058495 1 18 1
    1428748011 1 68 4
    1571541402 1 21 2
     272974447 1 67 5
    1360032647 1 38 5
    1020622211 1 87 5
     837228253 1 19 3
     817299503 1 17 2
    1090284543 1 32 5
     632606729 1 62 2
    1157276367 1 58 3
    1564266567 1 64 3
    1235369690 1 30 4
      68063251 1 54 5
     476497771 1 59 4
    1361089367 1 65 5
     817676887 1 77 1
    1429545647 1 27 4
     272741887 1 46 4
     694222217 1 38 3
    1632444727 1 59 3
     748724215 1 22 2
     544629687 1 61 3
     340686127 1 79 3
    1224688167 1 78 2
    1224457647 1 31 3
    1292561011 1 73 3
     503683485 1 53 1
     751561165 1 47 1
     816533807 1 76 3
     748309405 1 76 5
     817908085 1 53 3
     952072767 1 64 2
     272028567 1 78 5
    1225100255 1 22 3
     340281527 1 48 3
    1292235975 1 20 4
     750747207 1 78 4
     885081211 1 50 4
     571882725 1 63 3
     367745365 1 57 3
    1564852047 1 31 4
    1360333887 1 69 1
     272798327 1 48 3
    1633012539 1 18 2
     748809205 1 69 1
     817761964 1 31 3
     952644651 1 57 1
     411059329 1 25 5
     750754687 1 50 2
    1428518187 1 47 5
     693774765 1 54 1
    1030526450 1 80 5
     682502407 1 33 4
     137305692 1 34 4
    end
    label values sex c_sex
    label def c_sex 1 "male", modify
    label values dvage c_dvage

  • #2
    Code:
    gen wanted= ceil((dvage-15)/10)
    replace wanted = 5 if wanted>5 & !missing(wanted)
    See https://journals.sagepub.com/doi/pdf...867X0400300413 for an explanation.

    Comment


    • #3
      As #2 indicates I am a fan of floor() and ceil() when they make code concise and unambiguous. See also https://www.stata-journal.com/articl...article=dm0095

      There are many ways to do this and top concern is often clarity -- clarity in (1) your understanding the code now, (2) your understanding it when you revisit, e.g. some months down the line when reviewers have questions or comments, and (3) anybody's understanding if they need to understand your code, including if they know little or no Stata.

      I dislike using the official Stata function cut() in egen because I don't want the burden of remembering or uncovering which way values on bin limits are assigned and what happens to missing values, and so forth.

      Someone people may prefer variations on


      Code:
      gen wanted = 1 if inrange(dvage, 16, 25)
      replace wanted = 2 if inrange(dvage, 26, 35)
      where
      Code:
      inrange()
      is, I suggest, fairly explicit, although nothing stops use of >= <= > < as desired.

      I like reaching for

      Code:
      gen wanted = cond(dvage > 55, 5,
                             cond(dvage > 45, 4,
                             cond(dvage > 35, 3,
                             cond(dvage > 25, 2,
                             cond(dvage > 15, 1, 0))))) if dvage < .
      which, if you like it at all, is best laid out on multiple lines in a script; is thought of as each ( being a promise to lay down a matching ) later; and which can be read aloud as

      if dvage > 55 assign 5
      otherwise if dvage > 45 assign 4
      ...
      so long as dvage is not missing

      Many people hate that too.

      Comment


      • #4
        Thank you very much!

        Comment

        Working...
        X