Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to deal with age group variable as a string variable?

    Dear Statalisters,

    I have a dataset including age as groups after the age of 20; like 20-24, 25-29.
    However, I can see the age of those who are younger than 20.
    My problem arises when I want to generate a newd ummy variable for this age groups.
    I use the command below:
    . gen agegroup=1 if age<18

    and I get this error message:
    type mismatch
    r(109)

    Age is a string variable and I cannot format it. Any help will be appreciated. Thank you!
    Last edited by Gizem Cetin; 19 May 2017, 02:11.

  • #2
    Data example please http://www.statalist.org/forums/help#stata or minimally the results of

    Code:
    tabulate age, missing

    Comment


    • #3
      Here it is:

      YAS_HESAPLA
      NAN Freq. Percent Cum.

      0 315 0.02 0.02
      1 481 0.03 0.06
      10 13,806 0.99 1.04
      11 23,262 1.66 2.70
      12 24,645 1.76 4.46
      13 28,516 2.04 6.50
      14 29,359 2.10 8.60
      15 24,369 1.74 10.34
      16 24,754 1.77 12.11
      17 27,772 1.98 14.09
      18 25,236 1.80 15.89
      19 19,406 1.39 17.28
      2 425 0.03 17.31
      20-24 96,244 6.87 24.18
      25-29 108,535 7.75 31.93
      3 477 0.03 31.97
      30-34 131,055 9.36 41.33
      35-39 133,274 9.52 50.85
      4 484 0.03 50.88
      40-44 126,908 9.06 59.95
      45-49 112,511 8.04 67.98
      5 525 0.04 68.02
      50-54 113,049 8.07 76.09
      55-59 95,699 6.83 82.93
      6 492 0.04 82.96
      60-64 77,107 5.51 88.47
      65-69 58,110 4.15 92.62
      7 530 0.04 92.66
      70-74 41,855 2.99 95.65
      75-79 29,433 2.10 97.75
      8 581 0.04 97.79
      80+ 30,171 2.15 99.95
      9 749 0.05 100.00

      Total 1,400,135 100.00

      Comment


      • #4
        Gizem:
        please post what you typed and what Stata gave you back via CODE delimiters. Thanks.
        Nick asked for an example/excerpt of your dataset that you can easily provide via -dataex- (type -search dataex- to install).
        That said, you were probably trying something like
        Code:
        gen agegroup=1 if age<"19"///indeed, some of your ids are aged 19
        Then you can replicate the same approach via -replace- for all the remaining age groups (e.g.:
        Code:
        replace agegroup=2 if age>="20" & age<="24"///
        Eventually, you can -label- them (e.g.):
        Code:
        label define agegroup 1 "<20" 2 "20-24"
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          I don't think Carlo's approach will work. For example, in string terms, "7" is not less than "20" as the tabulate result already shows.

          I think you need something more like this. Note that a data example would have saved me time on crude mechanical engineering.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str5 age_group float freq
          "0"        315
          "1"        481
          "10"     13806
          "11"     23262
          "12"     24645
          "13"     28516
          "14"     29359
          "15"     24369
          "16"     24754
          "17"     27772
          "18"     25236
          "19"     19406
          "2"        425
          "20-24"  96244
          "25-29" 108535
          "3"        477
          "30-34" 131055
          "35-39" 133274
          "4"        484
          "40-44" 126908
          "45-49" 112511
          "5"        525
          "50-54" 113049
          "55-59"  95699
          "6"        492
          "60-64"  77107
          "65-69"  58110
          "7"        530
          "70-74"  41855
          "75-79"  29433
          "8"        581
          "80+"    30171
          "9"        749
          end
          
          split age_group, parse(- +) gen(age) limit(1) destring
          tab age1
          gen age2 = cond(age1 < 20, age1, age1 + 2)
          labmask age1, values(age_group)
          groups age1 age_group age2  [w=freq]
          
            +--------------------------------------------+
            |  age1   age_gr~p   age2    Freq.   Percent |
            |--------------------------------------------|
            |     0          0      0      315      0.02 |
            |     1          1      1      481      0.03 |
            |     2          2      2      425      0.03 |
            |     3          3      3      477      0.03 |
            |     4          4      4      484      0.03 |
            |--------------------------------------------|
            |     5          5      5      525      0.04 |
            |     6          6      6      492      0.04 |
            |     7          7      7      530      0.04 |
            |     8          8      8      581      0.04 |
            |     9          9      9      749      0.05 |
            |--------------------------------------------|
            |    10         10     10    13806      0.99 |
            |    11         11     11    23262      1.66 |
            |    12         12     12    24645      1.76 |
            |    13         13     13    28516      2.04 |
            |    14         14     14    29359      2.10 |
            |--------------------------------------------|
            |    15         15     15    24369      1.74 |
            |    16         16     16    24754      1.77 |
            |    17         17     17    27772      1.98 |
            |    18         18     18    25236      1.80 |
            |    19         19     19    19406      1.39 |
            |--------------------------------------------|
            | 20-24      20-24     22    96244      6.87 |
            | 25-29      25-29     27   108535      7.75 |
            | 30-34      30-34     32   131055      9.36 |
            | 35-39      35-39     37   133274      9.52 |
            | 40-44      40-44     42   126908      9.06 |
            |--------------------------------------------|
            | 45-49      45-49     47   112511      8.04 |
            | 50-54      50-54     52   113049      8.07 |
            | 55-59      55-59     57    95699      6.83 |
            | 60-64      60-64     62    77107      5.51 |
            | 65-69      65-69     67    58110      4.15 |
            |--------------------------------------------|
            | 70-74      70-74     72    41855      2.99 |
            | 75-79      75-79     77    29433      2.10 |
            |   80+        80+     82    30171      2.15 |
            +--------------------------------------------+
          Here labmask should be installed from Stata Journal files and groups from SSC.

          Indicators (you say dummies) are now easy.

          Comment


          • #6
            Nick is correct.
            I mistook _n with the numerical meaning of the value.
            My mistake is replicated in this toy-example:
            Code:
            . set obs 4
            
            . g age="10" in 1
            
            . replace age="11" in 2
            
            . replace age="12" in 3
            
            . replace age="7" in 4
            
            . g integer=1 if age<="7"
            
            . list
            
                 +---------------+
                 | age   integer |
                 |---------------|
              1. |  10         1 |
              2. |  11         1 |
              3. |  12         1 |
              4. |   7         1 |
                 +---------------+
            which is obviously absurd.

            Lesson for Carlo: do not ever use -string- when you can use numerical format.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Nick,

              What you suggest worked! I'm so grateful!

              Carlo, thank you four your comments!

              Comment


              • #8
                Nick,

                What you have suggested seems to work at first but actually it does not.
                Here is the example.

                input str8 prsnnr long birimno byte fertno str5 age byte age1 float age2
                "1001011" 100101 1 "60-64" 60 62
                "1001012" 100101 2 "55-59" 55 57
                "1001013" 100101 3 "35-39" 35 37
                "1001014" 100101 4 "30-34" 30 32
                "1001015" 100101 5 "25-29" 25 27
                "1001031" 100103 1 "70-74" 70 72
                "1001032" 100103 2 "75-79" 75 77
                "1001041" 100104 1 "70-74" 70 72
                "1001042" 100104 2 "75-79" 75 77
                "1001061" 100106 1 "35-39" 35 37
                "1001062" 100106 2 "30-34" 30 32
                "1001063" 100106 3 "13" 13 13
                "1001064" 100106 4 "1" 1 1
                "1001071" 100107 1 "70-74" 70 72
                "1001072" 100107 2 "60-64" 60 62
                "1001073" 100107 3 "35-39" 35 37
                "1001101" 100110 1 "55-59" 55 57
                "1001102" 100110 2 "45-49" 45 47
                "1001103" 100110 3 "25-29" 25 27
                "1001104" 100110 4 "20-24" 20 22
                "1001105" 100110 5 "20-24" 20 22
                "1001106" 100110 6 "4" 4 4
                "1002011" 100201 1 "65-69" 65 67
                "1002021" 100202 1 "35-39" 35 37
                "1002022" 100202 2 "40-44" 40 42
                "1002023" 100202 3 "6" 6 6
                "1002024" 100202 4 "2" 2 2
                "1002031" 100203 1 "80+" 80 82
                "1002032" 100203 2 "35-39" 35 37
                "1002041" 100204 1 "30-34" 30 32
                "1002042" 100204 2 "30-34" 30 32
                "1002051" 100205 1 "35-39" 35 37
                "1002052" 100205 2 "35-39" 35 37
                "1002053" 100205 3 "4" 4 4
                "1002061" 100206 1 "60-64" 60 62
                "1002062" 100206 2 "55-59" 55 57
                "1002081" 100208 1 "75-79" 75 77
                "1002082" 100208 2 "55-59" 55 57
                "1002091" 100209 1 "55-59" 55 57
                "1002092" 100209 2 "60-64" 60 62
                "1002093" 100209 3 "25-29" 25 27
                "1002101" 100210 1 "50-54" 50 52
                "1003011" 100301 1 "55-59" 55 57
                "1003012" 100301 2 "55-59" 55 57
                "1003013" 100301 3 "80+" 80 82
                "1003014" 100301 4 "13" 13 13
                "1003021" 100302 1 "30-34" 30 32
                "1003022" 100302 2 "30-34" 30 32
                "1003031" 100303 1 "50-54" 50 52
                "1003032" 100303 2 "35-39" 35 37
                "1003051" 100305 1 "35-39" 35 37
                "1003052" 100305 2 "25-29" 25 27
                "1003061" 100306 1 "25-29" 25 27
                "1003062" 100306 2 "20-24" 20 22
                "1003071" 100307 1 "25-29" 25 27
                "1003072" 100307 2 "20-24" 20 22
                "1003073" 100307 3 "1" 1 1
                "1004011" 100401 1 "50-54" 50 52
                "1004012" 100401 2 "45-49" 45 47
                "1004013" 100401 3 "25-29" 25 27
                "1004014" 100401 4 "16" 16 16
                "1004021" 100402 1 "55-59" 55 57
                "1004022" 100402 2 "35-39" 35 37
                "1004031" 100403 1 "55-59" 55 57
                "1004032" 100403 2 "55-59" 55 57
                "1004041" 100404 1 "35-39" 35 37
                "1004042" 100404 2 "35-39" 35 37
                "1004043" 100404 3 "15" 15 15
                "1004044" 100404 4 "12" 12 12
                "1004051" 100405 1 "60-64" 60 62
                "1004052" 100405 2 "45-49" 45 47
                "1004071" 100407 1 "65-69" 65 67
                "1004072" 100407 2 "60-64" 60 62
                "1004081" 100408 1 "35-39" 35 37
                "1004082" 100408 2 "30-34" 30 32
                "1004083" 100408 3 "8" 8 8
                "1004084" 100408 4 "3" 3 3





                When I use the command below:
                . gen agegroup=1 if age<18
                I still get 1 for everyone

                Comment


                • #9
                  Your names aren't all the same. So, don't expect that the code will necessarily be the same.

                  That code should not even work at all as you're telling us that age is a string variable. I think you should be working with age1

                  Comment

                  Working...
                  X