Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • replacing non-numeric values for numeric values in dataset

    I have a dataset that contains the educational level(educ) for individuals that are in a certain industry(indnaics) for the years 2000-2014.
    The educ variable contains non-numeric values see below:

    Code:
     
    year educ indnaics
    2000 5+ years 6214
    2000 4 years 6111
    2000 5+ years 3399ZM
    2000 2 years 622
    2000 Grade 11 446Z
    2000 1 year o 611M1
    2000 5+ years 611M1
    2000 Grade 12 493
    2000 Grade 12 4451
    2000 5+ years 6211
    2000 4 years 712
    2000 Grade 12 6111
    2000 Grade 12 722Z
    2000 Grade 11 6231
    2000 Grade 12 813M
    2000 Grade 12 5133Z
    2000 1 year o 6231
    2000 1 year o 334M2
    2000 Grade 10 334M2
    2000 4 years 5416
    2000 5+ years 611M1
    2000 4 years 611M1
    2000 Grade 9 6241
    2000 4 years 722Z
    2000 Grade 12 23
    2000 Grade 11 3MS
    2000 4 years 713Z
    2000 Grade 12 6212
    etc.
    I would like to replace these non-numeric values(educ) for numeric values(eduyears) according to the following scheme:
    Code:
     
    Educational attainment eduYears
    N/A or no schooling 0
    Nursery school to grade 4 4
    Grade 5, 6, 7, or 8 8
    Grade 9 9
    Grade 10 10
    Grade 11 11
    Grade 12 12
    1 year of college 13
    2 years of college 14
    3 years of college 15
    4 years of college 16
    5+ years of college 17
    Is there any efficient way how I could replace this?

    I have looked at using the replace command or using loops, but I can not seem to understand how I should do it. It could also be that I am looking in the wrong direction.
    Is there anyone who could help me out?

    Kind regards,
    Tom

  • #2
    Code:
    help encode
    Specifically, define a set of value labels first and then invoke them in an encode statement.

    Comment


    • #3
      Thank you very much for your help.
      However, I do not seem to come to the right outcome. In my previous question I have been stating wrong that educ was a non-numerical value as i just noticed that it is being considered as a numerical value(It is colored blue in data editor).

      I tried to use the recode command to replace the values, but it did not work.
      I am not sure what I did wrong.

      These are the steps I undertook:
      Code:
      generate EduYears = .
      (21992180 missing values generated)
      
      recode EduYears = 9 if educ=="Grade 9"
      I then got the following error

      Code:
      [P]     error . . . . . . . . . . . . . . . . . . . . . . . .  Return code 109
              type mismatch;
              In an expression, you attempted to combine a string and numeric
              subexpression in a logically impossible way.  For instance, you
              attempted to subtract a string from a number or you attempted
              to take the substring of a number.
      I know that i'm doing something seriously wrong here, but i cannot figure out what.
      Could you help me?

      Kind regards,
      Tom



      Comment


      • #4
        The first problem is with the -if educ == "Grade 9"- clause. You state that educ is in fact a numeric variable. So it may have a value label "Grade 9", but "Grade 9" is not a possible value for the underlying numeric variable. So you need to identify the numeric value that is labeled "Grade 9". That numeric value is what Stata actually stores in active memory and is what it will try to compare educ to in each observation. To do this, you first need to know the name of the value label associated with educ. That will show up in the fourth column of Stata's output if you run -des educ-. Then you can -label list- whatever that label is, and see what the numeric value for "Grade 9" is.

        (If you want to get fancier, you don't have to actually run -label list- and find the value; you can also change the -if- clause to -if educ == "Grade 9":lblname- where you replace lblname by the actual name of the label. And yes, you can even side-step that with a macro extended function, but I think you ought to stick with the basics here until you are more comfortable with Stata coding.)

        Also, you want -replace-, not -recode- here. They are related but distinct operations, and here -replace- is more congenial.

        Comment


        • #5
          Thank you very much, this has helped me a lot!
          Kind regards,
          Tom

          Comment

          Working...
          X