Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ISCO 88 fast recode from isco88-4 to isco88-3 and 2 with English labels

    Dear Statalist community!

    Since I profited so much from your knowledge I now want to give something back: During my work I constructed a file that will recode four-digit isco88 codes to three- and two-digit codes. And it will do that very, very fast. Even for huge panel datasets like the GSOEP. Subsequently English labels are constructed and applied to the recoded variables.

    How it does that? Instead of checking every isco88 code with recode I abuse the hierarchical structure of the isco88 scheme: I convert the isco88 codes to a string, delete the last digit and then convert it back to a number (real). So you can swiftly create isco88 three- and two-digit versions.

    BEWARE: Some datasets put the armed forces into the group 100 (e.g. the SOEPlong) instead into the 0 group as indicated by Elias and Birch (1994). Therefore check the specialities of your dataset before running this script. Small adjustments might be necessary.

    Credits: I just combined several pieces. I used Clyde Schechter's label creator from Statalist to construct the value labels from the list of the Warwick Institute for Employment Research and Iversen et.al.'s (2011) "traditional" recoding script (slow for large survey data) to verify the results of the script.


    References:

    Elias, P., & Birch, M. (1994). Establishment of Community-Wide Occupational Statistics:
    ISCO 88 (COM), A Guide for Users. Institute for Employment Research. University
    of Warwick.

    Torben Iversen; Thomas Cusack; Philipp Rehm, 2011,
    "Replication data for: Risks at Work: The Demand and Supply Sides of Government
    Redistribution", hdl:1902.1/16430, Harvard Dataverse, V1
    https://dataverse.harvard.edu/datase...l:1902.1/16430

    Attached Files
    Last edited by Daniel Paierl; 25 Sep 2017, 07:59.

  • #2
    Thanks for giving something back. If you want an even quicker approach code

    Code:
    generate isco8_3digits = int(isco88_4digits/10)
    and

    Code:
    generate isco8_2digits = int(isco88_4digits/100)
    instead of converting to string and back.

    Best
    Daniel

    Comment


    • #3
      Thanks for the hint! I compared the two approaches, ending up with the problem that uncvonverted missing value codes (e.g. -2, -3, -4 from the gsoep) are converted to 0 with int(isco88_4 / 10).
      Though this is not really a problem if you convert the NA's (which I strongly encourage), yet from the position of safety I favour the string approach.

      Comment


      • #4
        Originally posted by Daniel Paierl View Post
        I compared the two approaches, ending up with the problem that uncvonverted missing value codes (e.g. -2, -3, -4 from the gsoep) are converted to 0 with int(isco88_4 / 10).
        Well, if you are worried about this, then

        Code:
        generate isco8_3digits = cond(isco88_4digits > 0, int(isco88_4digits/10), isco88_4digits)
        will do and seems still to be about 7 times faster than the string approach

        Best
        Daniel

        Comment


        • #5
          Please send me an ISCO-88 file and a video or document to know how stata works.

          Comment

          Working...
          X