Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Separating numeric and character parts of a string variable

    Hello all,

    I have a variable, s4mainocc, which is constructed in the following way (I provide a number of examples below). The s4mainocc variable is a string variable.

    Code:
    2412. Personnel and careers professionals
    3433. Bookeepers
    4115. Secretaries
    Now I'd like to separate the occupational code (the first four digits - note ALL occupations are four digits) from the label into two separate variables. This example https://www.statalist.org/forums/forum/general-stata-discussion/general/1612510-split-a-string-variable-into-character-and-numeric-parts seemed to be exactly what I was looking for, so I tried the following:

    Code:
    gen numeric = substr(s4mainocc, 4, 4)
    however I received the following error code:

    Code:
    type mismatch
    r(109);
    Last edited by Chris Rooney; 18 Jan 2023, 06:02.

  • #2

    numeric is a reserved word.
    Code:
    help [M-2] reswords
    The error message is confusing, but I believe that to be the error if s4mainocc is a string variable. You need a different name.

    Comment


    • #3
      Firstly, please use -dataex- to provide data example, as there seems to be some storage type problem of your targeted vairable.
      Code:
      dataex s4mainocc
      Secondly, if you want to extract the four digits of occupation code and the occupation details, you should modify you command like this:
      Code:
      gen numeric=substr(s4mainocc, 1, 4)
      gen occupation=substr(s4mainocc, 6, .)
      crossed: I can generate variable numeric in Stata 16 (why?)
      Last edited by Chen Samulsion; 18 Jan 2023, 06:15.

      Comment


      • #4
        Hi Chen Samulsion

        Here is the dataex example - looks a bit weird to me. I should probably mention I used "numlabel, add" before doing this. Ah, I was looking at the variable above by mistake. I see this is an integer variable! Sorry, my mistake!

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input int s4mainocc
        9211
        7122
        9131
        9211
        8334
        9322
        9211
        9141
           .
        5133
        1314
           .
           .
        9132
        8290
        8212
        7136
        9131
        5220
           .
           .
           .
        1315
        9322
           .
        9211
        9211
        3433
        9211
        4211
           .
        4131
        7136
           .
           .
        9131
        1311
           .
        1311
           .
        6112
           .
        9211
        9211
        9211
        9211
        1227
           .
           .
        7137
        7213
           .
        7245
        3152
           .
           .
        4211
        1314
        1314
           .
           .
           .
           .
        9322
        9322
           .
        5169
           .
           .
        9322
           .
           .
        9333
           .
           .
        7122
           .
           .
        9322
        8152
           .
           .
           .
        9322
        9322
        9322
           .
        9131
        9211
           .
           .
        9322
        9322
           .
        9322
           .
        9322
           .
           .
           .
        end
        label values s4mainocc Q42OCCUPATION
        label def Q42OCCUPATION 1227 "1227. Production and operations managers/department managers in business services", modify
        label def Q42OCCUPATION 1311 "1311. General managers in agriculture, hunting, forestry and fishing", modify
        label def Q42OCCUPATION 1314 "1314. General managers in wholesale and retail trade", modify
        label def Q42OCCUPATION 1315 "1315. General managers of hotels, restaurants and other catering or accommodation services", modify
        label def Q42OCCUPATION 3152 "3152. Safety, health and quality inspectors, Inspectors, safety and health", modify
        label def Q42OCCUPATION 3433 "3433. Bookkeepers", modify
        label def Q42OCCUPATION 4131 "4131. Stock clerks", modify
        label def Q42OCCUPATION 4211 "4211. Cashiers and ticket clerks", modify
        label def Q42OCCUPATION 5133 "5133. Home-based personal care workers", modify
        label def Q42OCCUPATION 5169 "5169. Protective services workers not elsewhere classified, Rangers and game wardens", modify
        label def Q42OCCUPATION 5220 "5220. Shop salespersons and demonstrators, Salespersons, Petrol pump and filling station attendants", modify
        label def Q42OCCUPATION 6112 "6112. Tree and shrub crop growers (farm owners and skilled farm workers)", modify
        label def Q42OCCUPATION 7122 "7122. Bricklayers and stonemasons (including apprentices/trainees)", modify
        label def Q42OCCUPATION 7136 "7136. Plumbers and pipe fitters (including apprentices/trainees)", modify
        label def Q42OCCUPATION 7137 "7137. Building and related electricians (including apprentices/trainees)", modify
        label def Q42OCCUPATION 7213 "7213. Sheet-metal workers (including apprentices/trainees)", modify
        label def Q42OCCUPATION 7245 "7245. Electrical line installers, repairers and cable jointers (including apprentices/trainees)", modify
        label def Q42OCCUPATION 8152 "8152. Chemical heat-treating plant operators", modify
        label def Q42OCCUPATION 8212 "8212. Cement and other mineral products machine operators", modify
        label def Q42OCCUPATION 8290 "8290. Other machine operators and assemblers not elsewhere classified", modify
        label def Q42OCCUPATION 8334 "8334. Lifting-truck operators", modify
        label def Q42OCCUPATION 9131 "9131. Domestic helpers and cleaners", modify
        label def Q42OCCUPATION 9132 "9132. Helpers and cleaners in offices, hotels and other establishments", modify
        label def Q42OCCUPATION 9141 "9141. Building caretakers", modify
        label def Q42OCCUPATION 9211 "9211. Farmhands and labourers", modify
        label def Q42OCCUPATION 9322 "9322. Hand-packers and other manufacturing labourers", modify
        label def Q42OCCUPATION 9333 "9333. Freight handlers", modify

        Comment


        • #5
          Aha, there it is. The variable s4mainocc is an integer, you cannot manipulate it using -substr()- which suits to string variable. And if you still want to get what you want in #1, that is to say, generate two variables, one stores the occupation code, and one stores the occupation titles or details, I recommend the following codes:
          Code:
          decode s4mainocc, gen(occupation)
          generate occ_code=substr(occupation, 1, 4)
          generate occ_titles=substr(occupation, 6, .)
          Code:
          help decode
          Last edited by Chen Samulsion; 18 Jan 2023, 06:36. Reason: decode creates a new string variable named newvar based on the "encoded" numeric variable varname and its value label

          Comment

          Working...
          X