Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Manipulating string -- Delete everything after space

    Code:
    +--------------------------------------------------------+
      |        sex           age                     education |
      |--------------------------------------------------------|
      | Male (Sex)     25+ (Age)   Advanced (Aggregate levels) |
      | Male (Sex)     25+ (Age)   Advanced (Aggregate levels) |
      | Male (Sex)     25+ (Age)   Advanced (Aggregate levels) |
      | Male (Sex)     15+ (Age)   Advanced (Aggregate levels) |
      | Male (Sex)   15-64 (Age)   Advanced (Aggregate levels) |
      +--------------------------------------------------------+
    I would like to delete everything after space . For example, in the first observation sex becomes Male; age becomes 25+; education becomes Advanced.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str244 sex str11 age str31 education
    "Male (Sex)"   "25+ (Age)"   "Advanced (Aggregate levels)"    
    "Male (Sex)"   "25+ (Age)"   "Advanced (Aggregate levels)"    
    "Male (Sex)"   "25+ (Age)"   "Advanced (Aggregate levels)"    
    "Male (Sex)"   "15+ (Age)"   "Advanced (Aggregate levels)"    
    "Male (Sex)"   "15-64 (Age)" "Advanced (Aggregate levels)"    
    "Male (Sex)"   "25+ (Age)"   "Advanced (Aggregate levels)"    
    "Male (Sex)"   "15-64 (Age)" "Advanced (Aggregate levels)"    
    "Male (Sex)"   "15+ (Age)"   "Advanced (Aggregate levels)"    
    "Male (Sex)"   "25+ (Age)"   "Advanced (Aggregate levels)"    
    "Male (Sex)"   "25+ (Age)"   "Advanced (Aggregate levels)"    
    "Male (Sex)"   "25+ (Age)"   "Intermediate (Aggregate levels)"
    "Female (Sex)" "25+ (Age)"   "Advanced (Aggregate levels)"    
    "Male (Sex)"   "25+ (Age)"   "Advanced (Aggregate levels)"    
    "Male (Sex)"   "15+ (Age)"   "Advanced (Aggregate levels)"    
    "Male (Sex)"   "15-64 (Age)" "Advanced (Aggregate levels)"    
    end

  • #2
    The code below puts the first word in each variable into a new variable. If you don't want to have new variables, just use "replace" instead of "gen"
    Code:
    foreach var of sex age education{
    gen `var'_reduced = word(`var',1)
    }

    Comment


    • #3
      Sven-Kristjan Bormann Thanks for the code. Unfortunately, it yields invalid syntax. Please check.

      Comment


      • #4
        You are right. The correct code is
        Code:
        foreach var in sex age education{
        gen `var'_reduced = word(`var',1)
        }

        Comment


        • #5
          Many thanks Sven-Kristjan Bormann ! The code works now. I opted for the replace option. Just for reference, posting the code
          Code:
          foreach var in sex age education{
          replace `var' = word(`var',1)
          }

          Comment


          • #6
            Budu Gulo ,

            This looks like the data producing system has combined the value labels with variable labels into the values, as Male/Female are value labels for the categorical variable Sex. If this is indeed the case, perhaps you could re-export the data from the original system (what was it?) again with different settings that will require no further cleanup.

            Provided that this hypothesis is true, I see no reason for those labels to be a single word only (although it may have been the case for the three listed variables, it may not be true in other cases or in the future). If you care, perhaps a more robust strategy would be keeping everything before " (" rather than deleting everything after space.

            Code:
            foreach var in sex age education {
                 replace `var' = substr(`var',1,strpos(`var'," (")-1)
            }
            In any case, recording categorical attributes (like sex or education) in categorical (numerical labelled) variables will make your work more productive and enjoyable, especially if you are working with large data files.

            Best, Sergiy Radyakin
            Last edited by Sergiy Radyakin; 12 Apr 2019, 12:35. Reason: added formatting

            Comment


            • #7
              Sergiy Radyakin : Sorry for the late reply. The data (unemployment rates) is from ILO ; downloaded in Excel format. Regarding the strategy, I find your suggestion on using "(" very helpful. It is indeed robust. Thanks a lot!

              Comment

              Working...
              X