Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trimming labels of categorical variables

    Hi,

    I want to remove the () with its contents from all variable labels in the dataset. Here is an example:

    educational level
    - no education (below primary)
    - has education

    to

    educational level
    - no education
    - has education

    Please let me know if this is possible to do.

    Thanks in advance!


  • #2
    Regex will help. See below. Please do consider using -dataex- to post your raw data. Your example above seemed easy enough, but I had to spend two minutes copy/pasting your data into Stata and finagling it to make sure my code works.

    Code:
    ds, has(type string)
    foreach v in `r(varlist)'{
        gen _`v' = ustrregexra(`v', "\(.*\)", "")
    }

    Comment


    • #3
      Code:
      . // example dataset
      . clear
      
      . input ed
      
                  ed
        1. 1
        2. 0
        3. end
      
      . label define ed_lb 0 "no education (below primary)" ///
      >                    1 "has education"
      
      . label values ed ed_lb
      
      .
      . // what to do
      . local labname : value label ed
      
      . levelsof ed
      0 1
      
      . foreach lev in `r(levels)' {
        2.         local lab : label `labname' `lev'
        3.         gettoken lab rest : lab, parse("(")
        4.         label define `labname' `lev' `"`lab'"', modify
        5. }
      
      .
      . codebook ed
      
      -----------------------------------------------------------------------------------------------------------------------
      ed                                                                                                          (unlabeled)
      -----------------------------------------------------------------------------------------------------------------------
      
                        Type: Numeric (float)
                       Label: ed_lb
      
                       Range: [0,1]                         Units: 1
               Unique values: 2                         Missing .: 0/2
      
                  Tabulation: Freq.   Numeric  Label
                                  1         0  no education
                                  1         1  has education
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        My bad. Did not read the title properly in the OP. Glad to see Maarten Buis's excellent solution.

        Comment


        • #5
          Originally posted by Girish Venkataraman View Post
          Regex will help. See below. Please do consider using -dataex- to post your raw data. Your example above seemed easy enough, but I had to spend two minutes copy/pasting your data into Stata and finagling it to make sure my code works.

          Code:
          ds, has(type string)
          foreach v in `r(varlist)'{
          gen _`v' = ustrregexra(`v', "\(.*\)", "")
          }
          Thank you Girish!

          Comment


          • #6
            Originally posted by Maarten Buis View Post
            Code:
            . // example dataset
            . clear
            
            . input ed
            
            ed
            1. 1
            2. 0
            3. end
            
            . label define ed_lb 0 "no education (below primary)" ///
            > 1 "has education"
            
            . label values ed ed_lb
            
            .
            . // what to do
            . local labname : value label ed
            
            . levelsof ed
            0 1
            
            . foreach lev in `r(levels)' {
            2. local lab : label `labname' `lev'
            3. gettoken lab rest : lab, parse("(")
            4. label define `labname' `lev' `"`lab'"', modify
            5. }
            
            .
            . codebook ed
            
            -----------------------------------------------------------------------------------------------------------------------
            ed (unlabeled)
            -----------------------------------------------------------------------------------------------------------------------
            
            Type: Numeric (float)
            Label: ed_lb
            
            Range: [0,1] Units: 1
            Unique values: 2 Missing .: 0/2
            
            Tabulation: Freq. Numeric Label
            1 0 no education
            1 1 has education

            Marvellous! Thanks so much, Maarten.

            One more thing please, is it possible to apply this for the entire dataset i.e. without specifying a variable.

            Comment

            Working...
            X