Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Variable label cleaning

    Hi Everyone,

    I have a small question regarding any method there might be to help clean up variable labels with the following text:

    Code:
    'ABCDEFG`"'
    From this I would like to remove all apostrophe, back quote, and apostrophe.

    I tried using subinstr by saving the variable label in a local but unforntuately that didnt work for me, unless I didnt use the right command.

  • #2
    Originally posted by Fahad Mirza View Post
    I tried using subinstr by saving the variable label in a local but unforntuately that didnt work for me . . .
    It might be just as easy to use Mata for this. See below, beginning at the "Begin here" comment; the stuff above is just to create the example for illustration.
    Code:
    version 19
    
    clear *
    
    set obs 1
    generate byte var = 0
    
    mata {
        bad = char(39) + "ABCDEFG" + char(96) + char(34) + char(39)
        st_varlabel("var", bad)
    } 
    
    describe
    
    *
    * Begin here
    *
    mata {
        gud = subinstr(subinstr(subinstr(st_varlabel("var"), char(39), ""), 
            char(96), ""), char(34), "")
        st_varlabel("var", gud)
    }
    
    describe
    
    exit
    You can put the code into one line if desired, for example,
    Code:
    mata: st_varlabel("var", subinstr(subinstr(subinstr(st_varlabel("var"), char(39), ""), char(96), ""), char(34), ""))
    at the risk of being even less readable.

    You can also put it into a Mata function that takes a list of variable names as an argument and have it loop over those variables.

    Comment


    • #3
      You can use regular expressions to avoid adding an extra -subinstr()- function for every additional character you need to remove. Additionally, if you're interested in retaining only letters, numbers, and spaces, there's a direct way to do that as well. Taking Joseph's example:

      Code:
      
      clear *
      
      set obs 1
      generate byte var = 0
      
      mata {
          bad = char(39) + "ABCDEFG" + char(96) + char(34) + char(39)
          st_varlabel("var", bad)
      } 
      
      describe
      
      
      foreach var of varlist *{
          mata: st_varlabel("`var'", ustrregexra(st_varlabel("`var'"), `"[\' \" \`]"', ""))
      }
      
      describe
      For retaining letters, numbers and spaces only, you want:

      Code:
      foreach var of varlist *{
          mata: st_varlabel("`var'", ustrregexra(st_varlabel("`var'"), `"[^a-zA-Z0-9 ]"', ""))
      }
      Res.:

      Code:
      . describe
      
      Contains data
       Observations:             1                  
          Variables:             1                  
      -------------------------------------------------------------------------------------------------------------------------------------------
      Variable      Storage   Display    Value
          name         type    format    label      Variable label
      -------------------------------------------------------------------------------------------------------------------------------------------
      var             byte    %8.0g                 'ABCDEFG`"'
      -------------------------------------------------------------------------------------------------------------------------------------------
      Sorted by: 
           Note: Dataset has changed since last saved.
      
      . 
      . 
      . 
      . 
      . 
      . foreach var of varlist *{
        2. 
      .     mata: st_varlabel("`var'", ustrregexra(st_varlabel("`var'"), `"[^a-zA-Z0-9 ]"', ""))
        3. 
      . }
      
      . 
      . 
      . 
      . describe
      
      Contains data
       Observations:             1                  
          Variables:             1                  
      -------------------------------------------------------------------------------------------------------------------------------------------
      Variable      Storage   Display    Value
          name         type    format    label      Variable label
      -------------------------------------------------------------------------------------------------------------------------------------------
      var             byte    %8.0g                 ABCDEFG
      -------------------------------------------------------------------------------------------------------------------------------------------
      Sorted by: 
           Note: Dataset has changed since last saved.
      
      .

      Comment


      • #4
        I'll make a small edit to the first suggestion in #3 above. As it stands, Andrew Musau 's code will remove spaces, in addition to the characters you specified. This modification addresses that problem:

        Code:
        clear *
        
        set obs 1
        generate byte var = 0
        
        * notice the introduction of a space in the label below, which will illustrate the problem if you use the original code
        mata {
            bad = char(39) + "ABC DEFG" + char(96) + char(34) + char(39)
            st_varlabel("var", bad)
        }
        
        describe
        
        * notice I remove the spaces in the regular expression below
        foreach var of varlist * {
            mata: st_varlabel("`var'", ustrregexra(st_varlabel("`var'"), `"[\'\"\`]"', ""))
        }
        
        describe
        Last edited by Hemanshu Kumar; 08 Jul 2025, 10:34.

        Comment


        • #5
          Yes, thanks for the correction. The spaces here are interpreted literally.

          Comment


          • #6
            Thanks everyone for the useful tips! These worked as intended

            Comment

            Working...
            X