Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extract numbers (with varying lengths) from variable label

    Hi friends,

    I now face the following problem:

    Code:
    webuse auto,clear
    lab var make "STATA#1, 90"
    lab var price "STATA#11, 90"
    lab var mpg "STATA#22, 91"
    lab var rep78 "STATA#2, 91"
    
    lookfor make price mpg rep78
    local varlist= r(varlist)
    local counter=0
    foreach var in `varlist' {
        local varlab: var lab `var'  //extract the whole variable label string
        //di "`varlab'"
        local varlaby = real(reverse(substr(reverse("`varlab'"),1,2))) + 1900  //extract the last two digits in the label and add it by 1900
        di "`varlaby'"
    
        local varlabm = real(substr("`varlab'"),7,2)   //extract the number after #.
        //This command fails in the cases of make and rep78 and succeeds for price and mpg.
        //I am looking for a general solution to this problem
        di "`varlabm'"
        }
    So my job is to extract the numbers in each variable's label, including both the last two digits (90 and 91 in the example) and the first number after # (1, 11, 22, 2 in the example). I know how to do the former, but can't figure out how to extract the number after #, because they vary in length and are always surrounded by nonnumeric symbols.

    Thank you!
    Last edited by shem shen; 19 Dec 2020, 14:37.

  • #2
    I think this demonstrates technique that does what you want.
    Code:
    webuse auto,clear
    lab var make "STATA#1, 90"
    lab var price "STATA#11, 90"
    lab var mpg "STATA#22, 91"
    lab var rep78 "STATA#2, 91"
    
    lookfor #
    local varlist `r(varlist)'
    
    foreach var in `varlist' {
        local varlab: var lab `var'  //extract the whole variable label string
        local vl = ustrregexrf("`varlab'", "^.*#" ,"")
        local vl = ustrregexrf("`vl'", ",", " ")
        macro list _var _varlab _vl
        local varlabm : word 1 of `vl'
        local varlaby : word 2 of `vl'
        local varlaby 19`varlaby'
        macro list _varlabm _varlaby
    }
    Code:
    . webuse auto,clear
    (1978 Automobile Data)
    
    . lab var make "STATA#1, 90"
    
    . lab var price "STATA#11, 90"
    
    . lab var mpg "STATA#22, 91"
    
    . lab var rep78 "STATA#2, 91"
    
    . 
    . lookfor #
    
                  storage   display    value
    variable name   type    format     label      variable label
    -------------------------------------------------------------------------------------------------
    make            str18   %-18s                 STATA#1, 90
    price           int     %8.0gc                STATA#11, 90
    mpg             int     %8.0g                 STATA#22, 91
    rep78           int     %8.0g                 STATA#2, 91
    
    . local varlist `r(varlist)'
    
    . 
    . foreach var in `varlist' {
      2.     local varlab: var lab `var'  //extract the whole variable label string
      3.     local vl = ustrregexrf("`varlab'", "^.*#" ,"")
      4.     local vl = ustrregexrf("`vl'", ",", " ")
      5.     macro list _var _varlab _vl
      6.     local varlabm : word 1 of `vl'
      7.     local varlaby : word 2 of `vl'
      8.     local varlaby 19`varlaby'
      9.     macro list _varlabm _varlaby
     10. }
    _var:           make
    _varlab:        STATA#1, 90
    _vl:            1 90
    _varlabm:       1
    _varlaby:       1990
    _var:           price
    _varlab:        STATA#11, 90
    _vl:            11 90
    _varlabm:       11
    _varlaby:       1990
    _var:           mpg
    _varlab:        STATA#22, 91
    _vl:            22 91
    _varlabm:       22
    _varlaby:       1991
    _var:           rep78
    _varlab:        STATA#2, 91
    _vl:            2 91
    _varlabm:       2
    _varlaby:       1991
    
    .

    Comment


    • #3
      Here's a version that avoids using regular expression functions, the results are identical to those shown in post #2. Sorry, I've been answering regular expression questions recently and I was still on autopilot.
      Code:
      webuse auto,clear
      lab var make "STATA#1, 90"
      lab var price "STATA#11, 90"
      lab var mpg "STATA#22, 91"
      lab var rep78 "STATA#2, 91"
      
      lookfor #
      local varlist `r(varlist)'
      
      foreach var in `varlist' {
          local varlab: var lab `var'  //extract the whole variable label string
          local hloc = strpos("`varlab'","#")
          local vl = substr("`varlab'",`hloc'+1,.)
          local vl = subinstr("`vl'", ",", " ", .)
          macro list _var _varlab _vl
          local varlabm : word 1 of `vl'
          local varlaby : word 2 of `vl'
          local varlaby 19`varlaby'
          macro list _varlabm _varlaby
      }

      Comment


      • #4
        Adding two regex variants extracting both elements:
        Code:
        qui lookfor #
        
        foreach v of varlist `r(varlist)' {
            
            local d1 = word(ustrregexra("`:var lab `v''", "[^\d\s]",""), 1)  
            local d2 = word(ustrregexra("`:var lab `v''", "[^\d\s]",""), 2)
            
            di "`d1' `d2'"
        }
        Code:
        qui lookfor #
        
        foreach v of varlist `r(varlist)' {
            
            if ustrregexm("`:var lab `v''", "#(\d+),\s(\d+)" ) {
            
                local d1 = ustrregexs(1) 
                local d2 = ustrregexs(2)
            }
            
             di "`d1' `d2'"
        }

        Comment


        • #5
          Thank you William! Your code works for me!

          Also thank you Bjarte! May I ask what "#(\d+),\s(\d+)" means? I guess #(\d+) means to search for numbers after #. But I can't figure out what \s means. I googled "\s" and got no relevant results. Similarly, could you also tell me what [^\d\s] means? Google result says it is dominance and submission.

          Thank you!

          Comment


          • #6
            \s Match a white space character. White space is defined as [\t\n\f\r\p{Z}].

            The ICU regular expressions are documented at https://unicode-org.github.io/icu/us...gs/regexp.html

            The regular expressions can be explained using https://regex101.com/

            Comment


            • #7
              Originally posted by Bjarte Aagnes View Post
              \s Match a white space character. White space is defined as [\t\n\f\r\p{Z}].

              The ICU regular expressions are documented at https://unicode-org.github.io/icu/us...gs/regexp.html

              The regular expressions can be explained using https://regex101.com/
              Thank you so much! Have a good Sunday!

              Comment

              Working...
              X