Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Variable labelling, help with regular expression

    How can I change variable labels like census2011_age16to17 census2011_age85to89 to something like Total Population, Age 16 - 17 (Census '11). In practice I want to extract the figures and the add them to the new variable label with some text around it.
    Kind regards,
    Konrad
    Version: Stata/IC 13.1

  • #2
    Code:
    foreach v of var <whatever> {
        local label : var label `v'
        local label : subinstr local label "census_2011_age" "", all
        local label : subinstr local label "to" " ", all
        tokenize "`label'"
        label var `v'  "Total Population, Age `1' - `2' (Census '11)"  
    }
    You could, naturally, do it with regular expressions, in which case you would extract the numbers such as 2011, 16 and 17. I am more fluent with the kind of stuff above, and others more experienced with regular expressions can indicate the details. Each kind of solution will be quicker in programmer time to people more familiar with it.

    Comment


    • #3
      Nick,

      Thank you very much for your help. As a matter of fact, I only have basic skills in regex and think I should develop them further. There some interactive tutorials available on the Internet so I may do some of that.
      Kind regards,
      Konrad
      Version: Stata/IC 13.1

      Comment


      • #4
        I don't know why but it doesn't work
        Code:
        . * Make backup snapshot.
        . snapshot save, label("Befor making new variable labels")
        snapshot 20 (Befor making new variable labels) created at 27 Jun 2014 15:53
        
        .
        . * labels
        . foreach var of varlist  census2011_age* {
          2.         local label : var label `v'
          3.         local label : subinstr local label "census_2011_age" "", all
          4.         local label : subinstr local label "to" " ", all
          5.         tokenize "`label'"
          6.         label var `v' "Total Population, Age `1' - `2' (Census '11)"
          7.         }
        nothing found where name expected
        r(198);
        
        end of do-file
        
        r(198);
        Kind regards,
        Konrad
        Version: Stata/IC 13.1

        Comment


        • #5
          because you didn't define local v, the value of `v' remains empty. Trace should make this obvious. Best, Sergiy

          Comment


          • #6
            As Sergiy implies, you changed my code from

            Code:
            foreach v of var
            to

            Code:
            foreach var of var
            without changing inside the loop too.

            Comment


            • #7
              It is possible to use Stata's regular expression functions with string literals. For example, to extract the year from "Census 2011 data", you could type

              Code:
              . if regexm("Census 2011 data","([0-9]+)") local year = regexs(1)
              
              . dis "year = `year'"
              year = 2011
              Note that it is OK to use the single-line form of the if programming command to handle both the regexm() call and to use the matched subexpressions using regexs() calls but you must not use any form of macro expansion for the regexs() calls. The following is what will happen if you run it just after the preceding example

              Code:
              . if regexm("Census 1999 data","([0-9]+)") local s "the year is `=regexs(1)'"
              
              . dis "`s'"
              the year is 2011
              This happens because `=regexs(1)' is expanded before the regexm("Census 1999 data","([0-9]+)") is evaluated.

              I'm not sure that the exercise is worth the effort but here's how you could change the labels using regular expressions

              Code:
              clear
              gen cage1 = 0
              gen cage2 = 0
              gen cage3 = 0
              label var cage1 "census2011_age16to17"
              label var cage2 "census2001_age21to45"
              label var cage3 "census1991_age85to89"
              
              foreach v of varlist cage* {
                 local l : var label `v'
                 if regexm("`l'","census(19|20)([0-9][0-9])_age([0-9][0-9])to([0-9][0-9])") {
                   local yy = regexs(2)
                   local age1 = regexs(3)
                   local age2 = regexs(4)
                   local lfix = "Total Population, Age `age1' - `age2', (Census '`yy')"
                 }
                 label var `v' "`lfix'"
              }
              des
              
              * redo using the inline macro expansion of the regexs() function, just do
              * not use the single-line form of the if command
              label var cage1 "census2011_age16to17"
              label var cage2 "census2001_age21to45"
              label var cage3 "census1991_age85to89"
              foreach v of varlist cage* {
                 local l : var label `v'
                 if regexm("`l'","census(19|20)([0-9][0-9])_age([0-9][0-9])to([0-9][0-9])") {
                   local lfix "Total Population, Age `=regexs(3)' - `=regexs(4)', (Census '`=regexs(2)')"
                 }
                 label var `v' "`lfix'"
              }
              des
              
              
              * Do not use the single-line form of the if programming command with the
              * inline macro expansion of regexs() as the expansion will occur before 
              * the regexm() function is evaluated.
              if regexm("census2011_age16to17","census(19|20)([0-9][0-9])_age([0-9][0-9])to([0-9][0-9])") ///
                 local lfix = "Total Population, Age `=regexs(3)' - `=regexs(4)', (Census '`=regexs(2)')"
              dis "`lfix'"

              Comment


              • #8
                Brilliant, thank you all for helpful replies.
                Kind regards,
                Konrad
                Version: Stata/IC 13.1

                Comment

                Working...
                X