Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • label define error

    Dear list,

    Happy Friday. So I downloaded a data set from the Bureau of Labor Statistics (in the U.S.) and was using the do file provided by them to label all the variables and their values in the data set. However, in their do file, they used a lot of syntax like this:
    Code:
    label define vlR0026100 1 "RESPONDENT IS IN LABOR FORCE GROUP "A""  2 "RESPONDENT IS IN LABOR FORCE GROUP "B""  3 "ALL OTHERS"
    which turns out to be invalid syntax on my end. The problem is the quotes within quotes. What I can do is to remove those quotes within quotes so that 1 "RESPONDENT IS IN LABOR FORCE GROUP "A"" becomes 1 "RESPONDENT IS IN LABOR FORCE GROUP A" and then everything works fine. However, since they provided this do file, I guess it must have worked on their side (at least when they created such do files using their older version of Stata). Also, there are too many such cases in the do file with various texts affected so it is inefficient for me to use find and replace to modify all the places manually. Is there a trick that I can apply, such as temporarily change my Stata version, which is 15, to a lower version?

    Thanks!

  • #2
    I don't think this was ever valid syntax, but it seems like these files were created by some program whose author was familiar with a different statistical package. The invalid syntax can essentially be solved by either removing all internal double quotes within the intended label, or wrapping the label in Stata double quotes (`" and "'), which here would amount adding the missing left and right ticks.

    I suppose this could be done with regular expressions with some effort (and lookahead groups), but that's beyond my skill. You would also need to account for lines that begin with -label define- to be specific to those lines.

    There may be better/easier solutions for mass editing using programming text editors, too.

    Comment


    • #3
      Originally posted by Leonardo Guizzetti View Post
      I don't think this was ever valid syntax, but it seems like these files were created by some program whose author was familiar with a different statistical package. The invalid syntax can essentially be solved by either removing all internal double quotes within the intended label, or wrapping the label in Stata double quotes (`" and "'), which here would amount adding the missing left and right ticks.

      I suppose this could be done with regular expressions with some effort (and lookahead groups), but that's beyond my skill. You would also need to account for lines that begin with -label define- to be specific to those lines.

      There may be better/easier solutions for mass editing using programming text editors, too.
      Thank you anyway!

      Comment


      • #4
        Here is a quick and dirty program; minimal testing; virtually no error checking; might work with commands exactly as the one given in #1; no unquoted labels; no compound double quotes; no extended missing values; might choke on single quotes; ...

        Code:
        capture program drop label_define
        capture mata : mata drop label_define()
        
        program label_define
            version 14
            gettoken lblname 0 : 0
            mata : label_define("`lblname'", st_local("0"))
        end
        
        version 14
        
        mata :
        
        void label_define(string scalar lblname, string scalar zero)
        {
            real   scalar value
            string scalar label
            
            while ( ustrregexm(zero, "^[ ]*(-*\d+)([^-\d]+)") ) {
                value = strtoreal(ustrregexs(1))
                label = strtrim(ustrregexs(2))
                st_vlmodify(lblname, value, substr(label, 2, strlen(label)-2))
                zero = ustrregexra(zero, "^[ ]*(-*\d+)([^-\d]+)", "")
            }
        }
        
        end
        Copy the code into a do-file and define the program. In your do-file from the Bureau of Labor Statistics, replace all occurrences of

        Code:
        label define
        with

        Code:
        label_define
        Run that do-file.
        Last edited by daniel klein; 11 Dec 2020, 18:59.

        Comment


        • #5
          Originally posted by daniel klein View Post
          Here is a quick and dirty program; minimal testing; virtually no error checking; might work with commands exactly as the one given in #1; no unquoted labels; no compound double quotes; no extended missing values; might choke on single quotes; ...

          Code:
          capture program drop label_define
          capture mata : mata drop label_define()
          
          program label_define
          version 14
          gettoken lblname 0 : 0
          mata : label_define("`lblname'", st_local("0"))
          end
          
          version 14
          
          mata :
          
          void label_define(string scalar lblname, string scalar zero)
          {
          real scalar value
          string scalar label
          
          while ( ustrregexm(zero, "^[ ]*(-*\d+)([^-\d]+)") ) {
          value = strtoreal(ustrregexs(1))
          label = strtrim(ustrregexs(2))
          st_vlmodify(lblname, value, substr(label, 2, strlen(label)-2))
          zero = ustrregexra(zero, "^[ ]*(-*\d+)([^-\d]+)", "")
          }
          }
          
          end
          Copy the code into a do-file and define the program. In your do-file from the Bureau of Labor Statistics, replace all occurrences of

          Code:
          label define
          with

          Code:
          label_define
          Run that do-file.
          Your code works like a champ! Thank you so much for your time Daniel!!

          Comment

          Working...
          X