Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • variables with invalid names

    I have some Stata datasets which were written by R; R was saving arrays, and so it gave many of the variables names that include the array index, eg

    Code:
    . ds
    beta[1,1]  beta[3,1]  beta[5,1]  beta[1,2]  beta[3,2]  beta[5,2]  beta0[1]   beta0[3]   beta0[5]
    beta[2,1]  beta[4,1]  beta[6,1]  beta[2,2]  beta[4,2]  beta[6,2]  beta0[2]   beta0[4]   beta0[6]
    However, these names are not valid Stata names, so while I can -summarize- _all, I can't refer to any of them individually:

    Code:
    . rename beta[1,1] beta11
    syntax error
        Syntax is
            rename  oldname    newname   [, renumber[(#)] addnumber[(#)] sort ...]
            rename (oldnames) (newnames) [, renumber[(#)] addnumber[(#)] sort ...]
            rename  oldnames              , {upper|lower|proper}
    r(198);
    Obviously, this is a problem R created, and if necessary I can go back for an updated dataset, but it would be a lot easier for me if I could just use these files I have. Is there a way to refer to a variable other than by name - eg, by number?

    Jeph



  • #2
    Code:
    mata : st_varrename(42, "foo")

    Edit: Here is a program with some error checking

    Code:
    program rename_varindex
        
        version 16.1
        
        if (`"`3'"' != "") error 198
        
        gettoken varindex 0 : 0 , quotes
        confirm integer number `varindex'
        
        syntax newvarname
        
        if ( !c(k) ) {
            display as err "no variables defined"
            exit 111
        }
        
        if ( !inrange(`varindex', 1, c(k)) ) {
            display as err "variable index out of range"
            exit 198
        }
        
        mata : st_varrename(`varindex', "`varlist'")
        
    end
    Last edited by daniel klein; 07 Dec 2022, 07:26.

    Comment


    • #3
      It is also a Stata bug if Stata allowed import of a dataset which by its own rules includes illegal variable names. Please send a reproducible example to Stata technical support.

      Comment


      • #4
        Similar problems have occurred before (e.g,. here and there)

        Comment


        • #5
          Thanks Daniel, I didn't think of trying -mata- commands.

          Nick - the files were written using the R package -foreign-, which writes Stata version 12 format files. Seems like the bug is in that package, not in Stata?

          cheers,
          Jeph

          Comment


          • #6
            daniel klein I searched the forum, but apparently my search terms were too generic.

            Comment


            • #7
              Jeph: As I understand it, you're telling us in #1 that ds shows variable names but you can't change them with rename. So an improper dataset got inside the door.

              What am I missing here?

              Comment


              • #8
                Let me try to restate Nick's point, and enlarge upon it, because I've been at the same point he is at in the past. Perhaps even posted similar thoughts here. In the case I seem to remember, the Stata dataset was created not by Stata but by a source of survey data which apparently created the Stata dataset version of the data without in fact using Stata.

                Nick points out that Stata accepted as a Stata dataset a file which contained as Stata variable names entries which do not meet the requirements to be Stata variable names.

                We are used to Stata rejecting inappropriate variable names when using the import command to convert a file into a Stata dataset.

                The problem here is that the use command is reading the file which ostensibly contains a Stata dataset.

                I think Stata Corp has walked down this path before with the _datasignature command.

                I think I concluded that checking variable names every time a file is read in a way that assumes the file contains a Stata dataset (use, append, merge using, ... ) would impose a substantial cost that wasn't worth the benefit.

                What might be helpful is a Stata command that validates an ostensible Stata dataset and "cleans up" whatever problems it finds (invalid variable names, invalid display formats, ...)

                Or a Stata "fixnames" command that replaces invalid variable names and stows the original name in the value label.

                Comment

                Working...
                X