Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Advanced methods to rename variables

    Dear Statalist User,

    since I have imported my data from excel to Stata, all variables got named A, B, C (and so on) after the usual excel convention while the "real" (as in meaningful) variable names appear in the first observation. I have found a useful addon (renvars) as well as some code originally written by Nick Cox in another thread:

    renvars , map(word(@[1], 1))

    that allows me to use the first word in the first observation as a new variable name.

    However, I have two problems.

    First, in my excel data some column headers contain characters that are not allowed as variable names in Stata (such as ./-). So if I run the code above, I get the error message "/ invalid name". I found the addon cleanchars that removes all such characters in the data set. However, I would like them to be removed only in the first column. How could I do this?

    Secondly, some of the names do start with the same name. For example, the two variables "firm name" and "firm number of employees", so if I run the code as above, I will get the error message "new variable name entered more than once". How can I adress the issue? Is there an option renvars or any other command/addon that allows me to use all words as variable name, so each variable is uniquely identified?

    Ideally, I am looking for an option that solves both problems by running the command "by force", i.e. deleting characters when they are not possible as names and adding identifiers (eg. numbers: firm1, firm2, ...) if variable names occur more than once.

    Many thanks,
    Milan Quentel

    Last edited by Milan Quentel; 01 Nov 2016, 02:32. Reason: Added tags

  • #2
    Try

    Code:
    renvars , map(strtoname(word(@[1], 1)))
    Best
    Daniel

    Comment


    • #3
      Thanks for mentioning renvars but it was jointly written with Jeroen Weesie and should be cited as published in the Stata Journal (FAQ Advice #12).

      Best solution: I would read the data in again with import excel, firstrow which has the functionality for this problem. Importing metadata as data is best avoided at source. This is a case of Don't do that then! http://www.catb.org/jargon/html/D/Do...hat-then-.html

      Otherwise (second best solution at best):

      Code:
      renvars , map(strtoname(word(@[1], 1)))
      would solve part of your problem but it can do nothing about duplicate names. The problem of duplicate names obtained in this way is undoubtedly programmable, but Stata has done it already in import excel and it's likely to be a waste of anybody's time to try to do it for themselves.

      Comment


      • #4
        Hi Daniel, Hi Nick, and everyone else who is reading along,

        thanks for your answers. Your suggestion, Daniel, is a good way to adress the first part of the problem. This being said, I did not now about the import excel, firstrow option and it is a straightforward way to deal with both problems at the same time. So, thank you Nick.

        [As for correct citations: Sorry I missed that advice. Will cite correctly in my next post.]

        Comment

        Working...
        X