Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • rename variable by removing a "midfix"

    Hello,

    I am trying to remove a midfix from a variable name. I have combined three data sets that represent 3 different years. Some, though not all of the variables, have name like s07rate, s12rate, and s16rate. These variables all measure the same concept in each data set;. I am pooling the cross sections, but of course need the variable names to be similar across years. I want to keep the prefix "s" as well as the word "rate", and only drop the numbers to create a variable named "srate". I have tried the following syntax:

    rename s## s*

    as well as:

    rename s(##) s*

    My rationale is that the ## indicates that the two numbers (07,12,16) from the old variable name are to be dropped and replaced with whatever comes after, indicated by the wildcard * after the s.

    Each time I get the following error message:

    ## not allowed in oldname
    Syntax is rename oldname newname
    # in oldname means 1 or more digits go here. # is greedy, e.g., a# matches a1 and it matches a923. ## would have no meaning because the second # would always
    match 0 characters and thus be an error.

    If you want to match one-digit numbers, code (#); code (##) for two-digit numbers; and so on.
    r(198);

    The Stata manual has an extensive discussion of rename groups of variables, but there is only one line dedicated to midfixes (this is the language used in the manual). Specially, under renaming a group of variables the example says:

    8. rename *jan* **: Removes prefix, midfix, and suffix jan, for example, janstat to stat, injanstat to instat, and subjan to sub.

    This is a rather simple solution but I don't seem to have any luck with the syntax that seems most reasonable. I would appreciate any advice.

  • #2
    You can't use one new name for three existing variables. But the good news is that the problem doesn't appear to be about renaming at all.

    If s07rate s12rate s16rate come from different datasets then at most one should be non-missing in any observation, so you need (e.g.)

    Code:
    gen srate = max(s07rate, s12rate, s16rate)
    to combine them and if that works you can drop the ingredients. A careful check would be to count non-missings across those three e.g. by the appropriate egen function. That result should be 0 or 1.

    Comment


    • #3
      Wow! Thank you so much! Never would have thought to use the generate function for that.

      Comment

      Working...
      X