Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Long variables names

    Hi everybody,

    I'm performing a simple task of importing xls data into stata. But the long name in the raw data is really giving me a hard time.

    For illustration purpose, suppose I have three excel files CBA2001.xls, CBA2002.xls, CBA2003.xls with the same column name "Deal Value (Local Currency) Current". I'm using stata12.1. The code is simply

    forval i = 2001/2003 {
    import excel CBA`i', sheet("Sheet1") firstrow allstring clear
    save CBA`i'
    }

    Taking a look at the dta files, i find the variable in CBA2001 is named DealValueLocalCurrencyCurre, which is 27 characters long. Ideally I'd like it goes up to the 32 characters limit. I have many similar variables in each data set. It is more likely to distinguish among these similar variables with more characters included in the name .

    In CBA2002, the variable is called DealValueLocalCurrencyCurreP. In CBA2003, it is DealValueLocalCurrencyCurre0b. Why is the same code naming the same variable differently? This is causing trouble in my later append process.

    I understand people normally don't work with such long variable names. But having hundreds of variables makes change the variable names before importing to stata a lot of labor. Anybody have any idea about this, please help. Thanks in advance.

    Cheers,
    Daisy Huang

  • #2
    I can't explain the seemingly odd behavior you describe. However, I can suggest one workaround, which relies on the fact that the entire first row string is captured and assigned to a variable label, even though Stata truncates the variable's name when it imports. You could try something like this inside your loop:
    Code:
    import excel CBA`i', sheet("Sheet1") firstrow allstring clear
    foreach v of varlist _all {
       local wholename: variable label `v'
      // do something to modify wholename as you choose, then assign it to
      // the variable with -rename-.  For example, taking the first and
      // last 10 characters of the long varname might work
       local first10 = substr("`wholename'", 1, 10)
       local last10 = substr("`wholename'", -10, 10)
       rename `v' `first10'`last10'
    }
    Regards, Mike

    Comment


    • #3
      Alternatively, *if* they all have the same variables in the same columns, you can use the "cellrange" option to skip the first row entirely, then apply your rename commands to the column names it automatically assigns. Variable names "A, B, C..." aren't very useful or descriptive, but at least they'd be consistent across worksheets and easy to work with until you get them renamed.

      Comment


      • #4
        Hi Mike, Hi Ben,

        After reading Mike's post, I realized I don't really care about the variable names because the variables labels are exactly the same as the column names in xls. So I'm now doing what Ben suggests. By

        ren * _`j'v#, addnumber

        I rename all the variables to the same format. Thank you very much.

        Comment


        • #5
          Not precisely the answer, but a work around. I like short var names to quickly do analyses in Stata. I have an excel sheet with all long variable names in one column, shortened name in the next. In the database (which is also in excel format for data compilation), I add a column next to the one with long variable names and input the short variable names using either a Vlookup or Index(match,match) function.

          Comment


          • #6
            Hi Jorrit,

            Thank you. But to do it in your way, i will have to first generate the short name list in excel, right? This is not a desirable option for my situation because I have hundreds of excels. But it is a good work around in other cases.

            Comment


            • #7
              Hi Mike,

              I find ren * _`j'v#, addnumber is still not enough. So I want to include your idea as well. Then I find a strange thing.

              I test the code on two variables first, I simplify them to extract only the first 9 strings of the variable label.

              foreach v of varlist AnnouncementDate ArrangerParent {
              local wholename: variable label `v'
              local first9 = substr("`wholename'", 1, 9)
              ren `v' `first9'#, addnumber
              }

              The above code return the following error message:

              syntax error
              Syntax is
              rename oldname newname [, renumber[(#)] addnumber[(#)] sort ...]
              rename (oldnames) (newnames) [, renumber[(#)] addnumber[(#)] sort ...]
              rename oldnames , {upper|lower|proper}
              r(198);


              But if I change to extract the first 8 strings:

              foreach v of varlist AnnouncementDate ArrangerParent {
              local wholename: variable label `v'
              local first8 = substr("`wholename'", 1, 8)
              ren `v' `first8'#, addnumber
              }

              The same code works just fine if i reduce one string.

              This is a problem because when I expand the varlist, the strings can be extracted become less and less. In one data set I've tested, only 3 strings can be kept when varlist is _all. Then comes the problem of existing variable name during the loop over the varlist and the loop is stopped altogether.


              Originally posted by Mike Lacy View Post
              I can't explain the seemingly odd behavior you describe. However, I can suggest one workaround, which relies on the fact that the entire first row string is captured and assigned to a variable label, even though Stata truncates the variable's name when it imports. You could try something like this inside your loop:
              Code:
              import excel CBA`i', sheet("Sheet1") firstrow allstring clear
              foreach v of varlist _all {
              local wholename: variable label `v'
              // do something to modify wholename as you choose, then assign it to
              // the variable with -rename-. For example, taking the first and
              // last 10 characters of the long varname might work
              local first10 = substr("`wholename'", 1, 10)
              local last10 = substr("`wholename'", -10, 10)
              rename `v' `first10'`last10'
              }
              Regards, Mike

              Comment


              • #8
                And Mike, thank you. I'm new to the forum. I didn't know the quote and reply function. So you may not see my previous thank-you post.

                Originally posted by Mike Lacy View Post
                I can't explain the seemingly odd behavior you describe. However, I can suggest one workaround, which relies on the fact that the entire first row string is captured and assigned to a variable label, even though Stata truncates the variable's name when it imports. You could try something like this inside your loop:
                Code:
                import excel CBA`i', sheet("Sheet1") firstrow allstring clear
                foreach v of varlist _all {
                local wholename: variable label `v'
                // do something to modify wholename as you choose, then assign it to
                // the variable with -rename-. For example, taking the first and
                // last 10 characters of the long varname might work
                local first10 = substr("`wholename'", 1, 10)
                local last10 = substr("`wholename'", -10, 10)
                rename `v' `first10'`last10'
                }
                Regards, Mike

                Comment


                • #9
                  Hi Stata Users,

                  I have found the best work around for this problem, i think. I post it here FYI in case you need to work with long var names one day. It's inspired by Mike's idea of using macro extended function. But use PERMNAME to work on Stata imported varname instead of variable labels because there may be special characters in xls column names.

                  foreach v of varlist _all {
                  local nname : permname `v', length(23)
                  ren `v' `nname'
                  }
                  ren * *_#, addnumber

                  If there are really a huge number of variables, one extra line of rename group can make the variables names more distinguishable.

                  And I have also figured out why only 27-character DealValueLocalCurrencyCurre is read and why the parameter of 8 but not 9 works when i try to rename AnnouncementDate and ArrangerParent. The problem occurs in ArrangerParent.
                  Stata imported name xls column name
                  DealValueLocalCurrencyCurre Deal Value (LocalCurrency) Current
                  ArrangerParent Arranger Parent




                  Stata does try to use up to 32 characters as varname, but it implicitly counting space and special characters as well. In the case of Arranger Parent, it extracts "Arranger " with a space when I told Stata to get the first 9 characters. Since space is not allowed in varname, there is the error message.



                  Originally posted by Daisy Huang View Post
                  Hi Mike,

                  I find ren * _`j'v#, addnumber is still not enough. So I want to include your idea as well. Then I find a strange thing.

                  I test the code on two variables first, I simplify them to extract only the first 9 strings of the variable label.

                  foreach v of varlist AnnouncementDate ArrangerParent {
                  local wholename: variable label `v'
                  local first9 = substr("`wholename'", 1, 9)
                  ren `v' `first9'#, addnumber
                  }

                  The above code return the following error message:

                  syntax error
                  Syntax is
                  rename oldname newname [, renumber[(#)] addnumber[(#)] sort ...]
                  rename (oldnames) (newnames) [, renumber[(#)] addnumber[(#)] sort ...]
                  rename oldnames , {upper|lower|proper}
                  r(198);


                  But if I change to extract the first 8 strings:

                  foreach v of varlist AnnouncementDate ArrangerParent {
                  local wholename: variable label `v'
                  local first8 = substr("`wholename'", 1, 8)
                  ren `v' `first8'#, addnumber
                  }

                  The same code works just fine if i reduce one string.

                  This is a problem because when I expand the varlist, the strings can be extracted become less and less. In one data set I've tested, only 3 strings can be kept when varlist is _all. Then comes the problem of existing variable name during the loop over the varlist and the loop is stopped altogether.


                  Comment

                  Working...
                  X