Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rename variable names before import

    I have a collection of xls files that I want to append. Each has a set of indicators for a group of countries and years.
    The layout of each is in wide format with years (numerical) as column headers (i.e., Country, indicator name, and then years e.g., 1990, 1991 etc.; see also
    If I import from excel, using first row as var names, Stata replaces the numerical 1970 with capital letters.

    I cannot append ignoring the first row or with the option 'extvarlist', because some of these files range 1960-2010, or 1980-2014, etc.

    What I would like is Stata to rename all variables X to yrX before importing, or something similar.
    It doesnt really matter if none year variables are also included; there's only a few, which I could rename with a few simple lines.


    Any thoughts?
    I will need my data long at some point anyway, so it might be okay to reshape individual files, save them, then merge.
    Problem is I don't know how to reshape with data as below (I dont know whta to use as 'j').
    I did notice Stata sticks the year numbers into the value labels when importing with 'firstrow' specified, so maybe there is a way to batch rename variables using 'yr+label'?


    Hope this is clear enough.
    Thanks for your help.

    using option firstrow when importing:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str30 Country str3 CTRYCODE str34 Indicator double(E F G)
    "Afghanistan"    "AFG" "TechPot_Wind_MWh" 3653718937.2448316 3653718937.2448316 3653718937.2448316
    "Albania"        "ALB" "TechPot_Wind_MWh"  222818627.4862292  222818627.4862292  222818627.4862292
    "Algeria"        "DZA" "TechPot_Wind_MWh" 20192854481.295692 20192854481.295692 20192854481.295692
    "American Samoa" "ASM" "TechPot_Wind_MWh"                  .                  .                  .
    "Andorra"        "ADO" "TechPot_Wind_MWh"                  .                  .                  .
    end
    label var Country "Country" 
    label var CTRYCODE "CTRY CODE" 
    label var Indicator "Indicator" 
    label var E "1970" 
    label var F "1971" 
    label var G "1972"
    Or without firstrow specified

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str30 A str9 B str36 C str34 D double(E F G)
    "Country"        "CTRY CODE" "Region"                             "Indicator"                      1970               1971               1972
    "Afghanistan"    "AFG"       "South, East and South-Eastern Asia" "TechPot_Wind_MWh" 3653718937.2448316 3653718937.2448316 3653718937.2448316
    "Albania"        "ALB"       "Europe"                             "TechPot_Wind_MWh"  222818627.4862292  222818627.4862292  222818627.4862292
    "Algeria"        "DZA"       "Africa"                             "TechPot_Wind_MWh" 20192854481.295692 20192854481.295692 20192854481.295692
    "American Samoa" "ASM"       "Oceania"                            "TechPot_Wind_MWh"                  .                  .                  .
    end

  • #2
    At first glance this looks like the kind of data structure discussed by Kit Baum and myself in http://www.stata-journal.com/sjpdf.h...iclenum=dm0031

    Comment


    • #3
      Right. That would almost get me there.
      If doing ti via this route, the problem remains that the first year of datasets may differ, ie., some run 1960-2010, or 1980-2014, etc.
      That reference gives below loop. Is there a way to replace '1956' in that with the label of the variable E (as in the dataex examples in post #1)
      Even better, would there be a way to identify the first numerical variable label? The above example has the first year saved under variable name E, but some files might miss the column 'Region', so the first year of data would be saved in column D.



      [CODE]
      forv i=4/48 {
      rename v‘i’ d‘=1956+‘i’’
      }
      Last edited by Jorrit Gosens; 07 Apr 2016, 05:51.

      Comment


      • #4
        I am sure you could do that but right now I don't have access to enough files to work out more general syntax. (Please don't send me them, least of all lots of spreadsheets!) But why not just create a bunch of files and let append and/or merge handle any differences in coverage?

        Comment


        • #5
          The problem is append wont know where to stick the data if the variable names are not the same, right?
          I.e., the first file will have var name E for 1970; how will Stata know where to append the data from the next file if that one has 1960 in the fifth column?

          Comment


          • #6
            You can't have it both ways, the first row indicates the contents of the column but these are not valid Stata variable names. So you must import the files without the firstrow option and then rename the variables. A simple loop should do the trick:

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str30 A str9 B str36 C str34 D double(E F G)
            "Country"        "CTRY CODE" "Region"                             "Indicator"                      1970               1971               1972
            "Afghanistan"    "AFG"       "South, East and South-Eastern Asia" "TechPot_Wind_MWh" 3653718937.2448316 3653718937.2448316 3653718937.2448316
            "Albania"        "ALB"       "Europe"                             "TechPot_Wind_MWh"  222818627.4862292  222818627.4862292  222818627.4862292
            "Algeria"        "DZA"       "Africa"                             "TechPot_Wind_MWh" 20192854481.295692 20192854481.295692 20192854481.295692
            "American Samoa" "ASM"       "Oceania"                            "TechPot_Wind_MWh"                  .                  .                  .
            end
            
            foreach v of varlist * {
                
                local vname = `v'[1]
                capture confirm number `vname'
                if _rc == 0 local vname yr`vname'
                local vname = strtoname("`vname'")
                rename `v' `vname'
                
            }

            Comment


            • #7
              Right. Thanks for that. Very helpful.

              Comment

              Working...
              X