Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Append on Variable Labels

    Dear All,

    I am trying to append several datasets that have variables which describe the same things, but with the variables named something different across the datasets - think a variable named Age in one dataset, and time since birth in years in another.

    I can go through manually renaming the variables to match, and then append the datasets, a time-consuming process. However, the variables often share identical labels across the datasets. Does anyone know if there is a way to append data based on label names rather than variable names?

    Thanks for your help

  • #2
    I know of no way to append on variable labels rather than variable names. But, if every variable in each data set has a label, and if all of those labels are distinctive after being truncated to a maximum of 32 characters and all characters disallowed in variable names replaced with underscore characters ("_"), you can do this:
    Code:
    //    CREATE DATA SETS TO DEMONSTRATE THE CODE
    clear
    set obs 10
    set seed 1234
    gen x = runiform(1, 50)
    gen y = rnormal(0,1)
    label var x "First Variable"
    label var y "Second Variable"
    save dataset1, replace
    
    clear
    set obs 20
    gen a = runiformint(1, 50)
    gen b = rnormal(0, 1)
    label var a "First Variable"
    label var b "Second Variable"
    save dataset2, replace
    
    clear
    set obs 50
    gen j = runiformint(1, 50)
    gen k = rnormal(0, 1)
    label var j "First Variable"
    label var k "Second Variable"
    save dataset3, replace
    
    local filelist dataset1 dataset2 dataset3
    
    clear
    tempfile building
    save `building', emptyok
    
    foreach f of local filelist {
        use `"`f'"', clear
        ds _all
        foreach v of varlist `r(varlist)' {
            rename `v' `=strtoname(`"`:var label `v''"')'
        }
        append using `building'
        save `"`building'"', replace
    }
    des
    This code, in effect, automates the process of renaming the variables to the corresponding labels. Evidently it only works under pretty restrictive conditions. And it is fairly brittle: if in some of the data sets the labels on corresponding variables differ slightly (say the use of an upper or lower case letter somewhere) it will fail.

    If this approach won't work with your data sets, it is possible to write something more general, but it is lengthy and complicated, and probably more effort than just going through the data sets and doing the renaming in each data set before appending.

    Comment


    • #3
      I'd just bring in both datasets separately, rename all variables based on variable labels, and then append. That should do the trick. Perhaps code like this will help:

      Code:
      **Bring in dataset 1
      use "....dta", clear //or import excel, or import delimited, etc.
      ds
      foreach var in `r(varlist)'{
          rename `var' `=strtoname("`:variable label `var''")'
      }
      tempfile dataset1
      save `dataset1'
      
      **Bring in dataset 2
      use "....dta", clear //or import excel, or import delimited, etc.
      ds
      foreach var in `r(varlist)'{
          rename `var' `=strtoname("`:variable label `var''")'
      }
      append using `dataset1'
      Of course you can put the above into a forvalues loop if you choose

      Comment


      • #4
        Crossed with Clyde's post.

        Reading it makes me think that nesting a lower() or upper() function into the strtoname() function would help make the code a bit more versatile (and possibly some subinstr() functions too).

        Comment

        Working...
        X