Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • merging survey data waves with inconsistent variable names over time

    I use several waves of national labor force survey data. Each wave is a separate .dta file. Each wave contains more or less the same variables, but the variable names differ across years. For example, AGE is called a1 in year 1999, c1 in year 2000, b2 in year2001, etc. I have generated an excel sheet in which I track all the name changes, it looks about as follows:

    1999 2000 2001 2002 2003 .....

    AGE a1 c1 b2 a3 f2

    EDUC c1 a1 v1 b3 f5

    EXPER g1 r3 j2 l8 o6

    .
    .

    A variable name can stand for a specific variable in year t and another in year t+1 or t+2 or any other year; for example, above, a1 is the variable name for AGE in 1999 but stands for EDUC in 2000. My question is, how can I create a single .dta file with the variables AGE, EDUC, EXPER,... that includes all waves (years), and takes the information above into account in an automatized way (i.e. using the excel sheet)? Of course I could do it manually, but that would just take too long, since I have many variables and many waves (years).

    I hope for and look forward to helpful replies.

  • #2
    Unfortunately, the variables names overlap (e.g. c1 being sometimes AGE, sometimes EDUC), which complicates the task. We can't do a loop which renames a1, c1, b2, etc... to "AGE" because we should at each time also notify the year it applies, so I'm not sure it will be easier than to rename by hand (open each wave, properly rename the variables, save the file, and once all done, merge.). When I say by hand, I mean with a do-file, but one variable at the time.
    Code:
    use wave1.dta
    rename a1 age
    ....
    save wave1_harmonized,replace
    
    use wave2.dta
    ...
    How much waves and variables do you have?
    Ps : Also contact your data supplier and ask them to pick one name per variable and stick to it!

    Comment


    • #3
      Hi, thanks for your reply! Your way of doing it was the simple one I had in mind, with a dofile of course. I have about 20 waves and around 20 variables that are useful for me, so it would be a lot of work.

      Comment


      • #4
        If you make both the wave files and the excel sheet with indicator names doubly long (i.e., both years and variable names long), you could merge each wave with the excel sheet 1:1 on both year and varname. If the waves are one for each year, this requires introducing a year variable in each wave file

        Edit:
        if you'd like help in Stata syntax, please upload a .dat version of the excel sheet plus a snippet of one of the waves files, using dataex, by doing:
        Code:
        ssc install dataex
        dataex in 1/20
        Last edited by Jorrit Gosens; 18 Aug 2016, 08:57.

        Comment


        • #5
          That is a very good idea, thank you very much Jorrit.

          Comment

          Working...
          X