Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Append multiple datasets

    Dears,

    I am trying to append different dataset. I found this program and I would like to use it
    Code:
    program define apx
    local i = 1
    use database`i', clear
    while `i'<=10 {
    append using database`i'
    local i = `i' + 1
    }
    end
    apx
    My problem is that in each database i have some varibale coded with the same number but label differently and when I append all the variable containing the same code change label. I will try to explain better with an example. Suppose that in dataset1 the variable q1 contain the following values: 1 "A", 2 "B", 3 "3", while in dataset2 the same variable, q1, contains the following 1"C", 2"D", 3"E". Is there a way to tell stata to keep the different label when i do append (I would like to avoid to reconvert all the variable in string).

    Thanks

    Federica

  • #2
    first, I see no reason for a program; append allows you to append multiple files in one use of the command

    second, append has a "nolabel" option; it is not clear what you want the result to be (i.e., I could not understand what you wanted), so this may not be the solution; if not, please clarify what you want the end result to be

    Comment


    • #3
      Value labels are defined on the variable level, not for specific observations. So you cannot have a value q1 of 1 mean both "A" and "C". Unless there's some command out there I don't know of, I recommend you first define a consistent value label*, apply it to all datasets and then do the append.

      * You might want to read up on "label save [lblname [lblname...]] using filename [, replace]" to make your life easier (help label). It stores your label as commands in a dofile, with some minor alterations you should be able to combine those dofiles in a consistent way. If you have hundreds of labels, you could even do that with Stata string manipulation commands or more easily with some smart search and replacing.

      Comment


      • #4
        I would add that, more generally, doing a mass -append- of "fresh out of the box" data sets typically ends in tears. Rare is the data source that has been curated with such obsessive consistency and meticulous care that you don't encounter inconsistent labels, clashes of data types (string vs. numeric), inconsistent names for what should be the same variable, etc. Appending a bunch of data sets together should always be the last step in combining them. First, each data set separately needs to be carefully examined, cleaned, and transformed as necessary to a single consistent standard. Then these carefully scrubbed data sets can be -append-ed at the end.

        Comment


        • #5
          When you follow Clyde's excellent advice and work your way through each of your input files resolving inconsistencies, you may fine the following tool useful:
          Code:
          decode q1, generate(q1_s)
          drop q1
          which will create a new variable q1_s which in dataset1 will contain values "A", "B", and "C" while in dataset2 it will contain values "C", "D", and "E". Then, after you have appended all the files,
          Code:
          encode q1_s, generate(q1) label(Q1)
          which will create a numeric variable q1, and the value label Q1 that matches the generated numeric values to the strings.

          For more detail, see help encode or help decode - both take you to the same place.

          Comment


          • #6
            Dear All,

            thank you for your answer and sorry if i was not clear. As Clyde suggested I first cleaned all datasets and my problem arise because in each dataset I encoded string vars. I would like to clarify that I am clening dataset contianing information of different livelihood zones that is why i have variables with the same name but with different "label". So the solution is to go back to my do files and avoid the encode command before appending.

            Thanks again
            Federica

            Comment


            • #7
              not necessarily - use of encode may well be fine but use the label option (with a pre-defined label) to ensure consistency

              Comment


              • #8
                Ok thanks

                Comment

                Working...
                X