Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Appending datasets of different shapes

    Hi,

    I have datasets from 3 surveys with the first one containing variables like unique ID, age, BMI plus some health-related questions. The other two datasets (round 2 and 3) have all the variables but age and sex, as they are the same as in the first round. I have to append the 3 datasets together such that the data on age and BMI get filled for the last two rounds as well.

    Thank you for your suggestions!

    Here is sample data for round 1:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float ID int(age bmi) byte(we1 we2 we3 we4)
    1 14 48 0 1 0 0
    2 19 43 0 0 1 0
    3 21 38 0 0 0 1
    4 35 36 0 0 0 1
    5 25 36 0 0 1 0
    6 29 34 0 0 0 1
    7 20 40 0 0 1 0
    end



    Data for round 2:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float ID byte(we1 we2 we3 we4)
    1 1 0 0 0
    2 0 0 0 1
    3 0 1 0 0
    4 0 0 0 1
    5 0 0 0 1
    6 0 0 0 1
    7 0 0 0 1
    end

    Data for round 3:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float ID byte(we1 we2 we3 we4)
    1 0 0 1 0
    2 0 1 0 0
    3 0 0 1 0
    4 0 0 1 0
    5 0 1 0 0
    6 0 0 1 0
    7 0 0 1 0
    end
    Last edited by Sonnen Blume; 05 Nov 2022, 17:14.

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float ID int(age bmi) byte(we1 we2 we3 we4)
    1 14 48 0 1 0 0
    2 19 43 0 0 1 0
    3 21 38 0 0 0 1
    4 35 36 0 0 0 1
    5 25 36 0 0 1 0
    6 29 34 0 0 0 1
    7 20 40 0 0 1 0
    end
    tempfile one
    save `one'
    
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float ID byte(we1 we2 we3 we4)
    1 1 0 0 0
    2 0 0 0 1
    3 0 1 0 0
    4 0 0 0 1
    5 0 0 0 1
    6 0 0 0 1
    7 0 0 0 1
    end
    tempfile two
    save `two'
    
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float ID byte(we1 we2 we3 we4)
    1 0 0 1 0
    2 0 1 0 0
    3 0 0 1 0
    4 0 0 1 0
    5 0 1 0 0
    6 0 0 1 0
    7 0 0 1 0
    end
    tempfile three
    save `three'
    
    clear
    append using `one' `two' `three', generate(round)
    order round, after(ID)
    label values round .
    bysort ID (round): replace age = age[1]
    bysort ID (round): replace bmi = bmi[1]
    list, sepby(ID) noobs
    Code:
    . list, sepby(ID) noobs
    
      +------------------------------------------------+
      | ID   round   age   bmi   we1   we2   we3   we4 |
      |------------------------------------------------|
      |  1       1    14    48     0     1     0     0 |
      |  1       2    14    48     1     0     0     0 |
      |  1       3    14    48     0     0     1     0 |
      |------------------------------------------------|
      |  2       1    19    43     0     0     1     0 |
      |  2       2    19    43     0     0     0     1 |
      |  2       3    19    43     0     1     0     0 |
      |------------------------------------------------|
      |  3       1    21    38     0     0     0     1 |
      |  3       2    21    38     0     1     0     0 |
      |  3       3    21    38     0     0     1     0 |
      |------------------------------------------------|
      |  4       1    35    36     0     0     0     1 |
      |  4       2    35    36     0     0     0     1 |
      |  4       3    35    36     0     0     1     0 |
      |------------------------------------------------|
      |  5       1    25    36     0     0     1     0 |
      |  5       2    25    36     0     0     0     1 |
      |  5       3    25    36     0     1     0     0 |
      |------------------------------------------------|
      |  6       1    29    34     0     0     0     1 |
      |  6       2    29    34     0     0     0     1 |
      |  6       3    29    34     0     0     1     0 |
      |------------------------------------------------|
      |  7       1    20    40     0     0     1     0 |
      |  7       2    20    40     0     0     0     1 |
      |  7       3    20    40     0     0     1     0 |
      +------------------------------------------------+

    Comment


    • #3
      Originally posted by William Lisowski View Post
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float ID int(age bmi) byte(we1 we2 we3 we4)
      1 14 48 0 1 0 0
      2 19 43 0 0 1 0
      3 21 38 0 0 0 1
      4 35 36 0 0 0 1
      5 25 36 0 0 1 0
      6 29 34 0 0 0 1
      7 20 40 0 0 1 0
      end
      tempfile one
      save `one'
      
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float ID byte(we1 we2 we3 we4)
      1 1 0 0 0
      2 0 0 0 1
      3 0 1 0 0
      4 0 0 0 1
      5 0 0 0 1
      6 0 0 0 1
      7 0 0 0 1
      end
      tempfile two
      save `two'
      
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float ID byte(we1 we2 we3 we4)
      1 0 0 1 0
      2 0 1 0 0
      3 0 0 1 0
      4 0 0 1 0
      5 0 1 0 0
      6 0 0 1 0
      7 0 0 1 0
      end
      tempfile three
      save `three'
      
      clear
      append using `one' `two' `three', generate(round)
      order round, after(ID)
      label values round .
      bysort ID (round): replace age = age[1]
      bysort ID (round): replace bmi = bmi[1]
      list, sepby(ID) noobs
      Code:
      . list, sepby(ID) noobs
      
      +------------------------------------------------+
      | ID round age bmi we1 we2 we3 we4 |
      |------------------------------------------------|
      | 1 1 14 48 0 1 0 0 |
      | 1 2 14 48 1 0 0 0 |
      | 1 3 14 48 0 0 1 0 |
      |------------------------------------------------|
      | 2 1 19 43 0 0 1 0 |
      | 2 2 19 43 0 0 0 1 |
      | 2 3 19 43 0 1 0 0 |
      |------------------------------------------------|
      | 3 1 21 38 0 0 0 1 |
      | 3 2 21 38 0 1 0 0 |
      | 3 3 21 38 0 0 1 0 |
      |------------------------------------------------|
      | 4 1 35 36 0 0 0 1 |
      | 4 2 35 36 0 0 0 1 |
      | 4 3 35 36 0 0 1 0 |
      |------------------------------------------------|
      | 5 1 25 36 0 0 1 0 |
      | 5 2 25 36 0 0 0 1 |
      | 5 3 25 36 0 1 0 0 |
      |------------------------------------------------|
      | 6 1 29 34 0 0 0 1 |
      | 6 2 29 34 0 0 0 1 |
      | 6 3 29 34 0 0 1 0 |
      |------------------------------------------------|
      | 7 1 20 40 0 0 1 0 |
      | 7 2 20 40 0 0 0 1 |
      | 7 3 20 40 0 0 1 0 |
      +------------------------------------------------+
      Hi William,

      Thanks so much for the example.

      This is working with the sample data, but when I tried this with the original data I end up having 4 rounds (0-4), and some variables remained missing for the other rounds. I am wondering if this is because the sample size is different for the 3 rounds in the original data, unlike in the sample data. Please let me know if this can be an issue.

      Thank you.

      Comment


      • #4
        You should be able to figure this out by (a) looking in the appended data and finding the ID and round for observations you question, and then (b) looking for that ID in the original dataset for the round.

        By the way, I do not understand how you have four rounds numbered, apparently, 0-4.

        But then, your post #3 shows me the code I wrote in post #2, which I already knew, rather than the code you ran that produced the results you describe, which is left to my imagination.

        Added in edit: you may find this discussion from Clyde Schechter helpful:

        https://www.statalist.org/forums/for...48#post1684848
        Last edited by William Lisowski; 06 Nov 2022, 18:04.

        Comment


        • #5
          Hi William,

          Sorry I accidentally copied your own codes. Here is the one I used:


          Code:
          clear
          import excel "/Users/training1.xlsx", sheet("Sheet 1") firstrow
          ds, has(type string)
          tempfile one
          save `one'
             
          clear
          import excel "/Users/training2.xlsx", sheet("Sheet 1") firstrow
          ds, has(type string)
          tempfile two
          save `two'
          
          clear
          import excel "/Users/training3.xlsx", sheet("Sheet 1") firstrow
          ds, has(type string) 
          tempfile three
          save `three'
              
              
          append using `one' `two' `three', generate(round)  
          order round, after(univid)
          label values round .
          bysort univid (round): replace age = age[1]
          bysort univid (round): replace sex = sex[1]
          bysort univid (round): replace university = university[1]
          bysort univid (round): replace education = education[1]
          bysort univid (round): replace maritalStatus = maritalStatus[1]
          list, sepby(univid) noobs
          Thanks for sharing the post by Clyde. I have checked for string variables at every step and converted to numeric as necessary. It looks like the data for round 3 is getting duplicated.

          Comment


          • #6
            You neglected the clear command before the append command.

            Comment


            • #7
              Thank you so much William! It worked now.

              Comment

              Working...
              X