Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining Datasets and Reformatting Data

    Hello everyone.

    I have a number of datasets (excel files) with data formatted as

    SheetName: Year1,VarX
    Name Cat1 Cat2 Cat3 Cat4
    Name1 x1 x2 x3 x4
    Name2 x5 x6x x7 x8
    SheetName: Year2,VarX
    Name Cat1 Cat2 Cat3 Cat4
    Name1 x9 x10 x11 x12
    Name2 x13 x14 x15 x16
    SheetName: Year1,VarY
    Name Cat1 Cat2 Cat3 Cat4
    Name1 y1 y2 y3 y4
    Name2 y5 y6 y7 y8
    I'm trying to reformat the data so it looks like:
    Year Name Cat X Y
    1 Name1 Cat1 x1 y1
    1 Name1 Cat2 x2 y2
    1 Name1 Cat3 x3 y3
    [...]
    2 Name2 Cat2 x14 y14
    2 Name2 Cat3 x15 y15
    2 Name2 Cat4 x16 y16
    I've looked at reshape and stack respectively but I don't think either of them do quite what is needed. While I could do this manually in Excel, I don't have the time to go through this for the thousands of data observations I'm working with. This feels like there should be a Stata command to do so, but I don't know what it would be.

    Thanks


  • #2
    Create a variable "year" and an indicator "x" in each of the initial datasets. So for the first, year=1 and x=1, the second, year=2 and x=1 and the third, year=1 and x=0. Then append these datasets. See

    Code:
    help append
    You should have something similar to below after appending.

    Code:
    +----------------------------------------------+
      | year    name   x   cat1   cat2   cat3   cat4 |
      |----------------------------------------------|
      |    1   Name1   1     x1     x2     x3     x4 |
      |    1   Name2   1     x5     x6     x7     x8 |
      |    2   Name1   1     x9    x10    x11    x12 |
      |    2   Name2   1    x13    x14    x15    x16 |
      |    1   Name1   0     y1     y2     y3     y4 |
      |    1   Name2   0     y5     y6     y7     y8 |
      +----------------------------------------------+
    Then, you can do the following:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float year str5 name float x str5 cat1 str3(cat2 cat3 cat4)
    1 "Name1" 1 "x1"  "x2"  "x3"  "x4"
    1 "Name2" 1 "x5"  "x6"  "x7"  "x8"
    2 "Name1" 1 "x9"  "x10" "x11" "x12"
    2 "Name2" 1 "x13" "x14" "x15" "x16"
    1 "Name1" 0 "y1"  "y2"  "y3"  "y4"
    1 "Name2" 0 "y5"  "y6"  "y7"  "y8"
    end
    
    reshape long cat, i(year name x) j(which)
    reshape wide cat, i(year name which) j(x)
    rename (which cat0 cat1) (cat  y x)
    Res.:

    Code:
    . l, sepby(year name)
    
         +-------------------------------+
         | year    name   cat    y     x |
         |-------------------------------|
      1. |    1   Name1     1   y1    x1 |
      2. |    1   Name1     2   y2    x2 |
      3. |    1   Name1     3   y3    x3 |
      4. |    1   Name1     4   y4    x4 |
         |-------------------------------|
      5. |    1   Name2     1   y5    x5 |
      6. |    1   Name2     2   y6    x6 |
      7. |    1   Name2     3   y7    x7 |
      8. |    1   Name2     4   y8    x8 |
         |-------------------------------|
      9. |    2   Name1     1         x9 |
     10. |    2   Name1     2        x10 |
     11. |    2   Name1     3        x11 |
     12. |    2   Name1     4        x12 |
         |-------------------------------|
     13. |    2   Name2     1        x13 |
     14. |    2   Name2     2        x14 |
     15. |    2   Name2     3        x15 |
     16. |    2   Name2     4        x16 |
         +-------------------------------+
    
    .
    Last edited by Andrew Musau; 06 Jun 2021, 15:34.

    Comment

    Working...
    X