Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data Reshaping/Transformation

    Dear All,

    I would like some assistance cleaning a really messy dataset I have. For the sake of confidentiality, I'll use an example of a bakery. Let's say I have the data below:

    ID Date Product Collected
    001 1 milk 1
    001 1 eggs 1
    001 1 flour 1
    001 1 cake
    002 1 milk 0
    003 2 eggs 1
    003 2 milk 1
    004 3 milk 1
    004 3 eggs 1
    004 3 flour 1
    004 3 sugar 1
    004 3 scones
    001 4 milk 1


    A bakery receives orders for cakes and scones (parent orders) but every once in a while some people request milk or eggs (child orders) on their own. What I would like to do is generate a variable that determines the product a customer purchased given the presence of a 'parent order'. I would like to get data in the format below:

    ID Date Product Collected Sales
    001 1 milk 1 cake
    001 1 eggs 1 cake
    001 1 flour 1 cake
    001 1 cake cake
    002 1 milk 0 milk
    003 2 eggs 1 eggs
    003 2 milk 1 milk
    004 3 milk 1 scones
    004 3 eggs 1 scones
    004 3 flour 1 scones
    004 3 sugar 1 scones
    004 3 scones scones
    001 4 milk 1 milk



    Can anybody advise on how to go about doing this or have suggestions on how I can reshape the data?

    Thanks,

    Bernard.
    Last edited by Bernard Phiri; 10 Jun 2022, 05:36.

  • #2
    Code:
    clear
    input str3 ID     Date     str10 Product     Collected
    001     1     milk     1
    001     1     eggs     1
    001     1     flour     1
    001     1     cake     .
    002     1     milk     0
    003     2     eggs     1
    003     2     milk     1
    004     3     milk     1
    004     3     eggs     1
    004     3     flour     1
    004     3     sugar     1
    004     3     scones     .
    001     4     milk     1
    end
    
    gen seq = _n
    
    gen wanted = Product if Collected == .
    gsort ID Date -wanted
    bysort ID Date: replace wanted = wanted[1] if wanted == ""
    replace wanted = Product if wanted == ""
    
    sort seq
    
    list, sepby(ID Date)
    Results:

    Code:
         +------------------------------------------------+
         |  ID   Date   Product   Collec~d   seq   wanted |
         |------------------------------------------------|
      1. | 001      1      milk          1     1     cake |
      2. | 001      1      eggs          1     2     cake |
      3. | 001      1     flour          1     3     cake |
      4. | 001      1      cake          .     4     cake |
         |------------------------------------------------|
      5. | 002      1      milk          0     5     milk |
         |------------------------------------------------|
      6. | 003      2      eggs          1     6     eggs |
      7. | 003      2      milk          1     7     milk |
         |------------------------------------------------|
      8. | 004      3      milk          1     8   scones |
      9. | 004      3      eggs          1     9   scones |
     10. | 004      3     flour          1    10   scones |
     11. | 004      3     sugar          1    11   scones |
     12. | 004      3    scones          .    12   scones |
         |------------------------------------------------|
     13. | 001      4      milk          1    13     milk |
         +------------------------------------------------+

    Comment


    • #3
      Dear Ken,

      Thank you for the feedback. Your solution solved part of the problem. I should have used a more complex example. Please see below:

      Code:
      clear
      
      
      input str3 ID     Date     str10 Product     Collected
      001        1     milk      1
      001        1     eggs      1
      001        1     flour     1
      001        1     cake      .
      002        1     milk      0
      003        2     eggs      1
      003        2     milk      1
      004        3     milk      1
      004        3     eggs      1
      004     3     flour     1
      004     3     sugar     1
      004     3     scones    .
      001     4     milk      1
      002     4     water     1    
      002     4     butter    1
      002     4     cookies   .
      002        4     milk      1
      002        4     eggs      1
      002        4     flour     1
      002        4     cake      .
      end
      
      gen seq = _n
      
      gen wanted = Product if Collected == .
      gsort ID Date -wanted
      bysort ID Date: replace wanted = wanted[1] if wanted == ""
      replace wanted = Product if wanted == ""
      
      sort seq
      
      list, sepby(ID Date)

      Results:

      Code:
          +-------------------------------------------------+
           |  ID   Date   Product   Collec~d   seq    wanted |
           |-------------------------------------------------|
        1. | 001      1      milk          1     1      cake |
        2. | 001      1      eggs          1     2      cake |
        3. | 001      1     flour          1     3      cake |
        4. | 001      1      cake          .     4      cake |
           |-------------------------------------------------|
        5. | 002      1      milk          0     5      milk |
           |-------------------------------------------------|
        6. | 003      2      eggs          1     6      eggs |
        7. | 003      2      milk          1     7      milk |
           |-------------------------------------------------|
        8. | 004      3      milk          1     8    scones |
        9. | 004      3      eggs          1     9    scones |
       10. | 004      3     flour          1    10    scones |
       11. | 004      3     sugar          1    11    scones |
       12. | 004      3    scones          .    12    scones |
           |-------------------------------------------------|
       13. | 001      4      milk          1    13      milk |
           |-------------------------------------------------|
       14. | 002      4     water          1    14   cookies |
       15. | 002      4    butter          1    15   cookies |
       16. | 002      4   cookies          .    16   cookies |
       17. | 002      4      milk          1    17   cookies |
       18. | 002      4      eggs          1    18   cookies |
       19. | 002      4     flour          1    19   cookies |
       20. | 002      4      cake          .    20      cake |
           +-------------------------------------------------+
      As you can see, a customer on date 4 ordered a cake but that isn't reflected in the 'wanted' variable. How can this be overcome? Is it possible to have STATA search for 'products' and then generate an observation based on the combination of 'products' per id?

      Comment


      • #4
        As you can see, a customer on date 4 ordered a cake but that isn't reflected in the 'wanted' variable. How can this be overcome? Is it possible to have STATA search for 'products' and then generate an observation based on the combination of 'products' per id?
        That really depends on how you'd like to use this information. And given these are all imaginary data, I don't know enough to give any comment. They could be a series of "wanted#" variables, or the products could be connected and contained inside one cell, etc. It's more important to convey what is the desired variable. (And if not, at least provide enough analytical details so that users here may have a chance to help.)

        Your solution solved part of the problem. I should have used a more complex example.
        Then perhaps do not try too hard simplifying it. Keep it as authentic as it is. I don't particularly mind this kind of "And there is more!!" Stata quizzes; it's nice to practice. But do know that oversimplification tends to get useless answers which tend to waste everyone's time.

        <><><><><>

        Here is a suggestion if you wish to chain them together:

        Code:
        clear
        input str3 ID     Date     str10 Product     Collected
        001        1     milk      1
        001        1     eggs      1
        001        1     flour     1
        001        1     cake      .
        002        1     milk      0
        003        2     eggs      1
        003        2     milk      1
        004        3     milk      1
        004        3     eggs      1
        004     3     flour     1
        004     3     sugar     1
        004     3     scones    .
        001     4     milk      1
        002     4     water     1    
        002     4     butter    1
        002     4     cookies   .
        002        4     milk      1
        002        4     eggs      1
        002        4     flour     1
        002        4     cake      .
        end
        
        preserve
        keep if Collected == .
        bysort ID Date: gen ind = _n
        reshape wide Product, i(ID) j(ind)
        egen wanted = concat(Product*), punct(" ")
        keep ID Date wanted
        save temp, replace
        restore
        
        merge m:1 ID Date using temp
        drop _merge
        
        replace wanted = Product if wanted == ""
        
        list, sepby(ID Date)
        Results:

        Code:
             +------------------------------------------------+
             |  ID   Date   Product   Collec~d         wanted |
             |------------------------------------------------|
          1. | 001      1      milk          1           cake |
          2. | 001      1      cake          .           cake |
          3. | 001      1     flour          1           cake |
          4. | 001      1      eggs          1           cake |
             |------------------------------------------------|
          5. | 001      4      milk          1           milk |
             |------------------------------------------------|
          6. | 002      1      milk          0           milk |
             |------------------------------------------------|
          7. | 002      4   cookies          .   cookies cake |
          8. | 002      4     flour          1   cookies cake |
          9. | 002      4    butter          1   cookies cake |
         10. | 002      4      eggs          1   cookies cake |
         11. | 002      4      cake          .   cookies cake |
         12. | 002      4      milk          1   cookies cake |
         13. | 002      4     water          1   cookies cake |
             |------------------------------------------------|
         14. | 003      2      eggs          1           eggs |
         15. | 003      2      milk          1           milk |
             |------------------------------------------------|
         16. | 004      3      milk          1         scones |
         17. | 004      3      eggs          1         scones |
         18. | 004      3     flour          1         scones |
         19. | 004      3     sugar          1         scones |
         20. | 004      3    scones          .         scones |
             +------------------------------------------------+

        Comment


        • #5
          Thank you, Ken.

          This is exactly what I was looking to do.

          Best,

          Bernard.

          Comment

          Working...
          X