Dear All,
I would like some assistance cleaning a really messy dataset I have. For the sake of confidentiality, I'll use an example of a bakery. Let's say I have the data below:
A bakery receives orders for cakes and scones (parent orders) but every once in a while some people request milk or eggs (child orders) on their own. What I would like to do is generate a variable that determines the product a customer purchased given the presence of a 'parent order'. I would like to get data in the format below:
Can anybody advise on how to go about doing this or have suggestions on how I can reshape the data?
Thanks,
Bernard.
I would like some assistance cleaning a really messy dataset I have. For the sake of confidentiality, I'll use an example of a bakery. Let's say I have the data below:
| ID | Date | Product | Collected |
| 001 | 1 | milk | 1 |
| 001 | 1 | eggs | 1 |
| 001 | 1 | flour | 1 |
| 001 | 1 | cake | |
| 002 | 1 | milk | 0 |
| 003 | 2 | eggs | 1 |
| 003 | 2 | milk | 1 |
| 004 | 3 | milk | 1 |
| 004 | 3 | eggs | 1 |
| 004 | 3 | flour | 1 |
| 004 | 3 | sugar | 1 |
| 004 | 3 | scones | |
| 001 | 4 | milk | 1 |
A bakery receives orders for cakes and scones (parent orders) but every once in a while some people request milk or eggs (child orders) on their own. What I would like to do is generate a variable that determines the product a customer purchased given the presence of a 'parent order'. I would like to get data in the format below:
| ID | Date | Product | Collected | Sales |
| 001 | 1 | milk | 1 | cake |
| 001 | 1 | eggs | 1 | cake |
| 001 | 1 | flour | 1 | cake |
| 001 | 1 | cake | cake | |
| 002 | 1 | milk | 0 | milk |
| 003 | 2 | eggs | 1 | eggs |
| 003 | 2 | milk | 1 | milk |
| 004 | 3 | milk | 1 | scones |
| 004 | 3 | eggs | 1 | scones |
| 004 | 3 | flour | 1 | scones |
| 004 | 3 | sugar | 1 | scones |
| 004 | 3 | scones | scones | |
| 001 | 4 | milk | 1 | milk |
Can anybody advise on how to go about doing this or have suggestions on how I can reshape the data?
Thanks,
Bernard.

Comment