Use Collapse and Reshape to arrange the data

Shailaja Tiwari

Join Date: Dec 2017
Posts: 73

Use Collapse and Reshape to arrange the data

10 Nov 2021, 23:05

Dear Statalisters,

I have a cross section dataset which contains information on household level food consumption. Below is an example of the dataset.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double Hhs_no byte Time_of_day int menu double Menu_wht int Ingredient_1 double Ingredient_wht_1 int Ingredient_2 double Ingredient_wht_2 int Ingredient_3 double Ingredient_wht_3 int Ingredient_4 double Ingredient_wht_4
1 3  315   10  315   10   .   .   .   .   .  .
1 3  316    8  316    8   .   .   .   .   .  .
1 3 2522    9 2522    9   .   .   .   .   .  .
1 1 2771 1250    3  500   .   .   .   .   .  .
1 2 2771 2500    3 1000   .   .   .   .   .  .
1 2 2891 1404   97  250 198 500  61 300  64 40
1 1 2901  579   61  300  64  42  65  19 248  4
2 3 2522   14 2522   14   .   .   .   .   .  .
2 1 2771 1200    1  400   .   .   .   .   .  .
2 1 2772 1500    1  500   .   .   .   .   .  .
2 2 2773 6000    1 2000   .   .   .   .   .  .
2 1 2901  999   47  500 228 200  64  48 248  3
2 2 2902 3027   50 2500 214 250  64  10 248  4
3 1 2771 1000    3  400   .   .   .   .   .  .
3 1 2901  328   77   45  61  60 176  20  64 20
1 3 2771 1250    3  500   .   .   .   .   .  .
end
label values Time_of_day x1_03
label def x1_03 1 "Morning", modify
label def x1_03 2 "Noon", modify
label def x1_03 3 "Night", modify
label values menu x1_05
label def x1_05 315 "Betel Leaf", modify
label def x1_05 316 "Supari", modify
label def x1_05 2522 "Salt (Iodine)", modify
label def x1_05 2771 "Rice/Jao", modify
label def x1_05 2772 "Rice/Jao", modify
label def x1_05 2773 "Rice/Jao", modify
label def x1_05 2891 "Jhol curry", modify
label def x1_05 2901 "Bhuna curry", modify
label def x1_05 2902 "Bhuna curry", modify
label values Ingredient_1 x1_07_01
label def x1_07_01 1 "Parboiled rice (coarse)", modify
label def x1_07_01 3 "Fine rice", modify
label def x1_07_01 47 "Sweet gourd", modify
label def x1_07_01 50 "Water gourd", modify
label def x1_07_01 61 "Potato", modify
label def x1_07_01 77 "Beher gura", modify
label def x1_07_01 97 "Lau Shak", modify
label def x1_07_01 315 "Betel Leaf", modify
label def x1_07_01 316 "Supari", modify
label def x1_07_01 2522 "Salt (Iodine)", modify
label values Ingredient_2 x1_07_02
label def x1_07_02 61 "Potato", modify
label def x1_07_02 64 "Onion", modify
label def x1_07_02 198 "Koi", modify
label def x1_07_02 214 "Tengra", modify
label def x1_07_02 228 "Bele", modify
label values Ingredient_3 x1_07_03
label def x1_07_03 61 "Potato", modify
label def x1_07_03 64 "Onion", modify
label def x1_07_03 65 "Garlic", modify
label def x1_07_03 176 "Rui", modify
label values Ingredient_4 x1_07_04
label def x1_07_04 64 "Onion", modify
label def x1_07_04 248 "Turmeric (dried)", modify

Where Hhs_no refers to the household identification number, Time refers to the time of day that the meal was consumed, menu refers to the composite food item, Menu_wht refers to the weight of the cooked food ingredients_1-ingredient_4 refers to the ingredients used in preparing the food item, Ingredient_wht_1- Ingredient_wht_4 refers to the weight of the ingredients used in preparing the food. Some households consumed the same food items more than once a day. So my data set is unique by Hhs_no, Menu and Time_of_day but not by Hhs_no and Menu
I need to arrange this data in a way that there is no duplication for Hhs_no and menu. An example of how I need the data to look is below:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double Hhs_no byte Time_of_day int menu double Menu_wht int Ingredient_1 double Ingredient_wht_1 int Ingredient_2 double Ingredient_wht_2 int Ingredient_3 double Ingredient_wht_3 int Ingredient_4 double Ingredient_wht_4
1 3  315   10  315   10   .   .   .   .   .  .
1 3  316    8  316    8   .   .   .   .   .  .
1 3 2522    9 2522    9   .   .   .   .   .  .
1 1 2771 1250    3  500   .   .   .   .   .  .
1 2 2891 1404   97  250 198 500  61 300  64 40
1 1 2901  579   61  300  64  42  65  19 248  4
2 3 2522   14 2522   14   .   .   .   .   .  .
2 1 2771 1200    1  400   .   .   .   .   .  .
2 1 2772 1500    1  500   .   .   .   .   .  .
2 2 2773 6000    1 2000   .   .   .   .   .  .
2 1 2901  999   47  500 228 200  64  48 248  3
2 2 2902 3027   50 2500 214 250  64  10 248  4
3 1 2771 1000    3  400   .   .   .   .   .  .
3 1 2901  328   77   45  61  60 176  20  64 20
end
label values Time_of_day x1_03
label def x1_03 1 "Morning", modify
label def x1_03 2 "Noon", modify
label def x1_03 3 "Night", modify
label values menu x1_05
label def x1_05 315 "Betel Leaf", modify
label def x1_05 316 "Supari", modify
label def x1_05 2522 "Salt (Iodine)", modify
label def x1_05 2771 "Rice/Jao", modify
label def x1_05 2772 "Rice/Jao", modify
label def x1_05 2773 "Rice/Jao", modify
label def x1_05 2891 "Jhol curry", modify
label def x1_05 2901 "Bhuna curry", modify
label def x1_05 2902 "Bhuna curry", modify
label values Ingredient_1 x1_07_01
label def x1_07_01 1 "Parboiled rice (coarse)", modify
label def x1_07_01 3 "Fine rice", modify
label def x1_07_01 47 "Sweet gourd", modify
label def x1_07_01 50 "Water gourd", modify
label def x1_07_01 61 "Potato", modify
label def x1_07_01 77 "Beher gura", modify
label def x1_07_01 97 "Lau Shak", modify
label def x1_07_01 315 "Betel Leaf", modify
label def x1_07_01 316 "Supari", modify
label def x1_07_01 2522 "Salt (Iodine)", modify
label values Ingredient_2 x1_07_02
label def x1_07_02 61 "Potato", modify
label def x1_07_02 64 "Onion", modify
label def x1_07_02 198 "Koi", modify
label def x1_07_02 214 "Tengra", modify
label def x1_07_02 228 "Bele", modify
label values Ingredient_3 x1_07_03
label def x1_07_03 61 "Potato", modify
label def x1_07_03 64 "Onion", modify
label def x1_07_03 65 "Garlic", modify
label def x1_07_03 176 "Rui", modify
label values Ingredient_4 x1_07_04
label def x1_07_04 64 "Onion", modify
label def x1_07_04 248 "Turmeric (dried)", modify

I guess that I need to use the collapse and reshape commands but am not sure how. Any help would be appreciated.
Thanks

Tags: collapse, reshape

Fei Wang

Join Date: Oct 2021

Posts: 726
#2

11 Nov 2021, 00:17

Shailaja, if the procedure is simply to keep the first observation (with the smallest value of Time_of_day) within each Hhs_no-menu pair, just like deleting "1 2 2771" and "1 3 2771" in your example, then the code below should work.

Code:

bys Hhs_no menu (Time_of_day): keep if _n == 1
Comment
Shailaja Tiwari

Join Date: Dec 2017

Posts: 73
#3

11 Nov 2021, 00:35

Thank you for your reply. But I do not need to drop any observation. I need the total weight of the food and the ingredients consumed. I guess that I need to collapse and reshape the data. However cant get how to.
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#4

11 Nov 2021, 00:48

Still didn't get your point. In #1, the upper data simply drops two observations and becomes the lower data, without changing anything for the remaining observations.
Comment
Shailaja Tiwari

Join Date: Dec 2017

Posts: 73
#5

11 Nov 2021, 01:27

My data is currently in the form where i have records for households (Hhs_no) and the meals they consumed and its weight (Menu and Menu_wht) at different times of the day (Time_of_day). The ingredients and its weight ( ingredients_1-ingredient_4 and ingredients_wht_1-ingredient_wht_4) are arranged as separate columns.
However I do not need information about the time of the day the meals were consumed. I need aggregate information about the composite meals consumed by a given households and the ingredients used in those meals. So I need my data arranged such that there is no duplication for Hhs_no and menu. However I am not sure how to do this. Any suggestions?
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#6

11 Nov 2021, 06:03

Shailaja, I understand that you'd like to aggregate your variables in some way. But what I haven't understood is how they should be aggregated (this has nothing to do with Stata, but solely depends on your research design). For example, in the upper data of #1, you have three lines "1 1 2771...", "1 2 2771", "1 3 2771", you want to aggregate the three lines into one. But what should we do for Menu_wht? Calculate the average or the summation of the three lines? How about other variables that may differ within the three lines? I need to know your idea on this part before I can give specific codes.
Comment
Shailaja Tiwari

Join Date: Dec 2017

Posts: 73
#7

11 Nov 2021, 06:22

Thank you for your reply.
My objective is to know how much of food consumption was there in each household.
So I want to sum Menu_wht by household and Menu. Specifically, I require the first column to give the Hhs_id, the second to give the menu code the the third to give the weight of the consumed food. The rest of the columns i need to arrange such that i can see the ingredients used in each menu dish along with their weights.

Last edited by Shailaja Tiwari; 11 Nov 2021, 06:26.
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#8

11 Nov 2021, 06:30

Then the code below may be what your need.

Code:

collapse (sum) Menu_wht, by(Hhs_id menu)

This would give a data with only three variables: Hhs_id, menu and the summation of Menu_wht within each Hhs_id-menu pair. If you need to sum up other variables, jut put them after "Menu_wht".
Comment
Shailaja Tiwari

Join Date: Dec 2017

Posts: 73
#9

11 Nov 2021, 07:25

The code would eliminate information on ingredients (and their weights) used in the Menu dishes. While collapsing I need to retain information on the ingredients used to prepare the menu dishes.
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#10

11 Nov 2021, 08:10

Yes, that's what I meant in #8: You may add as many variables as you want after "Menu_wht" to sum them up cross lines. If you'd like to sum up ingredients and their weights as well, the code becomes

Code:

collapse (sum) Menu_wht Ingredient_1 Ingredient_wht_1 other_variables, by(Hhs_id menu)
Comment
Shailaja Tiwari

Join Date: Dec 2017

Posts: 73
#11

11 Nov 2021, 08:23

The above code does not retain information in the ingredient names. Ingredient_1 to Ingredient_4 are the names of the ingredients used to prepare the dishes.
Comment

Fei Wang

Join Date: Oct 2021
Posts: 726

#12

11 Nov 2021, 08:59

Code:

collapse (sum) *wht* (mean) Ingredient_1 Ingredient_2 Ingredient_3 Ingredient_4, by(Hhs_no menu)

Comment

Shailaja Tiwari

Join Date: Dec 2017

Posts: 73
#13

11 Nov 2021, 10:30

This code also does not retain information about the ingredients. While the ingredient weights are summed by the menu the ingredient names (which are encoded) are not retained.
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#14

11 Nov 2021, 23:26

As ingredient names are encoded, the collapsed data would store the average of ingredient names within each Hhs_no-menu pair. By definition, your ingredient names should be the same within each Hhs_no-menu pair -- a specific menu in a specific household would have a fixed ingredient, no matter for breakfast, lunch or dinner. Then the average would retain the ingredient names. If the names are different within each Hhs_no-menu pair, then you need to let me know how you would handle this issue -- which name should be retained (in this case, average or summation will make no sense)? Or you may need to check if this issue is a data error.
Comment

Announcement

Use Collapse and Reshape to arrange the data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment