Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Use Collapse and Reshape to arrange the data

    Dear Statalisters,


    I have a cross section dataset which contains information on household level food consumption. Below is an example of the dataset.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double Hhs_no byte Time_of_day int menu double Menu_wht int Ingredient_1 double Ingredient_wht_1 int Ingredient_2 double Ingredient_wht_2 int Ingredient_3 double Ingredient_wht_3 int Ingredient_4 double Ingredient_wht_4
    1 3  315   10  315   10   .   .   .   .   .  .
    1 3  316    8  316    8   .   .   .   .   .  .
    1 3 2522    9 2522    9   .   .   .   .   .  .
    1 1 2771 1250    3  500   .   .   .   .   .  .
    1 2 2771 2500    3 1000   .   .   .   .   .  .
    1 2 2891 1404   97  250 198 500  61 300  64 40
    1 1 2901  579   61  300  64  42  65  19 248  4
    2 3 2522   14 2522   14   .   .   .   .   .  .
    2 1 2771 1200    1  400   .   .   .   .   .  .
    2 1 2772 1500    1  500   .   .   .   .   .  .
    2 2 2773 6000    1 2000   .   .   .   .   .  .
    2 1 2901  999   47  500 228 200  64  48 248  3
    2 2 2902 3027   50 2500 214 250  64  10 248  4
    3 1 2771 1000    3  400   .   .   .   .   .  .
    3 1 2901  328   77   45  61  60 176  20  64 20
    1 3 2771 1250    3  500   .   .   .   .   .  .
    end
    label values Time_of_day x1_03
    label def x1_03 1 "Morning", modify
    label def x1_03 2 "Noon", modify
    label def x1_03 3 "Night", modify
    label values menu x1_05
    label def x1_05 315 "Betel Leaf", modify
    label def x1_05 316 "Supari", modify
    label def x1_05 2522 "Salt (Iodine)", modify
    label def x1_05 2771 "Rice/Jao", modify
    label def x1_05 2772 "Rice/Jao", modify
    label def x1_05 2773 "Rice/Jao", modify
    label def x1_05 2891 "Jhol curry", modify
    label def x1_05 2901 "Bhuna curry", modify
    label def x1_05 2902 "Bhuna curry", modify
    label values Ingredient_1 x1_07_01
    label def x1_07_01 1 "Parboiled rice (coarse)", modify
    label def x1_07_01 3 "Fine rice", modify
    label def x1_07_01 47 "Sweet gourd", modify
    label def x1_07_01 50 "Water gourd", modify
    label def x1_07_01 61 "Potato", modify
    label def x1_07_01 77 "Beher gura", modify
    label def x1_07_01 97 "Lau Shak", modify
    label def x1_07_01 315 "Betel Leaf", modify
    label def x1_07_01 316 "Supari", modify
    label def x1_07_01 2522 "Salt (Iodine)", modify
    label values Ingredient_2 x1_07_02
    label def x1_07_02 61 "Potato", modify
    label def x1_07_02 64 "Onion", modify
    label def x1_07_02 198 "Koi", modify
    label def x1_07_02 214 "Tengra", modify
    label def x1_07_02 228 "Bele", modify
    label values Ingredient_3 x1_07_03
    label def x1_07_03 61 "Potato", modify
    label def x1_07_03 64 "Onion", modify
    label def x1_07_03 65 "Garlic", modify
    label def x1_07_03 176 "Rui", modify
    label values Ingredient_4 x1_07_04
    label def x1_07_04 64 "Onion", modify
    label def x1_07_04 248 "Turmeric (dried)", modify
    Where Hhs_no refers to the household identification number, Time refers to the time of day that the meal was consumed, menu refers to the composite food item, Menu_wht refers to the weight of the cooked food ingredients_1-ingredient_4 refers to the ingredients used in preparing the food item, Ingredient_wht_1- Ingredient_wht_4 refers to the weight of the ingredients used in preparing the food. Some households consumed the same food items more than once a day. So my data set is unique by Hhs_no, Menu and Time_of_day but not by Hhs_no and Menu
    I need to arrange this data in a way that there is no duplication for Hhs_no and menu. An example of how I need the data to look is below:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double Hhs_no byte Time_of_day int menu double Menu_wht int Ingredient_1 double Ingredient_wht_1 int Ingredient_2 double Ingredient_wht_2 int Ingredient_3 double Ingredient_wht_3 int Ingredient_4 double Ingredient_wht_4
    1 3  315   10  315   10   .   .   .   .   .  .
    1 3  316    8  316    8   .   .   .   .   .  .
    1 3 2522    9 2522    9   .   .   .   .   .  .
    1 1 2771 1250    3  500   .   .   .   .   .  .
    1 2 2891 1404   97  250 198 500  61 300  64 40
    1 1 2901  579   61  300  64  42  65  19 248  4
    2 3 2522   14 2522   14   .   .   .   .   .  .
    2 1 2771 1200    1  400   .   .   .   .   .  .
    2 1 2772 1500    1  500   .   .   .   .   .  .
    2 2 2773 6000    1 2000   .   .   .   .   .  .
    2 1 2901  999   47  500 228 200  64  48 248  3
    2 2 2902 3027   50 2500 214 250  64  10 248  4
    3 1 2771 1000    3  400   .   .   .   .   .  .
    3 1 2901  328   77   45  61  60 176  20  64 20
    end
    label values Time_of_day x1_03
    label def x1_03 1 "Morning", modify
    label def x1_03 2 "Noon", modify
    label def x1_03 3 "Night", modify
    label values menu x1_05
    label def x1_05 315 "Betel Leaf", modify
    label def x1_05 316 "Supari", modify
    label def x1_05 2522 "Salt (Iodine)", modify
    label def x1_05 2771 "Rice/Jao", modify
    label def x1_05 2772 "Rice/Jao", modify
    label def x1_05 2773 "Rice/Jao", modify
    label def x1_05 2891 "Jhol curry", modify
    label def x1_05 2901 "Bhuna curry", modify
    label def x1_05 2902 "Bhuna curry", modify
    label values Ingredient_1 x1_07_01
    label def x1_07_01 1 "Parboiled rice (coarse)", modify
    label def x1_07_01 3 "Fine rice", modify
    label def x1_07_01 47 "Sweet gourd", modify
    label def x1_07_01 50 "Water gourd", modify
    label def x1_07_01 61 "Potato", modify
    label def x1_07_01 77 "Beher gura", modify
    label def x1_07_01 97 "Lau Shak", modify
    label def x1_07_01 315 "Betel Leaf", modify
    label def x1_07_01 316 "Supari", modify
    label def x1_07_01 2522 "Salt (Iodine)", modify
    label values Ingredient_2 x1_07_02
    label def x1_07_02 61 "Potato", modify
    label def x1_07_02 64 "Onion", modify
    label def x1_07_02 198 "Koi", modify
    label def x1_07_02 214 "Tengra", modify
    label def x1_07_02 228 "Bele", modify
    label values Ingredient_3 x1_07_03
    label def x1_07_03 61 "Potato", modify
    label def x1_07_03 64 "Onion", modify
    label def x1_07_03 65 "Garlic", modify
    label def x1_07_03 176 "Rui", modify
    label values Ingredient_4 x1_07_04
    label def x1_07_04 64 "Onion", modify
    label def x1_07_04 248 "Turmeric (dried)", modify
    I guess that I need to use the collapse and reshape commands but am not sure how. Any help would be appreciated.
    Thanks





  • #2
    Shailaja, if the procedure is simply to keep the first observation (with the smallest value of Time_of_day) within each Hhs_no-menu pair, just like deleting "1 2 2771" and "1 3 2771" in your example, then the code below should work.

    Code:
    bys Hhs_no menu (Time_of_day): keep if _n == 1

    Comment


    • #3
      Thank you for your reply. But I do not need to drop any observation. I need the total weight of the food and the ingredients consumed. I guess that I need to collapse and reshape the data. However cant get how to.

      Comment


      • #4
        Still didn't get your point. In #1, the upper data simply drops two observations and becomes the lower data, without changing anything for the remaining observations.

        Comment


        • #5
          My data is currently in the form where i have records for households (Hhs_no) and the meals they consumed and its weight (Menu and Menu_wht) at different times of the day (Time_of_day). The ingredients and its weight ( ingredients_1-ingredient_4 and ingredients_wht_1-ingredient_wht_4) are arranged as separate columns.
          However I do not need information about the time of the day the meals were consumed. I need aggregate information about the composite meals consumed by a given households and the ingredients used in those meals. So I need my data arranged such that there is no duplication for Hhs_no and menu. However I am not sure how to do this. Any suggestions?

          Comment


          • #6
            Shailaja, I understand that you'd like to aggregate your variables in some way. But what I haven't understood is how they should be aggregated (this has nothing to do with Stata, but solely depends on your research design). For example, in the upper data of #1, you have three lines "1 1 2771...", "1 2 2771", "1 3 2771", you want to aggregate the three lines into one. But what should we do for Menu_wht? Calculate the average or the summation of the three lines? How about other variables that may differ within the three lines? I need to know your idea on this part before I can give specific codes.

            Comment


            • #7
              Thank you for your reply.
              My objective is to know how much of food consumption was there in each household.
              So I want to sum Menu_wht by household and Menu. Specifically, I require the first column to give the Hhs_id, the second to give the menu code the the third to give the weight of the consumed food. The rest of the columns i need to arrange such that i can see the ingredients used in each menu dish along with their weights.
              Last edited by Shailaja Tiwari; 11 Nov 2021, 06:26.

              Comment


              • #8
                Then the code below may be what your need.

                Code:
                collapse (sum) Menu_wht, by(Hhs_id menu)
                This would give a data with only three variables: Hhs_id, menu and the summation of Menu_wht within each Hhs_id-menu pair. If you need to sum up other variables, jut put them after "Menu_wht".

                Comment


                • #9
                  The code would eliminate information on ingredients (and their weights) used in the Menu dishes. While collapsing I need to retain information on the ingredients used to prepare the menu dishes.

                  Comment


                  • #10
                    Yes, that's what I meant in #8: You may add as many variables as you want after "Menu_wht" to sum them up cross lines. If you'd like to sum up ingredients and their weights as well, the code becomes

                    Code:
                    collapse (sum) Menu_wht Ingredient_1 Ingredient_wht_1 other_variables, by(Hhs_id menu)

                    Comment


                    • #11
                      The above code does not retain information in the ingredient names. Ingredient_1 to Ingredient_4 are the names of the ingredients used to prepare the dishes.

                      Comment


                      • #12
                        Code:
                        collapse (sum) *wht* (mean) Ingredient_1 Ingredient_2 Ingredient_3 Ingredient_4, by(Hhs_no menu)

                        Comment


                        • #13
                          This code also does not retain information about the ingredients. While the ingredient weights are summed by the menu the ingredient names (which are encoded) are not retained.

                          Comment


                          • #14
                            As ingredient names are encoded, the collapsed data would store the average of ingredient names within each Hhs_no-menu pair. By definition, your ingredient names should be the same within each Hhs_no-menu pair -- a specific menu in a specific household would have a fixed ingredient, no matter for breakfast, lunch or dinner. Then the average would retain the ingredient names. If the names are different within each Hhs_no-menu pair, then you need to let me know how you would handle this issue -- which name should be retained (in this case, average or summation will make no sense)? Or you may need to check if this issue is a data error.

                            Comment

                            Working...
                            X