Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching households food consumption with individual

    Dear Stata list,

    I have household level 24 hours food consumption data (by Menu) including ingredients in one data set.

    In another data set, for the same household, I also have individual level food consumption data only by Menu (no ingredients )

    How can I match two data sets to find out individual level ingredients consumed by each person.

    Below example data sets are given. I also do not know why example data sets are not working (very sorry about that)
    Household level data:

    clear
    input double hh_id int Menu_1 int ingre_1 int ingre_2 int ingre_3 int ingre_4 int ingre_5

    1 "Salt"
    1 "Rice/Jao" "Fine Rice"
    1 "Jhol curry" "Lau Shak" "Koi" "Potato" "Turmeric"
    1 "Bhuna curry" "Potato" "Onion" "Garloic" "Coriander"
    1 "Supari"
    1 "Betel Leaf"
    2 "Rice/Jao"
    2 "Bhuna corry" "Water gourd" "Tengra" "Onion" "Green chilli"
    2 "Salt"
    end

    Individual level data:

    input double hh_id double Person_id double age int Menu_2
    1 1 . "Salt"
    1 1 57 "Rice/Jao"
    1 1 . "Bhuna curry"
    1 2 50 "Rice/Jao"
    1 2 . "Bhuna curry"
    1 2 . "Jhol curry"
    2 1 . "Salt"
    2 1 51 "Rice/Jao"
    2 1 . "Bhuna curry"
    end

    Any support would highly appreciated !

    Thank you so much!

  • #2
    The only real obstacle here is that the variable containing the menu is named differently in the two data sets.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(hh_id Person_id age) str11 Menu_2
    1 1  . "Salt"       
    1 1 57 "Rice/Jao"   
    1 1  . "Bhuna curry"
    1 2 50 "Rice/Jao"   
    1 2  . "Bhuna curry"
    1 2  . "Jhol curry" 
    2 1  . "Salt"       
    2 1 51 "Rice/Jao"   
    2 1  . "Bhuna curry"
    end
    tempfile consumption
    save `consumption'
    
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte hh_id str11(Menu_1 ingre_1) str6 ingre_2 str7 ingre_3 str12 ingre_4 str1 ingre_5
    1 "Salt"        ""            ""       ""        ""             ""
    1 "Rice/Jao"    "Fine Rice"   ""       ""        ""             ""
    1 "Jhol curry"  "Lau Shak"    "Koi"    "Potato"  "Turmeric"     ""
    1 "Bhuna curry" "Potato"      "Onion"  "Garloic" "Coriander"    ""
    1 "Supari"      ""            ""       ""        ""             ""
    1 "Betel Leaf"  ""            ""       ""        ""             ""
    2 "Rice/Jao"    ""            ""       ""        ""             ""
    2 "Bhuna corry" "Water gourd" "Tengra" "Onion"   "Green chilli" ""
    2 "Salt"        ""            ""       ""        ""             ""
    end
    tempfile recipes
    save `recipes'
    
    use `consumption', clear
    clonevar Menu_1 = Menu_2
    merge m:1 hh_id Menu_1 using `recipes'
    drop Menu_1
    Please re-read the help file for the -dataex- command and follow those instructions when posting example data in the future. If you don't find those instructions helpful, David Benson has a nice video at https://youtu.be/bXfaRCAOPbI.

    Comment


    • #3
      Dear Clyde,

      Thank you so much for the details input!

      I do not know why the syntex (use `consumption', clear) does not work in my data set whether due to data type. it gives a message as below
      invalid file specification.

      May I know what can I do with that.

      Thank you

      Comment


      • #4
        Thank you so much Clyde! It works now!

        Comment


        • #5
          Sorry Clyde to ask about this previous work (post 32)! when I run

          merge m:1 hh_id Menu_1 using `recipes'

          I get error message as below

          variables a01 Menu_1 do not uniquely identify observations in the using dataif I use m:m, then it works. merge m:m hh_id Menu_1 using `recipes'

          My question whether m:m would be correct if I do!

          Thank you so much!

          here comes the previous suggestion from you below..

          Code:
          * Example generated by -dataex-. To install: ssc install dataex clear input byte(hh_id Person_id age) str11 Menu_2 1 1 . "Salt" 1 1 57 "Rice/Jao" 1 1 . "Bhuna curry" 1 2 50 "Rice/Jao" 1 2 . "Bhuna curry" 1 2 . "Jhol curry" 2 1 . "Salt" 2 1 51 "Rice/Jao" 2 1 . "Bhuna curry" end tempfile consumption save `consumption' * Example generated by -dataex-. To install: ssc install dataex clear input byte hh_id str11(Menu_1 ingre_1) str6 ingre_2 str7 ingre_3 str12 ingre_4 str1 ingre_5 1 "Salt" "" "" "" "" "" 1 "Rice/Jao" "Fine Rice" "" "" "" "" 1 "Jhol curry" "Lau Shak" "Koi" "Potato" "Turmeric" "" 1 "Bhuna curry" "Potato" "Onion" "Garloic" "Coriander" "" 1 "Supari" "" "" "" "" "" 1 "Betel Leaf" "" "" "" "" "" 2 "Rice/Jao" "" "" "" "" "" 2 "Bhuna corry" "Water gourd" "Tengra" "Onion" "Green chilli" "" 2 "Salt" "" "" "" "" "" end tempfile recipes save `recipes' use `consumption', clear clonevar Menu_1 = Menu_2 merge m:1 hh_id Menu_1 using `recipes' drop Menu_1

          Comment


          • #6
            I must add that this merge (merge m:1 hh_id Menu_1 using `recipes') works for the sample data here but it does not work in my real data.

            Please suggest me what to do. Thank you

            Comment


            • #7
              No, no, no, no, no! -merge m:m- just produces data salad. It is definitely not the correct thing to do. And any analyses you would produce after using it would just be garbage.

              Most likely there is a problem with the data in your recipes file, one that did not show up in the example you gave. So you have to find out where your data set has gone wrong and how to fix it. Based on your description of the problem hh_id and Menu_1 should uniquely identify observations in the -recipes- data set. That is, for any given hh_id there should be only one observation for each Menu_1. But Stata is telling you that in your data there are at least some combinations of hh_id and Menu_1 that occur repeatedly. So the first step is to find what these are:

              Code:
              duplicates tag hh_id Menu_1, gen(flag)
              browse if flag
              will show them to you. Now it may turn out that some or all of these observations are pure duplicates--that is, they are identical on all variables, not just on hh_id and Menu_1. In that case, you will not lose any information if you just throw away one member of the hh_id Menu_1 group, as they are all the same. You can do that with -duplicates drop-. But, before you just apply that simple fix, be warned that the presence of these surplus observations may well indicate that something was done wrong when the `recipes' data set was built in the first place and you should revisit that process as there may well be other errors in the data that you need to attend to.

              Now, you may find that your pairs (or triplets or whatever) of hh_id Menu_1 observations are, in some cases, not exact duplicates--they disagree on other variables. Now you have a different problem. You have two inconsistent recipes for the same menu in the same household. Is one of them correct? Or is the true recipe somehow gotten from combining them? Or is the true recipe something else? The possibilities here are numerous and you will have to pursue each of these situations--different pairs may be problems for different reasons. You will just have to track down, in each case, what the correct recipe is, and either keep that one only (if it is already in the data set) or drop all the ones you have and add the correct one to your data set.

              Now, there is one other, less likely, possibility here. It may be that it is perfectly correct for a single household to have two or more different, contradictory versions of the same menu. In that case you have to either come to a decision about which one you want to use for your purposes, or, perhaps what you really want is to pair up each household's consumption data with every one of those different recipes. If the latter is true, then you cannot use -merge-. This becomes a task for joinby:

              Code:
              joinby hh_id Menu1 using `recipes'
              Before running that code, read -help joinby-, and decide what you want to do with unmatched observations, and then specify -joinby-'s -unmatched()- option accordingly.

              Comment


              • #8
                Thank you so much Clyde for the so details explanation. running the syntax
                duplicates tag hh_id Menu_1, gen(flag) browse if flag I can see huge number of duplicates like 30605 out of 320832 observations.

                Let me try to identify the error (with my limited knowledge of data! set)! I highly appreciate your continuous support!


                Sincerely,
                Rumana

                Comment

                Working...
                X