Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging with lookup table

    Hi everyone,

    I would really like some advice on how to use convert my data to kcal using both sex and age of the individual in a household (HHID). For instance, there are 6 individuals in HHID 1013000201 with different sex-age categories, I want to use the information given by the variable sex and age for kcal conversion.

    Do I create a look up table in this case or would it be too complicated since there are a lot of individuals with different age and sex combinations? Thank you for any advice!


    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str12 HHID str14 PID byte(sex hh_age) float(hh_agecat age_sexcat)
    "1013000201" "101300020102" 2 28 11 6
    "1013000201" "101300020103" 1 17 10 6
    "1013000201" "101300020106" 1 6 5 6
    "1013000201" "101300020104" 1 11 7 6
    "1013000201" "101300020105" 1 9 6 6
    "1013000201" "101300020101" 2 44 12 6
    "1013000204" "101300020403" 2 6 5 5
    "1013000204" "101300020405" 2 0 1 5
    "1013000204" "101300020404" 2 2 2 5
    "1013000204" "101300020402" 2 32 12 5
    "1013000204" "101300020401" 1 34 12 5
    "1013000206" "101300020601" 1 29 11 1
    "1013000210" "101300021001" 1 38 12 1
    "1013000213" "101300021301" 1 31 12 1
    "101300021302" "1013000213030" 1 5 4 4
    "101300021302" "10130002130201" 1 35 12 4
    "101300021302" "10130002130204" 1 0 1 4
    "101300021302" "1013000213020" 2 28 11 4
    "1021000102" "102100010205" 2 19 11 5
    "1021000102" "102100010203" 2 10 6 5
    "1021000102" "102100010201" 1 39 12 5
    "1021000102" "102100010204" 2 2 2 5
    "1021000102" "102100010206" 2 19 11 5
    "1021000102" "102100010202" 2 37 12 5
    "1021000108" "102100010804" 1 23 11 5
    "1021000108" "102100010803" 2 25 11 5
    "1021000108" "102100010806" 2 17 10 5
    "1021000108" "102100010805" 1 20 11 5
    "1021000108" "102100010802" 2 59 12 5
    "1021000108" "102100010801" 1 70 13 5
    "1021000109" "102100010903" 2 16 9 6
    "1021000109" "102100010906" 2 1 1 6
    "1021000109" "102100010902" 2 22 11 6
    "1021000109" "102100010904" 1 12 7 6
    "1021000109" "102100010905" 1 19 11 6
    "1021000109" "102100010901" 2 39 12 6
    "1021000110" "102100011002" 2 44 12 4
    "1021000110" "102100011003" 1 25 11 4
    "1021000110" "102100011006" 2 19 11 4
    "1021000110" "102100011004" 1 23 11 4
    "1021000110" "102100011005" 1 21 11 4
    "1021000110" "102100011001" 1 53 12 4
    "1021000111" "102100011103" 1 15 9 8
    "1021000111" "102100011102" 2 17 10 8
    "1021000111" "102100011108" 1 18 10 8
    "1021000111" "102100011110" 1 16 9 8
    "1021000111" "102100011109" 1 15 9 8
    "1021000111" "102100011107" 2 20 11 8
    "1021000111" "102100011106" 1 25 11 8
    "1021000111" "102100011105" 2 5 4 8
    "1021000111" "102100011104" 1 13 8 8
    "1021000111" "102100011101" 1 47 12 8
    "1021000113" "102100011308" 2 17 10 8
    "1021000113" "102100011302" 2 43 12 8
    "1021000113" "102100011307" 1 4 4 8
    "1021000113" "102100011304" 1 14 8 8
    "1021000113" "102100011306" 1 10 6 8
    "1021000113" "102100011301" 1 47 12 8
    "1021000113" "102100011305" 2 12 7 8
    "1021000113" "102100011309" 2 32 12 8
    "1021000113" "102100011303" 1 20 11 8
    "1021000201" "102100020103" 1 14 8 5
    "1021000201" "102100020101" 1 41 12 5
    "1021000201" "102100020105" 2 16 9 5
    "1021000201" "102100020102" 2 38 12 5
    "1021000201" "102100020104" 2 9 6 5
    "1021000202" "102100020201" 1 43 12 3
    "1021000202" "102100020203" 2 14 8 3
    "1021000202" "102100020202" 2 20 11 3
    "1021000203" "102100020309" 1 7 5 10
    "1021000203" "102100020306" 1 12 7 10
    "1021000203" "102100020307" 1 14 8 10
    "1021000203" "102100020305" 2 17 10 10
    "1021000203" "102100020304" 1 20 11 10
    "1021000203" "102100020303" 2 27 11 10
    "1021000203" "102100020302" 1 37 12 10
    "1021000203" "102100020308" 2 4 4 10
    "1021000203" "102100020310" 1 2 2 10
    "1021000203" "102100020301" 2 58 12 10
    "102100020304" "10210002030401" 1 27 11 4
    "102100020304" "10210002030404" 1 3 3 4
    "102100020304" "10210002030403" 1 6 5 4
    "102100020304" "1021000203040" 2 24 11 4
    "1021000207" "102100020703" 1 13 8 4
    "1021000207" "102100020702" 2 38 12 4
    "1021000207" "102100020704" 2 17 10 4
    "1021000207" "102100020701" 1 41 12 4
    "1021000209" "102100020901" 1 59 12 1
    "1021000210" "102100021001" 1 35 12 1
    "1021000212" "102100021202" 2 17 10 4
    "1021000212" "102100021204" 1 4 4 4
    "1021000212" "102100021203" 1 15 9 4
    "1021000212" "102100021201" 2 48 12 4
    "1021000213" "102100021309" 2 5 4 8
    "1021000213" "102100021307" 2 20 11 8
    "1021000213" "102100021306" 2 17 10 8
    "1021000213" "102100021303" 1 25 11 8
    "1021000213" "102100021302" 2 42 12 8
    "1021000213" "102100021305" 2 21 11 8
    "1021000213" "102100021304" 1 22 11 8
    end
    label values sex H2Q3
    label def H2Q3 1 "MALE", modify
    label def H2Q3 2 "FEMALE", modify
    [/CODE]
    ------------------ copy up to and including the previous line ------------------

    Listed 100 out of 17673 observations
    Use the count() option to list more

    .
    Last edited by Mangji Zo; 06 Aug 2018, 00:53.

  • #2
    Unless there is some clear formula for deriving kcal by e.g., multiplying with age and sex, then yes, a lookup table would be the most useful.
    If there are many different values, with each age/sex category requiring another kcal value, then yes, this could be a relatively large amount of work. The only short-cut that could exist, is if this type of analysis has been done by others before, and you could find an existing lookup table, rather than compile one yourself.
    I imagine you are not going to come up with these values yourself, so most likely you are drawing from some existing source. Getting to your answer would be asking the question: how do I use the existing source with kcal values to create the lookup table I need. So describe your source or method of determining the kcal values, and people will be able to transform it into code.

    Comment


    • #3
      Hi Jorrit,

      Thank you for your advice! I really appreciate your help and explanation! I have the following table with all the kcal values, there are around 16 categories (assuming only the moderate exercise column for both adult men and women) each using sex and age that I am trying to convert using my data. Unfortunately, I could not find any existing look up table for the data I have. Thank you for any advice on how to do this!



      Attached Files

      Comment


      • #4
        So it seems there are roughly 25 categories of age/sex/activity with corresponding kcal values. In that case, creating a dta file with 2 variables for 'category' and 'kcal' by hand seems the quickest and easiest way to go about it.

        Comment


        • #5
          Hi Jorrit,

          Thank you for your help! I will try it. If I create the dta file, how do I merge it with the data that I have to match the HHID (household identifier) and PID (individual identifier)? Do I need to include another variable to indicate this in the dta file I create? Sorry for the questions, I am still learning how to use STATA.

          Comment


          • #6
            Merge it on groups, for example something like the age_sexcat variable you have in your data example in post #1.
            Your list in post #3 suggest sex, age_cat and activity level defines the kcal value. So if your database also incudes activity level you can create a new group that includes all the characteristics provided in the list from post #3.

            Comment


            • #7
              Hi Jorrit,

              I have the following code for the lookup table. I am not sure if I am on the right track? Thank you for any advice on this!

              CODE:

              ***Create lookup table with two variables, roughly 25 categories****
              ***Male=1, female=2***

              input int category kcal sex

              ***young children***
              1 820
              2 1150
              3 1350
              4 1550

              **older children boys**
              5 1850 1
              6 2100 1
              7 2200 1
              8 2400 1
              9 2650 1
              10 2850 1

              **older children girls**
              5 1750 2
              6 1800 2
              7 1950 2
              8 2100 2
              9 2150 2
              10 2150 2

              ***Adults male, moderate activity**
              11 2600 1
              12 2500 1
              13 2100 1

              ***Adult female, moderate activity**
              11 2000 2
              12 2050 2
              13 1850 2

              end

              Comment


              • #8
                The format is what I suggested, but now you have some categories that are the same for male and female (categories 11, 12, and 13).
                Each group (age, sex, activity level) should have its own category number

                Comment


                • #9
                  Hi Jorrit,

                  Thank you for your help! Is there any way in which I can merge a look up table without numerical numbers? Stata is telling me that it can't be read as numerical? Can you let me know how to fix this? Thank you!

                  Code:

                  input int Crop_code Crop_name
                  111 Wheat
                  120 Rice
                  130 Maize
                  141 Finger millet
                  150 Sorghum
                  210 Beans
                  221 Field peas
                  222 Cow peas
                  223 Pigeon peas
                  310 Chick peas
                  320 Soya beans
                  330 Sunflower
                  340 Simsim
                  410 Cabbage
                  420 Tomatoes
                  440 Onions
                  450 Pumpkins
                  460 Dodo
                  470 Eggplants
                  510 Sugarcane

                  Comment


                  • #10
                    Did you write this piece of input code yourself?
                    If so, make sure to understand the different types of variables in Stata: https://www.stata.com/manuals13/ddatatypes.pdf
                    And read over the guide on input as well: https://www.stata.com/manuals13/dinput.pdf

                    The problem is you are asking Stata to input both Crop_code Crop_name variables as integer (int). Values of Crop_code can be read in as integers, but not those for Crop_name. You have to tell Stata Crop_name should be read in as a string variable.


                    Comment


                    • #11
                      Thank you, Jorrit! I have made the changes to my code and it worked! I will make sure that I read more about it in the pdf files you suggested! Thank you again for your help!

                      Comment

                      Working...
                      X