Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to categorize non-numerical variables

    Hi,

    I am having trouble coding these variables into two categories: edible foods and non-edible foods. I am not sure if the coding is the same way as continuous variables? I am relatively new to STATA, any advice is greatly appreciated!


    Here are my codes:

    tab Crop_name, mi nol
    encode Crop_name, gen(Crop_namecodes)


    tab Crop_namecodes, miss
    gen Crop_namecodescat = Crop_namecodes
    recode Crop_namecodescat bananafood beans maize =1 coffee cocao =2
    label define Crop_namecodescatlab 1 "edible food" 2 "non-edible food"





    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str20 Crop_name
    ""              
    "Banana food"   
    "Cassava"       
    "Beans"         
    "Dodo"          
    "Dodo"          
    "Maize"         
    "Yam"           
    "Banana food"   
    "Beans"         
    "Maize"         
    "Banana food"   
    "Beans"         
    "Maize"         
    "Coffee all"    
    ""              
    "Banana food"   
    "Banana food"   
    "Banana food"   
    "Cassava"       
    "Banana food"   
    "Coffee all"    
    "Beans"         
    "Maize"         
    "Groundnuts"    
    "Maize"         
    "Cassava"       
    "Maize"         
    "Sweet potatoes"
    ""              
    "Beans"         
    ""              
    "Cocoa"         
    "Coffee all"    
    "Groundnuts"    
    "Groundnuts"    
    "Beans"         
    "Cassava"       
    ""              
    "Sweet potatoes"
    "Banana food"   
    "Cassava"       
    "Maize"         
    "Yam"           
    "Sugarcane"     
    ""              
    "Banana food"   
    "Beans"         
    "Beans"         
    "Cassava"       
    "Maize"         
    "Banana food"   
    ""              
    "Banana food"   
    "Coffee all"    
    "Sweet potatoes"
    "Cassava"       
    "Yam"           
    "Groundnuts"    
    "Maize"         
    "Sweet potatoes"
    "Maize"         
    "Maize"         
    "Beans"         
    "Maize"         
    ""              
    "Banana food"   
    "Maize"         
    "Beans"         
    "Cassava"       
    "Banana food"   
    "Beans"         
    "Coffee all"    
    "Fallow"        
    "Sweet potatoes"
    "Beans"         
    "Cassava"       
    "Maize"         
    "Banana food"   
    ""              
    "Fallow"        
    "Banana food"   
    "Coffee all"    
    "Banana beer"   
    "Sweet potatoes"
    "Maize"         
    "Beans"         
    "Maize"         
    "Banana food"   
    "Coffee all"    
    "Banana food"   
    "Beans"         
    "Cassava"       
    "Maize"         
    ""              
    ""              
    ""              
    ""              
    "Banana beer"   
    "Banana beer"   
    end
    ------------------ copy up to and including the previous line ------------------

    Listed 100 out of 15403 observations
    Use the count() option to list more


  • #2
    You need to refer to the numeric values you created with encode in your recode command: You probably don't want to copy Crop_namecodes directly into your new variable Crop_namecodescat for fear of using the same codes in the recode command and confusing the categories.

    Code:
    encode Crop_name, gen(Crop_namecodes)
    lab list Crop_namecodes
    gen Crop_namecodescat=.
    replace Crop_namecodescat=1 if inlist(Crop_namecodes, 2, 3, 10)
    replace Crop_namecodescat=2 if inlist(Crop_namecodes, 6, 5)
    label define Crop_namecodescatlab 1 "edible food" 2 "non-edible food" , modify
    lab value Crop_namecodescat Crop_namecodescatlab
    Stata/MP 14.1 (64-bit x86-64)
    Revision 19 May 2016
    Win 8.1

    Comment


    • #3
      Hello,

      Thank you for the code! It worked really well! I have another question with regard to calculating the total quantity of crops produced by each household if it is possible? HIDD is the variable for the identification of each household. For instance, I want to add up the all the quan_harvested for HHID -1021000113 and subtract the Quan_sold. I have tried collapsing the data but I am not sure if I am on the right track? Thank you for any advice!

      input str14 HHID float Parcel_ID str20 Crop_name double(Quan_harvested Quan_sold)
      "1013000204" 21 "" . .
      "1021000108" 1 "Banana food" 48 0
      "1021000108" 1 "Cassava" 3 0
      "1021000113" 1 "Beans" 3 0
      "1021000113" 1 "Dodo" 2 0
      "1021000113" 1 "Dodo" 10 0
      "1021000113" 1 "Maize" 1 0
      "1021000113" 1 "Yam" 99999 .
      "1021000408" 1 "Banana food" 99999 .
      "1021000408" 1 "Beans" 2 0
      "1021000408" 1 "Maize" 1 0
      "1021000710" 1 "Banana food" 99999 .
      "1021000710" 1 "Beans" 3.5 0
      "1021000710" 1 "Maize" 60 0
      "1021000807" 1 "Coffee all" 7 7
      "1021000807" 1 "" . .
      "1021000807" 2 "Banana food" 180 80
      "1021000807" 2 "Banana food" 30 0
      "1021000807" 2 "Banana food" 90 90
      "1021000807" 2 "Cassava" 17 17
      "102100080803" 1 "Banana food" 20 0
      "102100080803" 1 "Coffee all" 0 .
      "102100080803" 2 "Beans" 1 0
      "102100080803" 2 "Maize" 1 1
      "102100080803" 2 "Groundnuts" 2 0
      "102100080803" 2 "Maize" 1 1
      "102100080803" 21 "Cassava" 2 0
      "102100080803" 21 "Maize" 2.5 1
      "102100080803" 21 "Sweet potatoes" 1 0
      "102100080803" 21 "" . .
      "102100110201" 1 "Beans" 1 0
      "102100110201" 1 "" . .
      "102100110201" 2 "Cocoa" 4 4
      "102100110201" 2 "Coffee all" 1 1
      "102100110201" 2 "Groundnuts" 3 0
      "102100110201" 3 "Groundnuts" 4 2
      "102100110201" 3 "Beans" 1 0
      "102100110201" 3 "Cassava" 99999 .
      "102100110201" 4 "" . .
      "1021001109" 1 "Sweet potatoes" 32 0
      "1021001109" 1 "Banana food" 12 0
      "1021001109" 1 "Cassava" 2 0
      "1021001109" 1 "Maize" .5 0
      "1021001109" 2 "Yam" 10 6
      "1021001109" 2 "Sugarcane" 120 0
      "1021001109" 3 "" . .
      "1021001304" 1 "Banana food" 4 0
      "1021001304" 1 "Beans" 4 0
      "1021001304" 1 "Beans" 10 0
      "1021001304" 1 "Cassava" 5 0
      "1021001304" 1 "Maize" 1 0
      "1021001910" 1 "Banana food" 99999 .
      "1021002501" 1 "" . .
      "1021002610" 1 "Banana food" 10 0
      "1021002610" 1 "Coffee all" 99999 .
      "1021002610" 1 "Sweet potatoes" 8 3
      "1021002610" 1 "Cassava" 99999 .
      "1021002610" 1 "Yam" 5 0
      "1021002610" 1 "Groundnuts" 1 0
      "1021002610" 1 "Maize" 1 0
      "1021002610" 21 "Sweet potatoes" 2 0
      "1021002611" 1 "Maize" 1 0
      "1021002611" 1 "Maize" 2 2
      "1021002810" 1 "Beans" .5 0
      "1021002810" 1 "Maize" 2 0
      "1021003309" 21 "" . .
      "1033000301" 1 "Banana food" 60 0
      "1033000301" 1 "Maize" 15 0
      "1033000302" 1 "Beans" 3 1
      "1033000302" 1 "Cassava" 0 .
      "1033000302" 1 "Banana food" 30 20
      "1033000302" 1 "Beans" 1 0
      "1033000302" 1 "Coffee all" 0 .
      "1033000302" 1 "Fallow" 99999 .
      "1033000303" 1 "Sweet potatoes" 0 .
      "1033000303" 1 "Beans" 0 .
      "1033000303" 1 "Cassava" 0 .
      "1033000303" 1 "Maize" 3 2
      "1033000303" 1 "Banana food" 50 15
      "1033000303" 2 "" . .
      "1033000303" . "Fallow" 99999 .
      "1033000304" 1 "Banana food" 40 0
      "1033000304" 1 "Coffee all" 0 .
      "1033000304" 1 "Banana beer" 30 0
      "1033000304" 21 "Sweet potatoes" 0 .
      "1033000304" 22 "Maize" 0 .
      "1033000304" 22 "Beans" 1 0
      "1033000304" 22 "Maize" 20 0
      "103300030403" 1 "Banana food" 5 0
      "103300030403" 1 "Coffee all" 2 2
      "103300030403" 2 "Banana food" 6 0
      "103300030403" 21 "Beans" 1 0
      "103300030403" 21 "Cassava" 0 .
      "103300030403" 21 "Maize" 7.5 7
      "1033000305" 1 "" . .
      "1033000305" 2 "" . .
      "1033000305" 3 "" . .
      "1033000305" 21 "" . .
      "1033000307" 1 "Banana beer" 30 0
      "1033000307" 2 "Banana beer" 10 0
      end
      [/CODE]
      ------------------ copy up to and including the previous line ------------------

      Listed 100 out of 15403 observations
      Use the count() option to list more



      Comment


      • #4
        Mangji:
        you may want to try (I assume that -99999- codes missing values):
        Code:
        bysort HHID: egen tot_harv=total( Quan_harvested) if Quan_harvested!=99999
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Hi Carlo,

          Thank you! It worked! I used the same code to get the sum of the quan_sold for each household, but I want to subtract tot_sold from tot_harv. Is the following code correct? I am not too sure. Thanks for any advice!
          Code: bysort HHID: egen tot_sold=total(Quan_sold) if Quan_sold!=99999
          gen diff_harvest = cond(tot_harv) - cond(tot_sold)

          Comment


          • #6
            Mangji:
            I would go:
            Code:
            . bysort HHID: egen tot_sold=total( Quan_sold ) if Quan_sold!=99999
            
            . g tot_sold_2= tot_harv-tot_sold
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Hi Carlo,

              Thank you for your help! It worked! Can I collapse the data so that each household (HHID) number is matched to one tot_sold_2 value? Also, does collapsing affect the missing data in the dataset? Thanks again!

              code:

              collapse tot_sold_2, by (HHID)


              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str14 HHID double tot_sold_2
              "1013000204"       0
              "1021000108"      51
              "1021000108"      51
              "1021000113"      16
              "1021000113"      16
              "1021000113"      16
              "1021000113"      16
              "1021000113"       .
              "1021000408"       .
              "1021000408"       3
              "1021000408"       3
              "1021000710"       .
              "1021000710"    63.5
              "1021000710"    63.5
              "1021000807"     130
              "1021000807"     130
              "1021000807"     130
              "1021000807"     130
              "1021000807"     130
              "1021000807"     130
              "102100080803"  27.5
              "102100080803"  27.5
              "102100080803"  27.5
              "102100080803"  27.5
              "102100080803"  27.5
              "102100080803"  27.5
              "102100080803"  27.5
              "102100080803"  27.5
              "102100080803"  27.5
              "102100080803"  27.5
              "102100110201"     7
              "102100110201"     7
              "102100110201"     7
              "102100110201"     7
              "102100110201"     7
              "102100110201"     7
              "102100110201"     7
              "102100110201"     .
              "102100110201"     7
              "1021001109"   170.5
              "1021001109"   170.5
              "1021001109"   170.5
              "1021001109"   170.5
              "1021001109"   170.5
              "1021001109"   170.5
              "1021001109"   170.5
              "1021001304"      24
              "1021001304"      24
              "1021001304"      24
              "1021001304"      24
              "1021001304"      24
              "1021001910"       .
              "1021002501"       0
              "1021002610"      24
              "1021002610"       .
              "1021002610"      24
              "1021002610"       .
              "1021002610"      24
              "1021002610"      24
              "1021002610"      24
              "1021002610"      24
              "1021002611"       1
              "1021002611"       1
              "1021002810"     2.5
              "1021002810"     2.5
              "1021003309"       0
              "1033000301"      75
              "1033000301"      75
              "1033000302"      13
              "1033000302"      13
              "1033000302"      13
              "1033000302"      13
              "1033000302"      13
              "1033000302"       .
              "1033000303"      36
              "1033000303"      36
              "1033000303"      36
              "1033000303"      36
              "1033000303"      36
              "1033000303"      36
              "1033000303"       .
              "1033000304"      91
              "1033000304"      91
              "1033000304"      91
              "1033000304"      91
              "1033000304"      91
              "1033000304"      91
              "1033000304"      91
              "103300030403"  12.5
              "103300030403"  12.5
              "103300030403"  12.5
              "103300030403"  12.5
              "103300030403"  12.5
              "103300030403"  12.5
              "1033000305"       0
              "1033000305"       0
              "1033000305"       0
              "1033000305"       0
              "1033000307"     170
              "1033000307"     170
              end
              ------------------ copy up to and including the previous line ------------------

              Listed 100 out of 15403 observations
              Use the count() option to list more

              .



              Comment


              • #8
                Mangji:
                instead of -collapse- (which comes with the cautionary advice of saving your original dataset beforehand), you can operate on the original dataset:
                Code:
                 bysort HHID: gen flag=1 if _n==1
                (77 missing values generated)
                
                . tabstat tot_sold_2 if flag==1, stat(count mean sd p50 min max)
                
                    variable |         N      mean        sd       p50       min       max
                -------------+------------------------------------------------------------
                  tot_sold_2 |        20     36.05  46.97533        20         0     170.5
                --------------------------------------------------------------------------
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Hi Carlo,

                  Thanks for the advice! I learned a lot! The reason I want to collapse the data according to HHID is to merge the dataset with another dataset using HHID. Do I have the correct code to do so? Many thanks!

                  Code:
                  collapse tot_sold_2, by (HHID)

                  Comment


                  • #10
                    Mangji:
                    yes, your code is correct (I assume that you want the -mean- of -tot-sold-2- by HHID).
                    IHowever, it's better to make explicit what you want to get from -collapse-:
                    Code:
                    collapse (mean) tot_sold_2, by (HHID)
                    Missing values will disappear after -collapse-: be sure that it does not matter for your analysis.
                    At the risk of being pedantic, save your original dataset before -collapse-.
                    Last edited by Carlo Lazzaro; 28 Jul 2018, 07:29.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Hi Carlo,

                      Thank you! I will save my dataset before collapsing!

                      Comment


                      • #12
                        Hi Carlo,

                        I have collapse my data to match on HHID for each household and by the type of crops (food and cash crops). I am only interested in the food crop data and not sure how to delete all the cash crops for each households.

                        Here is my code:

                        collapse tot_sold_2, by (HHID Crop_namecodescat)

                        Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input str14 HHID double(Crop_namecodescat tot_sold_2)
                        "1013000204"   .                  0
                        "1021000108"   1                 51
                        "1021000113"   1                 16
                        "1021000408"   1                  3
                        "1021000710"   1               63.5
                        "1021000807"   1                130
                        "1021000807"   2                  0
                        "1021000807"   .                  0
                        "102100080803" 1               27.5
                        "102100080803" 2                  0
                        "102100080803" .                  0
                        "102100110201" 1                  7
                        "102100110201" 2                  0
                        "102100110201" .                  0
                        "1021001109"   1              170.5
                        "1021001109"   .                  0
                        "1021001304"   1                 24
                        "1021001910"   1                  .
                        "1021002501"   .                  0
                        "1021002610"   1                 24
                        "1021002610"   2                  .
                        "1021002611"   1                  1
                        "1021002810"   1                2.5
                        "1021003309"   .                  0
                        "1033000301"   1                 75
                        "1033000302"   1                 13
                        "1033000302"   2                  0
                        "1033000303"   1                 36
                        "1033000303"   .                  0
                        "1033000304"   1                 91
                        "1033000304"   2                  0
                        "103300030403" 1               12.5
                        "103300030403" 2                  0
                        "1033000305"   .                  0
                        "1033000307"   1                170
                        "1033000307"   .                  0
                        "103300030702" .                  0
                        "1033000308"   1                  6
                        "1033000308"   .                  0
                        "1033000309"   1                 37
                        "1033000310"   1                 20
                        "1033000310"   2                  0
                        "1033000310"   .                  0
                        "1033000502"   .                  0
                        "1033000504"   1                 51
                        "1033000504"   2                  0
                        "1033000504"   .                  0
                        "1033000505"   1                 31
                        "1033000505"   2                  0
                        "1033000506"   1                  9
                        "1033000507"   1                  8
                        "1033000509"   1                  3
                        "1033000510"   1                 17
                        "1033000510"   2                  0
                        "1033000511"   1                8.5
                        "103300051102" 1                 48
                        "103300051102" 2                  0
                        "103300051102" .                  0
                        "1041000207"   1                  0
                        "1041000210"   1                 15
                        "1043000402"   1                  8
                        "1043000402"   2                  0
                        "1043000403"   1               83.5
                        "1043000403"   2                  0
                        "1043000403"   .                  0
                        "1043000406"   1                1.5
                        "1043000407"   1                 54
                        "1043000408"   1                  5
                        "1043000408"   .                  0
                        "1043000409"   1 22.200000000000003
                        "104300040906" 1                503
                        "104300040906" 2                  0
                        "104300040908" 1                 72
                        "104300040908" 2                  0
                        "1043000410"   1                355
                        "1043000601"   1                  4
                        "1043000601"   .                  0
                        "1043000602"   1                 98
                        "1043000602"   2                  0
                        "1043000603"   1              152.5
                        "1043000603"   2                  0
                        "1043000603"   .                  0
                        "1043000604"   1               11.5
                        "1043000605"   1                  2
                        "1043000607"   1                 25
                        "1043000607"   2                  0
                        "1043000610"   1                  4
                        "1043000610"   2                  0
                        "1043000611"   1                 16
                        "1043000611"   .                  0
                        "1043000804"   1                 20
                        "1043000804"   .                  0
                        "1043000805"   1                  0
                        "1043000806"   1                 31
                        "1043000806"   2                  0
                        "1043000806"   .                  0
                        "104300080608" 1                 29
                        "1043000807"   1                 52
                        "1043000807"   2                  0
                        "1043000808"   1                 44
                        end
                        label values Crop_namecodescat Crop_namecodescatlab
                        label def Crop_namecodescatlab 1 "Food Crop", modify
                        label def Crop_namecodescatlab 2 "Cash Crop", modify
                        ------------------ copy up to and including the previous line ------------------

                        Listed 100 out of 3935 observations
                        Use the count() option to list more

                        Comment


                        • #13
                          If you want them gone from the dataset:

                          Code:
                          drop if Crop_namecodescat==2
                          If you also want those with missing values gone,
                          Code:
                          drop if Crop_namecodescat==.
                          Stata/MP 14.1 (64-bit x86-64)
                          Revision 19 May 2016
                          Win 8.1

                          Comment


                          • #14
                            Hi Carole,

                            Thanks for your help! I want to count how many different types of crops each household produce (HHID). For instance, HHID 1021000108 produced 2 types of crops and 1021000113 with 4 types. Not sure what the code should be in this case. Thank you for any advice!


                            Code:
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input str14 HHID str20 cropname
                            "1013000204"   ""                
                            "1021000108"   "Cassava"         
                            "1021000108"   "Banana food"     
                            "1021000113"   "Maize"           
                            "1021000113"   "Beans"           
                            "1021000113"   "Dodo"            
                            "1021000113"   "Yam"             
                            "1021000408"   "Banana food"     
                            "1021000408"   "Maize"           
                            "1021000408"   "Beans"           
                            "1021000710"   "Maize"           
                            "1021000710"   "Beans"           
                            "1021000710"   "Banana food"     
                            "1021000807"   "Coffee all"      
                            "1021000807"   "Natural pastures"
                            "1021000807"   "Banana food"     
                            "1021000807"   "Cassava"         
                            "102100080803" "Banana food"     
                            "102100080803" "Coffee all"      
                            "102100080803" "Maize"           
                            "102100080803" "Beans"           
                            "102100080803" "Maize"           
                            "102100080803" "Groundnuts"      
                            "102100080803" "Maize"           
                            "102100080803" "Cassava"         
                            "102100080803" "Sweet potatoes"  
                            "102100080803" ""                
                            "102100110201" ""                
                            "102100110201" ""                
                            "102100110201" "Groundnuts"      
                            "102100110201" "Coffee all"      
                            "102100110201" "Cocoa"           
                            "102100110201" "Groundnuts"      
                            "102100110201" "Beans"           
                            "102100110201" "Cassava"         
                            "102100110201" ""                
                            "1021001109"   "Sweet potatoes"  
                            "1021001109"   "Cassava"         
                            "1021001109"   "Banana food"     
                            "1021001109"   "Maize"           
                            "1021001109"   "Yam"             
                            "1021001109"   "Sugarcane"       
                            "1021001109"   ""                
                            "1021001304"   "Maize"           
                            "1021001304"   "Beans"           
                            "1021001304"   "Cassava"         
                            "1021001304"   "Banana food"     
                            "1021001910"   "Banana food"     
                            "1021002501"   ""                
                            "1021002610"   "Sweet potatoes"  
                            "1021002610"   "Banana food"     
                            "1021002610"   "Coffee all"      
                            "1021002610"   "Cassava"         
                            "1021002610"   "Yam"             
                            "1021002610"   "Maize"           
                            "1021002610"   "Groundnuts"      
                            "1021002610"   "Sweet potatoes"  
                            "1021002611"   "Maize"           
                            "1021002810"   "Maize"           
                            "1021002810"   "Beans"           
                            "1021003309"   ""                
                            "1033000301"   "Maize"           
                            "1033000301"   "Banana food"     
                            "1033000302"   "Beans"           
                            "1033000302"   "Cassava"         
                            "1033000302"   "Beans"           
                            "1033000302"   "Banana food"     
                            "1033000302"   "Coffee all"      
                            "1033000302"   ""                
                            "1033000303"   "Sweet potatoes"  
                            "1033000303"   "Beans"           
                            "1033000303"   "Cassava"         
                            "1033000303"   "Maize"           
                            "1033000303"   "Banana food"     
                            "1033000303"   ""                
                            "1033000303"   ""                
                            "1033000304"   "Banana food"     
                            "1033000304"   "Coffee all"      
                            "1033000304"   "Banana beer"     
                            "1033000304"   "Sweet potatoes"  
                            "1033000304"   "Maize"           
                            "1033000304"   "Maize"           
                            "1033000304"   "Beans"           
                            "103300030403" "Banana food"     
                            "103300030403" "Coffee all"      
                            "103300030403" "Banana food"     
                            "103300030403" "Maize"           
                            "103300030403" "Beans"           
                            "103300030403" "Cassava"         
                            "1033000305"   ""                
                            "1033000305"   ""                
                            "1033000305"   ""                
                            "1033000305"   ""                
                            "1033000307"   "Banana beer"     
                            "1033000307"   "Beans"           
                            "1033000307"   "Banana beer"     
                            "1033000307"   "Cassava"         
                            "1033000307"   "Banana food"     
                            "1033000307"   ""                
                            "1033000307"   ""                
                            end
                            ------------------ copy up to and including the previous line ------------------

                            Comment


                            • #15
                              For this solution, you need to install the package -egenmore-. To install:
                              Code:
                              ssc install egenmore
                              Then you can use the command:
                              Code:
                              bysort HHID: egen num_crop=nvals( cropname)
                              When there are multiple cases of the same crop in the household (banana beer in household 1033000307 occurs twice) this calculation just counts it once (so 4 crops in that household).
                              Stata/MP 14.1 (64-bit x86-64)
                              Revision 19 May 2016
                              Win 8.1

                              Comment

                              Working...
                              X