Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generating variable for specific responses in multiple observations

    Hello

    I have a dataset which which lists various crops grown(var - crop_a) on different plots by various farming households(var - a01). One household can have multiple plots( var- plotid) and subsequently multiple crops. I am trying to calculate the number of farmeing household
    1. who grow only one crop (say, maize) and do not grow any other crop
    2. who grow both maize and betel leaf (two different crops) but no other crops.
    3. who grow both maize and betel leaf (two different crops) while they grow other crops too.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(a01 plotid) int crop_a
       1    3  19
       1    3 701
       1    2 603
       1  3.1  19
       2    .   .
       3  2.1  19
       3    5 701
       3    3  19
       3    2  19
       4    2 603
       4    2 603
       4    5  19
       4    4 603
       4    7 701
       5    3 603
       5    2  19
       5  3.1  19
       5    2 701
       5  2.1  19
       6    2  19
       6    .   .
       7    2 603
       7  2.1 603
       8    .   .
       9    2  19
       9    3 603
      10    7 701
      10    3  19
    10.1    9 603
    
    end
    label values crop_a crop_a
    label def crop_a 16 "aman  (hyv)", modify
    label def crop_a 19 "boro (hyv)", modify
    label def crop_a 20 "boro (hybrid)", modify
    label def crop_a 23 "maize", modify
    label def crop_a 603 "betel leaf", modify
    label def crop_a 701 "paddy seedling", modify
    For the 1st point, I am creating max and min functions and then creating a conditional dummy variable.
    bys a01: egen max_crop = max(crop_a)
    bys a01: egen min_crop = min(crop_a)
    gen count_dummy = cond(max_crop = 23 & min_crop = 23, 1,0)


    But for the other two, i cant find a way out. Please help on this . (Sorry for the ambiguous title)

    Smriti


  • #2
    You were on the right track thinking about the role of -egen, max()- and -egen, min()-. But I think the better way to do this is to start with separate indicators for growing maize, growing betel leaf, and growing anything else. Then they can be combined with simple boolean operators to get the variables you want.

    The matter is complicated by the existence of some observations where crop_a has a missing value. As you do not explain what this means, I am interpreting it as "nothing grown on this plot."

    With those considerations:
    Code:
    by a01, sort: egen byte grow_maize = max(crop_a == 23)
    by a01: egen byte grow_betel_leaf = max(crop_a == 603)
    by a01: egen byte grow_other = max(!inlist(crop_a, 23, 603, .))
    
    gen byte grow_only_maize = grow_maize & !grow_betel_leaf & !grow_other
    gen byte grow_maize_and_betel_leaf_only = grow_maize & grow_betel_leaf & !grow_other
    gen byte grow_maize_betel_leaf_and_other = grow_maize & grow_betel_leaf & grow_other

    Comment

    Working...
    X