Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Differentiating between different types of farmers - how can I find out exactly how many?

    Hi Statalist!

    I was wondering if anyone could help me, I am new to Statalist and I can’t seem to find the answers I am looking for.

    I have survey data at the individual level and I would like to differentiate between coffee farmers and maize farmers.
    Some farmers are engaged in more than one type of farming (e.g. coffee, bananas and potatoes) and thats okay as long as a farmer is not engaged in both coffee AND maize farming.
    I would like to separate the effects/ identify differences of being a coffee farmer vs being a maize farmer.

    I have 8 variables that describe up to 8 different crops that a farmer grows (firstcrop, secondcrop, thirdcrop, fourthcrop etc) and these variables can take values up to 50 different numbers.
    For example:
    iid
    firstcrop
    secondcrop
    thirdcrop
    fourthcrop
    fifthcrop
    sixthcrop
    seventhcrop
    eighthcrop
    01 01 24 19 40 - - - -
    02 17 01 - - - - - -
    03 19 02 - - - - - -
    So the first farmer (with iid 1) grows coffee (01) as their first crop, rice (24) as their second crop, maize (19) as their third crop and pineapples (40) as their fourth.
    The second farmer grows sweet potatoes (17) as their first crop and coffee (01) as their second crop.
    The third farmer grows maize (19) as their first crop and tea (02) as their second crop.

    So what I would like to do is:

    I would like to find out:
    1. How many farmers grow coffee but do not grow maize at all (so how many farmers are like the second farmer in the table above)
    1. How many farmers grow maize but do not grow coffee at all (so how many farmers are like the third farmer in the table above)
    1. And then how many farmers grow both coffee and maize (so how many farmers are like the first farmer in the table above)
    Please note - some farmers actually grow 5/6/7 different crops, this is just a simplified example.

    I’m not quite sure how to go about this and I would appreciate any help I can get.

    Thank you in advance!

    Kevin

  • #2
    Sorry, about the three 1s, should be 1. 2. and 3. of course.

    Comment


    • #3
      While this can be done with the wide data layout you have, it is simpler if we first go to long layout. Moreover, most if not all of what you will want to do subsequently is also easier in long layout. (Almost all thing in Stata are.) If you have a compelling reason to return to wide layout for further work, -reshape wide- will take you back.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte(iid firstcrop secondcrop thirdcrop fourthcrop fifthcrop sixthcrop seventhcrop eighthcrop)
      1 1 24 19 40 . . . .
      2 17 1 . . . . . .
      3 19 2 . . . . . .
      end
      
      reshape long @crop, i(iid) j(seq) string
      
      by iid, sort: egen byte grows_maize = max(crop == 19)
      by iid: egen byte grows_coffee = max(crop == 1)
      gen byte coffee_but_not_maize = grows_coffee & !grows_maize
      gen byte maize_but_not_coffee = grows_maize & !grows_coffee
      gen byte grows_both = grows_maize & grows_coffee
      will get you new variables in the data set which identify which farmers grow either but not the other, and which grow both. You said you want to count how many such their are, but you don't say in what form you want that. You might just want the counts displayed in the Results window and your log fie. Or maybe you want new variables in the data set. I'll assume the former here:
      Code:
      egen iid_flag = tag(iid) // AVOID REPEAT COUNTING OF SAME FARMER
      foreach v of varlist grows_maize-grows_both {
          display `"`v'"'
          count if `v' & iid_flag
      }

      In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      Last edited by Clyde Schechter; 18 Jan 2020, 16:38.

      Comment


      • #4
        Clyde Schechter thank you so much for your help! The commands work and I definitely would not have figured that out myself.

        Just curious, if I wanted to create a variable that identified farmers that grew coffee as either their firstcrop, secondcrop or thirdcrop, what command would I have to run then?

        Comment


        • #5
          Working with the long data layout created by the earlier solutions:
          Code:
          by iid, sort: egen coffee_123 = max(crop == 1 & inlist(seq, "first", "second", "third"))
          However, if this were the only such indicator to be made, this particular one would be a bit simpler in the original wide data:

          Code:
          gen byte coffee_123 = inlist(1, firstcrop, secondcrop, thirdcrop)

          Comment


          • #6
            Amazing, thank you once again Clyde! I appreciate your help.

            Comment

            Working...
            X