Differentiating between different types of farmers - how can I find out exactly how many?

Kevin Marian

Join Date: Jan 2020
Posts: 14

Differentiating between different types of farmers - how can I find out exactly how many?

18 Jan 2020, 16:10

Hi Statalist!

I was wondering if anyone could help me, I am new to Statalist and I can’t seem to find the answers I am looking for.

I have survey data at the individual level and I would like to differentiate between coffee farmers and maize farmers.
Some farmers are engaged in more than one type of farming (e.g. coffee, bananas and potatoes) and thats okay as long as a farmer is not engaged in both coffee AND maize farming.
I would like to separate the effects/ identify differences of being a coffee farmer vs being a maize farmer.

I have 8 variables that describe up to 8 different crops that a farmer grows (firstcrop, secondcrop, thirdcrop, fourthcrop etc) and these variables can take values up to 50 different numbers.
For example:

iid	firstcrop	secondcrop	thirdcrop	fourthcrop	fifthcrop	sixthcrop	seventhcrop	eighthcrop
01	01	24	19	40	-	-	-	-
02	17	01	-	-	-	-	-	-
03	19	02	-	-	-	-	-	-

So the first farmer (with iid 1) grows coffee (01) as their first crop, rice (24) as their second crop, maize (19) as their third crop and pineapples (40) as their fourth.
The second farmer grows sweet potatoes (17) as their first crop and coffee (01) as their second crop.
The third farmer grows maize (19) as their first crop and tea (02) as their second crop.

So what I would like to do is:

I would like to find out:

How many farmers grow coffee but do not grow maize at all (so how many farmers are like the second farmer in the table above)

How many farmers grow maize but do not grow coffee at all (so how many farmers are like the third farmer in the table above)

And then how many farmers grow both coffee and maize (so how many farmers are like the first farmer in the table above)

Please note - some farmers actually grow 5/6/7 different crops, this is just a simplified example.

I’m not quite sure how to go about this and I would appreciate any help I can get.

Thank you in advance!

Kevin

Tags: cross tabulation, frequency, statistics, summary, survey

Kevin Marian

Join Date: Jan 2020

Posts: 14
#2

18 Jan 2020, 16:13

Sorry, about the three 1s, should be 1. 2. and 3. of course.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#3

18 Jan 2020, 16:35

While this can be done with the wide data layout you have, it is simpler if we first go to long layout. Moreover, most if not all of what you will want to do subsequently is also easier in long layout. (Almost all thing in Stata are.) If you have a compelling reason to return to wide layout for further work, -reshape wide- will take you back.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte(iid firstcrop secondcrop thirdcrop fourthcrop fifthcrop sixthcrop seventhcrop eighthcrop) 1 1 24 19 40 . . . . 2 17 1 . . . . . . 3 19 2 . . . . . . end reshape long @crop, i(iid) j(seq) string by iid, sort: egen byte grows_maize = max(crop == 19) by iid: egen byte grows_coffee = max(crop == 1) gen byte coffee_but_not_maize = grows_coffee & !grows_maize gen byte maize_but_not_coffee = grows_maize & !grows_coffee gen byte grows_both = grows_maize & grows_coffee

will get you new variables in the data set which identify which farmers grow either but not the other, and which grow both. You said you want to count how many such their are, but you don't say in what form you want that. You might just want the counts displayed in the Results window and your log fie. Or maybe you want new variables in the data set. I'll assume the former here:

Code:

egen iid_flag = tag(iid) // AVOID REPEAT COUNTING OF SAME FARMER foreach v of varlist grows_maize-grows_both { display `"`v'"' count if `v' & iid_flag }

In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Last edited by Clyde Schechter; 18 Jan 2020, 16:38.
1 like
Comment
Kevin Marian

Join Date: Jan 2020

Posts: 14
#4

19 Jan 2020, 16:14

Clyde Schechter thank you so much for your help! The commands work and I definitely would not have figured that out myself.

Just curious, if I wanted to create a variable that identified farmers that grew coffee as either their firstcrop, secondcrop or thirdcrop, what command would I have to run then?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#5

19 Jan 2020, 16:43

Working with the long data layout created by the earlier solutions:

Code:

by iid, sort: egen coffee_123 = max(crop == 1 & inlist(seq, "first", "second", "third"))

However, if this were the only such indicator to be made, this particular one would be a bit simpler in the original wide data:

Code:

gen byte coffee_123 = inlist(1, firstcrop, secondcrop, thirdcrop)
1 like
Comment
Kevin Marian

Join Date: Jan 2020

Posts: 14
#6

19 Jan 2020, 17:17

Amazing, thank you once again Clyde! I appreciate your help.
Comment

Announcement

Differentiating between different types of farmers - how can I find out exactly how many?

Comment

Comment

Comment

Comment

Comment