Calculating proportions of several categories

Kamala Kaghoma

Join Date: Dec 2019

Posts: 26
#1

Calculating proportions of several categories

07 Dec 2019, 18:41

Hello dear all,
I am facing a serious problem to use the command proportion. I have a series of neighborhoods (psu) , within milieu which is also within a province. I am trying to get the proportion of people of the same tribe in each neighborhood in vain. In my dataset tribe is coded as 3digits figure. Whenever I try
by prov milieu psu, sort: prop tribe
Stata refuses to apply prop and responds that proportion does not work with "b". When I change and
proportion (tribe), over (psu) Stata says I have too much options while the same command works with a variable like "sex" with only two options instead of "psi" which corresponds to 660 modalities. I have tried to write loops for that but still I cannot manage to get the right one. I will much appreciate you help.
Thanks in advance
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

07 Dec 2019, 21:58

Well, if -proportion- could be combined with -by-, think what you would get: you would get 661 outputs of proportion. If you have, say, a total of 5 tribes, that's over 3300 individual proportions listed out in your Results window and log file. It would be nearly impossible to make any use of that data in that form.

You didn't provide example data, so I'm emulating a tiny toy version of your data set by changing the names of some variables in the auto.dta that comes with your Stata data set. This only has 2 "psu"s and 5 "tribe"s but it will work no matter how many of these things you have:

Code:

sysuse auto, clear rename rep78 tribe rename foreign psu keep psu tribe levelsof tribe, local(tribes) foreach t of local tribes { by psu, sort: egen tribe`t'_prop = mean(`t'.tribe) }

The code assumes that tribe is a numeric variable. If it is a string, then -encode- it to create a numeric equivalent and use that instead.

At the end of this code, your data set will contain new variables, one for each tribe. The value of a tribe's variable in any observation will be that tribe's proportion in that observation's psu. Evidently this information is very repetitious, and if what you ultimately want is one observation per psu containing the psu identifier and the tribe proportions, use the -collapse- command at the end.
Comment

Kamala Kaghoma

Join Date: Dec 2019
Posts: 26

08 Dec 2019, 04:59

Many many thanks Clyde. I do appreciate your prompt reaction to my request.

I am sorry, my request was not very clear. and the advice you have provided is really clear. However, it cannot yet allow me to have a unique variable containing the computed proportions of each tribe. In fact, my final idea is to have a variable which is such that in any psu, all the individuals from the same tribe having the same proportion (of fellows of their specific tribe). I have tried to mimic the structure of the data I have using the same coding for the tribe and psu as it is in the dataset.

id	prov	milieu	psu	tribe
1	lemba	city	1	11
2	lemba	city	1	33
3	lemba	city	1	11
4	lemba	village	1	36
5	lemba	village	1	39
6	lemba	village	1	39
7	lemba	village	1	39
8	lemba	Town	2	11
9	lemba	Town	2	11
10	lemba	Town	3	11
11	lemba	city	4	11
12	lemba	city	4	36
13	lemba	city	4	36
14	lemba	Town	3	39
15	lemba	Town	3	39
16	lemba	Town	3	33
17	lemba	Town	3	38
18	lemba	Town	2	40
19	Kazozo	Town	6	68
20	Kazozo	Town	6	68
21	Kazozo	Town	6	68
22	Kazozo	Town	6	39
23	Kazozo	Town	6	38
24	Kazozo	village	6	39
25	Kazozo	village	7	11
26	Kazozo	village	7	11
27	Kazozo	village	7	11
28	Kazozo	village	7	11
29	Kazozo	village	7	11
30	Kazozo	village	7	36
31	Kazozo	village	7	36
32	Kazozo	city	8	36
33	Kazozo	city	8	36
34	Kazozo	city	8	36
35	Kazozo	city	8	39
36	Kazozo	city	8	39
37	Kazozo	city	8	39
38	Kazozo	city	8	38
39	Kazozo	village	9	38
40	Kazozo	village	9	11
41	Kazozo	village	10	68
42	Kazozo	village	10	68
43	Kazozo	Town	10	11

With this the idea would thus be to have an additional column containing the proportion of members of each tribe corresponding to each member of the same tribe in the same PSU.
Many thanks again

Comment

Kamala Kaghoma

Join Date: Dec 2019
Posts: 26

08 Dec 2019, 05:01

id	prov	milieu	psu	tribe
1	lemba	city	1	11
2	lemba	city	1	33
3	lemba	city	1	11
4	lemba	village	1	36
5	lemba	village	1	39
6	lemba	village	1	39
7	lemba	village	1	39
8	lemba	Town	2	11
9	lemba	Town	2	11
10	lemba	Town	3	11
11	lemba	city	4	11
12	lemba	city	4	36
13	lemba	city	4	36
14	lemba	Town	3	39
15	lemba	Town	3	39
16	lemba	Town	3	33
17	lemba	Town	3	38
18	lemba	Town	2	40
19	Kazozo	Town	6	68
20	Kazozo	Town	6	68
21	Kazozo	Town	6	68
22	Kazozo	Town	6	39
23	Kazozo	Town	6	38
24	Kazozo	village	6	39
25	Kazozo	village	7	11
26	Kazozo	village	7	11
27	Kazozo	village	7	11
28	Kazozo	village	7	11
29	Kazozo	village	7	11
30	Kazozo	village	7	36
31	Kazozo	village	7	36
32	Kazozo	city	8	36
33	Kazozo	city	8	36
34	Kazozo	city	8	36
35	Kazozo	city	8	39
36	Kazozo	city	8	39
37	Kazozo	city	8	39
38	Kazozo	city	8	38
39	Kazozo	village	9	38
40	Kazozo	village	9	11
41	Kazozo	village	10	68
42	Kazozo	village	10	68
43	Kazozo	Town	10	11

Comment

Kamala Kaghoma

Join Date: Dec 2019

Posts: 26
#5

08 Dec 2019, 05:01

Thanks to consider the second table.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30119

08 Dec 2019, 14:58

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str6 prov str7 milieu byte(psu tribe)
 1 "lemba"  "city"     1 11
 2 "lemba"  "city"     1 33
 3 "lemba"  "city"     1 11
 4 "lemba"  "village"  1 36
 5 "lemba"  "village"  1 39
 6 "lemba"  "village"  1 39
 7 "lemba"  "village"  1 39
 8 "lemba"  "Town"     2 11
 9 "lemba"  "Town"     2 11
10 "lemba"  "Town"     3 11
11 "lemba"  "city"     4 11
12 "lemba"  "city"     4 36
13 "lemba"  "city"     4 36
14 "lemba"  "Town"     3 39
15 "lemba"  "Town"     3 39
16 "lemba"  "Town"     3 33
17 "lemba"  "Town"     3 38
18 "lemba"  "Town"     2 40
19 "Kazozo" "Town"     6 68
20 "Kazozo" "Town"     6 68
21 "Kazozo" "Town"     6 68
22 "Kazozo" "Town"     6 39
23 "Kazozo" "Town"     6 38
24 "Kazozo" "village"  6 39
25 "Kazozo" "village"  7 11
26 "Kazozo" "village"  7 11
27 "Kazozo" "village"  7 11
28 "Kazozo" "village"  7 11
29 "Kazozo" "village"  7 11
30 "Kazozo" "village"  7 36
31 "Kazozo" "village"  7 36
32 "Kazozo" "city"     8 36
33 "Kazozo" "city"     8 36
34 "Kazozo" "city"     8 36
35 "Kazozo" "city"     8 39
36 "Kazozo" "city"     8 39
37 "Kazozo" "city"     8 39
38 "Kazozo" "city"     8 38
39 "Kazozo" "village"  9 38
40 "Kazozo" "village"  9 11
41 "Kazozo" "village" 10 68
42 "Kazozo" "village" 10 68
43 "Kazozo" "Town"    10 11
end

gen proportion = .
levelsof tribe, local(tribes)
foreach t of local tribes {
   by psu, sort: egen tribe`t'_prop = mean(`t'.tribe)
   replace proportion = tribe`t'_prop if tribe == `t'
}
drop tribe*_prop

In the future, when showing data examples, please use the -dataex- command to do so, as I have in this example. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Comment

Kamala Kaghoma

Join Date: Dec 2019

Posts: 26
#7

10 Dec 2019, 04:01

Dear Clyde,
Many thanks for the guidance and the advice on the use of -dataex-. This is much appreciated. Many thanks.
Comment
Kamala Kaghoma

Join Date: Dec 2019

Posts: 26
#8

26 Dec 2019, 14:43

Dear all,

I am sorry to come back with a request for help. I want to build a variable which, for each observation, will give me the number of migrants prior to the observed individuals.. I have the duration in the migrant destination place (durstay) in the last column of the date sa-et below. The idea would thus be to have for each observation the number of other individuals with a duration of stay greater than its own. I’ve tried several unsuccessful things.

I will much appreciate your help.

clear
input byte id str6 prov str7 milieu byte(psu tribe durstay)
1 "lemba" "city" 1 11 2
2 "lemba" "city" 1 33 4
3 "lemba" "city" 1 11 4
4 "lemba" "village" 1 36 2
5 "lemba" "village" 1 39 1
6 "lemba" "village" 1 39 2
7 "lemba" "village" 1 39 3
8 "lemba" "Town" 2 11 5
9 "lemba" "Town" 2 11 10
10 "lemba" "Town" 3 11 11
11 "lemba" "city" 4 11 1
12 "lemba" "city" 4 36 4
13 "lemba" "city" 4 36 4
14 "lemba" "Town" 3 39 7
15 "lemba" "Town" 3 39 8
16 "lemba" "Town" 3 33 3
17 "lemba" "Town" 3 38 1
18 "lemba" "Town" 2 40 1
19 "Kazozo" "Town" 6 68 2
20 "Kazozo" "Town" 6 68 3
21 "Kazozo" "Town" 6 68 4
22 "Kazozo" "Town" 6 39 2
23 "Kazozo" "Town" 6 38 4
24 "Kazozo" "village" 6 39 5
25 "Kazozo" "village" 7 11 2
26 "Kazozo" "village" 7 11 2
27 "Kazozo" "village" 7 11 3
28 "Kazozo" "village" 7 11 2
29 "Kazozo" "village" 7 11 3
30 "Kazozo" "village" 7 36 1
31 "Kazozo" "village" 7 36 5
32 "Kazozo" "city" 8 36 2
33 "Kazozo" "city" 8 36 6
34 "Kazozo" "city" 8 36 8
35 "Kazozo" "city" 8 39 7
36 "Kazozo" "city" 8 39 3
37 "Kazozo" "city" 8 39 4
38 "Kazozo" "city" 8 38 9
39 "Kazozo" "village" 9 38 1
40 "Kazozo" "village" 9 11 2
41 "Kazozo" "village" 10 68 3
42 "Kazozo" "village" 10 68 2
43 "Kazozo" "Town" 10 11 5
end
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#9

26 Dec 2019, 15:09

Code:

rangestat (count) num_with_longer_stay = durstay, interval(durstay 1 .)

Note, you do not say you want to do this separately by prov, so the code above does a count over the entire data set. If you want it separately by prove, just add a -by()- option to the command.

-rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer, and is available from SSC.

This question is not really relevant to the topic that started this thread. In the future, when switching topic, please start a new thread. While it is tempting to think of these threads as a dialog between a questioner and an answerer, in fact, other people read a long, and still others come and search this Forum for previous answers to their questions. Their ability to do that relies upon the titles of the threads being correct. So please always choose a descriptive title for your thread, and start a new one if you want to go off topic.
Comment
Kamala Kaghoma

Join Date: Dec 2019

Posts: 26
#10

27 Dec 2019, 00:52

Dear Clyde, Many thanks for all. I really appreciate your help.
Comment
Kamala Kaghoma

Join Date: Dec 2019

Posts: 26
#11

31 Dec 2019, 15:48

&

Last edited by Kamala Kaghoma; 31 Dec 2019, 15:55.
Comment
Kamala Kaghoma

Join Date: Dec 2019

Posts: 26
#12

31 Dec 2019, 15:52

Dear Clyde,
Sorry to come back to you on the same issue. I have the impression that rangestat does not work with more than one condition or does not allow some traditional other commands of STATA. In fact based on my earlier example I wanted to consider two conditions: associating , in each psu, to each individual a new observation that will be obtained from a variable capturing the number of people of the same tribe and with a longer stay than her/himself. I’ve tryed unsuccesful several things like:

Code:

rangestat (count) num_longer_stay = durstay, interval(durstay 1 .) & tribe== tribe[`i'] by(psu)

or

Code:

keep if num_longer_stay!=.

The first one does not work. While the second works as a Stata command it does not really provide what I want. Is there anyway of combining any other traditional command with rangestat?

Many thanks once more for your help.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#13

31 Dec 2019, 17:11

-rangestat- does not allow the kind of syntax you proposed. BUT if I grasp what you want here, you can get it very simply with:

Code:

rangestat (count) num_longer_stay = durstay, interval(durstay 1 .) by(psu tribe)

should do what I think you are getting at in #12.

As for -keep if num_longer_stay != .- not doing what you want, what is it that you do want? I'm sure there is some way to get it if you make it clear.
Comment
Kamala Kaghoma

Join Date: Dec 2019

Posts: 26
#14

01 Jan 2020, 00:52

.

Last edited by Kamala Kaghoma; 01 Jan 2020, 00:55.
Comment
Kamala Kaghoma

Join Date: Dec 2019

Posts: 26
#15

01 Jan 2020, 00:54

Dear Clyde, Many thanks.
I tried

Code:

rangestat (count) num_longer_stay = durstay, interval(durstay 1 .) by(psu tribe)

several times and was misreading the results. I've just tried it again and see that it really gives the figures I need. Many thanks once more and Happy new year 2020.
Comment

Announcement