Selection by zone and rank

Harish Kumar

Join Date: Apr 2019
Posts: 13

Selection by zone and rank

27 Nov 2019, 07:56

Hi,

I want to select 10 districts out of 24 districts by zone and a rank variable. There are 5 zones in the data. Under each zone I want to select 2 districts which should be on lowest and highest rank value (poor performing and best performing districts). A sample data set is pasted below. Kindly suggest some codes, I shall be thankful to you for this.

With Thanks
Harish

district	children(6-59 months)	children(5-9 years)	adolescents(10-19 years)	pregnant_women	average_value	rank	zone	zonecode
Pashchimi Singhbhum	5.4	12.5	16.3	86.2	30.1	13	Kolhan division	1
Purbi Singhbhum	1.0	3.3	23.2	88.0	28.9	15	Kolhan division	1
Saraikela	3.2	3.0	24.1	81.0	27.8	17	Kolhan division	1
Dhanbad	1.4	13.4	95.0	67.5	44.3	2	North Chotanagpur division	2
Kodarma	4.5	10.3	46.4	73.5	33.7	7	North Chotanagpur division	2
Bokaro	1.4	11.9	44.5	71.5	32.3	8	North Chotanagpur division	2
Giridih	1.7	10.3	11.6	90.6	28.6	16	North Chotanagpur division	2
Ramgarh	4.0	0.0	18.1	86.3	27.1	19	North Chotanagpur division	2
Chatra	2.2	0.0	9.0	95.0	26.6	20	North Chotanagpur division	2
Hazaribagh	3.1	0.0	0.0	95.0	24.5	22	North Chotanagpur division	2
Palamu	0.4	0.0	39.9	87.0	31.8	10	Palamu division	3
Latehar	3.1	0.6	10.6	95.0	27.3	18	Palamu division	3
Garhwa	1.8	0.2	4.2	95.0	25.3	21	Palamu division	3
Dumka	8.4	30.7	39.2	86.8	41.3	3	Santhal Pargana division	4
Deoghar	1.4	5.1	64.6	69.3	35.1	4	Santhal Pargana division	4
Godda	1.5	5.1	46.3	87.0	35.0	5	Santhal Pargana division	4
Pakur	1.1	16.6	23.6	79.5	30.2	12	Santhal Pargana division	4
Jamtara	2.4	0.0	0.0	77.6	20.0	23	Santhal Pargana division	4
Sahibganj	1.9	0.0	0.0	70.8	18.2	24	Santhal Pargana division	4
Lohardaga	5.8	35.7	76.4	92.6	52.6	1	South Chotanagpur division	5
Simdega	0.9	1.9	38.1	95.0	34.0	6	South Chotanagpur division	5
Khunti	7.7	3.3	36.0	81.4	32.1	9	South Chotanagpur division	5
Ranchi	0.7	18.4	52.2	55.0	31.6	11	South Chotanagpur division	5
Gumla	3.2	9.2	56.8	49.2	29.6	14	South Chotanagpur division	5

Tags: None

Lakshman Balaji

Join Date: Oct 2019

Posts: 4
#2

27 Nov 2019, 09:30

Hi Harish,

If I understand you correctly, you would like to find out what the highest and lowest ranked district within each zone are.

Try this.

Code:

bysort zone: egen group_rank = rank(rank) encode zone, gen(zone_coded) levelsof zone_coded, local(divs) foreach d of local divs{ di "Rankings of districts in Zone" " " `d' list zone district group_rank if zone_coded == `d' }

Should give you a printout for the districts in every zone, with a group rank attached.

Best,
Lakshman

Last edited by Lakshman Balaji; 27 Nov 2019, 09:34.
1 like
Comment

Lakshman Balaji

Join Date: Oct 2019
Posts: 4

27 Nov 2019, 12:02

Another way to do this would be:

Code:


bysort zone: egen group_rank = rank(rank)
bysort zone: egen Minimum = min(group_rank)
bysort zone: egen Maximum = max(group_rank)

gen samemin = (group_rank == Minimum)
gen samemax = (group_rank == Maximum)


encode zone, gen(zone_coded)

 levelsof zone_coded, local(divs)

 foreach d of local divs {

 di "Minimum ranked district in Zone" " " `d'
 list zone district group_rank if zone_coded == `d' & samemin == 1
 di "Maximum ranked district in Zone" " " `d'
 list zone district group_rank if zone_coded == `d' & samemax == 1
 
 }

Best,
Lakshman

Comment

Harish Kumar

Join Date: Apr 2019

Posts: 13
#4

28 Nov 2019, 03:37

Thanks Lakshman!

The second set of codes is fine for my purpose. I am not generating any other rank variable but using the one which is there in my data.
I was using the 'sample' command in stata and it was returning me the only selected districts list in my data browser. I was wondering if same thing can be applied here as well.

Thanks
Harish
Comment
Lakshman Balaji

Join Date: Oct 2019

Posts: 4
#5

28 Nov 2019, 09:44

Harish,

I too used the same rank variable that was already there in your data. The group_rank variable that I created was just to help identify the first and last ranking districts within zones.

Yes, this code should work on the entire dataset as well. The printed output with the messages might be too large, though. In that case, I would recommend subsetting the data using the samemin column to get a dataset of all the minimum ranked districts, using the samemax column to get a dataset of all the maximum ranked districts, and then joining the two datasets.

Best,
Lakshman

Last edited by Lakshman Balaji; 28 Nov 2019, 09:50.
1 like
Comment
Harish Kumar

Join Date: Apr 2019

Posts: 13
#6

02 Dec 2019, 04:02

Thanks Lakshman!
Comment

Announcement

Selection by zone and rank

Comment

Comment

Comment

Comment

Comment