Coding loop for share of caste per village variable

Mol Nic

Join Date: Sep 2017

Posts: 11
#1

Coding loop for share of caste per village variable

12 Jan 2018, 15:42

Hello,

I need help with a simple loop.

I need to create a proportion of caste group per village variable. At the moment I have variables of caste and share separately and there can be up to 7 caste groups per village (which may repeat because they are aggregated from subgroups). Overall there are 5 major groups.

For example, I have caste groups recorded in A-G --- variables v2A - v2G - for different castes per village and I have v4A-v4G for proportion of this caste per village respectively.

I need to sum up these A-G for each caste, and there are 5 types of caste groups.

if done manually I guess it would look something like this: egen caste1=rowtotal(v2A v2B.....etc) if V4A==1 & V4B==1 & (for caste1 and the same for caste 4-5)

I cannot figure out what would be a more efficient loop?

Thanks for helping out!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#2

12 Jan 2018, 15:58

If you want help with code, it is better to post an example of your data, using the -datex- command, than to try to describe it. Personally, I don't understand from your description what your data look like. Perhaps somebody else does and will respond. But if you don't get a helpful response shortly, I would repost showing an example of the data. Instructions for getting and using the -dataex- command can be found in FAQ #12 and the -dataex- help file.
Comment
Mol Nic

Join Date: Sep 2017

Posts: 11
#3

13 Jan 2018, 12:52

OK, my apologies. It is a simple thing but hard to explain.
Here is the dataex example: VJ2* are caste codes that range from 1-5 and VJ4* are proportions of that caste. Each line is a village. So the data contains records proportions of caste in separate variables. The reason why castes repeat (as in lines 1-3) is that they are aggregated from subcastes.

input int(VJ2A VJ2B VJ2C VJ2D VJ2E VJ2F VJ2G VJ2H VJ2I VJ2J VJ4A VJ4B VJ4C VJ4D VJ4E VJ4F VJ4G VJ4H VJ4I VJ4J)
5 5 5 5 5 . . . . . . . . . . . . . . .
5 5 5 5 . . . . . . 23 15 17 45 . . . . . .
5 5 . . . . . . . . . . . . . . . . . .
5 5 5 . . . . . . . 60 20 20 . . . . . . .
5 5 . . . . . . . . 36 64 . . . . . . . .
2 2 2 2 2 . . . . . 40 20 15 10 4 11 . . . .
2 2 2 2 . . . . . . 37 27 14 6 16 . . . . .
5 5 5 5 5 . . . . . 50 10 9 8 6 17 . . . .
3 5 5 5 5 . . . . . 15 3 31 11 3 37 . . . .
5 5 5 5 5 . . . . . 20 20 18 17 15 10 . . . .
end
label values VJ2A VJ2A
label def VJ2A 2 "OBC", modify
label def VJ2A 3 "SC", modify
label def VJ2A 5 "Other", modify
label values VJ2B VJ2B
label def VJ2B 2 "OBC", modify
label def VJ2B 5 "Other", modify
label values VJ2C VJ2C
label def VJ2C 2 "OBC", modify
label def VJ2C 5 "Other", modify
label values VJ2D VJ2D
label def VJ2D 2 "OBC", modify
label def VJ2D 5 "Other", modify
label values VJ2E VJ2E
label def VJ2E 2 "OBC", modify
label def VJ2E 5 "Other", modify
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#4

13 Jan 2018, 14:22

I sort of understand what you have now. There are some things that confuse me, like situations where a VJ4 has a non-missing value but the corresponding VJ2 has none: see, for example observations 6 and 7 in your example. But perhaps a proportion of the population in a village is of unknown cast and you are using missing value to represent unknown cast.

The next steps would be:

Code:

gen long village = _n reshape long VJ2 VJ4, i(village) j(_j) string collapse (sum) VJ4, by(village VJ2) label values VJ2 VJ2A

This gives you the proportion corresponding to each caste within each village. The data is in long layout, which is almost certainly better than the wide layout you started with for analysis in Stata. If, however, you have a compelling reason to put the data back to wide layout (one obs per village with separate variables for each cast), see -help reshape wide-.

Note: In the example data you show, the various VJ2* labels are all consistent with each other. The code shown here relies critically on that. If it is not true in your full data set, you will get nonsense from this code. It is always somewhat hazardous to rely on assumptions like this. So you might want to consider -decode-ing the VJ2 variables back to strings before running this code. No modifications to the code would be required for this.
Comment
Mol Nic

Join Date: Sep 2017

Posts: 11
#5

15 Jan 2018, 14:03

Thank you so much! This was super helpful and it worked! Very grateful indeed for you quick responses!!
Comment

Announcement

Coding loop for share of caste per village variable

Comment

Comment

Comment

Comment