Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Coding loop for share of caste per village variable

    Hello,

    I need help with a simple loop.

    I need to create a proportion of caste group per village variable. At the moment I have variables of caste and share separately and there can be up to 7 caste groups per village (which may repeat because they are aggregated from subgroups). Overall there are 5 major groups.

    For example, I have caste groups recorded in A-G --- variables v2A - v2G - for different castes per village and I have v4A-v4G for proportion of this caste per village respectively.

    I need to sum up these A-G for each caste, and there are 5 types of caste groups.

    if done manually I guess it would look something like this: egen caste1=rowtotal(v2A v2B.....etc) if V4A==1 & V4B==1 & (for caste1 and the same for caste 4-5)

    I cannot figure out what would be a more efficient loop?

    Thanks for helping out!

  • #2
    If you want help with code, it is better to post an example of your data, using the -datex- command, than to try to describe it. Personally, I don't understand from your description what your data look like. Perhaps somebody else does and will respond. But if you don't get a helpful response shortly, I would repost showing an example of the data. Instructions for getting and using the -dataex- command can be found in FAQ #12 and the -dataex- help file.

    Comment


    • #3
      OK, my apologies. It is a simple thing but hard to explain.
      Here is the dataex example: VJ2* are caste codes that range from 1-5 and VJ4* are proportions of that caste. Each line is a village. So the data contains records proportions of caste in separate variables. The reason why castes repeat (as in lines 1-3) is that they are aggregated from subcastes.

      input int(VJ2A VJ2B VJ2C VJ2D VJ2E VJ2F VJ2G VJ2H VJ2I VJ2J VJ4A VJ4B VJ4C VJ4D VJ4E VJ4F VJ4G VJ4H VJ4I VJ4J)
      5 5 5 5 5 . . . . . . . . . . . . . . .
      5 5 5 5 . . . . . . 23 15 17 45 . . . . . .
      5 5 . . . . . . . . . . . . . . . . . .
      5 5 5 . . . . . . . 60 20 20 . . . . . . .
      5 5 . . . . . . . . 36 64 . . . . . . . .
      2 2 2 2 2 . . . . . 40 20 15 10 4 11 . . . .
      2 2 2 2 . . . . . . 37 27 14 6 16 . . . . .
      5 5 5 5 5 . . . . . 50 10 9 8 6 17 . . . .
      3 5 5 5 5 . . . . . 15 3 31 11 3 37 . . . .
      5 5 5 5 5 . . . . . 20 20 18 17 15 10 . . . .
      end
      label values VJ2A VJ2A
      label def VJ2A 2 "OBC", modify
      label def VJ2A 3 "SC", modify
      label def VJ2A 5 "Other", modify
      label values VJ2B VJ2B
      label def VJ2B 2 "OBC", modify
      label def VJ2B 5 "Other", modify
      label values VJ2C VJ2C
      label def VJ2C 2 "OBC", modify
      label def VJ2C 5 "Other", modify
      label values VJ2D VJ2D
      label def VJ2D 2 "OBC", modify
      label def VJ2D 5 "Other", modify
      label values VJ2E VJ2E
      label def VJ2E 2 "OBC", modify
      label def VJ2E 5 "Other", modify

      Comment


      • #4
        I sort of understand what you have now. There are some things that confuse me, like situations where a VJ4 has a non-missing value but the corresponding VJ2 has none: see, for example observations 6 and 7 in your example. But perhaps a proportion of the population in a village is of unknown cast and you are using missing value to represent unknown cast.

        The next steps would be:

        Code:
        gen long village = _n
        
        reshape long VJ2 VJ4, i(village) j(_j) string
        
        collapse (sum) VJ4, by(village VJ2)
        
        label values VJ2 VJ2A
        This gives you the proportion corresponding to each caste within each village. The data is in long layout, which is almost certainly better than the wide layout you started with for analysis in Stata. If, however, you have a compelling reason to put the data back to wide layout (one obs per village with separate variables for each cast), see -help reshape wide-.

        Note: In the example data you show, the various VJ2* labels are all consistent with each other. The code shown here relies critically on that. If it is not true in your full data set, you will get nonsense from this code. It is always somewhat hazardous to rely on assumptions like this. So you might want to consider -decode-ing the VJ2 variables back to strings before running this code. No modifications to the code would be required for this.

        Comment


        • #5
          Thank you so much! This was super helpful and it worked! Very grateful indeed for you quick responses!!

          Comment

          Working...
          X