Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using forloop to cycle through multiple variables and create a count variable

    Hello Statalisters,

    I am working with data on election data over six years per individual. And I wanted to create a single variable that counts all the reasons why an individual did not vote. The survey gives six options per election for six (v15_1 to v15_6) electoral cycles. So if the respondent picks response "booth too far" in say year 1 of data then v15_1 == 1. And it gives around 13 reasons. I wanted to create a variable that reduces the number of options to about 4 and counts through all six elections cycles into a single variable. I wrote a loop for that but I feel like rather than adding the reasons, it just replaces the values per respondent.

    Code:
    gen reasonnovote = 0 
    forval i = 1/6{
        replace reasonnovote = 1 if v15_`i' == 20 | v15_`i' == 21 | v15_`i' == 22 | v15_`i' == 23 | v15_`i' == 24  
        replace reasonnovote = 2 if v15_`i' == 30 | v15_`i' == 31 | v15_`i' == 32 
        replace reasonnovote = 3 if v15_`i' == 10 | v15_`i' == 11 | v15_`i' == 12 | v15_`i' == 13
        replace reasonnovote = 4 if v15_`i' == 33 | v15_`i' == 96
    }
    So what I wanted to achieve was, if the respondent reported not voting on account of booth being too far in 3 election cycles, I wanted to count all of those into reasonnovote == 1. I wanted to know how to write the loop to reflect this. Or do I just create one variable each per election cycle? so reasonnovote1 for election cycle 1 using v15_1, reasonnovote2 for election cycle 2 using v15_2 etc.?
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(hhno v15_1 v15_2 v15_3 v15_4 v15_5 v15_6)
    12 21 21 21 21  .  .
    24 11 11 24 11  .  .
    49 24 24 24 24  .  .
    39 11 11 11 11  .  .
     1 21 21 21 24  .  .
    53 11 11 11 11  .  .
    71 31 24 24 24  .  .
    46 21 21 21 21  .  .
    42 21 31 31 31  .  .
    27 31 31 31 24 24  .
    29 11 11 11 11  .  .
    93 11 11 11 11  .  .
    43 11 11 11 11  .  .
     5 11 11 11 11  .  .
    29 11 11 11 11  .  .
    76 25 25 25 25  .  .
    18 11 11 11 11  .  .
    97 11 11 11 11  .  .
    38 11 11 11 11  .  .
    60 11 11 11 11  .  .
    67 24 11 11 11  .  .
    20 24 24 11 11  .  .
    34 11 11 11 24  .  .
    67 11 11 24 24  .  .
    33 13 24 31 24  .  .
    53 24 24 24 24  .  .
    49 11 11 11 11 11  .
    31 11 13 11 13  .  .
     6 24 24 24 24 24  .
    17 24 24 11 11  .  .
    75 11 11 11 11  .  .
    77 25 25 25 31  .  .
    63 25 25 25 25  .  .
     5 25 25 25 25  .  .

    Thank you.


  • #2
    It seems to me that you might be better off with 4 variables, say

    Code:
    forval j = 1/4 {    
        gen reasonnovote`j' = 0  
    }  
    
    forval i = 1/6 {    
        replace reasonnovote1 = reasonnovote1 + inrange(v15_`i', 20, 24)        
        replace reasonnovote2 = reasonnovote2 + inrange(v15_`i', 30, 32)      
        replace reasonnovote3 = reasonnovote3 + inrange(v15_`i', 10, 13)    
        replace reasonnovote4 = reasonnovote4 + inlist(v15_`i', 33, 96)  
    }
    Note the scope for inrange() and inlist(). See also http://www.stata-journal.com/article...article=dm0026 and https://www.stata-journal.com/articl...article=dm0058

    Counting instances opens the door to summary variables of the form never, (count 0), ever (count 1 to 6) and always (count 6). .
    Last edited by Nick Cox; 06 Sep 2021, 06:16.

    Comment


    • #3
      Thanks Nick. You're right, I will get more info with 4 variables. However would this make more sense if I wanted was to count all reasons and categorize them into four dummy variables:

      Code:
      gen dist = 0
      gen cost = 0
      gen lossinc = 0
      gen illness = 0
      forval i =1/6 {
          replace dist = 1 if v15_`i' == 20 | v15_`i' == 21 | v15_`i' == 22 | v15_`i' == 23 | v15_`i' == 24 
          replace cost = 1 if v15_`i' == 30 | v15_`i' == 31 | v15_`i' == 32 
          replace inc = 1 if v15_`i' == 10 | v15_`i' == 11 | v15_`i' == 12 | v15_`i' == 13
          replace illness = 1 if v15_`i' == 33 | v15_`i' == 96
      }
      So with your code, it creates an ordinal variable that counts 1 to 6, never and always whereas in the loop above, it could create a dummy. Then the only remaining issue would be to mark missing values in each variable.

      Comment


      • #4
        As already implied, you can map to indicator variables by a command of the form

        Code:
        gen ever = count > 0

        Comment

        Working...
        X