Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Count frequency of multiple response with missing values

    I have a check all that apply question that has been imported into Stata as a string variable. There were a total of 14 options for the question. I have several datasets with the same question that I want to be able to apply the code to. I need to create a series of variables that tells me the frequency of people that chose each option, how many didn't choose the option, and maintain missing values for those that did not answer the question.

    I tried to use the following code:
    forvalues 5 = 1/14 {
    capture assert strpos(h2, "`5'") == 0
    if _rc {
    generate hp_`5' = strpos(h2, "`5'") > 0
    }
    }


    However, this does not account for the missing values and codes the missing values as "0" for the indicator variables created.

    I also tried to split the string variable at the commas, using split h2, p(","), but depending on the responses in each dataset, it could create a different number of variables.


    This is how the data appears in Stata:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str29 h2
    "4,12"                       
    "14"                         
    "14"                         
    "1,3,6,5,9,10,12"            
    "7,2,3,9"                    
    "14"                         
    "14"                         
    "14"                         
    "1,7,5,8,9,10,11,12,13"      
    "1,7,2,6,5,4,8,9,10,11,12,13"
    "1,2,6,5,4,8,9,10,11,12"     
    "10,13"                      
    "1,7,3,6,9,10,11"            
    "14"                         
    "2,3,11"                     
    "14"                         
    "14"                         
    "3,4"                        
    "14"                         
    "14"                         
    "1,7,10,11"                  
    "14"                         
    "14"                         
    "14"                         
    "14"                         
    "14"                         
    "14"                         
    "14"                         
    "14"                         
    "2,5,8,9,10,13"              
    "14"                         
    "14"                         
    "1,2,6,5,4,9,11,12,13"       
    "14"                         
    "14"                         
    "1,8,9,13"                   
    "14"                         
    "14"                         
    "14"                         
    "14"                         
    "7,3,6,5,4,9,10,11,12,13"    
    "2,3,10"                     
    "7,2,3,4"                    
    "14"                         
    "14"                         
    "14"                         
    "3,5,4,8"                    
    "14"                         
    "14"                         
    "14"                         
    "8"                          
    "14"                         
    "14"                         
    ""                           
    "3,4,8"                      
    "14"                         
    ""                           
    "14"                         
    "2,11"                       
    "14"                         
    "10,11,12"                   
    "7,13"                       
    "1,6,5,8,9,12,13"            
    ""                           
    "14"                         
    "9,11"                       
    "4,10"                       
    "14"                         
    "14"                         
    "14"                         
    "14"                         
    "1,2,3,6,5,8,9,10,11"        
    "14"                         
    "14"                         
    "1,7,5,10,12"                
    "1,3"                        
    "1,2,3,6,5,4,8,9,10,11"      
    "7,3,6,8,11,12,13"           
    "1,7,2,6,5,4,8,9,10,12"      
    "1,2,6,8,11,13"              
    "1,8,9,10,11"                
    "7,2,5,10"                   
    "7,3,6,4,8,9,10,13"          
    "14"                         
    "1,7,3,6,5,4,8,10,11,12"     
    "14"                         
    "1,2,6,5,4,8,9,10,11"        
    "14"                         
    "1,7,2,3,5,4,8,10,12"        
    "7,11"                       
    "14"                         
    "14"                         
    "14"                         
    "7,2,3,6,5,11"               
    "2,6"                        
    "1,7,8,10,11,12"             
    "1,6,5,9,10,11,12,13"        
    "14"                         
    "7,5,4,8,12,13"              
    "14"                         
    end


  • #2
    Code:
    split h2, gen(response) parse(",") destring
    forvalues i = 1/14 {
        egen hp_`i' = anymatch(response*), values(`i')
    }
    drop response*
    Added:

    I was surprised to see that you could use a number (5) as the iterator in -foreach-. It does work, and doesn't technically violate any Stata syntax rules, so perhaps I should not have been surprised. But I think it's a bad programming practice--it's bound to cause confusion at some point.

    The code proposed in #1, in addition to not handling missing values, will also give incorrect results. For example, it will give a match to response 1 if any of responses 10 through 14 have been selected, whether response 1 has or not.
    Last edited by Clyde Schechter; 14 Apr 2022, 13:50.

    Comment


    • #3
      Thank you Clyde! I agree, using a number as the iterator probably wasn't the best idea. It was a place holder while I tried various options to get the code to work. I appreciate your help.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        Code:
        split h2, gen(response) parse(",") destring
        forvalues i = 1/14 {
        egen hp_`i' = anymatch(response*), values(`i')
        }
        drop response*
        Added:

        I was surprised to see that you could use a number (5) as the iterator in -foreach-. It does work, and doesn't technically violate any Stata syntax rules, so perhaps I should not have been surprised. But I think it's a bad programming practice--it's bound to cause confusion at some point.

        The code proposed in #1, in addition to not handling missing values, will also give incorrect results. For example, it will give a match to response 1 if any of responses 10 through 14 have been selected, whether response 1 has or not.
        Hi Cylde,

        I've been using this code, but I just realized it is not handling the missing data in the way I need. It is coding the missing values as 0 rather than maintaining that they were missing. Is there a way to maintain the missing values?

        Comment


        • #5
          You have never explained what "maintain the missing values" means, and it isn't obvious to me. Here's one possibility:

          Condition: if there is no response* variable matching value i, but one or more of the response* variables is missing, then set hp_i to missing value.

          Code:
          split h2, gen(response) parse(",") destring
          forvalues i = 1/14 {
              egen hp_`i' = anymatch(response*), values(`i')
              egen mcount = rowmiss(response*)
              replace hp_`i' = . if hp_`i' == 0 & mcount > 0
          }
          drop response*
          If that isn't what you want, you need to spell out exactly what you mean.

          Comment

          Working...
          X