Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issue with multiple entries in the same column

    The data below is a survey where the respondents were asked to select all that apply. So you have one responded with multiple entries. I have to tabulate the frequencies and percentages. But i need to first label the responses as follows: 1--a, 2 -- b, 3 --c, 4-- d, 5--e, 6--f, 7--g. NA is supposed to be excluded from the tabulation and the percentages are computed by dividing the frequency for each response by the total number of observations (excluding the missing values). In this this the total frequency would be more than 100%. The final table should look like the table below --

    Values Frequency Percentage
    a
    b
    c
    d
    e
    f
    g

    I tried using split Values , gen(Variable) parse(,)
    bys MMID: gen n =_n
    reshape long Variable, i(AC n)
    drop if V==""
    tab Variable AC

    Is there a way to do the tabulation without reshaping the data? need help please.

    ---------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str13 Values
    "4,6"          
    "1,2,3,6"      
    "NA"           
    "2,3,4,5,6,7"  
    "1,2,3,4,5,6"  
    "1,2,3,4,5,6"  
    "6"            
    "1,2,3,4,5,6"  
    "3,6"          
    "NA"           
    "3,6"          
    "1,2,3,4,5,6"  
    "1,2,3,4,5"    
    "2,3,4,5,6,7"  
    "2,3,4,5,6"    
    "1,2,3,4,5,6,7"
    "1,2,3,5,6"    
    "2,4,6"        
    "1,2,3,4,5,6"  
    "1,2,3,4,5,6"  
    "1,3,4,6"      
    "2,3,6"        
    "1,3"          
    "1,2,3,4,5,6"  
    "1,2,3,4,5,6"  
    "1,2,3,4,5,6"  
    "1,2,3,4,5,6,7"
    "3,6"          
    "7"            
    "6"            
    "5,6"          
    "1,2,3,4,5,6"  
    "6,7"          
    "1,2,3,5,7"    
    "1,2,3,4,5,7"  
    "4,5,6"        
    "6"            
    "1,2,3,4,5,6,7"
    "1,2,4"        
    "1,2,3,4,5,6"  
    "1,2,3,4,5,6"  
    "1,2,3,4,5,6"  
    "2,3,4,5,6,7"  
    "1,2,3,4,5,6"  
    "3,5,6"        
    "1,2,3,4,5,6"  
    "1,2,3,4,5,6,7"
    end
    ------------------ copy up to and including the previous line ------------------


  • #2
    Thanks for the data example. I used tabsplit from tab_chi on SSC.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str13 Values
    "4,6"          
    "1,2,3,6"      
    "NA"           
    "2,3,4,5,6,7"  
    "1,2,3,4,5,6"  
    "1,2,3,4,5,6"  
    "6"            
    "1,2,3,4,5,6"  
    "3,6"          
    "NA"           
    "3,6"          
    "1,2,3,4,5,6"  
    "1,2,3,4,5"    
    "2,3,4,5,6,7"  
    "2,3,4,5,6"    
    "1,2,3,4,5,6,7"
    "1,2,3,5,6"    
    "2,4,6"        
    "1,2,3,4,5,6"  
    "1,2,3,4,5,6"  
    "1,3,4,6"      
    "2,3,6"        
    "1,3"          
    "1,2,3,4,5,6"  
    "1,2,3,4,5,6"  
    "1,2,3,4,5,6"  
    "1,2,3,4,5,6,7"
    "3,6"          
    "7"            
    "6"            
    "5,6"          
    "1,2,3,4,5,6"  
    "6,7"          
    "1,2,3,5,7"    
    "1,2,3,4,5,7"  
    "4,5,6"        
    "6"            
    "1,2,3,4,5,6,7"
    "1,2,4"        
    "1,2,3,4,5,6"  
    "1,2,3,4,5,6"  
    "1,2,3,4,5,6"  
    "2,3,4,5,6,7"  
    "1,2,3,4,5,6"  
    "3,5,6"        
    "1,2,3,4,5,6"  
    "1,2,3,4,5,6,7"
    end 
    
    tokenize "`c(alpha)'"
    clonevar Work = Values 
    
    forval j = 1/7 { 
        replace Work = subinstr(Work, "`j'", "``j''", .) 
    }
    
    tabsplit Work if Work != "NA",  parse(",")
    
           Work |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              a |         27       13.30       13.30
              b |         31       15.27       28.57
              c |         35       17.24       45.81
              d |         30       14.78       60.59
              e |         30       14.78       75.37
              f |         39       19.21       94.58
              g |         11        5.42      100.00
    ------------+-----------------------------------
          Total |        203      100.00

    Comment


    • #3
      Thanks,

      Please, how would the coding look like if the responses have the following labels: 1-- Context, 2--plan information, 3-- External Procedures, 4-- Internal Procedures, 5-- Storage and Maintenance, 6-- Data Transfer and 7-- Goals


      Comment


      • #4
        Nick Cox

        Thank you


        Please, how would the coding look like if the responses have the following labels: 1-- Context, 2--plan information, 3-- External Procedures, 4-- Internal Procedures, 5-- Storage and Maintenance, 6-- Data Transfer and 7-- Goals. Also, the percentage for each category should be calculated by dividing the frequency for each category by the total number of observations (45 observations ie 47 minus the two missing observations). The final table should look like this
        Freq Percentage
        Context 27 60.00
        Plan information 31 68.89
        External Procedures 35 77.78
        Internal Procedures 30 66.67
        Storage and Maintenance 30 66.67
        Data Transfer 39 86.67
        Goal 11 24.44

        Comment


        • #5
          You have your desired table already if I understand correctly, so you don’t need Stata code to get it again.

          Comment


          • #6
            Nick Cox , please, I created the table using stata, just for the purpose of demonstration -- need help on how to do it in stata.

            Comment


            • #7
              #6 is a perfect circle. Why not share your code?

              Comment


              • #8
                I am still puzzled at what you're asking -- because you have worked out what you want.

                This is how I would do it. This wouldn't work if answers 10 11 or higher were possible. Starting with your data example,

                Code:
                count if Values != "NA"
                local N = r(N)
                
                gen which = _n in 1/7
                gen frequency = .
                gen percent = ""
                
                
                quietly forval i = 1/7 {
                    count if strpos(Values, "`i'")
                    replace frequency = r(N) in `i'
                    replace percent = strofreal(100 * frequency / `N', "%3.2f") in `i'
                }
                
                tabdisp which in 1/7, c(freq percent)
                
                list which freq percent in 1/7
                Results:

                .
                Code:
                 tabdisp which in 1/7, c(freq percent)
                
                ----------------------------------
                    which |  frequency     percent
                ----------+-----------------------
                        1 |         27       60.00
                        2 |         31       68.89
                        3 |         35       77.78
                        4 |         30       66.67
                        5 |         30       66.67
                        6 |         39       86.67
                        7 |         11       24.44
                ----------------------------------
                
                .
                . list which freq percent in 1/7
                
                     +----------------------------+
                     | which   freque~y   percent |
                     |----------------------------|
                  1. |     1         27     60.00 |
                  2. |     2         31     68.89 |
                  3. |     3         35     77.78 |
                  4. |     4         30     66.67 |
                  5. |     5         30     66.67 |
                     |----------------------------|
                  6. |     6         39     86.67 |
                  7. |     7         11     24.44 |
                     +----------------------------+
                .
                Clearly with this approach you need to assign "Context" and so on as value labels for which.
                Last edited by Nick Cox; 12 Aug 2022, 04:11.

                Comment


                • #9
                  That's an interesting way of doing it, Nick!

                  This wouldn't work if answers 10 11 or higher were possible.
                  @kotey And if higher numbers were possible, the main substantive change that would make the code still work is to change the count command within the loop like so:

                  Code:
                      count if strpos(","+Values+",", ",`i',")

                  Comment


                  • #10
                    #9 is fine as a generalization to searching for words (e.g. searching for 1 or 11 so that a search for 1 finds only 1) -- so long as there are no extraneous spaces, which is true of the data example.

                    Searching for words in strings is the topic of a piece in press for Stata Journal 22(4).

                    Comment

                    Working...
                    X