Issue with multiple entries in the same column

kotey dzanie

Join Date: Jul 2022

Posts: 26
#1

Issue with multiple entries in the same column

11 Aug 2022, 14:09

The data below is a survey where the respondents were asked to select all that apply. So you have one responded with multiple entries. I have to tabulate the frequencies and percentages. But i need to first label the responses as follows: 1--a, 2 -- b, 3 --c, 4-- d, 5--e, 6--f, 7--g. NA is supposed to be excluded from the tabulation and the percentages are computed by dividing the frequency for each response by the total number of observations (excluding the missing values). In this this the total frequency would be more than 100%. The final table should look like the table below --

Values Frequency Percentage

a

b

c

d

e

f

g

I tried using split Values , gen(Variable) parse(,)
bys MMID: gen n =_n
reshape long Variable, i(AC n)
drop if V==""
tab Variable AC

Is there a way to do the tabulation without reshaping the data? need help please.

---------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str13 Values "4,6" "1,2,3,6" "NA" "2,3,4,5,6,7" "1,2,3,4,5,6" "1,2,3,4,5,6" "6" "1,2,3,4,5,6" "3,6" "NA" "3,6" "1,2,3,4,5,6" "1,2,3,4,5" "2,3,4,5,6,7" "2,3,4,5,6" "1,2,3,4,5,6,7" "1,2,3,5,6" "2,4,6" "1,2,3,4,5,6" "1,2,3,4,5,6" "1,3,4,6" "2,3,6" "1,3" "1,2,3,4,5,6" "1,2,3,4,5,6" "1,2,3,4,5,6" "1,2,3,4,5,6,7" "3,6" "7" "6" "5,6" "1,2,3,4,5,6" "6,7" "1,2,3,5,7" "1,2,3,4,5,7" "4,5,6" "6" "1,2,3,4,5,6,7" "1,2,4" "1,2,3,4,5,6" "1,2,3,4,5,6" "1,2,3,4,5,6" "2,3,4,5,6,7" "1,2,3,4,5,6" "3,5,6" "1,2,3,4,5,6" "1,2,3,4,5,6,7" end

------------------ copy up to and including the previous line ------------------
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 36054

11 Aug 2022, 18:15

Thanks for the data example. I used tabsplit from tab_chi on SSC.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str13 Values
"4,6"          
"1,2,3,6"      
"NA"           
"2,3,4,5,6,7"  
"1,2,3,4,5,6"  
"1,2,3,4,5,6"  
"6"            
"1,2,3,4,5,6"  
"3,6"          
"NA"           
"3,6"          
"1,2,3,4,5,6"  
"1,2,3,4,5"    
"2,3,4,5,6,7"  
"2,3,4,5,6"    
"1,2,3,4,5,6,7"
"1,2,3,5,6"    
"2,4,6"        
"1,2,3,4,5,6"  
"1,2,3,4,5,6"  
"1,3,4,6"      
"2,3,6"        
"1,3"          
"1,2,3,4,5,6"  
"1,2,3,4,5,6"  
"1,2,3,4,5,6"  
"1,2,3,4,5,6,7"
"3,6"          
"7"            
"6"            
"5,6"          
"1,2,3,4,5,6"  
"6,7"          
"1,2,3,5,7"    
"1,2,3,4,5,7"  
"4,5,6"        
"6"            
"1,2,3,4,5,6,7"
"1,2,4"        
"1,2,3,4,5,6"  
"1,2,3,4,5,6"  
"1,2,3,4,5,6"  
"2,3,4,5,6,7"  
"1,2,3,4,5,6"  
"3,5,6"        
"1,2,3,4,5,6"  
"1,2,3,4,5,6,7"
end 

tokenize "`c(alpha)'"
clonevar Work = Values 

forval j = 1/7 { 
    replace Work = subinstr(Work, "`j'", "``j''", .) 
}

tabsplit Work if Work != "NA",  parse(",")

       Work |      Freq.     Percent        Cum.
------------+-----------------------------------
          a |         27       13.30       13.30
          b |         31       15.27       28.57
          c |         35       17.24       45.81
          d |         30       14.78       60.59
          e |         30       14.78       75.37
          f |         39       19.21       94.58
          g |         11        5.42      100.00
------------+-----------------------------------
      Total |        203      100.00

Comment

kotey dzanie

Join Date: Jul 2022

Posts: 26
#3

11 Aug 2022, 18:34

Thanks,

Please, how would the coding look like if the responses have the following labels: 1-- Context, 2--plan information, 3-- External Procedures, 4-- Internal Procedures, 5-- Storage and Maintenance, 6-- Data Transfer and 7-- Goals
Comment
kotey dzanie

Join Date: Jul 2022

Posts: 26
#4

11 Aug 2022, 18:56

Nick Cox

Thank you

Please, how would the coding look like if the responses have the following labels: 1-- Context, 2--plan information, 3-- External Procedures, 4-- Internal Procedures, 5-- Storage and Maintenance, 6-- Data Transfer and 7-- Goals. Also, the percentage for each category should be calculated by dividing the frequency for each category by the total number of observations (45 observations ie 47 minus the two missing observations). The final table should look like this

Freq Percentage

Context 27 60.00

Plan information 31 68.89

External Procedures 35 77.78

Internal Procedures 30 66.67

Storage and Maintenance 30 66.67

Data Transfer 39 86.67

Goal 11 24.44
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36054
#5

11 Aug 2022, 19:57

You have your desired table already if I understand correctly, so you don’t need Stata code to get it again.
Comment
kotey dzanie

Join Date: Jul 2022

Posts: 26
#6

11 Aug 2022, 20:08

Nick Cox , please, I created the table using stata, just for the purpose of demonstration -- need help on how to do it in stata.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36054
#7

11 Aug 2022, 20:12

#6 is a perfect circle. Why not share your code?
Comment

Nick Cox

Join Date: Mar 2014
Posts: 36054

12 Aug 2022, 03:40

I am still puzzled at what you're asking -- because you have worked out what you want.

This is how I would do it. This wouldn't work if answers 10 11 or higher were possible. Starting with your data example,

Code:

count if Values != "NA"
local N = r(N)

gen which = _n in 1/7
gen frequency = .
gen percent = ""


quietly forval i = 1/7 {
    count if strpos(Values, "`i'")
    replace frequency = r(N) in `i'
    replace percent = strofreal(100 * frequency / `N', "%3.2f") in `i'
}

tabdisp which in 1/7, c(freq percent)

list which freq percent in 1/7

Results:

.

Code:

 tabdisp which in 1/7, c(freq percent)

----------------------------------
    which |  frequency     percent
----------+-----------------------
        1 |         27       60.00
        2 |         31       68.89
        3 |         35       77.78
        4 |         30       66.67
        5 |         30       66.67
        6 |         39       86.67
        7 |         11       24.44
----------------------------------

.
. list which freq percent in 1/7

     +----------------------------+
     | which   freque~y   percent |
     |----------------------------|
  1. |     1         27     60.00 |
  2. |     2         31     68.89 |
  3. |     3         35     77.78 |
  4. |     4         30     66.67 |
  5. |     5         30     66.67 |
     |----------------------------|
  6. |     6         39     86.67 |
  7. |     7         11     24.44 |
     +----------------------------+

.
Clearly with this approach you need to assign "Context" and so on as value labels for which.

Last edited by Nick Cox; 12 Aug 2022, 04:11.

Comment

Hemanshu Kumar

Join Date: Mar 2015

Posts: 1548
#9

12 Aug 2022, 05:27

That's an interesting way of doing it, Nick!

This wouldn't work if answers 10 11 or higher were possible.

@kotey And if higher numbers were possible, the main substantive change that would make the code still work is to change the count command within the loop like so:

Code:

count if strpos(","+Values+",", ",`i',")
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36054
#10

12 Aug 2022, 06:35

#9 is fine as a generalization to searching for words (e.g. searching for 1 or 11 so that a search for 1 finds only 1) -- so long as there are no extraneous spaces, which is true of the data example.

Searching for words in strings is the topic of a piece in press for Stata Journal 22(4).
Comment

Values	Frequency	Percentage
a
b
c
d
e
f
g

	Freq	Percentage
Context	27	60.00
Plan information	31	68.89
External Procedures	35	77.78
Internal Procedures	30	66.67
Storage and Maintenance	30	66.67
Data Transfer	39	86.67
Goal	11	24.44

Announcement

Issue with multiple entries in the same column

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment