Dear All,
I would like to ask a question related more to terminology than Stata, but I hope it is all right to benefit from the collective wisdom of the Stata users.
The question is in the context of the analysis of data that is being asked as multiple-select categorical questions.
The following web-page (brought into light here not because of any particular reputation or significance, but because of relevance and simple terms and illustration) mentions two approaches, which it calls:
- percent of respondents;
- percent of answers.
https://resources.pollfish.com/pollf...ion-questions/
1. My first question here is whether this terminology is standard/established/intuitive, or whether there are more common/historical/etc names for the same indicators?
2. My second question is whether there is any other indicator that the researcher may derive from a multiple-select categorical question, if yes, please suggest.
3. My main (third) question is, that although I agree that both can make sense and be useful for SOME situations. I want to question whether both are ALWAYS valid?
Consider the following first example:
the company is selling widgets and the customers can buy [otherwise identical] widgets that are painted to the color of choice of the customer, and each customer can buy no more than one widget of a specific color. Then having the results of the "What colors are your widgets?" multiple-select question we will be able to answer BOTH what is the percentage of the respondents that have a widget of a specific color (for example, if we want to decide which color is the one that will resonate most in an ad) and which percentage of the widgets is painted which color (for example if we wanted to trim the offering of colors, reduce the number of paint buckets).
But consider the following second example:
The household is asked what appliances they have in their possession: say, refrigerator, microwave, radio, etc. I have a feeling that it is incorrect to add all appliances together, and say that 10% of all appliances are refrigerators, since they are very different from e.g. radios (in terms of costs, or uses, or possibly utility derived from their use). Furthermore, if the number of the devices is not being asked, but just the fact of their presence (as common with such multiple-selection questions), then I can be building invalid inferences from the second indicator (percent of answers) or I should be cautious with the wording around it, as it looses the interpretability rather quickly.
Is my intuition correct here? And if so, what are the requirements for the second approach to make sense? And is that something that I [mechanically] can deduce from the question and/or its options? or it does necessarily require understanding of the underlying subject? (in lame terms: requires knowing and understanding what the widgets are specifically?)
Thank you and have a good weekend, everybody!
PS: regarding my second question above, I would like to somehow account for concentration of the answers. For example, if 5 households mentioned CRIME as the only problem, but 6 mentioned ACCESS TO SCHOOLS and ACCESS TO HOSPITALS (pairwise, selecting these two items at the same time), I would like to be sensitive to that and give the whole point for the only one selected item, and perhaps half a point if two items were selected. Is there a formal name of this approach/algorithm? (see the last column in the matrix displayed below)
Code:
clear all input crime school hospital 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 end program define mselect, rclass version 18.0 syntax varlist // 2 or more dummy variables 1=YES, 0=NO foreach v in `varlist' { assert inlist(`v',0,1) } local n=`:word count `varlist'' display `n' matrix M=J(`n',3,.) matrix rownames M=`varlist' local i=1 local s=0 foreach v in `varlist' { summarize `v', meanonly matrix M[`i',1]=r(mean)*100 matrix M[`i++',2]=r(sum) local s=`s'+r(sum) } local i=1 foreach v in `varlist' { summarize `v', meanonly matrix M[`i',2]=M[`i',2]/`s'*100 local i=`i'+1 } tempvar tmp egen `tmp'=rowtotal(`varlist') local i=1 foreach v in `varlist' { tempvar tmpw generate `tmpw'=`v'/`tmp' summarize `tmpw', meanonly matrix M[`i++',3]=r(mean)*100 drop `tmpw' } return matrix M=M end mselect crime school hospital return list matrix list r(M) , format(%6.2f)
Code:
r(M)[3,3] c1 c2 c3 crime 45.45 29.41 45.45 school 54.55 35.29 27.27 hospital 54.55 35.29 27.27
Comment