How to tell if all items in a group are equal?

Madison Smith

Join Date: Jun 2019
Posts: 8

How to tell if all items in a group are equal?

13 Jun 2019, 14:08

The data below is a simplified example to illustrate the problem I am working on. Basically, I am trying to figure out the number of families where all members have the same favorite color. So in this example, I would want a list of two families (Smith and Silver). For the purpose of this example, we can assume that the last name is unique and there are not multiple Smith families.

I thought about creating a binary variable for each color and then taking the average by family. If the average isn't 0 or 1, then there must be some differences within the family. I'd end up with a lot of extra variables and this doesn't seem like the most efficient way to solve the problem. Any other suggestions?

Thanks in advance!

Family (last name)	First Name	Favorite Color
Smith	Sarah	Blue
Smith	Sally	Blue
Smith	Sue	Blue
Smith	John	Blue
Doe	Megan	Red
Doe	Jack	Purple
Johnson	Michael	Red
Johnson	Mary	Orange
Johnson	Tom	Green
Johnson	Richard	Blue
Johnson	Joe	Red
Johnson	Jimmy	Blue
Silver	Susan	Purple
Silver	James	Purple
Silver	Josh	Purple

Tags: categorical, data, label, loop, Suggestion

Nick Cox

Join Date: Mar 2014

Posts: 36061
#2

13 Jun 2019, 14:22

This FAQ may help: https://www.stata.com/support/faqs/d...ions-in-group/
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1166

14 Jun 2019, 15:48

Thanks Nick. That's much more straightforward than what I had in mind. I was thinking of using egen to compute the SD within families, and then (by family) setting variable same = SD==0. That approach works, but is far clunkier than necessary. My colleague in a discussion forum for another stats package would describe it as Rubish (in honor of Rube Goldberg).

Madison, here's an example using the data you posted.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str12(last first favcol)
"Smith"   "Sarah"   "Blue"  
"Smith"   "Sally"   "Blue"  
"Smith"   "Sue"     "Blue"  
"Smith"   "John"    "Blue"  
"Doe"     "Megan"   "Red"  
"Doe"     "Jack"    "Purple"
"Johnson" "Michael" "Red"  
"Johnson" "Mary"    "Orange"
"Johnson" "Tom"     "Green"
"Johnson" "Richard" "Blue"  
"Johnson" "Joe"     "Red"  
"Johnson" "Jimmy"   "Blue"  
"Silver"  "Susan"   "Purple"
"Silver"  "James"   "Purple"
"Silver"  "Josh"    "Purple"
end

generate order1 = _n // preserve original order of observations
by last (favcol), sort: gen same = favcol[1] == favcol[_N]
sort order1 // restore original order of observations
list last first favcol same, sepby(last)

Output from the -list- command:

Code:

. list last first favcol same, sepby(last)

     +-----------------------------------+
     |    last     first   favcol   same |
     |-----------------------------------|
  1. |   Smith     Sarah     Blue      1 |
  2. |   Smith     Sally     Blue      1 |
  3. |   Smith       Sue     Blue      1 |
  4. |   Smith      John     Blue      1 |
     |-----------------------------------|
  5. |     Doe     Megan      Red      0 |
  6. |     Doe      Jack   Purple      0 |
     |-----------------------------------|
  7. | Johnson   Michael      Red      0 |
  8. | Johnson      Mary   Orange      0 |
  9. | Johnson       Tom    Green      0 |
 10. | Johnson   Richard     Blue      0 |
 11. | Johnson       Joe      Red      0 |
 12. | Johnson     Jimmy     Blue      0 |
     |-----------------------------------|
 13. |  Silver     Susan   Purple      1 |
 14. |  Silver     James   Purple      1 |
 15. |  Silver      Josh   Purple      1 |
     +-----------------------------------+

PS- Re the Rubish approach described above, I forgot to say that one would have to convert the string variable for color into a numeric variable before computing the SD. E.g.,

Code:

// Generate numeric version of favorite color variable
encode favcol, generate(color)

Last edited by Bruce Weaver; 14 Jun 2019, 15:58. Reason: Added the postscript.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Announcement

How to tell if all items in a group are equal?

Comment

Comment