The data below is a simplified example to illustrate the problem I am working on. Basically, I am trying to figure out the number of families where all members have the same favorite color. So in this example, I would want a list of two families (Smith and Silver). For the purpose of this example, we can assume that the last name is unique and there are not multiple Smith families.
I thought about creating a binary variable for each color and then taking the average by family. If the average isn't 0 or 1, then there must be some differences within the family. I'd end up with a lot of extra variables and this doesn't seem like the most efficient way to solve the problem. Any other suggestions?
Thanks in advance!
I thought about creating a binary variable for each color and then taking the average by family. If the average isn't 0 or 1, then there must be some differences within the family. I'd end up with a lot of extra variables and this doesn't seem like the most efficient way to solve the problem. Any other suggestions?
Thanks in advance!
Family (last name) | First Name | Favorite Color |
Smith | Sarah | Blue |
Smith | Sally | Blue |
Smith | Sue | Blue |
Smith | John | Blue |
Doe | Megan | Red |
Doe | Jack | Purple |
Johnson | Michael | Red |
Johnson | Mary | Orange |
Johnson | Tom | Green |
Johnson | Richard | Blue |
Johnson | Joe | Red |
Johnson | Jimmy | Blue |
Silver | Susan | Purple |
Silver | James | Purple |
Silver | Josh | Purple |
Comment