The data below is a simplified example to illustrate the problem I am working on. Basically, I am trying to figure out the number of families where all members have the same favorite color. So in this example, I would want a list of two families (Smith and Silver). For the purpose of this example, we can assume that the last name is unique and there are not multiple Smith families.
I thought about creating a binary variable for each color and then taking the average by family. If the average isn't 0 or 1, then there must be some differences within the family. I'd end up with a lot of extra variables and this doesn't seem like the most efficient way to solve the problem. Any other suggestions?
Thanks in advance!
I thought about creating a binary variable for each color and then taking the average by family. If the average isn't 0 or 1, then there must be some differences within the family. I'd end up with a lot of extra variables and this doesn't seem like the most efficient way to solve the problem. Any other suggestions?
Thanks in advance!
| Family (last name) | First Name | Favorite Color |
| Smith | Sarah | Blue |
| Smith | Sally | Blue |
| Smith | Sue | Blue |
| Smith | John | Blue |
| Doe | Megan | Red |
| Doe | Jack | Purple |
| Johnson | Michael | Red |
| Johnson | Mary | Orange |
| Johnson | Tom | Green |
| Johnson | Richard | Blue |
| Johnson | Joe | Red |
| Johnson | Jimmy | Blue |
| Silver | Susan | Purple |
| Silver | James | Purple |
| Silver | Josh | Purple |


Comment