Listing observations in a group that differ on non-missing values of a given string variable

tig som

Join Date: Sep 2022

Posts: 58
#1

Listing observations in a group that differ on non-missing values of a given string variable

06 Oct 2022, 07:57

I am working on a similar command like this
Stata | FAQ: Listing observations in a group that differ on a variable
However, what if egenotype has a missing value for some and I don’t want Stata to report this case as diff. How can I use such commands?
To make my question clear I am changing the observations in the above STATA example as follows

eid egenotype
0 vv
0 .
1 vv
1 ww
2 ww
2 vv
2 .

I want STATA to list only those samples that differ in non-missing values of the variable egenotype for each individuals.
If I use the command in the above STATA link, that is :
by eid (egenotype), sort: gen diff = egenotype[1] != egenotype[_N]
. list eid egenotype if diff
Then, STATA reports the eid 0, 1, and 2 as having differing genotypes for each individual. However, I don’t want STATA to consider the difference in values of egenotype observed eid 0 as diff because it is a difference between “VV” and “.”. How do I rearrange the above command to list samples that differ only in non-missing values of egenotype ?
Thank you
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 36053
#2

06 Oct 2022, 09:43

Segregate the missings so that they are ignored.

Code:

gen ismissing = missing(egenotype) bysort ismissing eid (egenotype) : gen diff = egenotype[1] != egenotype[_N]

Please see also https://www.statalist.org/forums/help#spelling

Last edited by Nick Cox; 06 Oct 2022, 10:25.
Comment
tig som

Join Date: Sep 2022

Posts: 58
#3

06 Oct 2022, 10:24

Thank you very much, Nick Cox!!
Comment

Announcement

Listing observations in a group that differ on non-missing values of a given string variable

Comment

Comment