Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Listing observations in a group that differ on non-missing values of a given string variable

    I am working on a similar command like this
    Stata | FAQ: Listing observations in a group that differ on a variable
    However, what if egenotype has a missing value for some and I don’t want Stata to report this case as diff. How can I use such commands?
    To make my question clear I am changing the observations in the above STATA example as follows


    eid egenotype
    0 vv
    0 .
    1 vv
    1 ww
    2 ww
    2 vv
    2 .

    I want STATA to list only those samples that differ in non-missing values of the variable egenotype for each individuals.
    If I use the command in the above STATA link, that is :
    by eid (egenotype), sort: gen diff = egenotype[1] != egenotype[_N]
    . list eid egenotype if diff
    Then, STATA reports the eid 0, 1, and 2 as having differing genotypes for each individual. However, I don’t want STATA to consider the difference in values of egenotype observed eid 0 as diff because it is a difference between “VV” and “.”. How do I rearrange the above command to list samples that differ only in non-missing values of egenotype ?
    Thank you

  • #2
    Segregate the missings so that they are ignored.

    Code:
    gen ismissing = missing(egenotype)
    bysort ismissing eid (egenotype) : gen diff = egenotype[1] != egenotype[_N]
    Please see also https://www.statalist.org/forums/help#spelling
    Last edited by Nick Cox; 06 Oct 2022, 10:25.

    Comment


    • #3
      Thank you very much, Nick Cox!!

      Comment

      Working...
      X