I am working on a similar command like this
Stata | FAQ: Listing observations in a group that differ on a variable
However, what if egenotype has a missing value for some, and I don’t want Stata to report this case as similar?
How can I use such commands?
To make my question clear I am changing the observations in the above STATA example as follows
. dataex
----------------------- copy starting from the next line -----------------------
------------------ copy up to and including the previous line ------------------
Listed 12 out of 12 observations
I want STATA to list only those samples that are the same in non-missing values of the variable egenotype for each individual.
If I use the command in the above STATA link, that is :
by eid (egenotype), sort: gen same = egenotype[1] == egenotype[_N]
. list eid egenotype if same
+----------------+
| eid egenot~e |
|----------------|
11. | 3 ww |
12. | 3 ww |
+----------------+
----------------------- copy starting from the next line -----------------------
------------------ copy up to and including the previous line ------------------
Listed 12 out of 12 observations
Then, as you can see from the result of the command above, stata report only eid 3 as having similar genotypes for each individual. However, I also want to consider the similarity in the non-missing egenotype observed in eid 1 because eid =1 has also a similar egenotype if we focus only on non-missing values similarity in the group.
How do I rearrange the above command to list samples that have similar values in only non-missing values of egenotype?
It would be great to have your tips.
Thank you.
Stata | FAQ: Listing observations in a group that differ on a variable
However, what if egenotype has a missing value for some, and I don’t want Stata to report this case as similar?
How can I use such commands?
To make my question clear I am changing the observations in the above STATA example as follows
. dataex
----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte eid str2 egenotype 0 "vv" 0 "" 0 "" 1 "ww" 1 "ww" 1 "" 1 "" 2 "vv" 2 "ww" 2 "" 3 "ww" 3 "ww" end
Listed 12 out of 12 observations
I want STATA to list only those samples that are the same in non-missing values of the variable egenotype for each individual.
If I use the command in the above STATA link, that is :
by eid (egenotype), sort: gen same = egenotype[1] == egenotype[_N]
. list eid egenotype if same
+----------------+
| eid egenot~e |
|----------------|
11. | 3 ww |
12. | 3 ww |
+----------------+
----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte eid str2 egenotype float same 0 "" 0 0 "" 0 0 "vv" 0 1 "" 0 1 "" 0 1 "ww" 0 1 "ww" 0 2 "" 0 2 "vv" 0 2 "ww" 0 3 "ww" 1 3 "ww" 1 end
Listed 12 out of 12 observations
Then, as you can see from the result of the command above, stata report only eid 3 as having similar genotypes for each individual. However, I also want to consider the similarity in the non-missing egenotype observed in eid 1 because eid =1 has also a similar egenotype if we focus only on non-missing values similarity in the group.
How do I rearrange the above command to list samples that have similar values in only non-missing values of egenotype?
It would be great to have your tips.
Thank you.
Comment