Count distinct observations per row for several variables

William Fogarty

Join Date: May 2016

Posts: 2
#1

Count distinct observations per row for several variables

29 May 2016, 08:18

Dear Users,

I have a dataset that contains many variables, but in this instance 84 variables of interest (cal_MeX where X=11-17, 21-27, 31-37, 41-47; cal_MiX where X=11-17, 21-27, 31-37, 41-47; cal_DisX where X=11-17, 21-27, 31-37, 41-47 - in other words (cal_Me11-cal_Dis47) (representing 84 variables). Each participant (row) has an observation for each of these variables (although there are some missing within each row).

Each variable represents a depth measurement (integer) at a separate site, and the integers range from -3 to 6.

I am trying to write syntax to count the number of observations that ==2. I am hoping to generate a variable (nsites2) that when tabulated will give a frequency distribution of the number of observations that ==2 per row e.g.

tab nsites2 // would hopefully give:

==2 Freq.
0 x
1 y
2 z

and so on, with x, y, z indicating how many observations ==2 there are per row in the dataset.

I have tried egen anycount; egen count; egen rowtotal, (followed by one of the if statements below), but it appears that a cumulative total/sum results, not a count of the observations.

Also, I am having some trouble with how to indicate the range of variables of interest. I have tried:

if (cal_Me11-cal_Dis47) ==2

and also

if inlist(2, cal_Me11, cal_Me12, cal_Me13, cal_Me14, cal_Me15, cal_Me16, cal_Me17, cal_Me21, cal_Me22, cal_Me23, cal_Me24, cal_Me25, cal_Me26, cal_Me27, cal_Me31, cal_Me32, cal_Me33, cal_Me34, cal_Me35, cal_Me36, cal_Me37, cal_Me41, cal_Me42, cal_Me43, cal_Me44, cal_Me45, cal_Me46, cal_Me47, cal_Mi11, cal_Mi12, cal_Mi13, cal_Mi14, cal_Mi15, cal_Mi16, cal_Mi17, cal_Mi21, cal_Mi22, cal_Mi23, cal_Mi24, cal_Mi25, cal_Mi26, cal_Mi27, cal_Mi31, cal_Mi32, cal_Mi33, cal_Mi34, cal_Mi35, cal_Mi36, cal_Mi37, cal_Mi41, cal_Mi42, cal_Mi43, cal_Mi44, cal_Mi45, cal_Mi46, cal_Mi47, cal_Dis11, cal_Dis12, cal_Dis13, cal_Dis14, cal_Dis15, cal_Dis16, cal_Dis17, cal_Dis21, cal_Dis22, cal_Dis23, cal_Dis24, cal_Dis25, cal_Dis26, cal_Dis27, cal_Dis31, cal_Dis32, cal_Dis33, cal_Dis34, cal_Dis35, cal_Dis36, cal_Dis37, cal_Dis41, cal_Dis42, cal_Dis43, cal_Dis44, cal_Dis45, cal_Dis46, cal_Dis47)

but these each give different results (or an invalid syntax or a code 198 error); therefore, I am not sure which syntax to use.

I apologise if this question has been asked before (I have searched for hours and could not find a solution that fits my particular problem with such a long lost of variables of interest).
Please let me know if I need to provide more information.

Thank you for your time and help, it is much appreciated.
Tags: count, observations, several variables
Nick Cox

Join Date: Mar 2014

Posts: 35698
#2

29 May 2016, 08:37

Please do study http://www.statalist.org/forums/help#stata on the strong importance of concrete data examples and showing exactly what code you tried. In each case you've given part of a command only. You'd need to give a simplified data example, naturally.

As I understand it you want to count occurrences of 2 in values that are in each observation across a set of variables.

Note that in Stata an observation is an entire row of the dataset, not the value of a single variable.

Hence this appears to be similar to your problem:

Code:

. clear . input id y1 y2 y3 id y1 y2 y3 1. 1 1 2 3 2. 2 2 2 3 3. 3 2 2 2 4. end . . egen count2 = anycount(y1-y3), value(2) . . list +----------------------------+ | id y1 y2 y3 count2 | |----------------------------| 1. | 1 1 2 3 1 | 2. | 2 2 2 3 2 | 3. | 3 2 2 2 3 | +----------------------------+

Note that

Code:

if (cal_Me11-cal_Dis47) ==2

as an if command means

Code:

if (cal_Me11[1]-cal_Dis4[1]) ==2 <whatever>

and as an if qualifier means

Code:

<whatever> if (cal_Me11-cal_Dis4) ==2

and either way pivots on the difference, cal_Me11 minus calDis4, because either form of if takes an expression as argument, not a varlist

Your inlist() call could only tell you whether there are any occurrences of 2 in an observation; it wouldn't count them for you.

In your case, I think you want something like

Code:

egen count2 = anycount(ca*), value(2)

and in this context a varlist is not only allowed, but compulsory.

Last edited by Nick Cox; 29 May 2016, 09:22.
Comment
Cyrus Levy

Join Date: Nov 2014

Posts: 99
#3

29 May 2016, 08:39

You can try something like this:

gen count = 0
foreach var of varlist yourfirstvariable - yourlastvariable {
replace count = count + 1 if `var' == 2
}
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#4

29 May 2016, 09:24

Cyrus's approach is what egen, anycount() does, reconstructed from first principles. In practice you would want to slap quietly on the loop.
Comment
William Fogarty

Join Date: May 2016

Posts: 2
#5

29 May 2016, 15:48

Hi Nick & Cyrus

Thank you very much for your help.

Sorry for my poorly written post.

It appears that Nick's latest suggestion did the trick:

egen count2 = anycount(ca*), value(2)

Thanks again.
Comment

Announcement

Count distinct observations per row for several variables

Comment

Comment

Comment

Comment