Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Count distinct observations per row for several variables

    Dear Users,

    I have a dataset that contains many variables, but in this instance 84 variables of interest (cal_MeX where X=11-17, 21-27, 31-37, 41-47; cal_MiX where X=11-17, 21-27, 31-37, 41-47; cal_DisX where X=11-17, 21-27, 31-37, 41-47 - in other words (cal_Me11-cal_Dis47) (representing 84 variables). Each participant (row) has an observation for each of these variables (although there are some missing within each row).

    Each variable represents a depth measurement (integer) at a separate site, and the integers range from -3 to 6.

    I am trying to write syntax to count the number of observations that ==2. I am hoping to generate a variable (nsites2) that when tabulated will give a frequency distribution of the number of observations that ==2 per row e.g.

    tab nsites2 // would hopefully give:

    ==2 Freq.
    0 x
    1 y
    2 z

    and so on, with x, y, z indicating how many observations ==2 there are per row in the dataset.

    I have tried egen anycount; egen count; egen rowtotal, (followed by one of the if statements below), but it appears that a cumulative total/sum results, not a count of the observations.

    Also, I am having some trouble with how to indicate the range of variables of interest. I have tried:

    if (cal_Me11-cal_Dis47) ==2

    and also

    if inlist(2, cal_Me11, cal_Me12, cal_Me13, cal_Me14, cal_Me15, cal_Me16, cal_Me17, cal_Me21, cal_Me22, cal_Me23, cal_Me24, cal_Me25, cal_Me26, cal_Me27, cal_Me31, cal_Me32, cal_Me33, cal_Me34, cal_Me35, cal_Me36, cal_Me37, cal_Me41, cal_Me42, cal_Me43, cal_Me44, cal_Me45, cal_Me46, cal_Me47, cal_Mi11, cal_Mi12, cal_Mi13, cal_Mi14, cal_Mi15, cal_Mi16, cal_Mi17, cal_Mi21, cal_Mi22, cal_Mi23, cal_Mi24, cal_Mi25, cal_Mi26, cal_Mi27, cal_Mi31, cal_Mi32, cal_Mi33, cal_Mi34, cal_Mi35, cal_Mi36, cal_Mi37, cal_Mi41, cal_Mi42, cal_Mi43, cal_Mi44, cal_Mi45, cal_Mi46, cal_Mi47, cal_Dis11, cal_Dis12, cal_Dis13, cal_Dis14, cal_Dis15, cal_Dis16, cal_Dis17, cal_Dis21, cal_Dis22, cal_Dis23, cal_Dis24, cal_Dis25, cal_Dis26, cal_Dis27, cal_Dis31, cal_Dis32, cal_Dis33, cal_Dis34, cal_Dis35, cal_Dis36, cal_Dis37, cal_Dis41, cal_Dis42, cal_Dis43, cal_Dis44, cal_Dis45, cal_Dis46, cal_Dis47)

    but these each give different results (or an invalid syntax or a code 198 error); therefore, I am not sure which syntax to use.

    I apologise if this question has been asked before (I have searched for hours and could not find a solution that fits my particular problem with such a long lost of variables of interest).
    Please let me know if I need to provide more information.

    Thank you for your time and help, it is much appreciated.

  • #2
    Please do study http://www.statalist.org/forums/help#stata on the strong importance of concrete data examples and showing exactly what code you tried. In each case you've given part of a command only. You'd need to give a simplified data example, naturally.

    As I understand it you want to count occurrences of 2 in values that are in each observation across a set of variables.

    Note that in Stata an observation is an entire row of the dataset, not the value of a single variable.

    Hence this appears to be similar to your problem:

    Code:
    . clear
    
    . input id y1 y2 y3
    
                id         y1         y2         y3
      1. 1  1 2 3
      2. 2  2 2 3
      3. 3  2 2 2
      4. end
    
    .
    . egen count2 = anycount(y1-y3), value(2)
    
    .
    . list
    
         +----------------------------+
         | id   y1   y2   y3   count2 |
         |----------------------------|
      1. |  1    1    2    3        1 |
      2. |  2    2    2    3        2 |
      3. |  3    2    2    2        3 |
         +----------------------------+
    Note that

    Code:
    if (cal_Me11-cal_Dis47) ==2
    as an if command means

    Code:
    if (cal_Me11[1]-cal_Dis4[1]) ==2  <whatever>
    and as an if qualifier means

    Code:
    <whatever> if (cal_Me11-cal_Dis4) ==2
    and either way pivots on the difference, cal_Me11 minus calDis4, because either form of if takes an expression as argument, not a varlist

    Your inlist() call could only tell you whether there are any occurrences of 2 in an observation; it wouldn't count them for you.

    In your case, I think you want something like


    Code:
    egen count2 = anycount(ca*), value(2)
    and in this context a varlist is not only allowed, but compulsory.
    Last edited by Nick Cox; 29 May 2016, 09:22.

    Comment


    • #3
      You can try something like this:

      gen count = 0
      foreach var of varlist yourfirstvariable - yourlastvariable {
      replace count = count + 1 if `var' == 2
      }

      Comment


      • #4
        Cyrus's approach is what egen, anycount() does, reconstructed from first principles. In practice you would want to slap quietly on the loop.

        Comment


        • #5
          Hi Nick & Cyrus

          Thank you very much for your help.

          Sorry for my poorly written post.

          It appears that Nick's latest suggestion did the trick:

          egen count2 = anycount(ca*), value(2)


          Thanks again.

          Comment

          Working...
          X