Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • _n not working as a running counter within a by-group

    Dear Statalist,

    I normally use _n within a by-group as the running counter within the group. Examples of this are option 1 and 2. However, it appears not to be working in option 3 as it should. It seems to me it should give same results as option 1 but it appears that the 'max' function is taking the running counter of the complete database. I include option 2 as to show this is not the case for other functions in egen.

    I'm using STATA version 14.2

    Any ideas?

    Code:
    clear
    set obs 10
    gen group=ceil(_n*2/_N)
    
    gen n=_n
    sort group n
    
    *Option 1
    by group: gen n1=_n //Works as running counter within a by-group
    by group: egen o1=max(n1)
    
    *Option 2:
    by group: egen o2=mean(_n) //Works as running counter within a by-group
    
    *Option 3:
    by group: egen o3=max(_n) //Works as running counter within the complete dataset

    HTML Code:
         +--------------------------------+
         | group    n   n1   o1   o2   o3 |
         |--------------------------------|
      1. |     1    1    1    5    3    5 |
      2. |     1    2    2    5    3    5 |
      3. |     1    3    3    5    3    5 |
      4. |     1    4    4    5    3    5 |
      5. |     1    5    5    5    3    5 |
      6. |     2    6    1    5    3   10 |
      7. |     2    7    2    5    3   10 |
      8. |     2    8    3    5    3   10 |
      9. |     2    9    4    5    3   10 |
     10. |     2   10    5    5    3   10 |
         +--------------------------------+
        

    Thanks!

    Francisco Pardo

  • #2
    The problem with

    Code:
     
     by group: egen o3=max(_n) //Works as running counter within the complete dataset
    is twofold. First, and more important, running egen on subscripts is contrary to explicit advice in the help. Here that is:

    Explicit subscripting (using _N and _n), which is commonly used with generate, should not be used with egen.
    Second, if you look at what happens within max() given what you feed it (viewsource _gmax.ado) that boils down to a first step which uses the observation number within the entire dataset:

    Code:
    gen double `x' = _n
    and then -- given your by group: -- it finds the maximum of that variable within each group.

    Hence in effect Stata warns you against doing what you did. Being surprised at what it does if you insist is secondary.


    Comment


    • #3
      Thanks, Nick. You are right, that clear things up.

      Comment

      Working...
      X