Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counter of unique observations contingent on a second variable

    Hi all,

    I have a question regarding building a counter as a new variable. If we consider the following table (I tried to use dataex, is this a correct use of it?):

    input int(A B)
    1 1 1
    1 2 2
    1 2 2
    1 3 3
    1 3 3
    1 3 3
    1 4 4
    1 4 4
    2 5 1
    2 5 1
    2 6 2
    2 5 2
    2 7 3

    The first two variables are as given in my dataset. The third variable was added manually by me and is the variable I would like to create. It is supposed to show the number of unique counts of observations of variable B for a given variable A. For example: Variable A takes on the value 1 in the first eight observations. For these observations I would like to start a counter starting at 1 increasing incrementally for each new unique observation in variable B. The same process starts again with the next observation 2 in variable A. Again I would like to start a counter starting at 1 increasing incrementally for each new unique observation in variable B.

    I would appreciate any help. Thank you.

  • #2
    Code:
    gen uniqueCounter = 1
    bysort A (B): replace uniqueCounter = uniqueCounter[_n-1] + 1 *(B != B[_n-1]) if _n != 1
    Does that do what you want? The second line operates by A-value and sorts the data by the B variable. The value of the counter is equal to the value in the previous row (_n-1) and adds one to it if the current value of B is different from the value of B in the previous row (B[_n-1]). The second line does nothing for the first observation within each A-group, because there, B[_n-1] would be missing (that's the if _n != 1) part.

    Comment


    • #3
      This is longer but may be a little easier to think through than Jesse's code.

      The trick is to tag the first occurrence of each distinct B within A with 1 (other observations being tagged with 0) and then to add them up.

      The trickery is to get the sort order right. This wasn't my first attempt.

      In a real problem you may have an identifier already, say a time variable. Given the example, I had to invent one.

      Code:
      clear
      
      input int(A B C)
      1 1 1
      1 2 2
      1 2 2
      1 3 3
      1 3 3
      1 3 3
      1 4 4
      1 4 4
      2 5 1
      2 5 1
      2 6 2
      2 5 2
      2 7 3
      end
      
      gen long id = _n
      bysort A B (id): gen wanted = _n == 1
      bysort A (id): replace wanted = sum(wanted)
      
      list, sepby(A)
      
      
           +-------------------------+
           | A   B   C   id   wanted |
           |-------------------------|
        1. | 1   1   1    1        1 |
        2. | 1   2   2    2        2 |
        3. | 1   2   2    3        2 |
        4. | 1   3   3    4        3 |
        5. | 1   3   3    5        3 |
        6. | 1   3   3    6        3 |
        7. | 1   4   4    7        4 |
        8. | 1   4   4    8        4 |
           |-------------------------|
        9. | 2   5   1    9        1 |
       10. | 2   5   1   10        1 |
       11. | 2   6   2   11        2 |
       12. | 2   5   2   12        2 |
       13. | 2   7   3   13        3 |
           +-------------------------+
      See also the FAQ How do I calculate the number of distinct values seen so far?
      https://www.stata.com/support/faqs/d...stinct-values/
      Last edited by Nick Cox; 05 Jul 2018, 06:37.

      Comment


      • #4
        1. Jesse's solution could be coded in 1-line coding, which might be also easier to explain the logic.
        Code:
        bys A (B): gen D = sum(B != B[_n-1])
        2. There is a difference in the output of this solution and Nick's one in the observation 12, which makes me confused about what Filipp is seeking for.

        Comment


        • #5
          For why people should (usually) say "distinct", not "unique" see https://www.stata-journal.com/sjpdf....iclenum=dm0042 p.558

          Comment


          • #6
            Very “enjoyable” is your paper, Nick sensei.

            And your solution at #3, given the description of Fillip at #1, is not merely an easier one as you said, but exactly the right solution for Fillip. Jesse’s and mine are going for other (if not wrong) dimension.

            Comment


            • #7
              Romalpa: Thanks for your very nice remarks.

              Comment

              Working...
              X