Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • alternative to using by and egen functions to group variables

    Hi
    I am trying delete rows in my dataset where the patid and obsid are the same. I have attempted to group them using this syntax

    by patid: egen group1 = group(patid obsdate)

    but get error message
    egen ... group() may not be combined with by
    r(190);

    if I use this syntax

    egen group1 = group(patid obsdate)

    it just numbers all of the obsdate but doesnt group them by patid.

    what i would like is the numbering to restart with a new patid.



    patid obsdate
    7320018 19may2018
    7320018 19may2018
    7320018 04jun2018
    7320018 04jun2018
    16220018 01jan1984
    16220018 01jan1984
    16220018 16may2001
    16220018 16may2001
    16220018 21dec2001
    16220018 21dec2001
    16220018 14feb2003
    16220018 14feb2003


    thanks

    Jennifer

  • #2
    Code:
    by patid (obsdate), sort: gen wanted = _n

    Comment


    • #3
      Jennifer:
      see -help duplicates-.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Hi Clyde,

        Thanks for your reply. This code generates the wanted variable but it doesnt group the same obsdates together. What i am trying to achieve is patids with the same obsdate having the same number assigned to them.

        Thanks
        Jennifer

        Comment


        • #5
          Code:
          by patid (obsdate), sort: gen wanted = sum(obs_date != obs_date[_n-1])

          Comment


          • #6
            This is still hard to follow. Perhaps you want

            Code:
            bysort patid (obsdate) : gen wanted = sum(obsdate != obsdate[_n-1]) 
            If this isn't the answer, please back up and give a data example using dataex including an explicit example of what the new variable should look like. Please see https://www.statalist.org/forums/help#stata in which it is explicit that a data example as requested is especially helpful, indeed often essential, when dates are present.

            Like this:

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input long patid float obsdate
             7320018 21323
             7320018 21323
             7320018 21339
             7320018 21339
            16220018  8766
            16220018  8766
            16220018 15111
            16220018 15111
            16220018 15330
            16220018 15330
            16220018 15750
            16220018 15750
            end
            format %td obsdate
            
            bysort patid (obsdate) : gen wanted = sum(obsdate != obsdate[_n-1]) 
            
            list, sepby(patid obsdate)
            
                 +-------------------------------+
                 |    patid     obsdate   wanted |
                 |-------------------------------|
              1. |  7320018   19may2018        1 |
              2. |  7320018   19may2018        1 |
                 |-------------------------------|
              3. |  7320018   04jun2018        2 |
              4. |  7320018   04jun2018        2 |
                 |-------------------------------|
              5. | 16220018   01jan1984        1 |
              6. | 16220018   01jan1984        1 |
                 |-------------------------------|
              7. | 16220018   16may2001        2 |
              8. | 16220018   16may2001        2 |
                 |-------------------------------|
              9. | 16220018   21dec2001        3 |
             10. | 16220018   21dec2001        3 |
                 |-------------------------------|
             11. | 16220018   14feb2003        4 |
             12. | 16220018   14feb2003        4 |
                 +-------------------------------+

            Comment


            • #7
              bysort patid obsdate: gen dups = _n

              this got me what I wanted!

              thanks for all your replies

              Comment

              Working...
              X