Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about generating a unique identifier by rows of 4

    Hello,

    Here is my issue:

    I need to create a unique identifier for every 4 rows in my data.

    It's current lay out has the following columns:

    ID EncounterNumber Date systolic diastolic source and position

    Systolic, diastolic, source, and position are given in alternating rows. For every blood pressure measured, it takes 4 rows to give the systolic, diastolic, source, and position. If I do the mod (n_,4) command it perfectly creates that a 0,1,2,3 sequence that matches consistently with systolic, diastolic, source, and position, but I want it all to merged in the same row.

    ID and EncounterNumber are not unique to these values, a person may have had multiple blood pressures taken on one day, so I need to create a new identifier for every four rows so I can make into one row.

    When I tried collapsing by date, ID, and EncounterNumber I lost many observations since there is overlap.

    Thanks for any guidance on this issue.

  • #2
    Welcome to the Stata Forum / Statalist.

    You may wish to read this FAQ.

    Also you may wish to check whether the code applies to your needs:

    Code:
    . set obs 12
    number of observations (_N) was 0, now 12
    
    . egen myseq = seq(), f(1) t(4)
    
    . list
    
    
         +-------+
         | myseq |
         |-------|
      1. |     1 |
      2. |     2 |
      3. |     3 |
      4. |     4 |
      5. |     1 |
         |-------|
      6. |     2 |
      7. |     3 |
      8. |     4 |
      9. |     1 |
     10. |     2 |
         |-------|
     11. |     3 |
     12. |     4 |
         +-------+
    Best regards,

    Marcos

    Comment


    • #3
      On second thoughts, I see you wish something different.
      Best regards,

      Marcos

      Comment


      • #4
        Therese:
        welcome to the list.
        With such a scant handful of details, interested listers can only support their temptative replies their guess-work (as Marcos did).
        Please note that posting an example/excerpt of your dataset via -dataex- (type -search dataex- from within Stata to install it) will increase your chances of getting (more) helpful replies (as reminded by the FAQ). Thanks.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          I endorse all previous comments. If your date are like this, collapse should suffice.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(ID Date systolic diastolic source position)
          1 21001 98  .  .  .
          1 21001  . 76  .  .
          1 21001  .  . 54  .
          1 21001  .  .  . 32
          end
          format %td Date
          
          collapse syst-pos, by(ID Date)
          
          list 
          
               +----------------------------------------------------------+
               | ID        Date   systolic   diasto~c   source   position |
               |----------------------------------------------------------|
            1. |  1   01jul2017         98         76       54         32 |
               +----------------------------------------------------------+

          Comment


          • #6
            Hi-
            Thanks so much for the responses. My dataset/workspace is in a protected vpn that doesn't allow for easy transfer of logs, my apologies.

            This is an example of my problem, there's multiple areas in my data where many blood pressures were taken on the same date, during the same encounter, on the same person. When I collapse by date, or by encounter, it just gives me the last one taken so I lose data. I want to preserve all the data, but there is no other unique identifier to each blood pressure observation other than that every 4 hours it restarts.

            Click image for larger version

Name:	Screen Shot 2017-07-01 at 8.12.01 AM.png
Views:	1
Size:	19.4 KB
ID:	1400009


            Thanks so much, I really appreciate the help and guidance.

            Comment


            • #7
              Threrese:
              please note that the way you posted your example makes importing it in Stata impossible.
              Please, as per previous replies, for the future use -dataex-, instead (type -search dataex- from within Stata to install it). Thanks.
              That said, taking shameless advantage from Nick's extremely helpful code, in case you had (as it seems) multiple observations per patient on the same visit, you may want to consider something along the following lines:
              Code:
              . input float(ID Date systolic diastolic source position)
              
                          ID       Date   systolic  diastolic     source   position
                1. 1 21001 98  .  .  .
                2. 1 21001  . 76  .  .
                3. 1 21001  .  . 54  .
                4. 1 21001  .  .  . 32
                5. end
              
              . format %td Date
              
              . expand 2
              (4 observations created)
              
              . bysort Date: g assessment_n=1 if _n<=4
              (4 missing values generated)
              
              . bysort Date: replace assessment_n=2 if _n>=5
              (4 real changes made)
              
              . collapse systolic- position, by( ID Date visit_n)
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                I need to create a unique identifier for every 4 rows in my data.
                Just hazarding a guess:

                Code:
                . set obs 16
                number of observations (_N) was 0, now 16
                
                . egen float myid = seq(), f(1) t(4) b(4)
                
                . list
                
                     +------+
                     | myid |
                     |------|
                  1. |    1 |
                  2. |    1 |
                  3. |    1 |
                  4. |    1 |
                  5. |    2 |
                     |------|
                  6. |    2 |
                  7. |    2 |
                  8. |    2 |
                  9. |    3 |
                 10. |    3 |
                     |------|
                 11. |    3 |
                 12. |    3 |
                 13. |    4 |
                 14. |    4 |
                 15. |    4 |
                     |------|
                 16. |    4 |
                     +------+
                Best regards,

                Marcos

                Comment


                • #9
                  Thanks so much Marcos, Carlo, and Nick, problem solved thanks to you three. I'm incredibly thankful.

                  Comment


                  • #10
                    If - collapse - eventually fails to provide what you want, you may as well fill in (most of) your missing data (by using the - replace - comman, having sorted according to "myid") before trimming the (newly-created) duplicates.

                    Thank you for informing the query reached a satisfactory result to you.
                    Best regards,

                    Marcos

                    Comment

                    Working...
                    X