Question about generating a unique identifier by rows of 4

Therese Gleason

Join Date: Jun 2017

Posts: 3
#1

Question about generating a unique identifier by rows of 4

30 Jun 2017, 15:06

Hello,

Here is my issue:

I need to create a unique identifier for every 4 rows in my data.

It's current lay out has the following columns:

ID EncounterNumber Date systolic diastolic source and position

Systolic, diastolic, source, and position are given in alternating rows. For every blood pressure measured, it takes 4 rows to give the systolic, diastolic, source, and position. If I do the mod (n_,4) command it perfectly creates that a 0,1,2,3 sequence that matches consistently with systolic, diastolic, source, and position, but I want it all to merged in the same row.

ID and EncounterNumber are not unique to these values, a person may have had multiple blood pressures taken on one day, so I need to create a new identifier for every four rows so I can make into one row.

When I tried collapsing by date, ID, and EncounterNumber I lost many observations since there is overlap.

Thanks for any guidance on this issue.
Tags: None

Marcos Almeida

Join Date: Apr 2014
Posts: 4047

30 Jun 2017, 15:48

Welcome to the Stata Forum / Statalist.

You may wish to read this FAQ.

Also you may wish to check whether the code applies to your needs:

Code:

. set obs 12
number of observations (_N) was 0, now 12

. egen myseq = seq(), f(1) t(4)

. list


     +-------+
     | myseq |
     |-------|
  1. |     1 |
  2. |     2 |
  3. |     3 |
  4. |     4 |
  5. |     1 |
     |-------|
  6. |     2 |
  7. |     3 |
  8. |     4 |
  9. |     1 |
 10. |     2 |
     |-------|
 11. |     3 |
 12. |     4 |
     +-------+

Best regards,

Marcos

Comment

Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#3

30 Jun 2017, 16:13

On second thoughts, I see you wish something different.

Best regards,

Marcos
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17704
#4

01 Jul 2017, 02:34

Therese:
welcome to the list.
With such a scant handful of details, interested listers can only support their temptative replies their guess-work (as Marcos did).
Please note that posting an example/excerpt of your dataset via -dataex- (type -search dataex- from within Stata to install it) will increase your chances of getting (more) helpful replies (as reminded by the FAQ). Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35580

01 Jul 2017, 02:54

I endorse all previous comments. If your date are like this, collapse should suffice.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(ID Date systolic diastolic source position)
1 21001 98  .  .  .
1 21001  . 76  .  .
1 21001  .  . 54  .
1 21001  .  .  . 32
end
format %td Date

collapse syst-pos, by(ID Date)

list 

     +----------------------------------------------------------+
     | ID        Date   systolic   diasto~c   source   position |
     |----------------------------------------------------------|
  1. |  1   01jul2017         98         76       54         32 |
     +----------------------------------------------------------+

Comment

Therese Gleason

Join Date: Jun 2017

Posts: 3
#6

01 Jul 2017, 06:16

Hi-
Thanks so much for the responses. My dataset/workspace is in a protected vpn that doesn't allow for easy transfer of logs, my apologies.

This is an example of my problem, there's multiple areas in my data where many blood pressures were taken on the same date, during the same encounter, on the same person. When I collapse by date, or by encounter, it just gives me the last one taken so I lose data. I want to preserve all the data, but there is no other unique identifier to each blood pressure observation other than that every 4 hours it restarts.

Thanks so much, I really appreciate the help and guidance.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17704
#7

01 Jul 2017, 06:59

Threrese:
please note that the way you posted your example makes importing it in Stata impossible.
Please, as per previous replies, for the future use -dataex-, instead (type -search dataex- from within Stata to install it). Thanks.
That said, taking shameless advantage from Nick's extremely helpful code, in case you had (as it seems) multiple observations per patient on the same visit, you may want to consider something along the following lines:

Code:

. input float(ID Date systolic diastolic source position) ID Date systolic diastolic source position 1. 1 21001 98 . . . 2. 1 21001 . 76 . . 3. 1 21001 . . 54 . 4. 1 21001 . . . 32 5. end . format %td Date . expand 2 (4 observations created) . bysort Date: g assessment_n=1 if _n<=4 (4 missing values generated) . bysort Date: replace assessment_n=2 if _n>=5 (4 real changes made) . collapse systolic- position, by( ID Date visit_n)

Kind regards,
Carlo
(Stata 19.0)
Comment

Marcos Almeida

Join Date: Apr 2014
Posts: 4047

01 Jul 2017, 13:32

I need to create a unique identifier for every 4 rows in my data.

Just hazarding a guess:

Code:

. set obs 16
number of observations (_N) was 0, now 16

. egen float myid = seq(), f(1) t(4) b(4)

. list

     +------+
     | myid |
     |------|
  1. |    1 |
  2. |    1 |
  3. |    1 |
  4. |    1 |
  5. |    2 |
     |------|
  6. |    2 |
  7. |    2 |
  8. |    2 |
  9. |    3 |
 10. |    3 |
     |------|
 11. |    3 |
 12. |    3 |
 13. |    4 |
 14. |    4 |
 15. |    4 |
     |------|
 16. |    4 |
     +------+

Best regards,

Marcos

Comment

Therese Gleason

Join Date: Jun 2017

Posts: 3
#9

01 Jul 2017, 19:18

Thanks so much Marcos, Carlo, and Nick, problem solved thanks to you three. I'm incredibly thankful.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#10

02 Jul 2017, 07:27

If - collapse - eventually fails to provide what you want, you may as well fill in (most of) your missing data (by using the - replace - comman, having sorted according to "myid") before trimming the (newly-created) duplicates.

Thank you for informing the query reached a satisfactory result to you.

Best regards,

Marcos
Comment

Announcement

Question about generating a unique identifier by rows of 4

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment