Creating a sequence using two variables

Nick Cox

Join Date: Mar 2014
Posts: 35809

Creating a sequence using two variables

26 Jan 2016, 11:40

Tom Guidry posted this on Sandbox:

I am using Stata 12 on Windows 8.1. I would like to create the variable seq in the sample data below. For each date, the sequence should be the same, 1 for the first date, 2 for the second date, etc for each ptid. The sequence should start over with each new ptid. i have tried generating seq with egen and grouping ptid and date, but the sequence does not start over with ptid and by can't be used with egen in version 12. What would be the best way to create seq?

ptid date seq
1 01/01/2016 1
1 01/01/2016 1
1 01/02/2016 2
1 01/02/2016 2
2 02/01/2016 1
2 02/01/2016 1
2 02/02/2016 2
2 02/02/2016 2
2 03/01/2016 3
2 03/01/2016 3
2 03/01/2016 3
3 04/01/2016 1
3 04/02/2016 1

Some small points:

1. The example data are helpful, but use dataex (SSC) to make it even better. Dates are a pain to re-create unless you do that.

2. Below I use numdate (SSC) to create the daily dates.

Code:

clear

input ptid str10(s_date) wanted
1 "01/01/2016" 1
1 "01/02/2016" 2
1 "01/02/2016" 2
2 "02/01/2016" 1
2 "02/01/2016" 1
2 "02/02/2016" 2
2 "02/02/2016" 2
2 "03/01/2016" 3
2 "03/01/2016" 3
2 "03/01/2016" 3
3 "04/01/2016" 1
3 "04/02/2016" 1
end

numdate daily date = s_date, pattern(MDY)

bysort ptid (date) : gen seq = sum(date != date[_n-1])
capture noisily assert wanted == seq

list, sepby(ptid)

     +----------------------------------------------+
     | ptid       s_date   wanted        date   seq |
     |----------------------------------------------|
  1. |    1   01/01/2016        1   01jan2016     1 |
  2. |    1   01/02/2016        2   02jan2016     2 |
  3. |    1   01/02/2016        2   02jan2016     2 |
     |----------------------------------------------|
  4. |    2   02/01/2016        1   01feb2016     1 |
  5. |    2   02/01/2016        1   01feb2016     1 |
  6. |    2   02/02/2016        2   02feb2016     2 |
  7. |    2   02/02/2016        2   02feb2016     2 |
  8. |    2   03/01/2016        3   01mar2016     3 |
  9. |    2   03/01/2016        3   01mar2016     3 |
 10. |    2   03/01/2016        3   01mar2016     3 |
     |----------------------------------------------|
 11. |    3   04/01/2016        1   01apr2016     1 |
 12. |    3   04/02/2016        1   02apr2016     2 |
     +----------------------------------------------+

The big idea is to create a running tally of how often you hit a new date for each identifier. Score 1 for a date unlike that in the previous observation and then get a cumulative sum.

The calculation reproduces what you want, modulo a small slip in your example.

Tags: None

Tom Guidry

Join Date: Jan 2016

Posts: 2
#2

26 Jan 2016, 12:18

Thanks very much Nick for the solution and the suggestions for posting!
Comment

Announcement

Creating a sequence using two variables

Comment