Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a sequence using two variables

    Tom Guidry posted this on Sandbox:

    I am using Stata 12 on Windows 8.1. I would like to create the variable seq in the sample data below. For each date, the sequence should be the same, 1 for the first date, 2 for the second date, etc for each ptid. The sequence should start over with each new ptid. i have tried generating seq with egen and grouping ptid and date, but the sequence does not start over with ptid and by can't be used with egen in version 12. What would be the best way to create seq?

    ptid date seq
    1 01/01/2016 1
    1 01/01/2016 1
    1 01/02/2016 2
    1 01/02/2016 2
    2 02/01/2016 1
    2 02/01/2016 1
    2 02/02/2016 2
    2 02/02/2016 2
    2 03/01/2016 3
    2 03/01/2016 3
    2 03/01/2016 3
    3 04/01/2016 1
    3 04/02/2016 1
    Some small points:

    1. The example data are helpful, but use dataex (SSC) to make it even better. Dates are a pain to re-create unless you do that.

    2. Below I use numdate (SSC) to create the daily dates.

    Code:
    clear
    
    input ptid str10(s_date) wanted
    1 "01/01/2016" 1
    1 "01/02/2016" 2
    1 "01/02/2016" 2
    2 "02/01/2016" 1
    2 "02/01/2016" 1
    2 "02/02/2016" 2
    2 "02/02/2016" 2
    2 "03/01/2016" 3
    2 "03/01/2016" 3
    2 "03/01/2016" 3
    3 "04/01/2016" 1
    3 "04/02/2016" 1
    end
    
    numdate daily date = s_date, pattern(MDY)
    
    bysort ptid (date) : gen seq = sum(date != date[_n-1])
    capture noisily assert wanted == seq
    
    list, sepby(ptid)
    
         +----------------------------------------------+
         | ptid       s_date   wanted        date   seq |
         |----------------------------------------------|
      1. |    1   01/01/2016        1   01jan2016     1 |
      2. |    1   01/02/2016        2   02jan2016     2 |
      3. |    1   01/02/2016        2   02jan2016     2 |
         |----------------------------------------------|
      4. |    2   02/01/2016        1   01feb2016     1 |
      5. |    2   02/01/2016        1   01feb2016     1 |
      6. |    2   02/02/2016        2   02feb2016     2 |
      7. |    2   02/02/2016        2   02feb2016     2 |
      8. |    2   03/01/2016        3   01mar2016     3 |
      9. |    2   03/01/2016        3   01mar2016     3 |
     10. |    2   03/01/2016        3   01mar2016     3 |
         |----------------------------------------------|
     11. |    3   04/01/2016        1   01apr2016     1 |
     12. |    3   04/02/2016        1   02apr2016     2 |
         +----------------------------------------------+
    The big idea is to create a running tally of how often you hit a new date for each identifier. Score 1 for a date unlike that in the previous observation and then get a cumulative sum.

    The calculation reproduces what you want, modulo a small slip in your example.

  • #2
    Thanks very much Nick for the solution and the suggestions for posting!

    Comment

    Working...
    X