Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • tsfill for time series data, ending fill at specific dates by subgroups

    I have panel data, with an id variable, two time variables: t0 (first date); t1(end date), and a spell variable that subsets time periods into groups within an id.
    I want to use tsfill to generate daily time between each spell from the first date (t0) and ending at the end date (t1) of the spell.

    I can't seem to find an option for tsfill to end at the last date for each spell, the command just fills daily dates until the next id.
    For instance, for id=1 and spell=1 I want to have dates generated from 06feb2012 and end at 27sep2012, and for id=1 spell=2 to have dates filled only from 10oct2012 to 09dec2012...
    is this possible with tsfill?


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int id float(spell t0 t1)
    1  1 19029     .
    1  1     .     .
    1  1     .     .
    1  1     .     .
    1  1     . 19263
    1  2 19276 19336
    1  3 19343     .
    1  3     .     .
    1  3     .     .
    1  3     . 19493
    1  4 19495     .
    1  4     . 19612
    2  1 19036 19096
    2  2 19121 19181
    2  3 19185     .
    2  3     .     .
    2  3     .     .
    2  3     .     .
    2  3     . 19387
    2  4 19394     .
    2  4     .     .
    2  4     . 19474
    2  5 19482     .
    2  5     . 19567
    2  6 19588 19648
    2  7 19674     .
    2  7     .     .
    2  7     . 19778
    2  8 19804     .
    2  8     . 19875
    2  9 19887 19947
    2 10 19967     .
    2 10     . 20033
    2 11 20063     .
    2 11     .     .
    2 11     . 20174
    2 12 20202     .
    2 12     .     .
    2 12     . 20316
    2 13 20346     .
    2 13     . 20453
    2 14 20462     .
    2 14     . 20525
    2 15 20537     .
    2 15     .     .
    2 15     . 20652
    3  1 19455     .
    3  1     . 19516
    3  2 19543 19603
    3  3 19631 19691
    3  4 19706     .
    3  4     . 19802
    3  5 20008     .
    3  5     .     .
    3  5     . 20118
    3  6 20309 20369
    3  7 20371 20431
    3  8 20449     .
    3  8     .     .
    3  8     .     .
    3  8     .     .
    3  8     .     .
    3  8     .     .
    3  8     . 20747
    4  1 20620     .
    4  1     .     .
    4  1     . 20723
    4  2 20746 20806
    5  1 19016     .
    7  1 19015     .
    7  1     .     .
    7  1     . 19141
    7  2 19151 19211
    7  3 19221     .
    7  3     . 19308
    7  4 19328     .
    7  4     .     .
    7  4     .     .
    7  4     .     .
    7  4     .     .
    7  4     .     .
    7  4     .     .
    7  4     .     .
    7  4     .     .
    7  4     .     .
    7  4     .     .
    7  4     .     .
    7  4     . 19757
    7  5 19763     .
    7  5     .     .
    7  5     .     .
    7  5     .     .
    7  5     .     .
    7  5     .     .
    7  5     .     .
    7  5     . 20099
    7  6 20123 20183
    7  7 20209 20269
    7  8 20270 20330
    9  1 19068 19128
    end
    format %td t0
    format %td t1

  • #2
    I am having trouble understanding your data. Looking at id==1 we have
    Code:
         +------------------------------------+
         | id   spell          t0          t1 |
         |------------------------------------|
      1. |  1       1   06feb2012           . |
      2. |  1       1           .           . |
      3. |  1       1           .           . |
      4. |  1       1           .           . |
      5. |  1       1           .   27sep2012 |
         |------------------------------------|
      6. |  1       2   10oct2012   09dec2012 |
         |------------------------------------|
      7. |  1       3   16dec2012           . |
      8. |  1       3           .           . |
      9. |  1       3           .           . |
     10. |  1       3           .   15may2013 |
         |------------------------------------|
     11. |  1       4   17may2013           . |
     12. |  1       4           .   11sep2013 |
         +------------------------------------+
    Variables t0 and t1 seem to separately identify the start date and end date of the spells. I'd be willing to guess that those are the dates associated with observations 1 and 5, but then, what dates are associated with observations 2, 3, and 4? And what is the deal with observation 6 - one observation with a start date and an end date?

    Is there some third date variable that indicates the date associated with each observation?

    What happens to the observations like 2, 3, and 4 that have no date?

    Comment


    • #3
      Originally posted by William Lisowski View Post
      I am having trouble understanding your data. Looking at id==1 we have
      Code:
      +------------------------------------+
      | id spell t0 t1 |
      |------------------------------------|
      1. | 1 1 06feb2012 . |
      2. | 1 1 . . |
      3. | 1 1 . . |
      4. | 1 1 . . |
      5. | 1 1 . 27sep2012 |
      |------------------------------------|
      6. | 1 2 10oct2012 09dec2012 |
      |------------------------------------|
      7. | 1 3 16dec2012 . |
      8. | 1 3 . . |
      9. | 1 3 . . |
      10. | 1 3 . 15may2013 |
      |------------------------------------|
      11. | 1 4 17may2013 . |
      12. | 1 4 . 11sep2013 |
      +------------------------------------+
      Variables t0 and t1 seem to separately identify the start date and end date of the spells. I'd be willing to guess that those are the dates associated with observations 1 and 5, but then, what dates are associated with observations 2, 3, and 4? And what is the deal with observation 6 - one observation with a start date and an end date?

      Is there some third date variable that indicates the date associated with each observation?

      What happens to the observations like 2, 3, and 4 that have no date?
      The issue is: one spell must consist of at least 60 days, so for spell 2, there really only was one observation for t0, but for t1 it is just 60 days past t0 (as this is the last date observed for that spell).All t1 dates = the last t0 date (in a spell) plus 60 days.

      There are dates for each missing value '.' but I just took the first and last occurrence for each period because this is the range I need to use tsfill on. Would it be clearer with dataex at the end?
      Below, for spell=2, t0 is the only date in the 'time' variable, but I still need to fill dates for 60 days past 10oct2012 after using tsfill.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int id float(spell t0 t1 time)
      1  1 19029     . 19029
      1  1     .     . 19075
      1  1     .     . 19116
      1  1     .     . 19176
      1  1     . 19263 19263
      1  2 19276 19336 19276
      1  3 19343     . 19343
      1  3     .     . 19387
      1  3     .     . 19408
      1  3     . 19493 19493
      1  4 19495     . 19495
      1  4     . 19612 19612
      2  1 19036 19096 19036
      2  2 19121 19181 19121
      2  3 19185     . 19185
      2  3     .     . 19218
      2  3     .     . 19278
      2  3     .     . 19287
      2  3     . 19387 19387
      2  4 19394     . 19394
      2  4     .     . 19407
      2  4     . 19474 19474
      2  5 19482     . 19482
      2  5     . 19567 19567
      2  6 19588 19648 19588
      2  7 19674     . 19674
      2  7     .     . 19692
      2  7     . 19778 19778
      2  8 19804     . 19804
      2  8     . 19875 19875
      2  9 19887 19947 19887
      2 10 19967     . 19967
      2 10     . 20033 20033
      2 11 20063     . 20063
      2 11     .     . 20071
      2 11     . 20174 20174
      2 12 20202     . 20202
      2 12     .     . 20220
      2 12     . 20316 20316
      2 13 20346     . 20346
      2 13     . 20453 20453
      2 14 20462     . 20462
      2 14     . 20525 20525
      2 15 20537     . 20537
      2 15     .     . 20547
      2 15     . 20652 20652
      3  1 19455     . 19455
      3  1     . 19516 19516
      3  2 19543 19603 19543
      3  3 19631 19691 19631
      3  4 19706     . 19706
      3  4     . 19802 19802
      3  5 20008     . 20008
      3  5     .     . 20036
      3  5     . 20118 20118
      3  6 20309 20369 20309
      3  7 20371 20431 20371
      3  8 20449     . 20449
      3  8     .     . 20496
      3  8     .     . 20536
      3  8     .     . 20585
      3  8     .     . 20629
      3  8     .     . 20637
      3  8     . 20747 20747
      4  1 20620     . 20620
      4  1     .     . 20644
      4  1     . 20723 20723
      4  2 20746 20806 20746
      5  1 19016     . 19016
      7  1 19015     . 19015
      7  1     .     . 19036
      7  1     . 19141 19141
      7  2 19151 19211 19151
      7  3 19221     . 19221
      7  3     . 19308 19308
      7  4 19328     . 19328
      7  4     .     . 19355
      7  4     .     . 19397
      7  4     .     . 19411
      7  4     .     . 19450
      7  4     .     . 19455
      7  4     .     . 19472
      7  4     .     . 19526
      7  4     .     . 19565
      7  4     .     . 19590
      7  4     .     . 19601
      7  4     .     . 19641
      7  4     . 19757 19757
      7  5 19763     . 19763
      7  5     .     . 19818
      7  5     .     . 19834
      7  5     .     . 19888
      7  5     .     . 19915
      7  5     .     . 19955
      7  5     .     . 20006
      7  5     . 20099 20099
      7  6 20123 20183 20123
      7  7 20209 20269 20209
      7  8 20270 20330 20270
      9  1 19068 19128 19068
      end
      format %td t0
      format %td t1
      format %td time

      Comment


      • #4
        In the code below, which I believe does what I understand you to want, you will see the importance of including the time variable. Without it, tsfill would have created new observations where there already were observations.
        Code:
        // confirm an assumption this code depends on
        assert t0==time if t0!=.
        
        // indicator variable for newly-created observations
        generate byte newobs = 0
        
        // create ending observation for spells with only a single observation
        generate toexpand = missing(t0,t1)==0
        expand 2 if toexpand==1, generate(new)
        replace t1   = .   if toexpand==1 & new==0
        replace t0   = .   if toexpand==1 & new==1
        replace time = t1  if toexpand==1 & new==1
        sort id spell time
        list if id==1, sepby(spell)
        drop toexpand new
        
        // fill in the missing observations within each spell
        egen idspell = group(id spell)
        xtset idspell time
        tsfill
        replace newobs = 1 if newobs==.
        bysort idspell (time): replace id    = id[1]
        bysort idspell (time): replace spell = spell[1]
        xtset, clear
        drop idspell
        describe
        table id spell, contents(n newobs sum newobs)
        Code:
        . describe
        
        Contains data
          obs:         4,204                          
         vars:             6                          
         size:        79,876                          
        -----------------------------------------------------------------------------------------------
                      storage   display    value
        variable name   type    format     label      variable label
        -----------------------------------------------------------------------------------------------
        id              int     %8.0g                 
        spell           float   %9.0g                 
        t0              float   %td                   
        t1              float   %td                   
        time            float   %td                   
        newobs          byte    %8.0g                 
        -----------------------------------------------------------------------------------------------
        Sorted by: 
             Note: Dataset has changed since last saved.
        
        . table id spell, contents(n newobs sum newobs)
        
        -----------------------------------------------------------------------------------------------
             |                                          spell                                          
          id |    1     2     3     4     5     6     7     8     9    10    11    12    13    14    15
        -----+-----------------------------------------------------------------------------------------
           1 |  235    61   151   118                                                                  
             |  230    59   147   116                                                                  
             | 
           2 |   61    61   203    81    86    61   105    72    61    67   112   115   108    64   116
             |   59    59   198    78    84    59   102    70    59    65   109   112   106    62   113
             | 
           3 |   62    61    61    97   111    61    61   299                                          
             |   60    59    59    95   108    59    59   292                                          
             | 
           4 |  104    61                                                                              
             |  101    59                                                                              
             | 
           5 |    1                                                                                    
             |    0                                                                                    
             | 
           7 |  127    61    88   430   337    61    61    61                                          
             |  124    59    86   417   329    59    59    59                                          
             | 
           9 |   61                                                                                    
             |   59                                                                                    
        -----------------------------------------------------------------------------------------------

        Comment


        • #5
          Yes William, that is what I needed. thank you!

          Comment

          Working...
          X