Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Working with Stata dates

    Hello,

    I would like to create a variable, called "outcome", measured as a dummy indicator equal to one if Plan B was implemented (variable Plan B = 1) within 30 days from the implementation of Plan A (variable Plan A = 1). Note that within an id, there could be multiple plans A and B. I send an example of the dataset below. Thank you very much in advance.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte id str9 date byte(plan_A plan_B outcome)
    1 "21-Jan-14" 0 0 0
    1 "8-Jul-14"  0 1 0
    1 "5-Aug-14"  1 0 0
    1 "1-Sep-14"  0 1 1
    1 "5-Mar-15"  0 0 0
    1 "28-Nov-15" 0 0 0
    1 "5-Sep-17"  1 0 0
    1 "2-Nov-17"  0 1 0
    2 "21-Mar-15" 0 0 0
    2 "10-Apr-15" 1 0 0
    2 "20-Apr-15" 0 1 1
    2 "19-Dec-16" 0 1 0
    2 "11-Jan-17" 0 0 0
    2 "3-Jul-17"  0 0 0
    end

  • #2
    Anh:
    probably not that efficient, but...:
    Code:
    .
    . generate new_date=date(date, "DM20Y")
    
    . format new_date %td
    
    . bysort id: gen wanted=new_date[_n]-new_date[_n-1]
    
    . g overall_plans=1 if plan_A==1
    
    . replace overall_plans=2 if plan_B==1
    
    . replace overall_plans=0 if overall_plans==.
    
    . bysort id: gen wanted2=new_date[_n]-new_date[_n-1] if overall_plans[_n]!=0 & overall_plans[_n-1]!=0
    
    
    . list
    
         +--------------------------------------------------------------------------------------+
         | id        date   plan_A   plan_B   outcome    new_date   wanted   overal~s   wanted2 |
         |--------------------------------------------------------------------------------------|
      1. |  1   21-Jan-14        0        0         0   21jan2014        .          0         . |
      2. |  1    8-Jul-14        0        1         0   08jul2014      168          2         . |
      3. |  1    5-Aug-14        1        0         0   05aug2014       28          1        28 |
      4. |  1    1-Sep-14        0        1         1   01sep2014       27          2        27 |
      5. |  1    5-Mar-15        0        0         0   05mar2015      185          0         . |
         |--------------------------------------------------------------------------------------|
      6. |  1   28-Nov-15        0        0         0   28nov2015      268          0         . |
      7. |  1    5-Sep-17        1        0         0   05sep2017      647          1         . |
      8. |  1    2-Nov-17        0        1         0   02nov2017       58          2        58 |
      9. |  2   21-Mar-15        0        0         0   21mar2015        .          0         . |
     10. |  2   10-Apr-15        1        0         0   10apr2015       20          1         . |
         |--------------------------------------------------------------------------------------|
     11. |  2   20-Apr-15        0        1         1   20apr2015       10          2        10 |
     12. |  2   19-Dec-16        0        1         0   19dec2016      609          2       609 |
     13. |  2   11-Jan-17        0        0         0   11jan2017       23          0         . |
     14. |  2    3-Jul-17        0        0         0   03jul2017      173          0         . |
         +--------------------------------------------------------------------------------------+
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Here is a twist on @Carlo Lazzaro's helpful approach. I use rangestat from SSC.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte id str9 date byte(plan_A plan_B outcome)
      1 "21-Jan-14" 0 0 0
      1 "8-Jul-14"  0 1 0
      1 "5-Aug-14"  1 0 0
      1 "1-Sep-14"  0 1 1
      1 "5-Mar-15"  0 0 0
      1 "28-Nov-15" 0 0 0
      1 "5-Sep-17"  1 0 0
      1 "2-Nov-17"  0 1 0
      2 "21-Mar-15" 0 0 0
      2 "10-Apr-15" 1 0 0
      2 "20-Apr-15" 0 1 1
      2 "19-Dec-16" 0 1 0
      2 "11-Jan-17" 0 0 0
      2 "3-Jul-17"  0 0 0
      end
      
      generate new_date=daily(date, "DM20Y")
      
      format new_date %td
      
      rangestat (max) prev_A=plan_A, int(new_date -30 -1) by(id)
      
      gen B_prev_A = prev_A==1 & plan_B == 1 
      
      list, sepby(id)
      
           +----------------------------------------------------------------------------+
           | id        date   plan_A   plan_B   outcome    new_date   prev_A   B_prev_A |
           |----------------------------------------------------------------------------|
        1. |  1   21-Jan-14        0        0         0   21jan2014        .          0 |
        2. |  1    8-Jul-14        0        1         0   08jul2014        .          0 |
        3. |  1    5-Aug-14        1        0         0   05aug2014        0          0 |
        4. |  1    1-Sep-14        0        1         1   01sep2014        1          1 |
        5. |  1    5-Mar-15        0        0         0   05mar2015        .          0 |
        6. |  1   28-Nov-15        0        0         0   28nov2015        .          0 |
        7. |  1    5-Sep-17        1        0         0   05sep2017        .          0 |
        8. |  1    2-Nov-17        0        1         0   02nov2017        .          0 |
           |----------------------------------------------------------------------------|
        9. |  2   21-Mar-15        0        0         0   21mar2015        .          0 |
       10. |  2   10-Apr-15        1        0         0   10apr2015        0          0 |
       11. |  2   20-Apr-15        0        1         1   20apr2015        1          1 |
       12. |  2   19-Dec-16        0        1         0   19dec2016        .          0 |
       13. |  2   11-Jan-17        0        0         0   11jan2017        0          0 |
       14. |  2    3-Jul-17        0        0         0   03jul2017        .          0 |
           +----------------------------------------------------------------------------+

      Comment


      • #4
        Both approaches work well. Thanks, Carlo and Nick.

        Comment


        • #5
          And here's another way:
          Code:
          generate new_date=date(date, "DM20Y")
          format new_date %td
          
          gen last_plan_A_date = .
          by id (new_date), sort: replace last_plan_A_date = cond(plan_A, new_date, last_plan_A_date[_n-1])
          format last_plan_A_date %td
          
          gen byte wanted = plan_B & inrange(new_date-last_plan_A_date, 1, 30)
          By the way, I should point out that the question posed is ambiguous. What does "within 30 days" mean? Both Nick's solution and mean assume it means "starting the day after plan A and extending out 30 days beyond that." But it could also mean "starting the day plan A began and extending 29 days beyond that." Or, slightly stretching the abuse of language, it could also perhaps mean "starting the day plan A began and extending 30 days beyond that." The numbers 1 and 30 (or -1 and -30 in #2) would have to be modified accordingly if one of these other meanings were intended.

          Added: in fact, "within 30 days" could also include within (one of the above versions) of the 30 days before plan A. In that case, the code in #2 could be modified by changing the numbers in interval. The code in this post, however, cannot be so simply adapted, and I would not use this approach in that case.
          Last edited by Clyde Schechter; 25 Feb 2023, 16:00.

          Comment


          • #6
            Dear Clyde,

            Please accept my apologies for any confusion caused. My intention when referring to "within 30 days" was to indicate a period starting from the day after the implementation of plan A and extending for a further 30 days beyond that. I would like to take this one step further by retaining only those observations where "plan_A=1," but with the outcome variable (B_prev_A) taking a value of 1 if plan B was implemented within the aforementioned 30-day window (as shown in the example below). Would you happen to have any suggestions?

            Thank you for your assistance.

            Code:
            . list if plan_A==1
            
                 +----------------------------------------------------------------------------+
                 | id        date   plan_A   plan_B   outcome    new_date   prev_A   B_prev_A |
                 |----------------------------------------------------------------------------|
              3. |  1    5-Aug-14        1        0         0   05aug2014        0          1 |
              7. |  1    5-Sep-17        1        0         0   05sep2017        .          0 |
             10. |  2   10-Apr-15        1        0         0   10apr2015        0          1 |
                 +----------------------------------------------------------------------------+

            Comment


            • #7
              For this result, a slight variation of what Nick Cox showed in #3 will be best:
              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input byte id str9 date byte(plan_A plan_B outcome)
              1 "21-Jan-14" 0 0 0
              1 "8-Jul-14"  0 1 0
              1 "5-Aug-14"  1 0 0
              1 "1-Sep-14"  0 1 1
              1 "5-Mar-15"  0 0 0
              1 "28-Nov-15" 0 0 0
              1 "5-Sep-17"  1 0 0
              1 "2-Nov-17"  0 1 0
              2 "21-Mar-15" 0 0 0
              2 "10-Apr-15" 1 0 0
              2 "20-Apr-15" 0 1 1
              2 "19-Dec-16" 0 1 0
              2 "11-Jan-17" 0 0 0
              2 "3-Jul-17"  0 0 0
              end
              
              generate new_date=daily(date, "DM20Y")
              format new_date %td
              
              rangestat (max) B_prev_A = plan_B, by(id) interval(new_date 1 30)
              keep if plan_A
              mvencode B_prev_A, mv(0)

              Comment

              Working...
              X