Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding closest date in second column to date in first column by ID

    Goodmorning Statalisters!

    I have a tough problem.

    Data Description
    • Pidm : a unique ID that is assigned to each individual in the dataset
    • final_decided_date : the date that a solicitation was sent to an individual (each Pidm can have MANY solicitation dates)
    • event_date : the date that an individual attended an event (each Pidm can have a couple event dates)
    I appended rows with just the Pidm, event ID, and event_date to the existing dataset I had which held all of the individuals' solicitations, so any Pidms with event_dates are appended at the bottom of their solicitations, with a new row for each event_date.

    Here is a good example of one person in the dataset:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long Pidm float final_decided_date int event_date
    70004813 17721     .
    70004813 19275     .
    70004813 19304     .
    70004813 19464     .
    70004813 19765     .
    70004813 19765     .
    70004813 19790     .
    70004813 19790     .
    70004813 19796     .
    70004813 19796     .
    70004813 19832     .
    70004813 19832     .
    70004813 20028     .
    70004813 20049     .
    70004813 20074     .
    70004813 20079     .
    70004813 20133     .
    70004813 20210     .
    70004813 20350     .
    70004813 20399     .
    70004813     . 20523
    70004813     . 20573
    end
    format %td final_decided_date
    format %td event_date


    As you can see, the person with Pidm 70004813 has been solicited on 20 dates, and attended an event on 2 different dates (hence total of 22 rows for that Pidm).

    *My goal is to sort through the solicitation dates (variable: final_decided_date), and for each event_date a Pidm has, I want to generate a copy of the closest final_decided_date after that event_date if there is one.

    I would also like to generate another indicator variable which just marks if a final_decided_date was after any of their event dates (so basically just check if it was after their first event_date if they have any event_dates.)

    I know how to check for the closest solicitation_date after an event_date if a Pidm only has ONE event_date, but not multiple.

    Thank you!


  • #2
    I tried to get it to work for you using the rangestat command. You can install the command from ssc using:

    Code:
    ssc install rangestat
    Then, run this code and it should work. It sort of seems like an abuse of the rangestat command to me, because this is not what it was intended for I believe. But it works nicely
    Code:
    sort Pidm
    gen notmissing = 0
    replace notmissing = 1 if !missing(final_decided_date)
    rangestat (lastnm) date=final_decided_date, interval(notmissing 0 1) by(Pidm)
    replace date =. if event_date==.
    drop notmissing
    After running that code, the variable 'date' contains the value of the last non-missing 'final_decided_date' before it.
    Last edited by Jesse Tielens; 27 Jul 2018, 11:35.

    Comment


    • #3
      Hi Jesse Tielens ,

      I don't think I was clear enough in my description. For each event_date a person has, I want the new variable to copy the value of the closest final_decided_date in time, rather than in proximity in the database. So instead of copying the date of the record above the event date, I want to sort through every final_decided_date a Pidm has and then copy the final_decided_date which is closest to and after each event_date.

      So let's say Pidm 0000 has two event dates, listed in 2 rows, one event date is 1/1/2010 and the other is 8/1/2015. And let's say this Pidm 0000 has 20 final_decided_dates (which are solicitations to this person). I want to sort through all 20 of the final_decided_dates for that Pidm, and then copy the closest value to 1/1/2015 but after 1/1/2015 into a new variable called "closest_solicitation", and the closest final_decided_date to 8/1/2015 but after 8/1/2015, into closest_solicitation.

      Comment


      • #4
        What is the maximum number of event dates that you would have per PIDM? I have an idea for an approach but want to see if it is wildly inefficient.
        Code:
        bysort Pidm:egen count=count( event_date)
        sum count
        Stata/MP 14.1 (64-bit x86-64)
        Revision 19 May 2016
        Win 8.1

        Comment


        • #5
          Carole J. Wilson 107

          Comment


          • #6
            Actually, I think this might do the trick. I added a couple of instance of event dates that have final_decided_dates after the event date.

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input long Pidm float final_decided_date int event_date
            70004813 17721     .
            70004813     . 19000
            70004813 19275     .
            70004813 19304     .
            70004813 19464     .
            70004813 19765     .
            70004813 19765     .
            70004813 19790     .
            70004813 19790     .
            70004813 19796     .
            70004813 19796     .
            70004813 19832     .
            70004813 19832     .
            70004813 20028     .
            70004813 20049     .
            70004813 20074     .
            70004813 20079     .
            70004813     . 20081
            70004813 20133     .
            70004813 20210     .
            70004813 20350     .
            70004813 20399     .
            70004813     . 20523
            70004813     . 20573
            end
            format %td final_decided_date
            format %td event_date
            
            clonevar all_dates=final_decided_date
            replace all_dates=event_date if !mi(event_date)
            format %td all_dates
            bysort Pidm (all_dates): gen x=final_decided_date[_n+1] if !mi(event_date)
            format %td x
            list, clean
            Stata/MP 14.1 (64-bit x86-64)
            Revision 19 May 2016
            Win 8.1

            Comment


            • #7
              Hello Caroline Wilson ,

              Your solution is very close! The only error I noticed is if there are two event_dates which occur before any final_decided_date, the first_final_decided_date will only be copied to the new x variable for the second event_date. I will try to figure it out, thank you so much for your help again!

              Here is one example:

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input long Pidm float final_decided_date int event_date float x
              70040272     . 16743     .
              70040272     . 17083 18154
              70040272     . 18747 18840
              70040272     . 20572     .
              70040272     . 21153 21160
              70040272     . 21239     .
              70040272     . 21302     .
              70040272 15876     .     .
              70040272 18154     .     .
              70040272 18526     .     .
              70040272 18840     .     .
              70040272 18989     .     .
              70040272 19022     .     .
              70040272 19179     .     .
              70040272 19211     .     .
              70040272 19275     .     .
              70040272 19304     .     .
              70040272 19334     .     .
              70040272 19334     .     .
              70040272 19334     .     .
              70040272 19334     .     .
              70040272 19334     .     .
              70040272 19334     .     .
              70040272 19334     .     .
              70040272 19367     .     .
              70040272 19375     .     .
              70040272 19400     .     .
              70040272 19429     .     .
              70040272 19463     .     .
              70040272 19464     .     .
              70040272 19548     .     .
              70040272 19548     .     .
              70040272 19573     .     .
              70040272 19573     .     .
              70040272 19678     .     .
              70040272 19678     .     .
              70040272 19699     .     .
              70040272 19699     .     .
              70040272 19706     .     .
              70040272 19706     .     .
              70040272 19712     .     .
              70040272 19712     .     .
              70040272 19719     .     .
              70040272 19719     .     .
              70040272 19765     .     .
              70040272 19765     .     .
              70040272 19783     .     .
              70040272 19783     .     .
              70040272 19790     .     .
              70040272 19790     .     .
              70040272 19796     .     .
              70040272 19796     .     .
              70040272 19832     .     .
              70040272 19832     .     .
              70040272 19846     .     .
              70040272 19846     .     .
              70040272 19915     .     .
              70040272 19915     .     .
              70040272 20028     .     .
              70040272 20049     .     .
              70040272 20074     .     .
              70040272 20079     .     .
              70040272 20133     .     .
              70040272 20210     .     .
              70040272 20350     .     .
              70040272 20399     .     .
              70040272 20420     .     .
              70040272 20439     .     .
              70040272 20444     .     .
              70040272 20452     .     .
              70040272 21160     .     .
              70040272 21160     .     .
              end
              format %td final_decided_date
              format %td event_date
              format %td x

              Comment


              • #8
                Caroline Wilson I just ran this command about 100 times until amount of real changes made = 0. Seemed to work!

                by Pidm, sort: replace x = x[_n+1] if x == . & event_date != .

                Comment


                • #9
                  Ok, here's a very circuitous solution (I think) for future reference:

                  New code in red

                  Code:
                  clonevar all_dates=final_decided_date
                  replace all_dates=event_date if !mi(event_date)
                  format %td all_dates
                  bysort Pidm (all_dates): gen x=final_decided_date[_n+1] if !mi(event_date)
                  format %td x
                  
                  bysort Pidm (all_dates): gen n=_n if !mi(event_date)
                  tsset Pidm n
                  gen group=.
                  bysort Pidm (n):  replace group = cond(L.group == ., _n, L.group ) if !mi(event_date)
                  bysort Pidm group: egen xx=min(x)
                  format %td xx
                  sort Pidm all_dates
                  list, clean sepby(Pidm)
                  Stata/MP 14.1 (64-bit x86-64)
                  Revision 19 May 2016
                  Win 8.1

                  Comment


                  • #10
                    Do you use dataex to print your code so nicely?

                    Comment


                    • #11
                      You can use code blocks either by writing them manually (those code commands at the beginning and end of the dataex outputs) or on the editor, you can click on the # symbol to input them automatically.
                      Stata/MP 14.1 (64-bit x86-64)
                      Revision 19 May 2016
                      Win 8.1

                      Comment


                      • #12
                        The below code is an improvement for the solution of Carole in #6. It would provide you the target with 1 time running instead of 100 times as mentioned in #8. The trick is sorting by "minus" date to make "after" temporarily become "before".

                        It should be noted that, also like the solution in #6, it is assumed that in each observation (the same "row") there is only one non-missing value between even_date and final_decided_date (like your example). Otherwise, a modification would be needed.
                        Code:
                        gen x=cond(event_date !=., -event_date, -final_decided_date)
                        bys Pidm (x): replace x = x[_n-1] if event_date !=.
                        gsort -x
                        replace x=cond(event_date !=.,-x,.)
                        format %td x

                        Comment

                        Working...
                        X