Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Grouping by date

    I have data that looks like the following:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double id float visit_date
    25 21091
    25 21091
    25 21091
    25 21091
    25 21091
    25 21091
    25 21104
    25 21104
    25 21104
    25 21122
    25 21122
    25 21122
    25 21235
    25 21325
    25 21325
    25 21325
    25 21355
    25 21355
    25 21355
    25 21388
    25 21629
    25 21629
    25 21629
    25 21839
    25 21839
    25 21875
    25 21875
    25 21941
    25 21941
    25 21941
    end
    format %td visit_date
    I would like to group by visit_date within id. So that all visits within id 25 that are 21091 are grouped as 1, and all those that are 21104 are grouped as 2 and so on. Thank you so much for your time and help!

  • #2
    not sure I completely understand, and your data example includes only one id so I couldn't test this, but I think the following is what you want:
    Code:
     egen group2=group(id visit)
    you should replace "group2" with whatever variable name you want

    Comment


    • #3
      If you have more identifiers (likely) and you want visit numbering to start at 1 for each (possible) that might be


      Code:
      bysort id (visit_date) : gen wanted = sum(visit_date != visit_date[_n-1])

      Comment


      • #4
        Thanks Nick, that worked perfectly! I am hoping you might be able to help me get at what I am trying to do. This is my dataset for one ID:

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input double id float(visit_date first_visit retention time_to_visit_years same_visit)
        25 21091 21091 .0001 1  1
        25 21091 21091 .0001 1  1
        25 21091 21091 .0001 1  1
        25 21091 21091 .0001 1  1
        25 21091 21091 .0001 1  1
        25 21091 21091 .0001 1  1
        25 21104 21091    13 1  2
        25 21104 21091    13 1  2
        25 21104 21091    13 1  2
        25 21122 21091    31 1  3
        25 21122 21091    31 1  3
        25 21122 21091    31 1  3
        25 21235 21091   144 1  4
        25 21325 21091   234 1  5
        25 21325 21091   234 1  5
        25 21325 21091   234 1  5
        25 21355 21091   264 1  6
        25 21355 21091   264 1  6
        25 21355 21091   264 1  6
        25 21388 21091   297 1  7
        25 21629 21091   538 2  8
        25 21629 21091   538 2  8
        25 21629 21091   538 2  8
        25 21839 21091   748 3  9
        25 21839 21091   748 3  9
        25 21875 21091   784 3 10
        25 21875 21091   784 3 10
        25 21941 21091   850 3 11
        25 21941 21091   850 3 11
        25 21941 21091   850 3 11
        end
        format %td visit_date
        format %td first_visit
        label values time_to_visit_years time_to_visit_years
        label def time_to_visit_years 1 "1 year", modify
        label def time_to_visit_years 2 "2 years", modify
        label def time_to_visit_years 3 "3 years", modify

        What I am trying to do is check for at least 2 visits in one year (time_to_visit_years) that must be 90 days apart. And then take the maximum date before this does not occur. I am trying to look at the loss to retention over time, and my retention definition is at least 2 visits in one year, which must be 90 days apart. Retention is calculated as visit_date - first_visit.

        Comment


        • #5
          I am not fully clear on what you want but this may help. Pull a year out of the daily date with the year() function. Then check the first and last date in each year with egen min() and max(). One visit only means first and last are the same in each year. The difference between first and last can be compared with 90 days.

          Comment


          • #6
            Hi Nick,
            Sorry, I fully realize I have not done a great job explaining what I am looking for. My first_visit variable is the first visit of a patient after diagnosis, and what I have defined as the start of treatment after diagnosis. From then on the patient comes back at periodic intervals to do medical checks, pick up drugs etc (during this time they were getting 3-month drug supplies). I want to look at the retention of patients in care. Retention is defined as at least 2 visits in one year, that are at least 90 days apart. The year starts when the patient initiates treatment, so first visit, and not calendar year. From that first visit, I want to see if they had at least two visits that were 90 days apart every year and record the date at which this did not occur. So for ID 25, (looking at retention) they had seven visits in the first year after they initiated treatment (0, 13, 31, 144, 234, 264, and 297) and 31 and 144 are more than 90 days apart, and 144 and 234 are exactly 90 days apart therefore in this first year ID 25 was retained. In the second year, they only had one visit (538), therefore they were not retained in that year and the date of falling out of care would be 21629, the last date they were "retained". If a patient is retained every year then it would just be the last visit date. Is that helpful?

            Comment


            • #7
              Thanks for the further detail. This may help:

              Code:
              gen year = ceil((visit_date - first_visit) / 365.25) 
              
              bysort id year (visit_date) : gen gap = visit_date - visit_date[_n-1] 
              by id year : egen biggest_gap = max(gap) 
              
              tabdisp year id, c(biggest_gap)

              Comment

              Working...
              X