Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating annual visit using visit date

    Hi there!

    I am a rather novice Stata user doing an MPH. I am using Stata 15 IC on Windows 10; my data is in long format.

    I am trying to classify visits to a clinic using the date of the visit. I am only interested in annual visits, and the way the visits were captured was for all visits, including interim. Visit 1 for all participants is the enrollment visit, then for some participants visit 2 is 12 months after visit 1 (example 294 below) whilst others visit 2 is 6 months after visit 1 (example 295 below). I have tried to look for an answer on forums and videos, but can't find quite the right approach. I thought to calculate the months since visit 1 and then create categories based on that, but can't figure out how to do that either. Please see example of data below. Any assistance would be greatly appreciated.

    Click image for larger version

Name:	Stata data.PNG
Views:	1
Size:	54.8 KB
ID:	1555733

  • #2
    See FAQ Advice #12 on why screenshots are not as useful as you think. You can copy and paste the result of the following to increase your chances of making progress with your problem.

    Code:
    dataex in 1/20

    Comment


    • #3
      Andrew Musau is bang on.

      That said, this is an interesting problem. I suggest that if we measure time since the first visit as multiples of 365.25 days (more precision seems spurious) then annual visits are most plausibly those with values close to integers and least plausibly those close to half-integers. With the dates for #294 painfully transcribed by hand (and modulo any copying errors) I get plausibility on a scale from 0 to 1 as follows:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(id date)
      294 18954
      294 19306
      294 19670
      294 20038
      294 20222
      294 20403
      294 20570
      294 20711
      294 20895
      294 21157
      end
      format %td date
      
      bysort id (date) : gen distance = (date - date[1]) / 365.25
      gen score = 1 - 2 * abs(distance - round(distance))
      format distance score %9.3f
      
      list
      
           +------------------------------------+
           |  id        date   distance   score |
           |------------------------------------|
        1. | 294   23nov2011      0.000   1.000 |
        2. | 294   09nov2012      0.964   0.927 |
        3. | 294   08nov2013      1.960   0.921 |
        4. | 294   11nov2014      2.968   0.936 |
        5. | 294   14may2015      3.472   0.057 |
           |------------------------------------|
        6. | 294   11nov2015      3.967   0.934 |
        7. | 294   26apr2016      4.424   0.151 |
        8. | 294   14sep2016      4.810   0.621 |
        9. | 294   17mar2017      5.314   0.372 |
       10. | 294   04dec2017      6.031   0.937 |
           +------------------------------------+
      In this example choosing scores above about 0.9 seems about right. No doubt the process could be made fancier, or more rigorous, or both.

      Comment


      • #4
        Andrew Musau Thank you for the advise. I tried on my original data set but it said "input statement exceeds linesize limit. Try specifying fewer variables". I have created a dummy data set with the ID, visit, date, and BMI and that seems to have worked.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input int record_id float(Visit date_of_visit BMI)
        269  1 20118  27.70083
        294  1 18954 32.098766
        294  2 19306  30.93044
        294  3 19670 31.481483
        294  4 20038  30.55556
        294  5 20222 31.481483
        294  6 20403  31.28092
        294  7 20570  31.28092
        294  8 20711  30.96173
        294  9 20895 31.600115
        294 10 21157  31.91931
        295  1 18023 29.387754
        295  2 18183   27.7551
        295  3 18339 29.714285
        295  4 18514 29.061224
        295  5 18694  29.75274
        295  6 18928 30.436714
        295  7 19193 31.120686
        295  8 19379 31.804657
        295  9 19547 30.436714
        end
        format %dM_d,_CY date_of_visit

        Comment


        • #5
          Nick Cox I am very grateful for your transcribing. I like your idea and see where you are coming from. How would I use this to create a new variable that I could use for my tests? The study aims to look at changes in cardiovascular indicators over time.

          Comment


          • #6
            I like the approach taken by Nick Cox to implement the suggestion in post #1.

            But when I think of how my "annual" medical visits work, I have some doubts. I'd say that in general they tend to slide over time, due to scheduling conflicts, etc. And when an annual visit occurs after 13 months, it's not like the next one is scheduled for 11 months later. (This is in part due to the vagaries of the health insurance complex here in the USA, where some procedures are only reimbursed once in any span of 12 months.) So in 2010 it was in June, in 2011 the doctor was on vacation in June and the visit was in July, in 2012 my vacation was in June and the visit was in August, for example.

            Defining annual relative to the inception of the series may be problematic depending on the reality of how visits are scheduled. But if you are looking at clinical trial data where an effort is made to gather data at predefined points in time (after 12 months, after 24 months, etc.) then this approach may work.

            Comment


            • #7
              I thought I addressed that. A suggestion of a criterion is

              Code:
              gen annual = score > 0.9
              but that could easily be frustrated, as whenever an annual visit was followed by a visit shortly afterwards as a matter of urgency.

              The incidental or accidental inclusion of BMI values around 30 suggests perhaps the treatment issues are rather more slowly changing.

              I love the idea that BMI can be reported to 6 decimal places. Even thinking about chocolate probably changes my BMI within that kind of resolution.

              (The issue of slippage raised by William Lisowski did enter my head and leave it again, thus agan changing my BMI. But this might be addressed empirically in terms of modal spacings. If the typical slippage was say 380 days that might be used instead.)

              Last edited by Nick Cox; 28 May 2020, 10:25.

              Comment


              • #8
                Out of curiosity I tried panelthin from SSC (see https://www.statalist.org/forums/forum/general-stata-discussion/general/1555093-sorting-and-creation-of-variables for a recent discussion):

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input float(id date)
                294 18954
                294 19306
                294 19670
                294 20038
                294 20222
                294 20403
                294 20570
                294 20711
                294 20895
                294 21157
                end
                format %td date
                
                tsset id date
                
                panelthin, min(360) generate(select)
                
                bysort id : gen spell = sum(select)
                
                list, sepby(id spell)
                
                     +----------------------------------+
                     |  id        date   select   spell |
                     |----------------------------------|
                  1. | 294   23nov2011        1       1 |
                  2. | 294   09nov2012        0       1 |
                     |----------------------------------|
                  3. | 294   08nov2013        1       2 |
                     |----------------------------------|
                  4. | 294   11nov2014        1       3 |
                  5. | 294   14may2015        0       3 |
                     |----------------------------------|
                  6. | 294   11nov2015        1       4 |
                  7. | 294   26apr2016        0       4 |
                  8. | 294   14sep2016        0       4 |
                     |----------------------------------|
                  9. | 294   17mar2017        1       5 |
                 10. | 294   04dec2017        0       5 |
                     +----------------------------------+


                Comment


                • #9
                  Nick Cox the above looks very impressive. As William Lisowski mentions visits are often a bit variable, and these are occupational medicals and not a clinical trial so sadly more variable than is ideal. I have been playing around with the above suggestions and realize the movement of visits over time does not really work so well. So, based on your first approach I used the below to calculate the number of days from first visit, which worked like a charm on my data. I will then create categories based on the number of days since first visit. Thank you for you assistance!
                  Code:
                   bysort record_id (date_of_visit) : gen distance = ( date_of_visit - date_of_visit [1])

                  Comment

                  Working...
                  X