Calculating annual visit using visit date

Sue Aitken

Join Date: May 2020

Posts: 5
#1

Calculating annual visit using visit date

28 May 2020, 07:45

Hi there!

I am a rather novice Stata user doing an MPH. I am using Stata 15 IC on Windows 10; my data is in long format.

I am trying to classify visits to a clinic using the date of the visit. I am only interested in annual visits, and the way the visits were captured was for all visits, including interim. Visit 1 for all participants is the enrollment visit, then for some participants visit 2 is 12 months after visit 1 (example 294 below) whilst others visit 2 is 6 months after visit 1 (example 295 below). I have tried to look for an answer on forums and videos, but can't find quite the right approach. I thought to calculate the months since visit 1 and then create categories based on that, but can't figure out how to do that either. Please see example of data below. Any assistance would be greatly appreciated.
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10483
#2

28 May 2020, 09:17

See FAQ Advice #12 on why screenshots are not as useful as you think. You can copy and paste the result of the following to increase your chances of making progress with your problem.

Code:

dataex in 1/20
Comment

Nick Cox

Join Date: Mar 2014
Posts: 36058

28 May 2020, 09:52

Andrew Musau is bang on.

That said, this is an interesting problem. I suggest that if we measure time since the first visit as multiples of 365.25 days (more precision seems spurious) then annual visits are most plausibly those with values close to integers and least plausibly those close to half-integers. With the dates for #294 painfully transcribed by hand (and modulo any copying errors) I get plausibility on a scale from 0 to 1 as follows:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id date)
294 18954
294 19306
294 19670
294 20038
294 20222
294 20403
294 20570
294 20711
294 20895
294 21157
end
format %td date

bysort id (date) : gen distance = (date - date[1]) / 365.25
gen score = 1 - 2 * abs(distance - round(distance))
format distance score %9.3f

list

     +------------------------------------+
     |  id        date   distance   score |
     |------------------------------------|
  1. | 294   23nov2011      0.000   1.000 |
  2. | 294   09nov2012      0.964   0.927 |
  3. | 294   08nov2013      1.960   0.921 |
  4. | 294   11nov2014      2.968   0.936 |
  5. | 294   14may2015      3.472   0.057 |
     |------------------------------------|
  6. | 294   11nov2015      3.967   0.934 |
  7. | 294   26apr2016      4.424   0.151 |
  8. | 294   14sep2016      4.810   0.621 |
  9. | 294   17mar2017      5.314   0.372 |
 10. | 294   04dec2017      6.031   0.937 |
     +------------------------------------+

In this example choosing scores above about 0.9 seems about right. No doubt the process could be made fancier, or more rigorous, or both.

Comment

Sue Aitken

Join Date: May 2020
Posts: 5

28 May 2020, 09:57

Andrew Musau Thank you for the advise. I tried on my original data set but it said "input statement exceeds linesize limit. Try specifying fewer variables". I have created a dummy data set with the ID, visit, date, and BMI and that seems to have worked.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int record_id float(Visit date_of_visit BMI)
269  1 20118  27.70083
294  1 18954 32.098766
294  2 19306  30.93044
294  3 19670 31.481483
294  4 20038  30.55556
294  5 20222 31.481483
294  6 20403  31.28092
294  7 20570  31.28092
294  8 20711  30.96173
294  9 20895 31.600115
294 10 21157  31.91931
295  1 18023 29.387754
295  2 18183   27.7551
295  3 18339 29.714285
295  4 18514 29.061224
295  5 18694  29.75274
295  6 18928 30.436714
295  7 19193 31.120686
295  8 19379 31.804657
295  9 19547 30.436714
end
format %dM_d,_CY date_of_visit

Comment

Sue Aitken

Join Date: May 2020

Posts: 5
#5

28 May 2020, 10:08

Nick Cox I am very grateful for your transcribing. I like your idea and see where you are coming from. How would I use this to create a new variable that I could use for my tests? The study aims to look at changes in cardiovascular indicators over time.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

28 May 2020, 10:11

I like the approach taken by Nick Cox to implement the suggestion in post #1.

But when I think of how my "annual" medical visits work, I have some doubts. I'd say that in general they tend to slide over time, due to scheduling conflicts, etc. And when an annual visit occurs after 13 months, it's not like the next one is scheduled for 11 months later. (This is in part due to the vagaries of the health insurance complex here in the USA, where some procedures are only reimbursed once in any span of 12 months.) So in 2010 it was in June, in 2011 the doctor was on vacation in June and the visit was in July, in 2012 my vacation was in June and the visit was in August, for example.

Defining annual relative to the inception of the series may be problematic depending on the reality of how visits are scheduled. But if you are looking at clinical trial data where an effort is made to gather data at predefined points in time (after 12 months, after 24 months, etc.) then this approach may work.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36058
#7

28 May 2020, 10:16

I thought I addressed that. A suggestion of a criterion is

Code:

gen annual = score > 0.9

but that could easily be frustrated, as whenever an annual visit was followed by a visit shortly afterwards as a matter of urgency.

The incidental or accidental inclusion of BMI values around 30 suggests perhaps the treatment issues are rather more slowly changing.

I love the idea that BMI can be reported to 6 decimal places. Even thinking about chocolate probably changes my BMI within that kind of resolution.

(The issue of slippage raised by William Lisowski did enter my head and leave it again, thus agan changing my BMI. But this might be addressed empirically in terms of modal spacings. If the typical slippage was say 380 days that might be used instead.)

Last edited by Nick Cox; 28 May 2020, 10:25.
1 like
Comment

Nick Cox

Join Date: Mar 2014
Posts: 36058

28 May 2020, 12:45

Out of curiosity I tried panelthin from SSC (see https://www.statalist.org/forums/forum/general-stata-discussion/general/1555093-sorting-and-creation-of-variables for a recent discussion):

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id date)
294 18954
294 19306
294 19670
294 20038
294 20222
294 20403
294 20570
294 20711
294 20895
294 21157
end
format %td date

tsset id date

panelthin, min(360) generate(select)

bysort id : gen spell = sum(select)

list, sepby(id spell)

     +----------------------------------+
     |  id        date   select   spell |
     |----------------------------------|
  1. | 294   23nov2011        1       1 |
  2. | 294   09nov2012        0       1 |
     |----------------------------------|
  3. | 294   08nov2013        1       2 |
     |----------------------------------|
  4. | 294   11nov2014        1       3 |
  5. | 294   14may2015        0       3 |
     |----------------------------------|
  6. | 294   11nov2015        1       4 |
  7. | 294   26apr2016        0       4 |
  8. | 294   14sep2016        0       4 |
     |----------------------------------|
  9. | 294   17mar2017        1       5 |
 10. | 294   04dec2017        0       5 |
     +----------------------------------+

Comment

Sue Aitken

Join Date: May 2020

Posts: 5
#9

29 May 2020, 09:41

Nick Cox the above looks very impressive. As William Lisowski mentions visits are often a bit variable, and these are occupational medicals and not a clinical trial so sadly more variable than is ideal. I have been playing around with the above suggestions and realize the movement of visits over time does not really work so well. So, based on your first approach I used the below to calculate the number of days from first visit, which worked like a charm on my data. I will then create categories based on the number of days since first visit. Thank you for you assistance!

Code:

bysort record_id (date_of_visit) : gen distance = ( date_of_visit - date_of_visit [1])
Comment

Announcement