Creating a variable: TRUE or FALSE based on date within two other dates for each participant of of which there are multiple ID values

Li Huang

Join Date: Jul 2019
Posts: 9

Creating a variable: TRUE or FALSE based on date within two other dates for each participant of of which there are multiple ID values

11 Aug 2019, 22:23

Hello, my apologies for not using dataex. I am using healthcare data so I have created an example below.
I am looking to create a variable that will satisfy this equality:

visit_date (for visit_type==1) <= visit_date (for visit_type==2) <= visit_date (for visit_type==1 + 365 days)

here is the structure I have:

study_id	unique_id	visit_type	visit_date
1	165423	1	01/01/2000
1	164651	2	06/07/2000
2	949628	1	03/05/2001
2	489461	2	04/05/2002
3	984665	1	02/20/2002
3	894861	2	01/06/2003
4	894156	1	10/10/2002
4	876464	3	10/02/2003
4	786386	2	11/05/2003

// all date values of Type: double and Format: tc%
// unique_id is unique to every occurrence of "activity" i.e. unique in the entirety of the dataset
// I have given an example where I have more than one value for what is supposed to be the follow-up measurement but the actual measurement was erroneous in some way (e.g. three study_id values, and visit_type==3 instead of 1 or 2).

study_id	unique_id	visit_type	visit_date	within_window
1	165423	1	01/01/2000	.
1	164651	2	06/07/2000	1
2	949628	1	03/05/2001	.
2	489461	2	04/05/2002	0
3	984665	1	02/20/2002	.
3	894861	2	01/06/2003	1
4	894156	1	10/10/2002	.
4	876464	3	10/02/2003	.
4	786386	2	11/05/2003	0

// you can see that I desire within_window==. if visit_type!=2
// I don't think -reshape wide- will help because as it stands there are > 100 variables and > 10,000 observations in the dataset

I have tried something very simple and non-elegant:

Code:

     
* visit_date for visit_type==1
gen double FirstVisitDate=vis_date if visit_type==1                      
  format FirstVisitDate %tc
     
* visit_date for visit_type==2
gen FollowUpVisitDate=visit_date if visit_type==2
  format FollowUpVisitDate %tc

* visit_datefor visit_type==1 + 365 days
gen FirstVisitDate_plus365=visit_date if visit_type==1
  format FirstVisitDate_plus365 %tc
  replace FirstVisitDate_plus365=FirstVisitDate_plus365+3.1536*10^10
  // 3.1536*10^10 = 1 year in milliseconds (non-leap since %tc)

* var returned within the time-window
gen within_window=.                                               
  replace within_window=1 if FirstVisitDate < FollowUpVisitDate < FirstVisitDate_plus365          
  replace within_window=0 if missing(FollowUpVisitDate) | FollowUpVisitDate > FirstVisitDate_plus365

This solution seems somewhat bizarre though.
I end up with the dataset looking something like this:

study_id	unique_id	visit_type	visit_date	FirstVisitDate	FollowUpVisitDate	FirstVisitDate_plus365	within_window
1	165423	1	01/01/2000	01/01/2000	.	01/01/2001	0
1	164651	2	06/07/2000	.	06/07/2000	.	1

Clearly the value within_window==1 is correct, however, I am confused about how Stata is reading this given that the observations are on different lines, so would Stata not be evaluating the calculation of within_window as:

Code:

FirstVisitDate==01/01/2000 < FollowUpVisitDate==. < FirstVisitDate_plus365==01/01/2001

FirstVisitDate==. < FollowUpVisitDate==06/09/2000 < FirstVisitDate_plus365==.

I have checked multiple values manually which appear to be correct but there are too many to sort through by simply observing value by value etc.

Questions:

Can someone lend a more elegant solution, point out my errors, or have a better way to quality check this solution?
Also, I would like to have within_window==. for the observations that have visit_type==1 but I suppose when I aggregate I can take only values of 0 or 1 where visit_type==2 if my data stay in the form I have above.

Thanks for the help as always.

Tags: panel, panel data, Suggestion, syntax, Time Series

Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#2

11 Aug 2019, 22:45

From your example data and your code, I'm inferring that in your data, for each id there is always one and only one observation with visit_type = 1. If that is true, you can get what you want with:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte study_id long unique_id byte visit_type double visit_date 1 165423 1 1.262304e+12 1 164651 2 1.2759552e+12 2 949628 1 1.2993696e+12 2 489461 2 1.333584e+12 3 984665 1 1.3297824e+12 3 894861 2 1.3574304e+12 4 894156 1 1.3498272e+12 4 876464 3 1.380672e+12 4 786386 2 1.3836096e+12 end format %tc visit_date local one_year = 365*msofhours(24) by study_id (visit_type), sort: gen byte within_window = /// inrange(visit_date, visit_date[1], visit_date[1] + `one_year') /// if visit_type == 2

I have not carefully reviewed your code. But I can tell you that a construction like x < y < z in Stata does not mean x < y & y < z. I won't go into what it does mean because what it means is something that you are unlikely to ever actually want. Suffice it to say that Stata code is not algebra, though there is a strong family resemblance.

Hello, my apologies for not using dataex. I am using healthcare data so I have created an example below.

Non sequitur. You can use -dataex- with any kind of data. If the concern is that your data are confidential and you could not share the real data, you could still have used -dataex-. Instead of drawing a table here in the Forum editor, you could have launched Stata, opened the Data Editor and created your data table there. Then run -dataex-. It would have actually been faster for you and would have saved me time as well. Whenever you ask for help with code, always use -dataex-.
1 like
Comment
Li Huang

Join Date: Jul 2019

Posts: 9
#3

19 Aug 2019, 00:01

This worked and thank you Clyde. I will also use dataex next time.
Comment

Announcement

Creating a variable: TRUE or FALSE based on date within two other dates for each participant of of which there are multiple ID values

Comment

Comment