Identifying a positive condition & date when a variable is coded as positive 2 or more times over a certain time period

Kevin Marks

Join Date: Jun 2021

Posts: 24
#1

Identifying a positive condition & date when a variable is coded as positive 2 or more times over a certain time period

17 Jun 2022, 11:13

Hi there. My goal = write code so that, if a urinary lab test (the albumin/creatinine ratio) is coded as positive two or more times within a 6 month period, then this observation and its corresponding date needs to red-flagged as positive for a suspected diagnosis of diabetic nephropathy in children and adolescents with type 1 diabetes. The main variables for this set of code are "ID_nr" (numeric format, 12 numbers), alb_creat_ratio, and test_date. Based on a set rules I have already written, alb_creat_ratio = 0 means it is a negative screening test and alb_creat_ratio = 1 means it is a positive screening test. What code should I write so that, if there are two or more positive tests over a 6-month period, then that condition -- identified by generating a new variable called nephropathy_status -- where 0 = none and 1= positive for diabetic nephropathy . My apologies for asking such a "oh, he's obviously a beginner" type of Stata question.

Example generated by -dataex-
clear
input double(ID_nr raw_data_alb_creat_ratio) long test_date float(alb_creat_ratio albu_creat_screen_date )

121212121212 .3 18245 0 18245
131313131313 7 18245 1 18245
141414141414 2 18249 0 18249
151515151515 6 18273 1 18273
161616161616 46.2 18288 1 18288
171717171717 1.3 18338 0 18338
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

17 Jun 2022, 12:09

My apologies for asking such a "oh, he's obviously a beginner" type of Stata question.

Actually, this is not a beginner-level question. And even if it were, beginner-level questions are quite welcome in this Forum. We were all beginners once!

Code:

local 6_months 183 rangestat (sum) nephropathy_status = alb_creat_ratio, by(ID_nr) interval(test_date -`6_months' 0) replace nephropathy_status = inrange(nephropathy_status, 2, .)

To run this code you must install -rangestat-, written by Robert Picard, Nick Cox, and Roberto Ferrer, and available from SSC.

I have reinterpreted 6 months to refer to a period of 183 days, not basing it on the same date 6 calendar months later, which would be much more complicated and clinically pointless.

I notice you have two date variables, test_date and albu_creat_screen_date, both of which, in the example data shown, have the same values. Not clear why you would need two date variables that are always the same, but perhaps in the full data set they sometimes differ. Bearing that in mind, I have based the 6 month interval on the test date, not the albu_creat_screen_date. You can change that if I guessed wrong.

By the way, the example data contains only a single observation for each ID_nr, so in the example data, the result is always 0 because there are never two tests on the same person.

Last edited by Clyde Schechter; 17 Jun 2022, 12:12.
1 like
Comment
Kevin Marks

Join Date: Jun 2021

Posts: 24
#3

18 Jun 2022, 09:37

Thanks so much Clyde for this pithy reply. In my real dataset, there are typically multiple lab values (albu_creat_ratio) per ID number over an 11-year study period. In my example, I should have been more clear that there are multiple lab values and test dates per participant. You are correct that it's silly to have two test date variables with the exact same data, so I will drop one of these time variables. It should work fine that you based the 6 month interval on the test_date variable. Thanks!!!
Comment
Kevin Marks

Join Date: Jun 2021

Posts: 24
#4

04 Aug 2022, 10:55

Oh crap. Now I have been told there has been a change in my rules for how diabetic nephropathy is diagnosed in children... at least here in Denmark. My new rule = a pediatric patient with type 1 diabetes must have a positive albumin/creatine ratio (alb_creat_ratio) test (coded as 1 = positive, 0 = negative) within a 2 year period of time AND can NOT have any negative albumin/creatine ratio tests during that 2-year time span. Btw, the population-based dataset I am working with has multiple observations for each ID_nr from 2010 to 2020. My variable nephropathy_status describes will describe whether a child has a positive or negative diagnosis.

1. How should I tweak the code below to add on the condition that there can be no negative test or tests (alb_creat_ratio = 0) during the 2 year time range?
2. For each ID_nr, I also need to only keep the first positive diagnosis (albu_creat_screen_date) moving forward in time. What would you recommend?
local 24_months 732
rangestat (sum) nephropathy_status = alb_creat_ratio, by(ID_nr) interval(test_date -`24_months' 0) replace nephropathy_status = inrange(nephropathy_status, 2, .)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#5

04 Aug 2022, 11:10

1. How should I tweak the code below to add on the condition that there can be no negative test or tests (alb_creat_ratio = 0) during the 2 year time range?

Code:

local 2_years = 365*2 assert !missing(alb_creat_ratio) gen byte negative_test = !alb_creat_ratio rangestat(sum) positive_tests = alb_creat_ratio /// negative_tests = negative_test, /// interval(test_date -`2_years' 0) by(ID_nr) gen byte nephropathy_status = inrange(positive_tests, 2, .) & negative_tests == 0

I have taken the liberty of assuming that 24 months really is intended as a paraphrase of 2 years, which would be 730 days, not 732.

Note: the example data in #1 does not provide a thorough test of this code because it does not even extend over a two year period, nor does it offer multiple ID_nr's, some of which do and some of which do not meet criteria. Nevertheless, I think this code is correct.

For each ID_nr, I also need to only keep the first positive diagnosis (albu_creat_screen_date) moving forward in time.

I do not know what this means. Please explain, or, better, show, what you want.
1 like
Comment

Announcement

Identifying a positive condition & date when a variable is coded as positive 2 or more times over a certain time period

Comment

Comment

Comment

Comment