looping to pull a single observation based on age from unbalanced longitudinal data

Lynne Peterson

Join Date: Mar 2020

Posts: 3
#1

looping to pull a single observation based on age from unbalanced longitudinal data

03 Mar 2020, 17:04

Hi.

I have repeated measures for various midlife health conditions (e.g. diabetes, hypertension) that were captured by a health care system over a period of 30 years. Some participants have measures for 1 or 2 timepoints; others for 20-30. The measures are not evenly spaced; some may be within the same calendar year/participant age and some may be a decade apart.

I am trying to pull the first observation for each person in midlife (ages 45-55) to identify their disease status at this point. To code this the painstaking way in wide format, it would look something like this:

gen diabetes_age45_55=.
*capture those with/without diabetes in age range at visit 1:
replace diabetes_age_45-55=1 if diabetes1=1 & visit_age1>=45 & visit_age1<=55
replace diabetes_age_45-55=0 if diabetes1=0 & visit_age1>=45 & visit_age1<=55

*capture those at visit 2 that weren't captured in visit 1
replace diabetes_age_45-55=1 if diabetes2=1 & visit_age2>=45 & visit_age2<=55 & visit_age45_55==.
replace diabetes_age_45-55=0 if diabetes2=0 & visit_age2>=45 & visit_age2<=55 & visit_age45_55==.

... and this pattern would repeat 28 more times (and for 3 other conditions) to capture all 30 visits.

I've also attempted the following code in long format. Unfortunately, this fails to isolate the first visit per participant in that age range, resulting in some participants having repeated measures counted in the new variable.

gen diabetes_age45_55=.
set trace on
foreach i of varlist mhc_exam_age {
replace diabetes_age45_55=diabetes if visit_age>=45 & visit_age<=55
}

I would appreciate any tips for coding this in either wide or long format to get a single observation from each participant in the given age range.
Thanks!
Tags: foreach, looping, panel data
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#2

03 Mar 2020, 19:27

You really should post example data when asking for help with code.

Before you do anything else, -reshape- the data to long; this task is very simple in long layout and difficult in wide. (Like most tasks in Stata. You should always have your data sets in long layout unless there is a clear and compelling reason to use wide layout, as nearly everything is easier in long.) The code below presumes you have done this.

But with some assumptions about how your data are organized and some unspecified variable names:

Code:

sort patient_id visit_date rangestat (first) diabetes_age_45_55 = diabetes, by(patient_id) interval(age, 45, 55)

-rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.
Comment

Lynne Peterson

Join Date: Mar 2020
Posts: 3

04 Mar 2020, 10:46

Thanks for the tip. I am able to run your suggested code without errors, but no values are entered. I don't have the visit date, but have created the visit number using Stata's _n. Below is the code I used and example data (100 out of 6,000 observations).

Code:

sort studyid exam_age
by studyid exam_age: gen visit=_n
sort studyid visit
rangestat (firstnm) diabetesage45_55=midlife_dm, by (studyid) interval (exam_age, 45, 55)

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long studyid byte exam_age float(midlife_dm visit) double diabetesage45_55
 11 40 0 1 .
 11 43 0 1 .
 31 32 1 1 .
 31 35 . 1 .
 41 37 0 1 .
 41 42 1 1 .
 41 44 0 1 .
 41 46 . 1 .
111 17 0 1 .
151 41 0 1 .
201 31 . 1 .
201 33 . 1 .
201 35 0 1 .
201 40 0 1 .
201 45 0 1 .
201 48 0 1 .
221 36 0 1 .
301 34 0 1 .
301 36 1 1 .
301 37 1 1 .
301 38 1 1 .
301 43 . 1 .
301 47 0 1 .
301 50 0 1 .
301 53 . 1 .
301 56 . 1 .
301 58 0 1 .
301 59 . 1 .
301 53 . 2 .
311 32 0 1 .
351 34 0 1 .
351 36 0 1 .
351 39 0 1 .
351 42 0 1 .
351 44 0 1 .
351 46 0 1 .
361 37 . 1 .
361 39 0 1 .
361 40 0 1 .
361 41 0 1 .
381 34 1 1 .
381 35 0 1 .
381 37 1 1 .
381 38 . 1 .
381 39 1 1 .
381 40 1 1 .
381 41 1 1 .
381 42 0 1 .
381 43 . 1 .
381 44 . 1 .
381 45 . 1 .
381 47 . 1 .
381 48 0 1 .
381 49 . 1 .
381 51 0 1 .
381 54 0 1 .
381 56 0 1 .
381 57 0 1 .
381 59 0 1 .
381 54 0 2 .
381 57 . 2 .
401 27 1 1 .
401 32 . 1 .
401 35 0 1 .
472 30 1 1 .
472 36 . 1 .
491 32 0 1 .
491 35 0 1 .
511 35 0 1 .
511 36 . 1 .
511 37 . 1 .
511 41 0 1 .
511 43 0 1 .
511 50 . 1 .
511 53 . 1 .
511 55 0 1 .
511 56 0 1 .
521 32 0 1 .
521 35 1 1 .
521 38 1 1 .
541 32 1 1 .
541 33 1 1 .
541 35 . 1 .
541 36 . 1 .
631 22 1 1 .
661 36 0 1 .
661 39 0 1 .
661 43 0 1 .
661 45 0 1 .
661 47 0 1 .
771 31 1 1 .
771 36 . 1 .
771 39 0 1 .
771 41 0 1 .
771 44 0 1 .
771 50 0 1 .
771 51 0 1 .
781 25 0 1 .
781 27 . 1 .
781 40 0 1 .
end

Last edited by Lynne Peterson; 04 Mar 2020, 10:56.

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30111

04 Mar 2020, 11:19

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long studyid byte exam_age float(midlife_dm visit)
 11 40 0 1
 11 43 0 1
 31 32 1 1
 31 35 . 1
 41 37 0 1
 41 42 1 1
 41 44 0 1
 41 46 . 1
111 17 0 1
151 41 0 1
201 31 . 1
201 33 . 1
201 35 0 1
201 40 0 1
201 45 0 1
201 48 0 1
221 36 0 1
301 34 0 1
301 36 1 1
301 37 1 1
301 38 1 1
301 43 . 1
301 47 0 1
301 50 0 1
301 53 . 1
301 56 . 1
301 58 0 1
301 59 . 1
301 53 . 2
311 32 0 1
351 34 0 1
351 36 0 1
351 39 0 1
351 42 0 1
351 44 0 1
351 46 0 1
361 37 . 1
361 39 0 1
361 40 0 1
361 41 0 1
381 34 1 1
381 35 0 1
381 37 1 1
381 38 . 1
381 39 1 1
381 40 1 1
381 41 1 1
381 42 0 1
381 43 . 1
381 44 . 1
381 45 . 1
381 47 . 1
381 48 0 1
381 49 . 1
381 51 0 1
381 54 0 1
381 56 0 1
381 57 0 1
381 59 0 1
381 54 0 2
381 57 . 2
401 27 1 1
401 32 . 1
401 35 0 1
472 30 1 1
472 36 . 1
491 32 0 1
491 35 0 1
511 35 0 1
511 36 . 1
511 37 . 1
511 41 0 1
511 43 0 1
511 50 . 1
511 53 . 1
511 55 0 1
511 56 0 1
521 32 0 1
521 35 1 1
521 38 1 1
541 32 1 1
541 33 1 1
541 35 . 1
541 36 . 1
631 22 1 1
661 36 0 1
661 39 0 1
661 43 0 1
661 45 0 1
661 47 0 1
771 31 1 1
771 36 . 1
771 39 0 1
771 41 0 1
771 44 0 1
771 50 0 1
771 51 0 1
781 25 0 1
781 27 . 1
781 40 0 1
end

sort studyid exam_age visit

preserve
keep if inrange(exam_age, 45, 55)
sort studyid exam_age visit
collapse (firstnm) midlife_dm, by(studyid)
rename midlife_dm diabetesage45_55
keep studyid diabetesage45_55
tempfile midlife_dm
save `midlife_dm'

restore
merge m:1 studyid using `midlife_dm', assert(match master) nogenerate

Note: Your code appears to generate visit as just an arbitrary ordering where the same patient has repeated visits at the same age. If you don't have an actual date, or an actual visit number that corresponds to real-world chronological order of the visits, then your data are incapable of answering your question. You state that you want the first value of the diabetes variable between ages 45 and 55--but if the data can't tell you which visit is first, then there is no hope of getting that.

Comment

Lynne Peterson

Join Date: Mar 2020

Posts: 3
#5

04 Mar 2020, 13:05

Thanks!
I will go back to the data managers and have them generate the order of visits based on visit date to address the issue of repeated visits at the same age for my study. (Visit date isn't distributed outside of the health system). Thank you for the help!
Comment

Announcement