Need help with splitting longitudinal data into episodes of equal length

David Etoori

Join Date: Oct 2020

Posts: 6
#1

Need help with splitting longitudinal data into episodes of equal length

29 Aug 2025, 10:45

Hi Statalisters,

I'd appreciate your help with a data formatting problem I'm trying to solve.

I'm working with longitudinal data (multiple rows per subject). I want to split the data into intervals of equal length (30 days), starting from the study entry date until the study exit date, taking into account the multiple rows per subject.

My data looks like this:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(myid DoE DoX dob admidate adm_ind) 1 18263 23375 5395 18281 1 1 18263 23375 5395 19511 1 1 18263 23375 5395 19554 1 1 18263 23375 5395 19762 1 1 18263 23375 5395 19764 1 1 18263 23375 5395 20049 1 1 18263 23375 5395 20251 1 1 18263 23375 5395 21682 1 end format %d DoE format %d DoX format %d dob format %d admidate

DoE is the date of entry into the study
DoX the date of exit
dob is the date of birth
admidate is the date of admission

Each individual can have multiple admissions. I'd like to split the data so that an admission flag appears in any time interval where an admission occurred.

Could someone please advise on the best way to do this?

Any help would be much appreciated.

Thanks.
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30170

29 Aug 2025, 11:01

Code:

tempfile admissions
save `admissions'

keep myid DoE DoX
duplicates drop
gen `c(obs_t)' obs_no = _n
expand ceil((DoX - DoE)/30)
by obs_no, sort: gen start = DoE + (_n-1)*30
gen end = start + 29
format start end %d
drop obs_no

rangejoin admidate start end using `admissions', by(myid) keepusing(adm_ind)
replace adm_ind = 0 if missing(adm_ind)

-rangejoin- is written by Robert Picard and is available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

Comment

David Etoori

Join Date: Oct 2020

Posts: 6
#3

29 Aug 2025, 11:10

Thanks Clyde! Much appreciated
Comment

Hemanshu Kumar

Join Date: Mar 2015
Posts: 1485

Yesterday, 01:08

Here is an alternative, which uses only Stata's inbuilt commands (I borrow some chunks of code from #2):

Code:

tempfile admissions
save `admissions'

keep myid DoE DoX
duplicates drop

gen `c(obs_t)' obs_no = _n
expand (DoX - DoE)
bysort myid (obs_no): gen admidate = DoE + _n - 1

merge 1:1 myid admidate using `admissions', keep(master match)

gen period = ceil((admidate + 1 - DoE)/30)
collapse (max) _merge , by(myid DoE DoX period)
gen byte wanted = (_merge == 3)

gen start = DoE + (period - 1)*30
gen end = start + 29
format start end %d
drop _merge period

Note that internally, rangejoin uses the same expand and merge technique, so this method should be as fast (may even be faster insofar as it avoids some of the other overhead of the command that is unnecessary here).

Last edited by Hemanshu Kumar; Yesterday, 01:37.

Announcement