Subset or flag observations in panel data by id and a second variable

Anne Thackeray

Join Date: Sep 2023

Posts: 3
#1

Subset or flag observations in panel data by id and a second variable

11 Oct 2023, 16:40

Hello and thanks in advance for your help.
I have longitudinal data by id, visit#, and episode of treatment. Visit numbers are based on episodes of treatment. I can get duplicates on id and visit number as people can have more than one episode of treatment (see id 1, 3 ,4 below). What I would like to do is flag different episodes of treatment and can't seem to solve how to recognize a change in status by id. Any help is appreciated. A brief example is shown below:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(id visit episode) 1 1 1 1 2 1 1 1 2 2 1 3 2 2 3 3 1 4 3 2 4 3 1 8 3 2 8 3 3 8 4 1 10 4 1 15 4 2 15 5 1 7 5 2 7 end label values episode episode label def episode 1 "LBP", modify label def episode 2 "Neck pain", modify label def episode 3 "RCR", modify label def episode 4 "HS injury", modify label def episode 7 "Shoulder pain", modify label def episode 8 "weakness", modify label def episode 10 "ACLR", modify label def episode 15 "Knee OA", modify

Anne Thackeray
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30163
#2

11 Oct 2023, 18:15

Thank you for using -dataex- with your very first post.

I don't know what you mean by "flag different episodes of treatment" (beyond the fact that they are already identified in the data by the episode variable) and "recognize a change in status by id" seems even more vague. What is status and how does it relate to anything in your data? Perhaps it would be best if you hand worked a brief example so you can show what the results you want would look like.
Comment
Anne Thackeray

Join Date: Sep 2023

Posts: 3
#3

11 Oct 2023, 19:47

Hi Clyde- sorry for the confusion. Ultimately I need to sort out which participants had more than one episode of care. My analysis is based on changes over a course of care using visit numbers. Visits are sequentially numbered for each episode of care such that an individual can be seen for a series of 2 visits for a particular diagnosis but then seen for 3 visits for another diagnosis (For example, id #3 had 2 visit for an episode of #4 (HS injury) and 3 visits for "weakness".) Changes in our outcomes are expected to be different for each diagnoses. The challenge is that visit is my time variable and will be duplicated if the episode name changes. It seems I could sort this by identifying a new variable indicating a different episode by id, Something like:
id visit episode newvar
1 1 1 0
1 2 1 0
1 1 2 1
2 1 3 0
2 2 3 0
3 1 4. 0
3 2 4 0
3 1 8 1
3 2 8 1
3 3 8 1

It seems I want to identify a "change in state" as described here; https://www.stata.com/support/faqs/d...t-occurrences/
Using:
by id (time), sort: gen byte first = sum(inrange(value, 42,.)) == 1 & sum(inrange(value[_n - 1],42,.)) == 0
Where I get stuck is that I am not looking for just one work or range of values, rather the logic that there is a change in values (again using id 3 above: episode changes from 4 to 8). I am guessing the logic relates to the row value is not equal to the lag value.

Does that help?
I am open to clarifying any terms I am using incorrectly.
Thanks
Anne
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30163
#4

11 Oct 2023, 20:45

So that's:

Code:

by id (episode visit), sort: gen newvar = sum(episode != episode[_n-1]) - 1
Comment
Anne Thackeray

Join Date: Sep 2023

Posts: 3
#5

12 Oct 2023, 07:08

Thank you Clyde. That is exactly what I needed and it helps me know to understand the logic.
Anne
Comment

Announcement

Subset or flag observations in panel data by id and a second variable

Comment

Comment

Comment

Comment