Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Subset or flag observations in panel data by id and a second variable

    Hello and thanks in advance for your help.
    I have longitudinal data by id, visit#, and episode of treatment. Visit numbers are based on episodes of treatment. I can get duplicates on id and visit number as people can have more than one episode of treatment (see id 1, 3 ,4 below). What I would like to do is flag different episodes of treatment and can't seem to solve how to recognize a change in status by id. Any help is appreciated. A brief example is shown below:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(id visit episode)
    1 1  1
    1 2  1
    1 1  2
    2 1  3
    2 2  3
    3 1  4
    3 2  4
    3 1  8
    3 2  8
    3 3  8
    4 1 10
    4 1 15
    4 2 15
    5 1  7
    5 2  7
    end
    label values episode episode
    label def episode 1 "LBP", modify
    label def episode 2 "Neck pain", modify
    label def episode 3 "RCR", modify
    label def episode 4 "HS injury", modify
    label def episode 7 "Shoulder pain", modify
    label def episode 8 "weakness", modify
    label def episode 10 "ACLR", modify
    label def episode 15 "Knee OA", modify
    Anne Thackeray

  • #2
    Thank you for using -dataex- with your very first post.

    I don't know what you mean by "flag different episodes of treatment" (beyond the fact that they are already identified in the data by the episode variable) and "recognize a change in status by id" seems even more vague. What is status and how does it relate to anything in your data? Perhaps it would be best if you hand worked a brief example so you can show what the results you want would look like.

    Comment


    • #3
      Hi Clyde- sorry for the confusion. Ultimately I need to sort out which participants had more than one episode of care. My analysis is based on changes over a course of care using visit numbers. Visits are sequentially numbered for each episode of care such that an individual can be seen for a series of 2 visits for a particular diagnosis but then seen for 3 visits for another diagnosis (For example, id #3 had 2 visit for an episode of #4 (HS injury) and 3 visits for "weakness".) Changes in our outcomes are expected to be different for each diagnoses. The challenge is that visit is my time variable and will be duplicated if the episode name changes. It seems I could sort this by identifying a new variable indicating a different episode by id, Something like:
      id visit episode newvar
      1 1 1 0
      1 2 1 0
      1 1 2 1
      2 1 3 0
      2 2 3 0
      3 1 4. 0
      3 2 4 0
      3 1 8 1
      3 2 8 1
      3 3 8 1

      It seems I want to identify a "change in state" as described here; https://www.stata.com/support/faqs/d...t-occurrences/
      Using:
      by id (time), sort: gen byte first = sum(inrange(value, 42,.)) == 1 & sum(inrange(value[_n - 1],42,.)) == 0
      Where I get stuck is that I am not looking for just one work or range of values, rather the logic that there is a change in values (again using id 3 above: episode changes from 4 to 8). I am guessing the logic relates to the row value is not equal to the lag value.

      Does that help?
      I am open to clarifying any terms I am using incorrectly.
      Thanks
      Anne

      Comment


      • #4
        So that's:
        Code:
        by id (episode visit), sort: gen newvar = sum(episode != episode[_n-1]) - 1

        Comment


        • #5
          Thank you Clyde. That is exactly what I needed and it helps me know to understand the logic.
          Anne

          Comment

          Working...
          X