Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Diary data and symptom severity scores

    Hi Statalist,

    I am looking at clinical event data. In this data we have an participant ID, their symptom e.g. cough, their symptom score at baseline from 0 (no problem) to 3 (severe problem) their daily diary scores of this symptom (days 1 -28 but in the dummy data below I have just used days 1 -10) and the date of their daily diaries. From this data I am trying to work out the start date of the clinical event - this is the first day that the symptom score increased from baseline. The data I am working with only contains participants where this has happened (which helps). On top of this I would then like to figure out the date that the score returned to the same as the baseline score or lower, and therefore creating an end date of the clinical event. I have experience in Stata but I do not even know where to start to tackle this. I have created some dummy data below.


    ID Symptom Baseline Diaryday1 Diaryday2 Diaryday3 Diaryday4 Diaryday5 Diaryday6 Diaryday7 Diaryday8 Diaryday9 Diaryday10 Diarydate1 Diarydate2 Diarydate3 Diarydate4 Diarydate5 Diarydate6 Diarydate7 Diarydate8 Diarydate9 Diarydate10
    A Cough 0 0 0 0 1 1 1 2 2 1 0 01/12/2020 02/12/2020 03/12/2020 04/12/2020 05/12/2020 06/12/2020 07/12/2020 08/12/2020 09/12/2020 10/12/2020
    B Cough 0 0 0 0 0 0 1 0 0 0 0 03/05/2021 04/05/2021 05/05/2021 06/05/2021 07/05/2021 08/05/2021 09/05/2021 10/05/2021 11/05/2021 12/05/2021
    C Cough 1 1 1 1 2 2 2 2 1 0 0 06/06/2021 07/06/2021 08/06/2021 09/06/2021 10/06/2021 11/06/2021 12/06/2021 13/06/2021 14/06/2021 15/06/2021
    D Cough 2 2 2 2 2 2 3 3 3 3 3 07/08/2021 08/08/2021 09/08/2021 10/08/2021 11/08/2021 12/08/2021 13/08/2021 14/08/2021 15/08/2021 16/08/2021
    E Cough 2 2 2 2 2 2 2 3 3 3 2 09/08/2021 10/08/2021 11/08/2021 12/08/2021 13/08/2021 14/08/2021 15/08/2021 16/08/2021 17/08/2021 18/08/2021
    F Cough 0 0 0 1 0 0 2 2 2 0 0 12/12/2021 13/12/2021 14/12/2021 15/12/2021 16/12/2021 17/12/2021 18/12/2021 19/12/2021 20/12/2021 21/12/2021
    G Cough 2 1 1 1 1 1 2 2 3 3 2 01/02/2022 02/02/2022 03/02/2022 04/02/2022 05/02/2022 06/02/2022 07/02/2022 08/02/2022 09/02/2022 10/02/2022


    A clinical event starts on day 4 and ends on day 10
    B clinical event starts on day 6 and ends on day 7
    C clinical event starts on day 4 and ends on day 8
    D clinical event starts on day 6 and does not end/resolve
    E clinical event starts on day 7 and ends on day 10
    F clinical event starts on day 3 ends on day 4 but starts again on day 6 and then ends on day 9
    G clinical event starts on day 8 and ends on day 10

    Any help for this would be much appreciated.

    Many thanks,
    Jenna

  • #2
    First, reshape long your data and then give a data example using dataex, explaining what you want.

    Code:
    reshape long Diaryday Diarydate, i(id) j(which)
    sort id which
    dataex

    Comment


    • #3
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str2 id str6 symptom byte(baseline diaryday1 diaryday2 diaryday3 diaryday4 diaryday5 diaryday6 diaryday7 diaryday8 diaryday9) float(diarydate1 diarydate2 diarydate3 diarydate4 diarydate5 diarydate6 diarydate7 diarydate8 diarydate9)
      "A " "Cough " 0 0 0 0 1 1 1 2 2 1 22250 22251 22252 22253 22254 22255 22256 22257 22258
      "B " "Cough " 0 0 0 0 0 0 1 0 0 0 22403 22404 22405 22406 22407 22408 22409 22410 22411
      "C " "Cough " 1 1 1 1 2 2 2 2 1 0 22437 22438 22439 22440 22441 22442 22443 22444 22445
      "D " "Cough " 2 2 2 2 2 2 3 3 3 3 22499 22500 22501 22502 22503 22504 22505 22506 22507
      "E " "Cough " 2 2 2 2 2 2 2 3 3 3 22501 22502 22503 22504 22505 22506 22507 22508 22509
      "F " "Cough " 0 0 0 1 0 0 2 2 2 0 22626 22627 22628 22629 22630 22631 22632 22633 22634
      "G " "Cough " 2 1 1 1 1 1 2 2 3 3 22677 22678 22679 22680 22681 22682 22683 22684 22685
      end
      format %td diarydate1
      format %td diarydate2
      format %td diarydate3
      format %td diarydate4
      format %td diarydate5
      format %td diarydate6
      format %td diarydate7
      format %td diarydate8
      format %td diarydate9
      
      //  GO TO LONG LAYOUT
      reshape long diaryday diarydate, i(id symptom) j(day)
      
      // by id symptom (day): egen episode_start = min(cond(diaryday > baseline, day, .))
      // by id symptom (day): egen episode_end = ///
      //     min(cond(day > episode_start & diaryday <= baseline, day, .))
          
      egen condition = group(id symptom)
      xtset condition day
      by condition (day): gen spell_num = ///
          sum(diaryday > baseline & L1.diaryday <= baseline ///
              | diaryday <= baseline & L1.diaryday > baseline | _n == 1)
      by condition spell_num (day), sort: gen byte event = 1 if _n == 1 & diaryday > baseline
      by condition spell_num (day): replace event = 2 if _n == 1 & diaryday <= baseline
      by condition (spell_num day): replace event = . if spell_num == 1 & event == 2
      label define event  1   "Start" 2   "End"
      label values event event
      drop condition spell_num
      Notes:
      1. Due to limitations of output line size in -dataex-, I dropped variables diaryday10 and diarydate10 in constructing the example data set used above. Nevertheless, the code will work no matter how many diary days there are.
      2. This problem is essentially unsolvable in the wide layout of the original data set. So the first step is to -reshape- it to long. From there it becomes more or less a standard spells problem.
      3. If there were only one event per ID/symptom, you could, if you really wanted to, go back to wide layout at the end of this code. But it is almost guaranteed that whatever else you want to do with this data will be easier, and likely only possible, if you stay in long layout.
      4. Given that there can be multiple events per person/symptom, a wide layout becomes even more unwieldy because you would need to set up multiple start-day and end-day variables to correspond to the number of events. The existing long layout accommodates this gracefully and compactly.

      Added:
      Crossed with #2.

      Comment


      • #4
        Clyde Schechter thank you very much for your response that is really hepful. This is only part of the project so I may need to revisit this but I'll see how I go first.

        Comment


        • #5
          Clyde Schechter I have added the code you wrote me into my program. I'm getting the error:

          Click image for larger version

Name:	Stata.PNG
Views:	1
Size:	14.1 KB
ID:	1672590



          Both the variables Baseline and Diaryday are string, so I am unsure why this is happening. All I have changed are the variable names so that they start with a capital letter to match my dataset. Any advice on this would be much appreciated.

          BW,
          Jenna

          Comment


          • #6
            Please ignore my message above I have now sorted this. Many thanks!

            Comment


            • #7
              Just 2 quick lessons to draw from #5, for those following along.

              In #1, the example data are presented as a tableau, rather than using -dataex-. My response in #2 was based on my interpretation of that tableau, and I assumed that the date variables were true Stata internal format daily date variables. That turns out not to be the case--of course, you can't tell that from a tableau. And that's why it is so important to always use dataex to show example data, so that those who try to find solution to the problems are starting from a correct representation of the problem. And so that when you get your answer, it will actually work and you don't waste your time fixing the fix.

              Second lesson: few things in Stata are more useless than dates stored as string variables. You cannot calculate anything with them. And, unless they are formatted as YMD, you cannot even correctly sort them into chronological order. So whenever you are creating a Stata data set that contains dates, one of the very first things you should do is change any string date variables to Stata internal format date variables using the appropriate conversion functions (see -help datetime functions- if you don't know what those are or how to use them). Do that before you do any analysis that involves those variables.

              Comment


              • #8
                Hi Clyde Schechter I've been working on my project some more and have got it to the following state (dummy data) so it is getting there. I only like to ask for help when I am truely stuck.

                My aim now is to put this back into wide format where ParticipantID becomes a unique value and to generate a StartDate and EndDate for each event but have the StartDate and EndDate variables as variables. An example of what I would like the variable heading to be are as follows:

                Participant ID AERPTDAT_1 AERPTIME_1 AETERM_1 AESEV_1 StartDate_1 EndDate_1 AESPID_1 AEINIT_1 AERPTDAT_2 AERPTIME_2 AETERM_2 AESEV_2 StartDate_2 EndDate_2 AESPID_2 AEINIT_2
                I've tried a few things to get this to work, the main one being the code below. I've read the manual on reshaping data and read the FAQs, but now that I have j being unique within ParticipantID (i), I am unsure why this has not reshaped the data. I keep getting the error message invalid 'j'.

                Code:
                by ParticipantID: gen ID2 = _n
                sort ParticipantID
                reshape wide Diarydate event, i(ParticipantID), j(ID2)
                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input str1 ParticipantID int AERPTDAT double AERPTIME str18 AETERM byte AESEV int Diarydate str5 event int AESPID str2 AEINIT byte ID2
                "A" 22834 -1893409463000 "High BP"            2 22638 "Start" 402 "JG"  1
                "A" 22834 -1893409463000 "High BP"            2 22640 "End"   404 "JG"  2
                "A" 22834 -1893409463000 "Headache"           2 22639 "Start" 431 "JG"  3
                "A" 22834 -1893409463000 "Headache"           2 22640 "End"   432 "JG"  4
                "A" 22834 -1893409463000 "Increased appetite" 2 22639 "Start" 459 "JG"  5
                "A" 22834 -1893409463000 "Increased appetite" 2 22641 "End"   461 "JG"  6
                "A" 22834 -1893409463000 "Abdominal pain"     2 22639 "Start" 487 "JG"  7
                "A" 22834 -1893409463000 "Abdominal pain"     2 22640 "End"   488 "JG"  8
                "A" 22834 -1893409463000 "Abdominal pain"     2 22643 "Start" 491 "JG"  9
                "A" 22834 -1893409463000 "Abdominal pain"     2 22644 "End"   492 "JG" 10
                "B" 22834 -1893409463000 "Headache"           3 22719 "Start" 569 "JG"  1
                "B" 22834 -1893409463000 "Headache"           3 22720 "End"   570 "JG"  2
                "B" 22834 -1893409463000 "Headache"           3 22721 "Start" 571 "JG"  3
                "C" 22834 -1893409463000 "Night sweats"       2 22600 "Start" 799 "JG"  1
                "C" 22834 -1893409463000 "Night sweats"       2 22605 "End"   804 "JG"  2
                "C" 22834 -1893409463000 "Night sweats"       2 22606 "Start" 805 "JG"  3
                "C" 22834 -1893409463000 "Headache"           3 22600 "Start" 827 "JG"  4
                "C" 22834 -1893409463000 "Headache"           3 22605 "End"   832 "JG"  5
                "C" 22834 -1893409463000 "Headache"           3 22606 "Start" 833 "JG"  6
                "C" 22834 -1893409463000 "Decreased appetite" 3 22600 "Start" 883 "JG"  7
                "C" 22834 -1893409463000 "Decreased appetite" 3 22605 "End"   888 "JG"  8
                "C" 22834 -1893409463000 "Decreased appetite" 3 22606 "Start" 889 "JG"  9
                "C" 22834 -1893409463000 "Nausea vomiting"    2 22594 "Start" 905 "JG" 10
                "C" 22834 -1893409463000 "Nausea vomiting"    2 22596 "End"   907 "JG" 11
                "C" 22834 -1893409463000 "Nausea vomiting"    2 22597 "Start" 908 "JG" 12
                "C" 22834 -1893409463000 "Nausea vomiting"    2 22598 "End"   909 "JG" 13
                "C" 22834 -1893409463000 "Nausea vomiting"    2 22600 "Start" 911 "JG" 14
                "C" 22834 -1893409463000 "Increased appetite" 3 22600 "Start" 939 "JG" 15
                "C" 22834 -1893409463000 "Increased appetite" 3 22605 "End"   944 "JG" 16
                "C" 22834 -1893409463000 "Increased appetite" 3 22606 "Start" 945 "JG" 17
                end
                format %tddd-Mon-YY AERPTDAT
                format %tcHH:MM:SS AERPTIME
                format %tddd-Mon-YY Diarydate

                I really appreciate your help with this.

                Many thanks,
                Jenna

                Comment


                • #9
                  There shouldn't be a comma between the i(ParticipantID) and j(ID2) options.

                  Comment


                  • #10
                    Thank you!

                    Comment

                    Working...
                    X