Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Configuring Wide Data to Time-Series -- First Time, Need Help

    Hi everyone,

    I've Googled till kingdom come, as well as searched through this forum for the last few hours, and despite my best tries, I can't seem to find a solution to my problem. Though I'll be the first to admit that it is probably fairly easy and right in front of me. Apologies from the outset!

    I have a bunch of panel data needing to be converted to Time-Series, but for this example, I'm going to use a single observation in wide format to get the simplest form of a solution. Here's my data:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long studyid str10 med float(start duration day_of_seizure_stop end)
    100001 "Tylenol" 11 4 12 14
    end

    I would like to turn this into the following:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long studyid str10 med float(day seizure_stop days_on_med)
    100001 "No Rx"    0 0  1
    100001 "No Rx"    1 0  2
    100001 "No Rx"    2 0  3
    100001 "No Rx"    3 0  4
    100001 "No Rx"    4 0  5
    100001 "No Rx"    5 0  6
    100001 "No Rx"    6 0  7
    100001 "No Rx"    7 0  8
    100001 "No Rx"    8 0  9
    100001 "No Rx"    9 0 10
    100001 "No Rx"   10 0 11
    100001 "Tylenol" 11 0  1
    100001 "Tylenol" 12 1  2
    100001 "Tylenol" 13 0  3
    100001 "Tylenol" 14 0  4
    100001 "No Rx"   15 0  1
    100001 "No Rx"   16 0  2
    end
    For clarity:
    studyid = Identifier
    med = Medication
    start = The day when medication began relative to when a diagnosis was given
    duration = How many days between when the medication was started and when it was ended
    day_of_seizure_stop = The day seizures stopped relative to when a diagnosis was given
    end = The day when medication was stopped relative to when a diagnosis was given

    day = Any given day relative to when a diagnosis was given
    seizure_stop = Did the seizure stop on the current day? 0/1 = "No/Yes"
    days_on_med = Given the current day, how many total days has the patient been on the current medication?


    Any help would be appreciated!

    Thanks,
    Kevin

  • #2
    Hi,
    I don't know if it is the most efficient way to do it but I believe it is sufficiently flexible for any additions you may want to implement.
    The reason is that you can "augment" this code by using a for loop for each drug quite easily and use tempfile + append to create the final time series you need.
    I really hope this helps you.

    Code:
    clear
    input long studyid str10 med float(start duration day_of_seizure_stop end)
    100001 "Tylenol" 11 4 12 14
    end
    
    expand 30 // I chose 30 days but you can modify it depending on the days you need
    
    generate day = _n -1
    
    gen seizure_stop = 0
    replace seizure_stop = 1 if day == day_of_seizure_stop
    
    gen drug = "No Rx" if day < start
    replace drug = "Tylenol" if day >= start & day <= end
    replace drug = "Post Tylenol" if day > end 
    
    bys drug (day): gen days_on_med = _n
    sort day

    And gives the following:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str12 drug float(day seizure_stop days_on_med)
    "No Rx"         0 0  1
    "No Rx"         1 0  2
    "No Rx"         2 0  3
    "No Rx"         3 0  4
    "No Rx"         4 0  5
    "No Rx"         5 0  6
    "No Rx"         6 0  7
    "No Rx"         7 0  8
    "No Rx"         8 0  9
    "No Rx"         9 0 10
    "No Rx"        10 0 11
    "Tylenol"      11 0  1
    "Tylenol"      12 1  2
    "Tylenol"      13 0  3
    "Tylenol"      14 0  4
    "Post Tylenol" 15 0  1
    "Post Tylenol" 16 0  2
    "Post Tylenol" 17 0  3
    "Post Tylenol" 18 0  4
    "Post Tylenol" 19 0  5
    "Post Tylenol" 20 0  6
    "Post Tylenol" 21 0  7
    "Post Tylenol" 22 0  8
    "Post Tylenol" 23 0  9
    "Post Tylenol" 24 0 10
    "Post Tylenol" 25 0 11
    "Post Tylenol" 26 0 12
    "Post Tylenol" 27 0 13
    "Post Tylenol" 28 0 14
    "Post Tylenol" 29 0 15
    end

    Comment


    • #3
      Thank you so much, Luca. How would this be implemented in the larger scenario, say with this dataset?


      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input long studyid str5 med float(start duration days_to_censor end)
      100001 "vgb"  11 180 12 191
      100005 "acth"  0  56  3  56
      130003 "pred"  2  32  8  34
      end

      Also, for context, I'm trying to identify the effect that specific medications have on stopping seizures, taking into account when they were started relative to diagnosis, how long they were on them, and (eventually) which medication made the most impact if given in concert with another medication. For example, this next snippet is a slice of how things really look (previous snippets are simplifications of the dataset):

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input long studyid str5 med float(start duration days_to_censor end)
      100001 "vgb"   11 180  12 191
      100002 "acth"   1  56   5  57
      100002 "acth" 111  28   5 139
      100002 "vgb"  126 180   5 306
      100003 "acth"   0  56 180  56
      100003 "vgb"   43 180 180 223
      100005 "acth"   0  56   3  56
      130003 "pred"   2  32   8  34
      end

      You can see it gets complicated quickly, especially when there is overlap in prescribing practices.
      Last edited by Kevin Blaine; 17 Oct 2023, 09:52.

      Comment


      • #4
        Hi Kevin,
        for sure there is something more efficient and appropriate but this should do the job

        Code:
        clear
        input long studyid str5 med float(start duration days_to_censor end)
        100001 "vgb"   11 180  12 191
        100002 "acth"   1  56   5  57
        100002 "acth" 111  28   5 139
        100002 "vgb"  126 180   5 306
        100003 "acth"   0  56 180  56
        100003 "vgb"   43 180 180 223
        100005 "acth"   0  56   3  56
        130003 "pred"   2  32   8  34
        end
        
        
        levelsof med, local(med)
        levelsof studyid, local(id)
        
        
        summ end
        local max_days = r(max)
        
        ** Check how many cases there are of individuals that take repeatedly the same medicine
        
        duplicates list studyid med
        
        * Solution is to create new variables for each med cycle
        duplicates tag studyid med, gen(temp_tag)
        summ temp_tag
        local repeated_tag = r(max)
        
        forvalues n = 1(1)`repeated_tag'{
            
            foreach var of varlist start duration days_to_censor end{
                
                gen `var'`n' = .
                bys studyid med: replace `var'`n' = `var'[_n+1]
                
                local variables `variables' `var' `var'`n' 
                
            }
        }
        
        bys studyid med:  replace temp_tag = 0 if temp_tag[1]
        drop if temp_tag >0
        drop temp_tag
        
        
        *** I reshape the dataset now to have a row for every indvidual
        reshape wide `variables', i(studyid) j(med) string
        
        tempfile temp
        save `temp'
        
        tempfile simulation
        save `simulation'
        
        
        * Now I expand the dataset in order to have a time series with individuals study
        
        
        
        
        foreach l of local id {
            use `temp.dta', clear
                
            keep if studyid == `l'
            expand `max_days' 
        
            generate day = _n - 1
            foreach drug of local med {
                gen `drug' = 0
            }    
        
            foreach drug of local med {
        
                gen censor`drug' = 0
                replace censor`drug' = 1 if day == days_to_censor`drug'
                replace censor`drug' = 1 if day == days_to_censor1`drug'
                
                replace `drug' = 1 if day >= start`drug' & day <= end`drug'
                replace `drug' = 1 if day >= start1`drug' & day <= end1`drug'
                
                bys `drug' (day): gen days_on_med`drug' = _n
                replace days_on_med`drug' = 0 if `drug' ==0
                sort day
        
                }
        
            if `l'==1 {
                save `"`simulation.dta'"', replace
            }
            else {
                    
                append using `simulation'
                save `"`simulation.dta'"', replace
            }
        
        }
        
        
        
        
        
        
        use `simulation', clear

        Comment


        • #5
          Luca -- this is excellent! It's a lot to take it, but I'm going to incorporate the code into my large DO file and see how it works. Mind if I ping you for additional troubleshooting as needed?

          Thanks!

          Comment

          Working...
          X