Configuring Wide Data to Time-Series -- First Time, Need Help

Kevin Blaine

Join Date: Jul 2017

Posts: 66
#1

Configuring Wide Data to Time-Series -- First Time, Need Help

16 Oct 2023, 21:01

Hi everyone,

I've Googled till kingdom come, as well as searched through this forum for the last few hours, and despite my best tries, I can't seem to find a solution to my problem. Though I'll be the first to admit that it is probably fairly easy and right in front of me. Apologies from the outset!

I have a bunch of panel data needing to be converted to Time-Series, but for this example, I'm going to use a single observation in wide format to get the simplest form of a solution. Here's my data:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input long studyid str10 med float(start duration day_of_seizure_stop end) 100001 "Tylenol" 11 4 12 14 end

I would like to turn this into the following:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input long studyid str10 med float(day seizure_stop days_on_med) 100001 "No Rx" 0 0 1 100001 "No Rx" 1 0 2 100001 "No Rx" 2 0 3 100001 "No Rx" 3 0 4 100001 "No Rx" 4 0 5 100001 "No Rx" 5 0 6 100001 "No Rx" 6 0 7 100001 "No Rx" 7 0 8 100001 "No Rx" 8 0 9 100001 "No Rx" 9 0 10 100001 "No Rx" 10 0 11 100001 "Tylenol" 11 0 1 100001 "Tylenol" 12 1 2 100001 "Tylenol" 13 0 3 100001 "Tylenol" 14 0 4 100001 "No Rx" 15 0 1 100001 "No Rx" 16 0 2 end

For clarity:
studyid = Identifier
med = Medication
start = The day when medication began relative to when a diagnosis was given
duration = How many days between when the medication was started and when it was ended
day_of_seizure_stop = The day seizures stopped relative to when a diagnosis was given
end = The day when medication was stopped relative to when a diagnosis was given

day = Any given day relative to when a diagnosis was given
seizure_stop = Did the seizure stop on the current day? 0/1 = "No/Yes"
days_on_med = Given the current day, how many total days has the patient been on the current medication?

Any help would be appreciated!

Thanks,
Kevin
Tags: None

Luca Calianno

Join Date: Sep 2023
Posts: 31

17 Oct 2023, 02:16

Hi,
I don't know if it is the most efficient way to do it but I believe it is sufficiently flexible for any additions you may want to implement.
The reason is that you can "augment" this code by using a for loop for each drug quite easily and use tempfile + append to create the final time series you need.
I really hope this helps you.

Code:

clear
input long studyid str10 med float(start duration day_of_seizure_stop end)
100001 "Tylenol" 11 4 12 14
end

expand 30 // I chose 30 days but you can modify it depending on the days you need

generate day = _n -1

gen seizure_stop = 0
replace seizure_stop = 1 if day == day_of_seizure_stop

gen drug = "No Rx" if day < start
replace drug = "Tylenol" if day >= start & day <= end
replace drug = "Post Tylenol" if day > end 

bys drug (day): gen days_on_med = _n
sort day

And gives the following:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str12 drug float(day seizure_stop days_on_med)
"No Rx"         0 0  1
"No Rx"         1 0  2
"No Rx"         2 0  3
"No Rx"         3 0  4
"No Rx"         4 0  5
"No Rx"         5 0  6
"No Rx"         6 0  7
"No Rx"         7 0  8
"No Rx"         8 0  9
"No Rx"         9 0 10
"No Rx"        10 0 11
"Tylenol"      11 0  1
"Tylenol"      12 1  2
"Tylenol"      13 0  3
"Tylenol"      14 0  4
"Post Tylenol" 15 0  1
"Post Tylenol" 16 0  2
"Post Tylenol" 17 0  3
"Post Tylenol" 18 0  4
"Post Tylenol" 19 0  5
"Post Tylenol" 20 0  6
"Post Tylenol" 21 0  7
"Post Tylenol" 22 0  8
"Post Tylenol" 23 0  9
"Post Tylenol" 24 0 10
"Post Tylenol" 25 0 11
"Post Tylenol" 26 0 12
"Post Tylenol" 27 0 13
"Post Tylenol" 28 0 14
"Post Tylenol" 29 0 15
end

Comment

Kevin Blaine

Join Date: Jul 2017

Posts: 66
#3

17 Oct 2023, 09:44

Thank you so much, Luca. How would this be implemented in the larger scenario, say with this dataset?

Code:

* Example generated by -dataex-. For more info, type help dataex clear input long studyid str5 med float(start duration days_to_censor end) 100001 "vgb" 11 180 12 191 100005 "acth" 0 56 3 56 130003 "pred" 2 32 8 34 end

Also, for context, I'm trying to identify the effect that specific medications have on stopping seizures, taking into account when they were started relative to diagnosis, how long they were on them, and (eventually) which medication made the most impact if given in concert with another medication. For example, this next snippet is a slice of how things really look (previous snippets are simplifications of the dataset):

Code:

* Example generated by -dataex-. For more info, type help dataex clear input long studyid str5 med float(start duration days_to_censor end) 100001 "vgb" 11 180 12 191 100002 "acth" 1 56 5 57 100002 "acth" 111 28 5 139 100002 "vgb" 126 180 5 306 100003 "acth" 0 56 180 56 100003 "vgb" 43 180 180 223 100005 "acth" 0 56 3 56 130003 "pred" 2 32 8 34 end

You can see it gets complicated quickly, especially when there is overlap in prescribing practices.

Last edited by Kevin Blaine; 17 Oct 2023, 09:52.
Comment

Luca Calianno

Join Date: Sep 2023
Posts: 31

18 Oct 2023, 04:20

Hi Kevin,
for sure there is something more efficient and appropriate but this should do the job

Code:

clear
input long studyid str5 med float(start duration days_to_censor end)
100001 "vgb"   11 180  12 191
100002 "acth"   1  56   5  57
100002 "acth" 111  28   5 139
100002 "vgb"  126 180   5 306
100003 "acth"   0  56 180  56
100003 "vgb"   43 180 180 223
100005 "acth"   0  56   3  56
130003 "pred"   2  32   8  34
end


levelsof med, local(med)
levelsof studyid, local(id)


summ end
local max_days = r(max)

** Check how many cases there are of individuals that take repeatedly the same medicine

duplicates list studyid med

* Solution is to create new variables for each med cycle
duplicates tag studyid med, gen(temp_tag)
summ temp_tag
local repeated_tag = r(max)

forvalues n = 1(1)`repeated_tag'{
    
    foreach var of varlist start duration days_to_censor end{
        
        gen `var'`n' = .
        bys studyid med: replace `var'`n' = `var'[_n+1]
        
        local variables `variables' `var' `var'`n' 
        
    }
}

bys studyid med:  replace temp_tag = 0 if temp_tag[1]
drop if temp_tag >0
drop temp_tag


*** I reshape the dataset now to have a row for every indvidual
reshape wide `variables', i(studyid) j(med) string

tempfile temp
save `temp'

tempfile simulation
save `simulation'


* Now I expand the dataset in order to have a time series with individuals study




foreach l of local id {
    use `temp.dta', clear
        
    keep if studyid == `l'
    expand `max_days' 

    generate day = _n - 1
    foreach drug of local med {
        gen `drug' = 0
    }    

    foreach drug of local med {

        gen censor`drug' = 0
        replace censor`drug' = 1 if day == days_to_censor`drug'
        replace censor`drug' = 1 if day == days_to_censor1`drug'
        
        replace `drug' = 1 if day >= start`drug' & day <= end`drug'
        replace `drug' = 1 if day >= start1`drug' & day <= end1`drug'
        
        bys `drug' (day): gen days_on_med`drug' = _n
        replace days_on_med`drug' = 0 if `drug' ==0
        sort day

        }

    if `l'==1 {
        save `"`simulation.dta'"', replace
    }
    else {
            
        append using `simulation'
        save `"`simulation.dta'"', replace
    }

}






use `simulation', clear

Comment

Kevin Blaine

Join Date: Jul 2017

Posts: 66
#5

18 Oct 2023, 15:17

Luca -- this is excellent! It's a lot to take it, but I'm going to incorporate the code into my large DO file and see how it works. Mind if I ping you for additional troubleshooting as needed?

Thanks!
1 like
Comment

Announcement