Hello,
I am using Stata 16 (have access to 17 also) in a closed system lab. I am trying to conduct a study to see the effects of living near a "factory" on various health outcomes. I am trying to see if I am using the time variant covariate option correctly on a single observation data. An example of the dataset is given below, which is longitudinal in nature and quite large.
Since the dataset is large, I chose to bring it down to one observation per id using these codes.
The data with single output looks like this:
And then I declared the data as survival-time data to conduct Cox proportional hazards model.
Initially, I censored events before exposure date (code commented out) e.g., id#1. However, I have been advised to take into account events before factory opens as contribution to unexposed time using time varying option in Stata.
So, I have used "tvc" and chose "exposure" as time varying variable. In actual dataset the code works and the output changes (I know example has no SEs and CIs). But I am not sure if it is correct. In a single observation dataset would Stata recognize "exposure" as time varying? I tried to use generate pre-exposure time and include that in texp, but quite sure I am using it wrong.
I would appreciate any comments on the methodology and codes. Apologies if the post is too long and if something is unclear.
Thanks in advance.
I am using Stata 16 (have access to 17 also) in a closed system lab. I am trying to conduct a study to see the effects of living near a "factory" on various health outcomes. I am trying to see if I am using the time variant covariate option correctly on a single observation data. An example of the dataset is given below, which is longitudinal in nature and quite large.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float id str30(start_date end_date) float(start_dt end_dt) int(exposure diag) float diag_dt 1 "1oct1981" "1oct1986" 7944 9770 0 0 . 1 "2oct1986" "2oct1991" 9771 11597 1 1 9060 2 "15nov1991" "15nov1993" 11641 12372 0 1 15724 3 "20dec1995" "20dec2001" 13137 15329 0 0 . 3 "21dec2001" "21dec2006" 15330 17156 0 1 16501 4 "15mar2005" "15mar2011" 16510 18701 1 0 . 4 "16mar2011" "16mar2016" 18702 20529 0 0 . 5 "7feb1995" "7feb2000" 12821 14647 0 0 . 5 "8feb2000" "8feb2005" 14648 16475 0 1 16363 5 "9feb2005" "9feb2010" 16476 18302 1 1 . 6 "2jun2003" "2jun2005" 15858 16589 0 0 . 7 "16aug1995" "10march1997" 13011 13583 0 0 . 8 "2sep2001" "10aug2005" 15220 16658 0 1 18128 9 "8jan2014" "9jul2017" 19731 21009 0 0 . 10 "14mar2005" "14mar2010" 16509 18335 0 0 . 10 "15mar2010" "15mar2015" 18336 20162 1 1 17583 11 "1feb2013" "15mar2014" 19390 19797 0 0 . 12 "15may1997" "15may2002" 13649 15475 0 0 . 13 "25nov2004" "19jan2007" 16400 17185 1 1 18351 14 "1jan2016" "1jan2017" 20454 20820 1 0 . 15 "21feb2012" "21feb2017" 19044 20871 0 0 . 16 "17jul2001" "17jul2016" 15173 20652 0 1 18938 17 "8jan2014" "9jun2016" 19731 20614 1 0 . 18 "1may2008" "31jan2011" 17653 18658 0 0 . end
Code:
format start_dt %td format end_dt %td format diag_dt %td *Exposure date, if not exposure then earliest start date* gen exp_dt=start_dt if exposure==1 sort id exp_dt bys id: replace exp_dt=exp_dt[_n-1] if exp_dt==. format exp_dt %td label var exp_dt "Exposure date" *Flagging exposed individual bys id: replace exposure=1 if exp_dt!=. & exposure==0 label var exposure "Exposed to factories" *Diagnosis date sort id diag_dt bys id: replace diag_dt=diag_dt[_n-1] if diag_dt==. label var diag_dt "Diagnosis Date" *Earliest start date (living in the area) sort id start_dt end_dt bys id: gen seq=_n gen org_dt=start_dt if seq==1 sort id org_dt bys id: replace org_dt=org_dt[_n-1] if org_dt==. format org_dt %td label var org_dt "Time since living in the area" *End of study date gen xyz="31mar2019" gen study_end=date(xyz, "DMY") format study_end %td keep if seq==1 drop seq xyz start_date end_date start_dt end_dt *If exposure date is blank replace with earliest start date replace exp_dt=org_dt if exp_dt==.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float id int(exposure diag) float(diag_dt exp_dt org_dt study_end) 1 1 0 9060 9771 7944 21639 2 0 1 15724 11641 11641 21639 3 0 0 16501 13137 13137 21639 4 1 0 . 16510 16510 21639 5 1 0 16363 16476 12821 21639 6 0 0 . 15858 15858 21639 7 0 0 . 13011 13011 21639 8 0 1 18128 15220 15220 21639 9 0 0 . 19731 19731 21639 10 1 0 17583 18336 16509 21639 11 0 0 . 19390 19390 21639 12 0 0 . 13649 13649 21639 13 1 1 18351 16400 16400 21639 14 1 0 . 20454 20454 21639 15 0 0 . 19044 19044 21639 16 0 1 18938 15173 15173 21639 17 1 0 . 19731 19731 21639 18 0 0 . 17653 17653 21639 end format %td diag_dt format %td exp_dt format %td org_dt format %td study_end
Code:
**STSET & EVENT** gen event=[diag_dt!=.] replace diag_dt=study_end if event==0 *replace event=0 if diag_dt<exp_dt /*censoring*/ stset diag_dt, origin(time org_dt) enter(time exp_dt) id(id) failure(event)
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float id int(exposure diag) float(diag_dt exp_dt org_dt study_end event) byte(_st _d) int(_origin _t) byte _t0 1 1 0 9060 9771 7944 21639 1 0 . 7944 . . 2 0 1 15724 11641 11641 21639 1 1 1 11641 4083 0 3 0 0 16501 13137 13137 21639 1 1 1 13137 3364 0 4 1 0 21639 16510 16510 21639 0 1 0 16510 5129 0 5 1 0 16363 16476 12821 21639 1 0 . 12821 . . 6 0 0 21639 15858 15858 21639 0 1 0 15858 5781 0 7 0 0 21639 13011 13011 21639 0 1 0 13011 8628 0 8 0 1 18128 15220 15220 21639 1 1 1 15220 2908 0 9 0 0 21639 19731 19731 21639 0 1 0 19731 1908 0 10 1 0 17583 18336 16509 21639 1 0 . 16509 . . 11 0 0 21639 19390 19390 21639 0 1 0 19390 2249 0 12 0 0 21639 13649 13649 21639 0 1 0 13649 7990 0 13 1 1 18351 16400 16400 21639 1 1 1 16400 1951 0 14 1 0 21639 20454 20454 21639 0 1 0 20454 1185 0 15 0 0 21639 19044 19044 21639 0 1 0 19044 2595 0 16 0 1 18938 15173 15173 21639 1 1 1 15173 3765 0 17 1 0 21639 19731 19731 21639 0 1 0 19731 1908 0 18 0 0 21639 17653 17653 21639 0 1 0 17653 3986 0 end format %td diag_dt format %td exp_dt format %td org_dt format %td study_end
So, I have used "tvc" and chose "exposure" as time varying variable. In actual dataset the code works and the output changes (I know example has no SEs and CIs). But I am not sure if it is correct. In a single observation dataset would Stata recognize "exposure" as time varying? I tried to use generate pre-exposure time and include that in texp, but quite sure I am using it wrong.
Code:
*Cox proportional hazard model stcox i.exposure *Time-varient model* stcox i.exposure, tvc(exposure) *Duration before exposure gen pre_exp=exp_dt - org_dt *Time-varient model with pre-duration* stcox i.exposure, tvc(exposure) texp(pre_duration)
Thanks in advance.
Comment