Hi,
I'm performing a discrete time survival analysis (DTSA) on longitudinal panel data with data collected every 2 years from 1994/95 through 2010/2011. My primary interest is in examining arthritis as a risk factor for developing heart disease. I created a person-period dataset and performed all of my initial analyses with a non-parametric specification of calendar time as my time-scale. I therefore created 8 indicator variables (d1-d8) for each 2-year wave of data collection. I then extended my model to include the exposure of interest (x) and covariates (z) which include age.
The model looks something like this
logit Y d1-d8 x z, nocons or
Apon further consideration, given that my primary research question relates to predicting chronic disease onset in a longitudinal population-based survey, I think attained age is a better choice for time scale because defining the risk sets in terms of age is more relevant in the present context than calendar time and age is an important confounder for heart disease onset that needs to be carefully controlled for. This is summarized in the excerpt below by Thiebaut & Benichou, Stat Med, 2004 (link for reference: http://www.ncbi.nlm.nih.gov/pubmed/?...mulation+study)
"In most epidemiologic cohort studies, subjects are followed up prospectively for the occurrence of a given disease. Upon analysing such data, the effect of age needs to be tightly controlled because the incidence of most diseases, especially chronic diseases, is strongly determined by age. The natural time-scale W is then (attained) age. Using time-on-study as the time-scale would generally not be relevant, especially when the inclusion into the cohort coincides with an interview, which is not supposed to modify one’s risk. Indeed in epidemiologic cohort studies, contrary to clinical studies, the time when a subject comes under observation usually does not coincide with the time when the subject becomes at risk for the disease of interest."
Question 1
My first question is as follows, what would be a preferred method for specifying age as the time scale in (DTSA). The age range in my sample is 18-105 years so specifying a dummy indicator for each year of age is both cumbersome and problematic because an event does not occur in each year of age. So far I have specified attained age as the time scale in 2 ways
1. Indicator attained age categories as follows (18-44, 45-49, 50-54, 55-59, 60-65, 65-69, 70-74, 75+). Note 18-44 collapsed category selected because sparse number of events in this age range. This category grouping is useful for comparison with other studies.
2. Polynomial specification of age centered at mean age value at baseline. i.e mean age baseline is 45 years. So created two centered age variables
- cage = age-45
- cage2 = cage^2
How do I go about choosing between the two specification? I was reviewing Stephen Jenkins Lesson 6 - Estimation: (ii) discrete time models (logistic and cloglog), and while there are examples of various specifications of time, I did not see how to select the most appropriate one. (Apologies if I just missed this in the lecture notes). I cannot use a likelihood ratio test because these are non-nested models. Can I chose based on AIC/BIC criteria? Note: That in reality it does not make much difference on the effect of my exposure whether I use age one way or the other but feel that I should have a clear decision making process as these analyses are part of my doctoral thesis.
Question 2
My second question further relates to specifying the time scale in discrete time survival analysis (DTSA). If I go with age as the time scale, I still want to account for potential calendar period effects over the follow up period from 1994/94-2010/11. The publication I site below is a discussion of choice of time scale in longitudinal surveys using continuous time survival analysis with Cox proportional hazards model but I think the discussion is relevant to DTSA. Korn et al. AJE 1997 (link for reference: http://www.ncbi.nlm.nih.gov/pubmed/8982025) Here is the relevant excerpt
The recommended continuous time proportional hazards model which controls for period effects as well as age and cohort effects is given as:

where
A = a is the age of the individual during the follow-up period
b0 is the birth cohort of the individual with Bj birth cohort intervals, e.g.,1906-1910, 1911-1915, etc.
B'z is a vector of regression parameters
If my age time scale was the polynomial specification above (cage cage 2), my question is how would the DTSA model with logit link statement look in STATA?
Suppose I create 2 birth cohorts
BC190610
BC191115
Would I create interactions with cage & cage2?
cageBC190610=cage*BC190610
cageBC191115=cage*BC191115
cage2BC190610=cage2*BC190610
cage2BC191115=cage2*BC191115
Would I then specify the 4 interaction terms, 2 birth cohort indicators along with exposure and other covariates as follows:
logit Y cageBC190610 cageBC191115 cage2BC190610 cage2BC191115 BC190610 BC192215 x z , or
Thanks in advance
Orit
I'm performing a discrete time survival analysis (DTSA) on longitudinal panel data with data collected every 2 years from 1994/95 through 2010/2011. My primary interest is in examining arthritis as a risk factor for developing heart disease. I created a person-period dataset and performed all of my initial analyses with a non-parametric specification of calendar time as my time-scale. I therefore created 8 indicator variables (d1-d8) for each 2-year wave of data collection. I then extended my model to include the exposure of interest (x) and covariates (z) which include age.
The model looks something like this
logit Y d1-d8 x z, nocons or
Apon further consideration, given that my primary research question relates to predicting chronic disease onset in a longitudinal population-based survey, I think attained age is a better choice for time scale because defining the risk sets in terms of age is more relevant in the present context than calendar time and age is an important confounder for heart disease onset that needs to be carefully controlled for. This is summarized in the excerpt below by Thiebaut & Benichou, Stat Med, 2004 (link for reference: http://www.ncbi.nlm.nih.gov/pubmed/?...mulation+study)
"In most epidemiologic cohort studies, subjects are followed up prospectively for the occurrence of a given disease. Upon analysing such data, the effect of age needs to be tightly controlled because the incidence of most diseases, especially chronic diseases, is strongly determined by age. The natural time-scale W is then (attained) age. Using time-on-study as the time-scale would generally not be relevant, especially when the inclusion into the cohort coincides with an interview, which is not supposed to modify one’s risk. Indeed in epidemiologic cohort studies, contrary to clinical studies, the time when a subject comes under observation usually does not coincide with the time when the subject becomes at risk for the disease of interest."
Question 1
My first question is as follows, what would be a preferred method for specifying age as the time scale in (DTSA). The age range in my sample is 18-105 years so specifying a dummy indicator for each year of age is both cumbersome and problematic because an event does not occur in each year of age. So far I have specified attained age as the time scale in 2 ways
1. Indicator attained age categories as follows (18-44, 45-49, 50-54, 55-59, 60-65, 65-69, 70-74, 75+). Note 18-44 collapsed category selected because sparse number of events in this age range. This category grouping is useful for comparison with other studies.
2. Polynomial specification of age centered at mean age value at baseline. i.e mean age baseline is 45 years. So created two centered age variables
- cage = age-45
- cage2 = cage^2
How do I go about choosing between the two specification? I was reviewing Stephen Jenkins Lesson 6 - Estimation: (ii) discrete time models (logistic and cloglog), and while there are examples of various specifications of time, I did not see how to select the most appropriate one. (Apologies if I just missed this in the lecture notes). I cannot use a likelihood ratio test because these are non-nested models. Can I chose based on AIC/BIC criteria? Note: That in reality it does not make much difference on the effect of my exposure whether I use age one way or the other but feel that I should have a clear decision making process as these analyses are part of my doctoral thesis.
Question 2
My second question further relates to specifying the time scale in discrete time survival analysis (DTSA). If I go with age as the time scale, I still want to account for potential calendar period effects over the follow up period from 1994/94-2010/11. The publication I site below is a discussion of choice of time scale in longitudinal surveys using continuous time survival analysis with Cox proportional hazards model but I think the discussion is relevant to DTSA. Korn et al. AJE 1997 (link for reference: http://www.ncbi.nlm.nih.gov/pubmed/8982025) Here is the relevant excerpt
The recommended continuous time proportional hazards model which controls for period effects as well as age and cohort effects is given as:
where
A = a is the age of the individual during the follow-up period
b0 is the birth cohort of the individual with Bj birth cohort intervals, e.g.,1906-1910, 1911-1915, etc.
B'z is a vector of regression parameters
If my age time scale was the polynomial specification above (cage cage 2), my question is how would the DTSA model with logit link statement look in STATA?
Suppose I create 2 birth cohorts
BC190610
BC191115
Would I create interactions with cage & cage2?
cageBC190610=cage*BC190610
cageBC191115=cage*BC191115
cage2BC190610=cage2*BC190610
cage2BC191115=cage2*BC191115
Would I then specify the 4 interaction terms, 2 birth cohort indicators along with exposure and other covariates as follows:
logit Y cageBC190610 cageBC191115 cage2BC190610 cage2BC191115 BC190610 BC192215 x z , or
Thanks in advance
Orit
Comment