Hi,
I am new to this forum and hope that my post follows proper etiquette. (Please do let me know if I have not and will be happy to adjust). I am working with longitudinal complex survey data. All variables are updated every 2 years providing up to 8 waves/ time-points for analysis per person. I am interested in estimating the effect of a time-varying exposure (arthritis) controlling for both time-invariant (e.g. sex, race, education...) and time-varying covariates (chronic comorbidities, medication use, etc...) on the the time-to-first occurrence of heart disease in the dataset. My primary analytic approach is discrete time survival analysis. I have prepared the person period dataset with 8 time dummy variables reflecting calendar time from the beginning of the survey (1994/95 through 2010/11) and use a logit link for the model. The baseline hazard for the model with dummy time only as predictors is relatively flat given that the probability of developing incident heart disease in a given 2-year period is fairly constant however I keep time as dummies because I see some fluctuations in the models that include covariates.
The data contains roughly 12,000 observations weighted to represent the Canadian population in 1994. Thus far I have been performing all analyses including 8 time-dummy variables [_d1-_d8], my time invariant (TI) and time varying covariates (TVC), a variable reflecting the survey weights [pweight] and cluster robust standard error vce(cluster id). My stata code looks something like this
logit _Y _d1-_d8 TI TV [pweight=WT64LS], nocons vce(cluster REALUKEY)
The recommended approach from Statistics Canada for analysis of their surveys is to use survey commands with bootstrap standard errors. Stata ofcourse has these options through the SVY commands however SVY: logit does not permit clustered standard errors
Here are my questions:
My intuition is that although the baseline hazard for the outcome heart disease is relatively constant, my time-varying covariates are very faily correlated within person. Particularly my time-varying exposure arthritis is an "absorbing state" such that once a person reports having arthritis they are treated as having arthritis until the outcome, death or censoring. So their future value is dependent on prior values.
1. Would you recommend using the SVY logit procedure with no cluster standard errors or proceeding as I did above specifying the pweight and cluster id.
2. My other issue is following Dr. Jenkins's lesson plans, I would like to test for unobserved heterogeneity using a multi-level model but have not been succesful at generating results, particularly with survey weights. Should I treat the dummy time-variables as fixed and random effects? Do I specify the survey weights as fixed and random effects? My experience thus far is that I am able to estimate a model using dummy time as fixed effects and id as random with no survey weights, however estimation procedures break down when I try to specify dummy time as random effects with pweights. Any guidance particularly with stata code would be greatly appreciated.
Thanking you in advance
Orit
I am new to this forum and hope that my post follows proper etiquette. (Please do let me know if I have not and will be happy to adjust). I am working with longitudinal complex survey data. All variables are updated every 2 years providing up to 8 waves/ time-points for analysis per person. I am interested in estimating the effect of a time-varying exposure (arthritis) controlling for both time-invariant (e.g. sex, race, education...) and time-varying covariates (chronic comorbidities, medication use, etc...) on the the time-to-first occurrence of heart disease in the dataset. My primary analytic approach is discrete time survival analysis. I have prepared the person period dataset with 8 time dummy variables reflecting calendar time from the beginning of the survey (1994/95 through 2010/11) and use a logit link for the model. The baseline hazard for the model with dummy time only as predictors is relatively flat given that the probability of developing incident heart disease in a given 2-year period is fairly constant however I keep time as dummies because I see some fluctuations in the models that include covariates.
The data contains roughly 12,000 observations weighted to represent the Canadian population in 1994. Thus far I have been performing all analyses including 8 time-dummy variables [_d1-_d8], my time invariant (TI) and time varying covariates (TVC), a variable reflecting the survey weights [pweight] and cluster robust standard error vce(cluster id). My stata code looks something like this
logit _Y _d1-_d8 TI TV [pweight=WT64LS], nocons vce(cluster REALUKEY)
The recommended approach from Statistics Canada for analysis of their surveys is to use survey commands with bootstrap standard errors. Stata ofcourse has these options through the SVY commands however SVY: logit does not permit clustered standard errors
Here are my questions:
My intuition is that although the baseline hazard for the outcome heart disease is relatively constant, my time-varying covariates are very faily correlated within person. Particularly my time-varying exposure arthritis is an "absorbing state" such that once a person reports having arthritis they are treated as having arthritis until the outcome, death or censoring. So their future value is dependent on prior values.
1. Would you recommend using the SVY logit procedure with no cluster standard errors or proceeding as I did above specifying the pweight and cluster id.
2. My other issue is following Dr. Jenkins's lesson plans, I would like to test for unobserved heterogeneity using a multi-level model but have not been succesful at generating results, particularly with survey weights. Should I treat the dummy time-variables as fixed and random effects? Do I specify the survey weights as fixed and random effects? My experience thus far is that I am able to estimate a model using dummy time as fixed effects and id as random with no survey weights, however estimation procedures break down when I try to specify dummy time as random effects with pweights. Any guidance particularly with stata code would be greatly appreciated.
Thanking you in advance
Orit
Comment