Note: question is also posted here without the stata specifics

I am estimating a cox proportional hazard model with and without explanatory variables. Without explanatory variables, the hazard rate is just the proportion of all individuals that failed at time t out of all individual that lasted at least time t .

After estimating the model with covariates, I calculate the predicted hazard for each observation in the sample used to fit the model by multiplying the baseline hazard by exp(z_i′β) . I then average the predicted baseline hazard for each unit of time across all individuals that have not yet failed at that time.

My problem is that the averaged hazard rate is always larger than the hazard rate from the model without any explanatory variables. I do not understand how the average hazard rate from including a covariate is always larger than the hazard rate without any covariates. It makes sense if they are not always equal but I feel like the average of the two should be the same. If it helps, below is the stata code I am using:

My problem is that individual_hazard_rate average is always larger than simple_hazard_rate_estimate . See the figure below (the explanatory variable hazard is always larger but the difference decreases over time):

I have also tried this with and without using sample weights, made sure that none of the explanatory variables have any missing values, and made sure none of the explanatory variables change over the duration of the survey and experience the same problem

Thank you for any help you can provide.

I am estimating a cox proportional hazard model with and without explanatory variables. Without explanatory variables, the hazard rate is just the proportion of all individuals that failed at time t out of all individual that lasted at least time t .

After estimating the model with covariates, I calculate the predicted hazard for each observation in the sample used to fit the model by multiplying the baseline hazard by exp(z_i′β) . I then average the predicted baseline hazard for each unit of time across all individuals that have not yet failed at that time.

My problem is that the averaged hazard rate is always larger than the hazard rate from the model without any explanatory variables. I do not understand how the average hazard rate from including a covariate is always larger than the hazard rate without any covariates. It makes sense if they are not always equal but I feel like the average of the two should be the same. If it helps, below is the stata code I am using:

Code:

/* My data is in the form, one observation per individual for each month they have not yet failed. So if an individual survives for three months and then fails the data looks like : */ **** Stsets the data stset spell_duration, failure(failure_indicator) id(individual_id) stcox explanatory_variable **** Predicts the baseline hazard that is the same for each person at each time t predict baseline_hazard_rate, basehc **** Predicts the hazard ratio for the individual that varies across individuals predict hazard_ratio_for_individual, hr **** baseline_hazard_rate estimate only appears for observations that failed at the specific time so this replaces the missing values and then redefines the baseline hazard bysort spell_duration: egen temp = max(baseline_hazard_rate) replace baseline_hazard_rate = temp drop temp **** This should be the unique hazard for each individual at time t gen individual_hazard_rate = baseline_hazard_rate*hazard_ratio_for_individual **** Estimates the simple model /* Note: this estimate is the same as simply taking the proportion of individuals that failed at period t out of all individuals that lasted until at least time t. In stata I could do this as: collapse failure_indicator, by(spell_duration) */ stcox, estimate **** Predicts the simple hazard rate estimate that is the same for all individuals at time t predict simple_hazard_rate_estimate, basehc collapse simple_hazard_rate_estimate individual_hazard_rate, by(spell_duration)

I have also tried this with and without using sample weights, made sure that none of the explanatory variables have any missing values, and made sure none of the explanatory variables change over the duration of the survey and experience the same problem

Thank you for any help you can provide.

## Comment