Can you include time fixed effects in survival analysis (e.g., stcox)?

Bo Zhao

Join Date: Apr 2016

Posts: 10
#1

Can you include time fixed effects in survival analysis (e.g., stcox)?

26 Apr 2017, 16:16

I have a balanced panel dataset (the time frequency is yearly). I turned it into a dataset for survival analysis. I noticed a previously published paper in my research area claimed that they included year fixed effects in the Cox Proportional Hazard model. I was skeptical because I thought year fixed effects are simply a nonlinear form of time and therefore the duration. I tried including year fixed effects in stcox anyway. This is what I got.

stcox X i.year, nohr vce(robust)

Wald chi2(1) = 32.09
Log pseudolikelihood = -1199.478 Prob > chi2 = 0.0000

(Std. Err. adjusted for 351 clusters in town2)
------------------------------------------------------------------------------
| Robust
_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
X | 2.228435 .3934043 5.66 0.000 1.457377 2.999494
|
year |
2011 | 2.995783 . . . . .
2012 | 4.893169 . . . . .
2013 | 3.70227 . . . . .
2014 | 3.991488 . . . . .
2015 | 4.319502 . . . . .
2016 | 4.085234 . . . . .
2017 | 5.017047 . . . . .
------------------------------------------------------------------------------

The coefficient on X was exactly the same as from a stcox without year fixed effects. No standard errors were estimated for these year fixed effects. Does it mean they were estimated as random effects? Can these year fixed effects be included in the first place?
Tags: None
zoe zhang

Join Date: May 2016

Posts: 1
#2

20 Nov 2017, 01:08

I am interested in this topic. Have you got the answer?
Comment
Agnieszka Nowinska

Join Date: Mar 2019

Posts: 8
#3

15 Oct 2019, 01:57

I am following as well...
I found this on the topic : http://people.stern.nyu.edu/wgreene/...xedeffects.pdf,
but, in the same time, I realize some papers do include the fixed effects in such specifications.
Comment
Leon Schmidt

Join Date: Apr 2018

Posts: 98
#4

15 Feb 2022, 09:33

This is an old discussion but I would also be interested if someone has an answer to it.

I am fitting a Cox model using panel data and the stcox command. I am wondering two things:

1) Is it possible to include year dummies? I don´t think so because it results in a "flat region", but it would be nice to know why it isn´t possible.
2) How does a panel Cox regression relate to including individual fixed effects? Are they included or not? If not, should I see Cox results more akin to simple OLS than FE estimates?

This is how I set my data up

Code:

stset year, id(firm_id) failure(firm_exit==1)

Thank you very much!
Comment
Paul Dickman

Join Date: Apr 2014

Posts: 294
#5

15 Feb 2022, 10:38

I work with medical statistics, and we use different terminology, so please take that into consideration.

First a comment on your data set up (because it's related to your question). Your setup assumes firms are "at risk of failure" from year zero until the value of the variable year. If the variable year contains calendar year then your code is probably not what you want. If year is calendar year, then you can solve this by including the option enter(start_year). If the variable year is "years from when the firm was incorporated until exit" or "years from when we started studying the firm" then all is good. Similarly, if your data consist of annual surveys and year is numbered 1,2,3,4 then all is good (although you need to be careful how the baseline is coded).

This gets to the concept of "what's the underlying timescale". In medicine there are a number of timescales to choose from (patient age, calendar time, time since diagnosis). You can think of the Cox model as adjusting for the chosen timescale, where the choice of timescale is defined in stset.

If your chosen timescale is calendar time (i.e., year represents calendar year) then including calendar year dummies is not advisable because you are effectively including calendar year in the model twice and introducing co-linearity. It looks like that's what OP did. One exception to this is if you wish to estimate interactions with the timescale (e.g., to relax the proportional hazards assumption). Then you can include year dummies in order to estimate the interaction effects.

If, however, year represents "time since the firm was under study" then you can include dummies for calendar year since they represent different timescales.
4 likes
Comment
Leon Schmidt

Join Date: Apr 2018

Posts: 98
#6

15 Feb 2022, 12:41

Thank you very much Paul Dickman for your detailed explanations! As I am teaching myself this material, this is very helpful!

Indeed, I have a panel of firms that I observe in every year and I want to estimate which factors drive their survival. So I want to estimate regressions like

Code:

stcox size_of_firm age quality_management ..., vce(cluster firm_id)

Year is indeed calendar years. I have yearly data between 1850 to 1890 and firms enter and fail throughout this period or survive until some unknown year. I understand that I should modify my stset command to the following (thank you very much for spotting this!):

Code:

stset year, id(firm_id) failure(firm_exit==1) enter(start_year)

However, this excludes firms that are active before 1850 but for which their foundation year is not observed. Is there any solution to this? Also, should I include the origin() and exit() options?

My biggest wonder is still how I should interpret the results. I am used to fixed effects models. Using a Cox model with a panel structure, are such fixed effects implicitly included? Or is it a stretch to say that the results can be plausibly interpreted as causal?

Thank you very much again!
Comment
Paul Dickman

Join Date: Apr 2014

Posts: 294
#7

16 Feb 2022, 01:00

However, this excludes firms that are active before 1850 but for which their foundation year is not observed. Is there any solution to this?

This is known as "left truncation". Google terms such as "left truncated survival analysis".

Note that Stata does not allow a subject to enter and die at the same time (i.e., during the same year). If you have a firm that enters in 1851 and exits in 1851 then it will be excluded. See https://www.stata.com/support/faqs/s...and-cox-model/ for an explanation and a solution.

Also, should I include the origin() and exit() options?

No, I don't believe so. origin() defines time zero. By default it is year zero, which is what you want when calendar year is the timescale. Have a look at the variables _t0 (start time) and _t (end time). You will find you have values such as _t0=1854 and _t=1960 for a firm that is active between those years.

exit() is used, for example, if you wish to force all firms to exit (i.e., censor the survival times) at a specified year.

My biggest wonder is still how I should interpret the results. I am used to fixed effects models. Using a Cox model with a panel structure, are such fixed effects implicitly included?

I assume you mean the fixed effects of year. The other covariates in your Cox model have the usual fixed effects interpretation. The timescale (year in your example) is special in the Cox model. The Cox model adjusts for the timescale (as a fixed effect), but does not estimate the effect of the timescale. If you are interested in estimating the effect of the timescale (year) then you can estimate it using postestimation commands or you can use a parametric model that does estimate the effect of the timescale.
1 like
Comment
Leon Schmidt

Join Date: Apr 2018

Posts: 98
#8

16 Feb 2022, 02:35

Thank you again very much Paul Dickman !

Regarding the origin() vs enter() options: Then I misunderstood the help file in Stata 16. The help file contains the following examples:

Code:

Subjects first become at risk at time 0 and come under observation at date of entry into the study recorded in variable doe . stset dox, id(id) failure(fail) enter(time doe) Subjects first become at risk and come under observation at date of birth recorded in variable dob . stset dox, id(id) failure(fail) origin(time dob)

I thought I should use origin() because firms enter in some year and then become at risk (being at risk before entry sounds not intuitive to me). But from your explanation it seems origin() does not work this way but specifies the time zero.

Regarding the fixed effects: I meant the firm-level fixed effects. Is the Cox model with a panel structure sort of including such effects or is there a way to do this?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#9

16 Feb 2022, 02:36

Leon:
as an aside to Paul's helpful replies, one of the issues you're interested in (fixed effects in -stcox-)is covered in
https://www.stata.com/bookstore/survival-analysis-stata-introduction, page 201.

Kind regards,
Carlo
(Stata 19.0)
Comment
Paul Dickman

Join Date: Apr 2014

Posts: 294
#10

16 Feb 2022, 05:56

Originally posted by Leon Schmidt View Post

Regarding the origin() vs enter() options: Then I misunderstood the help file in Stata 16. The help file contains the following examples:

Code:

Subjects first become at risk at time 0 and come under observation at date of entry into the study recorded in variable doe . stset dox, id(id) failure(fail) enter(time doe) Subjects first become at risk and come under observation at date of birth recorded in variable dob . stset dox, id(id) failure(fail) origin(time dob)

I thought I should use origin() because firms enter in some year and then become at risk (being at risk before entry sounds not intuitive to me). But from your explanation it seems origin() does not work this way but specifies the time zero.

I think the help file is misleading. In epidemiology and biostatistics, "at risk" and "under observation" are synonymous. I'd appreciate hearing from others who can explain why this sentence "Subjects first become at risk at time 0 and come under observation at date of entry" makes sense. I maintain that the simplest explanation for origin() is that it's the definition of time zero. Often, when teaching, I ask the class the following questions:

1. How old are you?
2. What's the time?
3. For how long have you lived at your current address?

The time origin for these timescales are (1) date of birth, (2) midnight (assuming we use 24h time), and (3) date you took residence in current address. For (1) and (3) individuals have potentially different origin times. There are plenty of other examples, but the main point is that choosing the the time scale is done by defining the origin.

Implicit in the two examples in the Stata help file is (1) if the origin is not specified then it is zero and (2) if enter() is not specified then subjects become at risk from time zero.

That is, the second example is

Code:

. stset dox, id(id) failure(fail) origin(time dob) enter(time dob)

A common application of this is when the analyst pre-calculates the survival time (survtime=exit-entry) but it's also applicable if subjects are followed-up from birth.

Regarding the fixed effects: I meant the firm-level fixed effects. Is the Cox model with a panel structure sort of including such effects or is there a way to do this?

I'd appreciate if others more familiar with econometrics and panel data could confirm or refute, but I believe this is just:

Code:

stcox i.firm

Adding to my previous comment on left truncation. Such data are common in medicine, such as when we enroll adults in a study and use attained age as the timescale. Patients are followed from the age at entry to age at exit and there are potential issues in that patients must survive until entry in order to be in the study. Usually we ignore the left truncation unless there are features of the design and research question that make it necessary to account for the left truncation in the analysis.
Comment
Leon Schmidt

Join Date: Apr 2018

Posts: 98
#11

16 Feb 2022, 07:11

Thanks again for the detailed explanation!

Regarding the fixed effects: I tried including a dummy for each firm but then Stata just slows down and does not respond anymore, it seems too much for Stata. In a twoway fixed effects setting such as with xtreg ... i.year, fe Stata demeans the data instead of actually including individual fixed effects. This is much quicker. I guess that this is not possible with stcox (?).

Regarding the enter() and origin() options: I did not know about them until you suggested including enter(), so thank you very much for that!

The help file contains the additional sentence:

Code:

Do not confuse enter() and origin(). origin() specifies when a subject first becomes at risk. In many datasets, becoming at risk and coming under observation are coincident. Then it is sufficient to specify origin().

My interpretation is that

Code:

stset year, id(firm_id) failure(firm_exit) origin(start_year)

implies that firms can have different start years in which they enter and then are at risk. I think this comes closer to my situtation.

In contrast,

Code:

stset year, id(firm_id) failure(firm_exit) enter(start_year)

means that all firms are implicitly at risk from year 0 but start existing and thus enter the study only later.

Regarding the truncation: My data is basically a census of firms and it was first collected in year t. So there are some firms which are active in year t and afterwards but for which I do not know the year (before t) when they were started. Right now I exclude such firms since I found no option in stset that would include them.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#12

16 Feb 2022, 13:56

Leon Schmidt (1) You can't include year indicator variables ('dummies') in your survival time regression mostly likely because calendar time (measured in years) and time at risk of experiencing the event are effectively collinear. (2) You can't include 'individual fixed effects' (meaning firm fixed effects) because you do not have repeated observations per firm. You have, I think, only one spell per firm. Models for single spell data can take "random effects" approaches to accounting for heterogeneity ('frailty') but estimates are identified in effect by functional form. (There are very few papers that have used "fixed effects" in the way you're talking about. Google Paul Allison's publication list for an example, I recall.) Meanwhile, I recommend some reading of a basic survival analysis text.
1 like
Comment
Leon Schmidt

Join Date: Apr 2018

Posts: 98
#13

18 Feb 2022, 02:08

Thank you very much Stephen Jenkins for the explanations and references! I do have multiple observations per firm. It is a panel data set where I follow firms over their life course and I have time varying variables assessing their performance. The firms either fail or survive the period of analysis. That is still single spell data though since firms can only fail, right?
Comment
Priver JM

Join Date: Feb 2019

Posts: 30
#14

20 Apr 2022, 12:42

Leon Schmidt I am wondering if you include the firm-level fixed effect in the analysis. According to the book of "Introduction to survival analysis (page 199, older version)", "i.hospital" is used and I assume you might also be able to do that.
Comment

Announcement

Can you include time fixed effects in survival analysis (e.g., stcox)?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment