Problems in declaring survival-time data in STATA 14

chiara piccardo

Join Date: Mar 2015

Posts: 15
#1

Problems in declaring survival-time data in STATA 14

28 Jun 2016, 10:50

Dear all,
I would like to apply survival analysis techniques to my dataset but I have some problems in declaring survival-time data in STATA 14.
I have an unbalanced panel containing annual financial data for firms and a variable (exit) equal to 1 in the year of firm's exit from the market and 0 in previous years.
The majority of firms in my data do not exit from the market and for some firms I have missing values for the variable "exit" in all years.

I tried to run the following command:
stset year, id(ID) failure(exit=1)

I have some doubts about this command and the four generated variables:
1) _st (=1 if record is to be used; 0 otherwise). For all observations "_st" is equal to 1 (I had 0 exclusions). How is it possible if my "exit" variable is missing for some firms? How STATA can use these observations?
2) _d (= 1 if failure; 0 if censored). The variable "_d "is equal to 1 when firms exit (thus when "exit" is equal to 1) and 0 otherwise. I cannot understand why it is not missing for firms with the "exit" variable missing. STATA consider firms without information on their exit as censored observations?
3) _t (analysis time when record ends). This variable is equal to "year", as I expected.
4) _t0 (analysis time when record begins). This variable is equal to "year"-1 whit the exception for the first year for each firm. "_t0" is equal to 0 for each firm in the first year in which firm appears in the dataset. I do not know if it is correct or if it could generate errors in the analysis.

Thanks a lot in advance for your help.
Best wishes, Chiara
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30058
#2

28 Jun 2016, 12:02

What you're doing is probably incorrect. The usual panel data set up provides a series of "snapshots" in time for each panel. This is not the correct layout for multiple-record survival data. You have to first change the data to the -span- layout using the -snapspan- command. Then you can -stset- the data as multiple record survival data in the usual way.
Comment
chiara piccardo

Join Date: Mar 2015

Posts: 15
#3

29 Jun 2016, 08:08

Thank you Clyde.

I run the following commands (as you suggested):

snapspan ID year exit lntot_revenue, generate(year0) replace
stset year, id(ID) failure(exit=1)

by applying "snapspan" command STATA gave me the following message:
"note: 55699 obs. (4.3%) have only a single record; they will be ignored". I think it is correct because in my data I have 55699 firms which survive only one year.

Is it , in your opinion, correct to consider "lntot_revenue" (log of annual total revenue) as instantaneous variable? By considering this variable as an instantaneous variable, it obtains its value from the current recorded snapshot. Contrarily, if I consider this variable as an enduring variable, it obtains its value from the previous recorded snapshot (and missing in the first snapshot).

by applying "stset" command STATA returned:
id: ID
failure event: exit == 1
obs. time interval: (year[_n-1], year]
exit on or before: failure

------------------------------------------------------------------------------
10209464 total observations
0 exclusions
------------------------------------------------------------------------------
10209464 observations remaining, representing
1291548 subjects
92852 failures in single-failure-per-subject data
2.6008e+09 total analysis time at risk and under observation
at risk from t = 0
earliest observed entry t = 0
last observed exit t = 2014

the four variables (_st, _d, _t, _t0) generated by "stset" command are as I defined in the previous post.

I report an example of the data for three firms included i my dataset:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str13 ID int year float(exit lntot_revenue) byte(_st _d) int(_t _t0) "1" 2004 0 . 1 0 2004 0 "1" 2005 0 4.6821313 1 0 2005 2004 "1" 2006 0 4.564348 1 0 2006 2005 "1" 2007 0 4.875197 1 0 2007 2006 "1" 2008 0 4.6151204 1 0 2008 2007 "1" 2009 0 4.5325994 1 0 2009 2008 "1" 2010 0 3.8286414 1 0 2010 2009 "1" 2011 0 4.189655 1 0 2011 2010 "1" 2012 0 4.248495 1 0 2012 2011 "1" 2013 0 4.204693 1 0 2013 2012 "1" 2014 0 3.637586 1 0 2014 2013 "2" 2004 . . 1 0 2004 0 "2" 2005 . 7.496098 1 0 2005 2004 "2" 2006 . 7.619724 1 0 2006 2005 "2" 2007 . 7.511525 1 0 2007 2006 "2" 2008 . 7.892452 1 0 2008 2007 "2" 2009 . 7.294377 1 0 2009 2008 "2" 2010 . 7.410952 1 0 2010 2009 "2" 2011 . 7.867488 1 0 2011 2010 "2" 2012 . 7.707962 1 0 2012 2011 "2" 2013 . 7.672758 1 0 2013 2012 "2" 2014 . . 1 0 2014 2013 "3" 2004 0 . 1 0 2004 0 "3" 2005 0 9.466145 1 0 2005 2004 "3" 2006 0 9.558105 1 0 2006 2005 "3" 2007 0 9.695294 1 0 2007 2006 "3" 2008 0 9.52843 1 0 2008 2007 "3" 2009 0 9.451403 1 0 2009 2008 "3" 2010 0 9.245804 1 0 2010 2009 "3" 2011 0 9.049702 1 0 2011 2010 "3" 2012 1 . 1 1 2012 2011 end

It is exactly what happened before, when I used only the "stset" command, with the only exception for the variable "year0" (created by the "snapspan" command). Thus what I'm doing is still incorrect and I do not know how to proceed.

Thanks a lot in advance for your suggestions
Chiara.

Last edited by chiara piccardo; 29 Jun 2016, 08:18.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30058
#4

29 Jun 2016, 09:41

Is it , in your opinion, correct to consider "lntot_revenue" (log of annual total revenue) as instantaneous variable? By considering this variable as an instantaneous variable, it obtains its value from the current recorded snapshot. Contrarily, if I consider this variable as an enduring variable, it obtains its value from the previous recorded snapshot (and missing in the first snapshot).

That is a substantive question in your discipline. I have no expertise in your area and can't comment on this. I suggest you ask a colleague in your field.

As for the rest of your concerns, I think the problem is in your -stset- command. Because you specify year as the time variable, but you don't give an -origin()- option, you are telling Stata that your various firms go 2000+ years before they exit. So I think you just need to specify an -origin()- variable.The variable you show above as t0 would work, provided you change the 0 values to missing.
Comment
chiara piccardo

Join Date: Mar 2015

Posts: 15
#5

01 Jul 2016, 07:51

Thank you for your response!
I specified the year of firm birth (year of incorporation) as origin; thus I declared that the subject (firm) becomes at risk in its incorporation year. Here are the commands I used:

snapspan ID year exit incorporation_year solv_ratio lntot_revenue lnLP lnLEV age lnAGE lnworkers, generate(year0) replace
stset year, id(ID) failure(exit=1) origin(incorporation_year)

However I have still some doubts:
1) Our study covers the period 2004-2014 and data (revenues, employees, etc) are collected only during these years. However we know the year of birth for each firm in the sample, even if the birth is before 2004. Indeed, some firms entered the market before 2004 (birth year<2004) and some others during the sample period, that is, in 2004 or after.
I would like to know if I can include in the study firms born before the 2004 (thus outside the period in which data are collected 2004-2014). Do you think I have to specify both the origin (when a subject becomes at risk= incorporation year) and entry in the study (when subject first enters study=2004), in order to include these subjects ( firms) in the analysis?
Contrarily, for firms with incorporation year >=2004, origin and entry in the study should coincide. Are you agree with me?

2) the _st variable, created by the stset command is equal to 0 in the first year for firms with incorporation year>=2004 (we observe firms between 2004-2014) . Thus STATA excludes firms in their incorporation year if they start their activity during our study (2004-2014).Is it correct?

Chiara
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30058
#6

01 Jul 2016, 07:59

Yes, to both questions.
Comment

Announcement

Problems in declaring survival-time data in STATA 14

Comment

Comment

Comment

Comment

Comment