Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stcox is enough? Survival Analysis (EHA), Time-Varying Covariates, Discrete, Binary DV

    I am planning to make a survival analysis (EHA, Cox model) to predict the adoption of one particular policy and I do not know whether my data set is set up appropriately to handle both Time-Varying and Time-Invariant Covariates (TVCs and TICs). If I command “Stcox [varlist]” on a dataset that has both TVCs and TICs, I am afraid that Stata will be calculating the model as if all covariates were TICs, am I right?
    I have declared my data as Survival Time data, the time variable corresponds to the years 1996-2011, with “multiple record id variable” (corresponding to the 50 states) checked, and the failure variable corresponds to whether the state adopted or not the policy.

    For my data set I collected annual data (1996-2011) for independent variables (IV) for each one of the 50 US American States. Most of the IVs vary annually, and I guess this makes them Time-Varying Covariates (TVCs). My DV is binary (1=state adopted the policy, 0 otherwise; only one failure is possible)
    34 states did not adopt the policy in the time span and are, thus, right censored.
    The strategy that I followed to create this first data set (I had no previous experience with Stata nor EHA) was to make it as similar as possible to data sets that authors who investigated the adoption of other policies shared with me.
    Victor Cruz

  • #2
    I advise you to spell out abbreviations. I guess that "EHA" is "event history analysis" (but why should I have to guess? And others may not have a clue and be disinclined to read your post.)
    I am puzzled why you are using a continuous time approach (you refer to stset, and to the Cox model fitted by stcox) when you have apparently have interval-censored (or grouped or discrete) survival times -- a country may adopt at any time within a year, but you only know the year. Moreover, you appear to have left-censored survival time data -- you say you start observations for countries in 1996. But some countries may have become at risk of adoption (how do you define that?) in 1996, some in 1994, or 1960, or some other year. With left-censored data, estimation is problematic. It is common to either (i) drop left-censored obs, or (ii) assume there is no duration dependence (hazard of adoption doesn't vary with time at risk). On the face of it, you should concentrate on these fundamental issues first of all. For some free resources, see my Survival Analysis Using Stata website: http://www.iser.essex.ac.uk/survival-analysis

    Comment


    • #3
      Thanks a lot for the remarks and the link.
      EHA: Event History Analysis. DV/IV: Dependent/Independent Variables. TVC/TIC: Time-Varying/Invariant Covariates.

      I do have discrete survival times as you point out, given that a state may adopt at any time within a year, but I only know the year.
      So "stset" and "stcox" are for continuous models only? I am confused now as to what are the commands for discrete time models. I will have a closer look to your materials.
      Risk of adoption starts in 1996, when the first state adopted the policy for the first time. This is the convention for defining the beginning of risk in the literature.
      Hence, no left- only right truncation (6 additional states adopted after 2011, which is the final year of my study-period).
      Victor Cruz

      Comment


      • #4
        Risk of adoption starts in 1996, when the first state adopted the policy for the first time.
        : So, the state that adopted first is not included in your analysis (because the event has occurred)? All the other 49 states became at risk of adoption because this state adopted in 1996? All this is rather confusing to me at least, and I -- like many other Forum readers, I suspect -- will be unfamiliar with "the literature" [of which you have given no detailed references].
        "stset" and "stcox" are for continuous models only?
        Yes!

        Comment


        • #5
          Thanks again for the remarks.

          The failure event is adoption of one policy. One year after the first adoption of the policy by one state, the remaining 49 are at risk of adopting the policy, e.g., 49 states in 1997. Only one observation for the state that adopted the policy for the first time (1996) is included, i.e., the first event occurred in 1996.
          The hazard rate of policy adoption is hypothesized to vary according to time-varying and time-invariant covariates.

          Berry and Berry implemented Event History Analysis to analize policy adoption/diffusion, though with a logit model. Box-Steffensmeier and Jones emphasized the appropriateness of a cox model for this type of research. Ever since 1990, dozens of investigations have used EHA to predict policy adoption.

          I collected my data in a spreadsheet trying to resemble those data sets that I obtained from Berry and Berry and Doyle, the latter used a cox model.
          Honestly, I thought that the next step would be to "stset" my data and "stcox" it. But I am afraid that "stcox [varlist]" is not the last step, especially if the value of some covariates change annualy. This is my first quantitative research, by the way.

          Berry and Berry (1990). State Lottery Adoptions as Policy Innovations: An Event History Analysis
          Box-Steffensmeier and Jones (2007). Event History Modeling. A Guide for Social Scientist
          Doyle (2006). Adoption of Merit-Based Student Grant Programs: An Event History Analysis
          Victor Cruz

          Comment


          • #6
            Just in case someone else also reviews this post, the point is to use discrete-time survival analysis which is described in detail on Proj Jenkins link and in many other publications. It's a lot easier than one might think. The data usually needs to be put in country-year form for the example by the OP. Then it's straightforward to do a logistic regression to estimate the discrete time hazard. Merging and estimating with time-series data is trivial in this case. Cox can be done as well with a complementary log-log link using discrete-time analysis, also described in lectures available online.

            Comment


            • #7
              I second Richard's suggestions (but I would, wouldn't I?!) For the record, I would make a distinction between "the Cox model" and other proportional hazard models. To me, the Cox model is (a) applied to continuous time data, (b) has the proportional hazards property, and (c) one can't identify (in the statistical sense) the baseline hazard function -- where, remember, this is a function defined over continuous time. You can fit a discrete time (interval censored) proportional hazards model easily, using data reorganisation combined with a cloglog regression, as shown at my pages. The coefficients on the covariates (the covariates that not used to summarise duration dependence) are exactly the same corresponding parameters as in the underlying continuous model. The duration dependence parameters in the cloglog model refer, however, to duration dependence in the interval (discrete) hazard (not the continuous time one). You can only get estimates of duration dependence in continuous time when fitting a discrete time model if you make additional assumptions about how hazards within intervals (unobserved!), e.g. that it's Weibull. intcens on SSC provides a way to estimate the latter type of models for the case when you do not have any time-varying covariates.

              Comment


              • #8
                Thank you two for the comments.
                Prof. Jenkins' materials have been very useful, indeed.
                Richard, what do you mean by "the example by the OP?"
                I did as follows, for I was told that this way Stata would handle both the variables whose values changed annualy and those that remained static, using the Breslow method for ties. I would appreciate thoughts on this.

                stset duration, id(state) failure(adopt==1)
                stcox [varlist], cluster(state)

                The reasons for this are that an author (Crowley) kindly shared with me her data of a peer-reviewed article and discussed by mail at greater detail how she worked with it in Stata. I succesfully replicated her results follwing her instructions. Our datasets were organized similarly: state-year observations, covariates measured at discrete-time points (anually).
                Generally speaking, my dataset has 4 ties, each consisting of maximum three states failing in the same year. According to Allison (2014), if the number of ties at any time point is not higher that 15% than the risk set at the corresponding time, then the Breslow method is appropriate, which is an approximation of the marginnal likelihood estimation, hence I proceeded accordingly.

                The content of the variable duration is the same as Stata's created "_t". I conducted an analysis by stsetting my variable "year" instead, and the results were identical, except for the "total analysis time at risk and under observation" (increased to 100423, instead of 673). I will keep using "duration", as Crowley did.


                Allison, Paul 2014, "Event History and Survival Analysis"
                Crowley, Jocelyn Elise (2006) "Moving Beyond Tokenism. Equal Rights Amendment Ratification of the Equal Rights Amendment and the Election of Women to State Legislatures"
                Victor Cruz

                Comment


                • #9
                  OP: the original poster of the thread.

                  I believe, but stand to be helpfully corrected, that while you can use stcox with time-varying regressors, the assumption is that the effect is the same over time for the same values. I don't know your use case well enough to comment further, however.

                  Comment

                  Working...
                  X