Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cox model with panel data: stset setup and PH violations

    I have country–year panel data and I am modeling the timing of a binary event (first occurrence at the country level). The dataset contains multiple observations per country.

    My independent variables include measures such as resource exports, foreign reserves, and external balances.

    Question 1: stset with multiple records

    Is it appropriate to use stset with multiple observations per country in this context?

    My setup is:
    stset time_year if Is_It_country==1, id(countrycode_A) failure(event) enter(UN_Join)

    The output indicates:
    • multiple observations per subject
    • single-failure-per-subject structure
    • some observations dropped due to late entry and failure timing
    I want to confirm whether this setup is correct for modeling time-to-first-event using country-year panel data.

    Question 2: Combining variables with different PH properties

    I tested proportional hazards using Schoenfeld residuals. Some covariates (e.g., resource exports) satisfy the PH assumption, while others (e.g., foreign reserves) do not.
    Test of proportional-hazards assumption

    Time function: Analysis time
    --------------------------------------------------------
    | rho chi2 df Prob>chi2
    -------------+------------------------------------------
    L_c_fuel_e~o | -0.05301 0.17 1 0.6816
    L_c_ex_bal~e | -0.02052 0.05 1 0.8159
    L_c_for_re~p | -0.50975 29.76 1 0.0000
    -------------+------------------------------------------
    Global test | 32.89 3 0.0000
    --------------------------------------------------------
    Note: Robust variance–covariance matrix used.


    Is it appropriate to include both types of variables in the same stcox model, or does violation by one covariate invalidate the model? If not, what is the recommended approach (e.g., tvc(), stratification)?

    Question 3: Rare event modeling

    The event is relatively rare. Would it be more appropriate to use a complementary log-log model (cloglog / xtcloglog) instead of Cox, or does the survival framework remain preferable in this setting?

    Any guidance would be appreciated.

  • #2
    Re: Question 1
    Code:
    stset time_year if Is_It_country==1, id(countrycode_A) failure(event) enter(UN_Join)
    Mostly, perhaps entirely, right. The only question I raise is the use of the -enter()- observation. It is important to distinguish -enter()- from -origin()-. And I cannot tell from your description which of these is appropriate here. -enter()- is used to identify a date when the subject (country) enters the study and comes under observation. -origin()- is used to identify date when the subject first becomes at risk for the failure event. These may be different. If the event is only a possibility for countries that have joined the UN, and all UN member countries become at risk for the event upon joining, then UN_Join would be both the -enter()- and -origin()- for your data. But if the event could happen to a non UN-member country (but the event would not be recognized in your data because you only enter countries into the study), then UN_Join would be the -enter()- variable, and the -origin()- option would be some other date defining their first risk for the event (if you have that information).

    Re: Question 2
    The proportional hazards assumption should be tested on the variables exactly as they are used in the model. If you plan to use two variables, X, and Y, in the model, then you should run that model and then do the PH test. If the global PH test turns out OK, then you can use that model, even if one of the individual variables' tests indicates a deviation. BUT, you can not then run tests on just that one variable: all your model tests must involve all of the variables.

    Re: Question 3
    I wouldn't use a -cloglog- model. Being fully parametric, it is less flexible than a Cox model, and the adjustment it makes to account for different time at risk among different countries is not as useful as a true survival analysis. And rare events are a problem in any model: your results can only reflect the amount of information in the data. If your data set isn't large enough to give you an adequate number of events, your analysis is going to lead to indeterminate results regardless. -cloglog- is often better at handling rare outcomes than ordinary -logit- or -probit-, but I wouldn't say it offers any advantage over a bona fide survival analysis.

    Comment


    • #3
      Thank you, Clyde, for the clarification. it helps a lot.

      Comment

      Working...
      X