I have country–year panel data and I am modeling the timing of a binary event (first occurrence at the country level). The dataset contains multiple observations per country.
My independent variables include measures such as resource exports, foreign reserves, and external balances.
Question 1: stset with multiple records
Is it appropriate to use stset with multiple observations per country in this context?
My setup is:
stset time_year if Is_It_country==1, id(countrycode_A) failure(event) enter(UN_Join)
The output indicates:
Question 2: Combining variables with different PH properties
I tested proportional hazards using Schoenfeld residuals. Some covariates (e.g., resource exports) satisfy the PH assumption, while others (e.g., foreign reserves) do not.
Test of proportional-hazards assumption
Time function: Analysis time
--------------------------------------------------------
| rho chi2 df Prob>chi2
-------------+------------------------------------------
L_c_fuel_e~o | -0.05301 0.17 1 0.6816
L_c_ex_bal~e | -0.02052 0.05 1 0.8159
L_c_for_re~p | -0.50975 29.76 1 0.0000
-------------+------------------------------------------
Global test | 32.89 3 0.0000
--------------------------------------------------------
Note: Robust variance–covariance matrix used.
Is it appropriate to include both types of variables in the same stcox model, or does violation by one covariate invalidate the model? If not, what is the recommended approach (e.g., tvc(), stratification)?
Question 3: Rare event modeling
The event is relatively rare. Would it be more appropriate to use a complementary log-log model (cloglog / xtcloglog) instead of Cox, or does the survival framework remain preferable in this setting?
Any guidance would be appreciated.
My independent variables include measures such as resource exports, foreign reserves, and external balances.
Question 1: stset with multiple records
Is it appropriate to use stset with multiple observations per country in this context?
My setup is:
stset time_year if Is_It_country==1, id(countrycode_A) failure(event) enter(UN_Join)
The output indicates:
- multiple observations per subject
- single-failure-per-subject structure
- some observations dropped due to late entry and failure timing
Question 2: Combining variables with different PH properties
I tested proportional hazards using Schoenfeld residuals. Some covariates (e.g., resource exports) satisfy the PH assumption, while others (e.g., foreign reserves) do not.
Test of proportional-hazards assumption
Time function: Analysis time
--------------------------------------------------------
| rho chi2 df Prob>chi2
-------------+------------------------------------------
L_c_fuel_e~o | -0.05301 0.17 1 0.6816
L_c_ex_bal~e | -0.02052 0.05 1 0.8159
L_c_for_re~p | -0.50975 29.76 1 0.0000
-------------+------------------------------------------
Global test | 32.89 3 0.0000
--------------------------------------------------------
Note: Robust variance–covariance matrix used.
Is it appropriate to include both types of variables in the same stcox model, or does violation by one covariate invalidate the model? If not, what is the recommended approach (e.g., tvc(), stratification)?
Question 3: Rare event modeling
The event is relatively rare. Would it be more appropriate to use a complementary log-log model (cloglog / xtcloglog) instead of Cox, or does the survival framework remain preferable in this setting?
Any guidance would be appreciated.

Comment