First time posting, please let me know if I'm breaking any rules or if more information is required.
I am using stcox in Stata/BE 17.0 to run a Cox Proportional-Hazards model on some data. The dataset is a panel of mid- to large-size U.S. cities from 1977 to 2023, with 7,821 subjects, 46 time periods, and 127 failures.
Following a reviewer comment, I changed the specification of my model to log-transform two of my control variables: population and the number of program units ("vouchers"). This, unexpectedly, drastically changed the hazard ratios for my independent variable of interest. I am looking for any insights on reasons why this might have occurred, and if there is any way to determine whether the log-transformation is correct or not. I recognize that it may also be the case that the model results being so sensitive to such an innocuous specification may just be an indication of spurious results.
Relevant code below:
The first stcox specification returns the following:

The second returns:

As you can see, the hazard ratios on the levels for "LPHA_principal" drop substantially and are no longer statistically different.
Any insight into why this occurred and how to determine the correct specification would be greatly appreciated. I would be happy to supply additional data, code, etc.
I am using stcox in Stata/BE 17.0 to run a Cox Proportional-Hazards model on some data. The dataset is a panel of mid- to large-size U.S. cities from 1977 to 2023, with 7,821 subjects, 46 time periods, and 127 failures.
Following a reviewer comment, I changed the specification of my model to log-transform two of my control variables: population and the number of program units ("vouchers"). This, unexpectedly, drastically changed the hazard ratios for my independent variable of interest. I am looking for any insights on reasons why this might have occurred, and if there is any way to determine whether the log-transformation is correct or not. I recognize that it may also be the case that the model results being so sensitive to such an innocuous specification may just be an indication of spurious results.
Relevant code below:
gen log_population = ln(total_pop_i + 1)
gen log_vouchers = ln(total_units_i + 1)
stset year, origin(time 1977) enter(time 1977) id(jurisdiction) failure(soi_enacted == 1)
stcox i.LPHA_principal black_diff other_diff total_pop_i total_units_i vacancy partisan_lean_i i.prior_state_law i.prior_state_preemption i.prior_county_law cumlocallaws_unitspct i.jurisdictiontype
stcox i.LPHA_principal black_diff other_diff log_population log_vouchers vacancy partisan_lean_i i.prior_state_law i.prior_state_preemption i.prior_county_law cumlocallaws_unitspct i.jurisdictiontype
The first stcox specification returns the following:
The second returns:
As you can see, the hazard ratios on the levels for "LPHA_principal" drop substantially and are no longer statistically different.
Any insight into why this occurred and how to determine the correct specification would be greatly appreciated. I would be happy to supply additional data, code, etc.
Comment