Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing values in Cox regression

    Hi everybody,

    I would need your help, if is it possible. I have stset the data for survival analysis and i have mulitple records of variables for one subject in time. When I do Kaplan-Meier estimate, it shows the real number of failures in dataset. The problem is, that I have lot of missing values for variables for the time when the default has occurred. So when I do Cox regression, it shows number of failures in datatest only for the subjects which have values of variables in time when the failure have occured. That causes lot of problems in estimation and postestimatin procedures.

    In the picutre you can see how my data are declared. Event=9 means failure, and you can see, that values of variables are not known.

    I would appreciate any advice.

    Thanks in advance

  • #2
    Vijtech:
    your problem is probably due to the listwise deletion that Stata applies whenever an observation has missing values in any of the existing variables.
    That said, you may want to impute the missing values via -mi-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      As an aside, please do not post screen shots. Often they are unreadable. Yours happens to be readable, at least on my setup, but it blocks the entire screen when I look at it, and if I wanted to try to copy/paste into Stata or another application, that would be impossible. Please post code, data, or Stata output in a code block. See the FAQ for how to do that. Thanks.

      Comment


      • #4
        Ok, I will do that next time. Yes, I could impute the missing values, but I would rather need to assign the latest known value of variable to the time of failure.

        Thanks

        Vojtech

        Comment


        • #5
          I think that I can see a solution but I need to know more to be sure.

          The same FAQ that Clyde quoted also asks that you show all pertinent statements and output from those statements. So 1) show us the stset and the stcox statements and the output from them.. 1) reshow a listing for one person showing all the person's records with the ID variable, , the variables mentioned in the stset statement, and the "missing" covariates.. ).

          Then explain the study design and what every variable is. It looks like you have a lot of year-end dates, for example.
          Steve Samuels
          Statistical Consulting
          [email protected]

          Stata 14.2

          Comment


          • #6
            Well, here you can see the stset and stcox information for one variable univariate.smcl. Also are here information for one company. I am trying to find out the relationship between the time of foundation and time of bankruptcy of company. And I have information about financial statement in the end of years from 2005 to 2013. So for example t_a means total assets. Overall I have about 25 variables in my dataset. But especially for failured companies many values of variables are not knwon.

            Thanks, Vojtech

            Comment


            • #7
              Clyde and I did not make clear that code and results should go between CODE delimiters, described in FAQ 12. Please do it so that others can easily follow the discussion.
              Steve Samuels
              Statistical Consulting
              [email protected]

              Stata 14.2

              Comment


              • #8
                I am not really sure what you mean, but I am trying this one. Does anyone know how to solve the problem?

                Thanks

                Comment


                • #9
                  Well, there is no way to force stcox to include observations where a model variable has a missing value. The ideal solution, often impractical or impossible in reality, is to obtain the actual values for the missing observations. When that can't be done, you either settle for what you have, or you impute missing values. You indicated in #4 that if you have to impute, you want to do it by carrying forward the last observation. I think some people would question whether that is really an appropriate approach to imputation for this kind of data. But that's up to you. The way to carry forward the last observation is:

                  Code:
                  by id (date), sort: replace t_a = t_a[_n-1] if missing(t_a)
                  With regard to what was meant in #7, we had hoped that you would copy the contents of your log file univariate.smcl, and paste them into the code block.

                  Comment


                  • #10
                    If your analysis is intended for publication, then you simply don't have enough information. If you are doing the analysis for a class project or for a thesis that won't be published, then use Clyde's suggestion to move forward (with your advisor's approval! ). You also should devote a section to this problem, and the possible biases and consequences of your solution. Try some extrapolation techniques and a sensitivity analysis and next time get better data.


                    You can do a perfectly valid predictive analysis if you use "missing financial statement" as a predictive time-varying predictor, along with baseline covariates. Obviously "missingness" doesn't cause failure, but it is a sign of itrouble. You can also do a valid analysis with just baseline covariates and ,perhaps, those known for, say, the first two years, if available for almost all companies.

                    Good luck!
                    Steve Samuels
                    Statistical Consulting
                    [email protected]

                    Stata 14.2

                    Comment


                    • #11
                      As an aside to previous excellent insights, there's an interesting website on dealing with missing values at:
                      http://missingdata.lshtm.ac.uk/index...isms&Itemid=96
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment

                      Working...
                      X