Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Automating an stcox command in a loop and outputting the result to another file.

    I am fairly new to Stata.
    I have survival data for 60 patients. I have asked a large number of physicians to evaluate each of the patients and say whether they think the patient has a killer disease - so, a binary evaluation of 1 = yes, has the disease, 0 = no, does not have the disease. Each doctor is labeled var1.....varN). Here is what the data looks like for the first 10 patients and for 3 doctors (var1-var3):

    Code:
    input byte(patient dead var1 var2 var3)
    1 0 0 0 0
    2 0 0 0 0
    3 0 1 1 1
    4 1 0 0 0
    5 1 0 1 1
    6 0 0 0 0
    7 1 1 1 1
    8 1 0 0 0
    9 1 0 0 0
    10 1 0 0 1
    end
    I can easily perform a stcox var1 to see the prognostic significance of the doctor's ( in this case, labeled var1) diagnosis. The issue is, that I may have up to 1000 doctors which makes performing a stcox for each very cumbersome. Ideally I could write the results to another file. This what I have tried - posting the result to another file called test.dta.

    Code:
    tempname memhold
    postfile `memhold' survival using "test.dta" , replace
    foreach var in varlist var1-var10 {
    post `memhold' (`stcox(var)')
    }
    postclose `memhold'
    I can see at least one reason why this doesn't work - I think that the line

    Code:
    post ` memhold ' (` stcox (var)')
    Isn't specifying one value but instead a set of results. The ideal situation would be to "post" the HR, p value and CI to another file. That file would look like this:

    Code:
    doctor  hr       p       CIlwr       CIupper
    "var1"  4.341    .003   1.672851 11.26876
    "var2"  2.595   .031  1.092362 6.165455
    "var3"  6.160    0      2.430396 15.61521
    I am sure there is an easy way to do this. May I ask for some assistance?

  • #2
    Well,there are two issues here. The first is that I don't understand how you are going to use -stcox-. The data you show has an indicator for who is dead and who is alive, but there is no variable showing the duration of survival, so I don't see how you can specify the time variable in your -stset- command. For now I'll just assume that the example you show for some reason omits that variable and that you have a variable, called survival_time, that will serve the purpose.

    The second matter is that you don't need to do anything nearly this complicated. The -statsby- command automates this entire process for you if you put your data in long layout. This follows the general rule, so often preached on this Forum, that almost everything is easier in Stata when the data are in long layout.

    Code:
    reshape long var, i(patient) j(doctor_num)
    stset survival_time, failure(dead = 1)
    statsby b =_b[1.var] _se = _se[1.var], by(doctor_num) saving(results, replace): stcox i.var
    At the end of that you will have the Cox regression coefficients and their standard errors in the file results. From there you can get what you want with:

    Code:
    gen hr = exp(b)
    local z95 = -invnorm(0.025)
    gen ci_lower = exp(b - `z95'*se)
    gen ci_upper = exp(b + `z95'*se)
    gen p = 2*normal(-abs(b/se))

    Comment


    • #3
      Professor Schechter,
      This is very promising and I very much appreciate the time you have spent. I must ask one thing - I am going through the statsby literature for Stata and looking through your code. I keep getting the following error "expressions must be bound in parentheses" when I run line 3. I do apologise for asking (I have tried various permutations of parentheses around the expressions to no avail). Are there parentheses missing in this? I fully appreciate this question is probably bordering on not being "appropriate". You are right about survival time and my failure variable - I left them out thinking it would make the data look clearer.

      Comment


      • #4
        Your question is not inappropriate at all, not even nearly so!

        I don't know why you are getting that error message. No parentheses are needed in that -statsby- command. I suggest you post the exact code you ran for line 3 (since your data was a simplification, I'm guessing you had to adapt my code to your actual situation as well) and show the complete output you got from Stata. The "expressions must be bound in parentheses" error meesage is something commonly seen with the -post- command. Perhaps I did not make clear that the code I showed is instead of your loop and your -postfile- code. If you are using -statsby- you should not have any loop, nor should you be using the -post- command: -statsby- takes care of all of that for you automatically.

        Comment


        • #5
          This is most kind of you. So I ran the reshape command and then sts and got my data in long form as suggested. It now looks like this (futime = follow up time). I am obviously just diplaying the first 4 rows (out of 16500)


          Code:
          patient    doctor_num    dead    futime    _st    _d    _t    _t0    var
          1             1             0    4.692    1    0    4.6919999    0    0
          1             2             0    4.692    1    0    4.6919999    0    0
          1             3             0    4.692    1    0    4.6919999    0    0
          1             4             0    4.692    1    0    4.6919999    0    0
          I immediately get stuck on line 3. (no loops)

          Code:
          statsby b =_b[1.var] _se = _se[1.var], by(doctor_num) saving(results, replace): stcox i.var
          expressions must be bound in parentheses
          r(198);
          As an aside (while trying to get to the bottom of this) - If I leave out the second variable and do
          Code:
          statsby b =_b[1.var], by(doctor_num) saving(results, replace): stcox i.var
          I get
          Code:
          (running stcox on estimation sample)
          
                command:  stcox i.var
                      b:  _b[1.var]
                     by:  doctor_num
          
          Statsby groups
          ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
          ..................................................    50
          ..................................................   100
          ..................................................   150
          ..................................................   200
          ..................................................   250
          .........................
          But if I run the first line of your next code block
          Code:
          gen hr = exp(b)
          I get
          Code:
          b not found
          Many thanks for any input you can offer.

          Simon
          Last edited by Simon Walsh; 30 Oct 2016, 14:38.

          Comment


          • #6
            Second issue first. My apologies, here. You need to -use results, clear- before you go on to that second block of code so as to bring the results into memory.

            As for the first, I am puzzled. This syntax works for me (it doesn't matter that the example is with -regress- instead of -stcox-:
            Code:
            . clear*
            
            . sysuse auto
            (1978 Automobile Data)
            
            . statsby b = _b[mpg] se = _se[mpg], saving(results, replace) by(rep78): ///
            >         regress price mpg headroom
            (running regress on estimation sample)
            
                  command:  regress price mpg headroom
                        b:  _b[mpg]
                       se:  _se[mpg]
                       by:  rep78
            
            Statsby groups
            ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
            .....
            
            .
            . use results, clear
            (statsby: regress)
            CORRECTION: In my original code in #2, I wrote
            Code:
            _se = _se[1.var]
            I meant to type
            Code:
            se = _se[1.var] // NO INITIAL UNDERSCORE ON THE LEFT SIDE
            This turns out to be the source of the problem you are having. I think this is some kind of bug in -statsby-. I think that -statsby- is misunderstanding the first occurrence of _se in the original code as a reference to a non-existent local macro se, and that is gumming up the works here. But underscores are legal as initial characters in variable names, so -statsby- should have been able to handle this.

            Be that as it may, just eliminate that first underscore and you should be fine. (Don't eliminate the underscore on the right hand side, though.)
            Last edited by Clyde Schechter; 30 Oct 2016, 15:23.

            Comment


            • #7
              Professor,
              That worked! Yes I am using Stata 13 and yes, I should have mentioned that. My apologies. Your use of parentheses worked....I am extremely grateful for this.

              Comment


              • #8
                Professor Schechter,
                Many thanks for your help with this problem as outlined above. May I ask, is it possible to perform an

                estat concordance

                to calculate Harrell's C index for each of these analyses?

                Again, many thanks for any advice you may have.

                Comment


                • #9
                  -statsby- will only accommodate a single command. To do this you have to write a short program that wraps the Cox regression and the Harrell's C index command and returns things. So something like this:

                  Code:
                  capture program drop cox_and_harrell
                  program define cox_and_harrell, rclass
                      stcox whatever
                      estat concordance
                      return scalar harrell_c = r(C)
                      exit
                  end
                  
                  use my_data_set, clear
                  tempfile results
                  statsby _b r(harrell_c), by(group_var) saving(`results')
                  
                  // perhaps other stuff
                  
                  use `results', clear
                  Last edited by Clyde Schechter; 19 Dec 2016, 14:31. Reason: Somehow my post spontaneously posted itself before I was done typing it. Edited just to finish it.

                  Comment


                  • #10
                    Professor,
                    Forgive my novice questions.
                    Could you elaborate this code a little bit more? The program is run as a do-file (but is ALL of this code run as a do-file?). Where you have stcox whatever, how does the variable for each analysis get inserted into this stcox. Also, in the statsby command, where is the custom code be run in this? Again, my knowledge of running single stats commands in Stata is ok, but I am completely new to the programming of it.
                    Many thanks and seasons greetings.

                    Comment


                    • #11
                      The "code" shown in #9 is pseudocode; it is an outline of how you might go about writing the code. Up to this point I have not understood that there are different variables in differenet analyses. I have understood that we are doing the same -stcox- command, but each time on a different subset of the data, defined by levels of a grouping variable. So whatever was just a shorthand for saying "whatever variables you are using in your -stcox- command." So it gets put in when you write it there: the whatever is not used literally. (When I use italics within code that is intended as: substitute the variables or other relevant terms relating to your project here.) Now, if your problem actually requires doing this for different cox models, then additional code is required to handle that.

                      As for the -statsby- command at the bottom, that was sloppy on my part. Apologies. I wrote the -statsby- prefix part and left out the command. It should read:

                      Code:
                      statsby _b r(harrell_c), by(group_var) saving(`results'): cox_and_harrell
                      The entire thing is intended to be run as a single do-file.

                      So if you change that -statsby- line as noted just above, and if you substitue the actual variables in your Cox regression for "whatever" in the code (and also include any options that are relevant to what you are doing) you can convert this pseudocode into an actual do-file you can run. If it is necessary to do this for different Cox models, then the entire problem is more complicated than I had understood it to be and you will need to explain it in more detail in order for me to point you in a different direction.

                      Comment

                      Working...
                      X