Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Test Graph Export


    Except for the fact that both functions increase, cumulative hazard estimate is nothing like the estimate\( \widehat{F} = 1- S\). In fact the cumulative hazard estimate can exceed 1.0.

    Below I plot the estimated failure function and two different estimates of the cumulative hazard function. The first is the one Stata generates. It is the Nelson-Aalen estimate, shown on page 300 of the Manual Entry for sts. The N-A estimate the finite sample version of the definition:
    \[
    \Lambda_1(t) = \int_0^t \lambda(t)dt
    \]
    A second estimate can be based on the mathematical relationship of the cumulative hazard function to the Survival curve
    \[
    \Lambda_2(t)= -\textrm{log}(1-S(t))
    \]
    where the Kaplan-Meier estimate \(\widehat{S}\) is substituted for \(S\). The graph below shows that the two estimates are very close.

    Code:
    webuse catheter, clear
    stset time infect
    sts gen  cumhaz1 = na  km = s
    label var cumhaz1  "Cum Haz:Nelson-Aalen"
    gen cumhaz2 = -log(km)
    label var cumhaz2  "Cum Haz:-log(s)"
    gen cumfail = 1 - km
    plot cumhaz1 cumhaz2 cumfail _t
    sort _t
    label var cumfail "Cumulative Failure Probability"
    #delim;
    twoway connect cumhaz1 cumhaz2 cumfail _t,
      c(stairstep stairstep)
      title("KM Failure & Two Cum Hazard Estimates")
      saving(g01, replace);
    graph use g01
    graph export graph.png
    Click image for larger version

Name:	graph.png
Views:	1
Size:	57.6 KB
ID:	1333873

    Last edited by Steve Samuels; 03 Apr 2016, 10:42.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

  • #2
    You've asked two versions of this question, the other being here, but the details change in each. In this post the population size is 100,000; in the other it is 10,000 and 20,000. I don't believe that any of these numbers is the real one. In future posts, please do not use fake numbers; they just confuse the issue.

    In this post, you say that you collect all cases of deaths in 10 clusters, but you don't say that they are same ones that were selected with PPS; I'll assume that they were.

    Here is how to compute the weights:

    1. Let \(N\) be the population size and \(N_j\) be the size the j-th selected cluster. Then the probability of selection the j-th cluster is \(f1 = N_j/N\).
    2. In each sampled cluster, the probability that (the record of) a dead person is selected for study is \(f2 = 1\). The probability of selecting (the record of?) a living person for study is \(f2 = 300/N_j\).

    The overall probability of selecting a person is f (f = f1 \times f2\). For a live person \(f = (N_j/N)\times 300/N_j = 300/N\); for a dead person the probability of selection is \(f = N_j/N \times 1= N_j/N\). The design weight is \(W = 1/f).
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3

      My data ( recurrent exacerbation chronic obstructive pulmonary disease) attachment, Which is frailty model (shared, joint,.......) please help me how can get a stata code?
      thanks..

      time :COPD exacerbation recurrent time
      status :failure/

      Comment


      • #4
        This is a forum for making test posts. Ask on the General Forum; before you do read the FAQs and follow the directions in FAQ 12
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment


        • #5
          ​Correction: I now understand where the "10" in your equation came from. For estimating totals, the correct factor to use in the first stage is indeed:

          \[
          f_{1j} = n \frac{N_j}{N}
          \]
          or with n = 10
          \[
          f_{1j} = 10 \frac{N_j}{N}
          \]

          This is the probability that cluster j will be selected by a proper PPS sampling algorithm. If, however, \(\frac{N_j}{N}>1/10\) for any cluster, you would have been forced to treat that cluster as a certainty unit and restart the algorithm with the reduced n.

          One thing unclear from your description is whether deaths were included in the population \(N_j\) and \(N\) totals. If so, then for living people

          \[
          f_{2j} = \frac{300}{N_j - D_j}
          \]
          where \(D_j\) is the number of deaths in cluster j.

          Then form the design weight

          \[
          W_j = (1/f_{1j})(1/f_{2j})
          \]
          as before.

          The design weights as I specified them would be okay for estimating statistics other than totals. I apologize for the error.
          Last edited by Steve Samuels; 19 Jul 2016, 13:42.
          Steve Samuels
          Statistical Consulting
          [email protected]

          Stata 14.2

          Comment

          Working...
          X