Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    \[
    \Delta y_{i,t} = \alpha + \beta \times \Delta X_{i,t} + \epsilon_{i,t}
    \]
    // ivica_rubil //

    Comment


    • #17
      $$
      \text{
      \def\sym#1{\ifmmode^{#1}\else\(^{#1}\)\fi}
      \begin{tabular}{l*{5}{c}}
      &\multicolumn{5}{c}{} \\
      & age& grade& black& union& south\\
      Age & 1.000& & & & \\
      Grade & 0.114& 1.000& & & \\
      Black & -0.016& -0.167& 1.000& & \\
      Union & 0.032& 0.069& 0.096& 1.000& \\
      south & 0.030& -0.115& 0.277& -0.135& 1.000\\
      \end{tabular} }
      $$
      Last edited by Konrad Zdeb; 20 Sep 2014, 04:50.
      Kind regards,
      Konrad
      Version: Stata/IC 13.1

      Comment


      • #18

        Joseph, the paper itself was confined to 2 x 2 tables and says nothing about other dimensions. The simulation was confined to generating tables for 100 variables from each simulated configuration-so that the true kappa was identical for all variables. That said, with weights for i categories, kappa in Stata generates weighted observed and expected proportions. So there's no practical barrier to estimating the pooled kappa.


        In the i-th table, with

        $$
        \kappa_i= \frac{ p_{o,i}-p_{e,i}}{1-p_{e,i}}
        $$
        The pooled summary
        $$
        \kappa_p= \frac{ \sum (p_{o,i}-p_{e,i})}{\sum (1-p_{e,i})} \\
        $$
        can be viewed as a weighted average of the individual \(\kappa_i\):

        $$
        \kappa_p= \sum w_i \kappa_i
        $$

        where the \(w_i\) is proportional to \(1-p_{e,i}\). Thus variables with the least chance agreement are weighted most heavily.
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment


        • #19
          You are asked in the FAQ to state where you got a contributed program. stlh, by David Hosmer and Patrick Royston, can be located by findit and installed by clicking on the "st0024" link. Be sure to read the accompanying Stata Journal article that can be downloaded from http://ageconsearch.umn.edu/bitstrea...art_st0024.pdf.

          The model which would be most analogous to a stratified Cox models is one in which there is a non-parametric baseline hazard function in each cohort-gender category and a linear model with constant coefficients for the risk factors.

          In a Cox model with stratification, there is a different baseline hazard function in each stratum. Call it \(h_i(t)\). With covariates, the model for the log hazard in stratum i can be written:

          \[
          \text{log}(h_i(t, x)) = \text{log}(h_i(t)) + \beta_1 x_1 + \beta_2 x_2 + \dots +\beta_p x_p
          \]

          For comparison, the general form of Aalen's linear hazard model is:

          $$
          h(t, x) = \beta_0(t) + \beta_1(t)x_1 + \beta_2(t)x_2 + \dots + \beta_p(t)x_p
          $$

          Note that every coefficient is a function of t. This makes it possible to easily study possible non-proportionality of hazard ratios.

          Define \(Z_i(t)\) to be a 0-1 indicator variable for stratum i. Then the model with a separate baseline hazard in each stratum can be written:

          $$
          h(t, x) = \beta_0(t) + \sum_{i=2}^I \alpha_i(t) Z_i + \beta_1(t)x_1 + \beta_2(t)x_2 + \dots + \beta_p(t)x_p
          $$

          The baseline hazard for stratum \(i\) is \(h_i(t)= \beta_0(t) + \alpha_i(t)\).

          Here is suggested Stata code to fit this model; Add the options you require. I use the old "xi" prefix because stph was written before Stata introduced factor variables.

          Code:
          egen stratum = group(sex cohort)
          xi: stlh i.stratum x1 x2 x3
          or
          Code:
          xi: stlh i.cohort*i.sex x1 x2 x3
          stlh has limits; for instance, you cannot constrain some coefficients to be constant; nor can you model data in which hazard ratios are proportional over part of a time range and non-proportional elsewhere. To fit such models you need the timereg package in R (Martinussen and Scheike, 2002, 2006; Scheike, 2014).

          References:

          Hosmer, David, and Patrick Royston. 2002. Using Aalen’s linear hazards model to investigate time-varying effects in the proportional hazards regression model. The Stata Journal 2, 331-350

          Martinussen, T., & Scheike, T. H. (2002). A flexible additive multiplicative hazard model. Biometrika, 89(2), 283-298.

          Martinussen T, Scheike T. (2006) Dynamic regression models for survival data. New York: Springer.

          T Scheike (2014) http://cran.r-project.org/web/packag...eg/timereg.pdf
          Last edited by Steve Samuels; 21 Oct 2014, 15:36.
          Steve Samuels
          Statistical Consulting
          [email protected]

          Stata 14.2

          Comment


          • #20
            \[
            \begin{align}
            h_i(t) & = & \beta_0(t) , i= 1 \\
            & = & \beta_0(t) + \alpha_i(t), i>1
            \end{align}
            \]
            Last edited by Steve Samuels; 21 Oct 2014, 16:29.
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment


            • #21
              \[
              \bar{R}^{2}=R^{2}-(1-R^{2})\frac{p}{n-p-1}
              \]

              Comment


              • #22
                According to the manual, stcrreg uses the Breslow method for ties. It looks like you are analyzing time to stillbirth (stillbirth=1) so that observations are censored if stillbirth \(\neq 1\)
                Steve Samuels
                Statistical Consulting
                [email protected]

                Stata 14.2

                Comment


                • #23
                  Originally posted by Aspen Chen View Post
                  \bar{R}^{2}=R^{2}-(1-R^{2})\frac{p}{n-p-1}
                  test
                  Kind regards,
                  Konrad
                  Version: Stata/IC 13.1

                  Comment


                  • #24
                    $$
                    left\{\begin{array}{l l} 0 & \text{if }m = 0\\ \frac{1}{3}\left(\frac{m}{|s_1|} + \frac{m}{|s_2|} + \frac{m-t}{m}\right) & \text{otherwise} \end{array} \right
                    $$
                    Kind regards,
                    Konrad
                    Version: Stata/IC 13.1

                    Comment


                    • #25
                      There is no way to stset your data to do what you want, and for a good reason: the denominator you want to create " is not person-years. It is instead total of potential observation time, a very different thing A "rate" calculated this way does not estimate any well-defined survival parameter of a survival distribution.

                      Suppose that potential age-at-end of observation for ID 3 in your example had been 100 instead of 80. Your computed rate would change from 6/20 to 6/40. But the "real" data hadn't changed. Or, suppose that you didn't know the maximum date of observation when you analyzed the data (it was missing or wrong from the data set). According to your notion, you could not estimate "person-years".

                      Now consider single-record data with an exponential distribution, in which the hazard function \(\lambda(t)\) is a constant \(\lambda\). Suppose \(D\) is the number of observed failures and \(t_i\) is the recorded failure time or censoring time, in years, for observation \(i\). Then \(T = \sum_i t_i\) is the total of person-years that Stata displays in stsum.

                      It is proved in every survival book that the maximum likelihood estimate of \(\lambda\) is:

                      \[
                      \hat{{\lamda} = \frac{D}{T}
                      \]
                      Notice, no effect of potential observation time after \(t_i\).
                      Steve Samuels
                      Statistical Consulting
                      [email protected]

                      Stata 14.2

                      Comment


                      • #26


                        The method of random groups was originated by Mahalanobis and appears by many names: "interpenetrating samples", "ultimate clusters", "replicated samples", and "random groups". Th method is discussed discussed in many sampling texts (see the references). I give the basic idea below. For a comprehensive exposition, consult Chapter 2 of Wolter, 2007.

                        Suppose the goal is to estimate a parameter \(\theta\). The method creates k subsamples, each of which is a random sample of the original population. In sample \(\alpha\), the estimate of \(\theta\) is
                        \(\widehat{\theta}_\alpha\). The estimate of \(\theta\) is the sample mean:

                        \[
                        \widehat{\overline{\theta}} = \sum_{\alpha=1}^k \widehat{\theta}_\alpha/k
                        \]

                        The advantage of this is a greatly simplified estimate of variance, even for complicated designs:
                        \[
                        v(\widehat{\overline{\theta}}) = \sum_{\alpha=1}^k (\widehat{\overline{\theta}}-\widehat{\theta}_\alpha)/k(k-1)
                        \]

                        This is just the elementary estimate of a variance applied to the \(\widehat{\theta}_\alpha\). There is another advantage to the random groups approach: a look at the \(\widehat{\theta}_\alpha\).gives the a better idea of variability than display of the confidence interval alone.

                        How random groups are created

                        1. Draw the original sample according to the complex multistage design, then group the primary sampling units (enumeration units) into subsamples which are themselves stratified. This is used in the Nigerian Household Survey and, apparently, in many other IPUM surveys.

                        2. Draw the random groups as independent systematic samples. This is the method popularized by Deming (1960) and I've used it myself. It's the only way to get an unbiased estimate of variance with systematic sampling.

                        These approaches are very flexible. For example certainty units are analyzed by placing them in every replicate.


                        Analysis choices when the data consist of random groups

                        1. Implement the random group formulas with statsby: estimate parameters for each group; save them; use summarize to get means and standard errors. This approach works well if the main goal is estimate descriptive parameters. For regression models, statsby is less attractive, because post-estimation commands like margins and predict are not available.

                        2. svyset with replicates as PSUs; then apply Stata's survey commands. This is the approach I suggest above.

                        References:

                        Deming WE, 1960, Sample Design in Business Research, New York: Wiley

                        Hansen, MH, WN Hurwitz, and W Madow. 1953. Sample Survey Methods and Theory. Volume I Methods and Applications. New York: Wiley.

                        Heeringa, Steven, Brady T. West, and Patricia A. Berglund. 2010. Applied survey data analysis. Boca Raton, FL: Chapman & Hall/CRC.

                        Kish, Leslie, 1965 Survey Sampling, New York: Wiley

                        Wolter, Kirk M. 2007. Introduction to variance estimation. New York: Springer.
                        Last edited by Steve Samuels; 17 Feb 2016, 08:21.
                        Steve Samuels
                        Statistical Consulting
                        [email protected]

                        Stata 14.2

                        Comment


                        • #27
                          Neither of your scores is correct. The risk score you want.
                          Code:
                          predict riskscore, xb
                          Let me explain:
                          In a Cox model, two individuals have the same "risk" if, at every time point, \(t\), they have the same hazard function. Suppose the vectors of risk predictors for the indviduals are \(x_1\) and \(x_2)\).

                          \[
                          h_1(t) = \text{exp}(x_1 \beta)h_0(t) = h_2(t) = \text{exp}(x_2 \beta)h_0(t)

                          Here \(h_0(t)\) is the baseline hazard function. In other words,

                          \[
                          x_1\beta = x_2\beta
                          \]

                          This is equivalent to saying that their predicted survival curves at the same time \(t\), \(S_1(t)\) and \(S_2(t)\), are also equal.

                          \[
                          S_1(t) = S_0(t)^{\text{exp}(x_1 \beta)} = S_0(t)^{\text{exp}(x_2 \beta)}
                          \]
                          So, if you know \(S_0(t)\) you can compare the survival curves or the failure curves \(F(t) =1 -S(t)\) at the same point.

                          However your "risk score" does not reproduce \(S_1(t)\) or \(S_2(t)\).

                          For individual \(i\), your score is:

                          \[
                          \hat{S}_2(t_i)= \hat{S}_0^{\text{exp}(x_i \hat{\beta})}
                          \]
                          evaluated at \(t_i\), which is the observed time of failure or censoring for individual \(i\). This time by itself conveys nothing about risk.

                          Your code for creating a risk score at 3 years is not correct, as you are averaging the estimates of \(S_0(t\) at \(t = 3\). But because \(S(t\) decreasess, the average will be greater than \(S_0(3)\).



                          Last edited by Steve Samuels; 13 Mar 2016, 10:00.
                          Steve Samuels
                          Statistical Consulting
                          [email protected]

                          Stata 14.2

                          Comment

                          Working...
                          X