Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate Annual Percentage Change using joinplot/nl hockey

    Hello,
    I have data on years and corresponding cancer cases.

    year cases
    2010 7.07143
    2011 7.09581
    2012 7.12348
    2013 7.15027
    2014 7.17207
    2015 7.1936
    2016 7.21309
    2017 7.23115
    2018 7.24734
    2019 7.26185
    2020 7.27493
    2021 7.28676
    2022 7.29735
    2023 7.30616
    2024 7.3131
    2025 7.31808
    2026 7.32111
    2027 7.32244
    2028 7.32209
    2029 7.31986
    2030 7.3155
    2031 7.30892
    2032 7.30047
    2033 7.29041
    2034 7.27849
    2035 7.26443
    2036 7.24815
    2037 7.22982
    2038 7.20979
    2039 7.18799
    2040 7.16415
    2041 7.13816
    2042 7.11005
    2043 7.07997
    2044 7.04794
    2045 7.01377
    2046 6.97755
    2047 6.93938
    2048 6.89907
    2049 6.85672
    2050 6.81231

    I want to calculate the annual percentage change over the 4o years. The recommend way to do that is to identify joints or segments of varying slopes and then use the APC formula APCi = { exp(bi) - 1 } x 100, where bi as the slope coefficient for the ith segment with i indexing the segments in the desired range of years.

    #My first question is what command can I use to identify the number of segments in the data? That way I can use piecewise linear regression.

    I have read in other posts about the nl hockey program. It identifies 2 segments only. I have tried using it. The result looks like this:

    HTML Code:
    . nl hockey cases year
    (obs = 41)
    
    Iteration 0:   residual SS =  1.19e+10
    Iteration 1:   residual SS =  .0401542
    Iteration 2:   residual SS =  .0333771
    Iteration 3:   residual SS =  .0330889
    Iteration 4:   residual SS =  .0330671
    
          Source |       SS       df       MS            Number of obs =        41
    -------------+------------------------------         F(  3,    37) =    275.09
           Model |  .737541997     3  .245847332         Prob > F      =    0.0000
        Residual |  .033067058    37  .000893704         R-squared     =    0.9571
    -------------+------------------------------         Adj R-squared =    0.9536
           Total |  .770609056    40  .019265226         Root MSE      =  .0298949
                                                         Res. dev.     = -175.6814
    (hockey)
    ------------------------------------------------------------------------------
         cccases | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
      breakpoint |   2030.589   .4860736  4177.53   0.000     2029.604    2031.574
         slope_l |   .0123611   .0010773    11.47   0.000     .0101782     .014544
         slope_r |  -.0260939   .0011593   -22.51   0.000    -.0284428   -.0237449
            cons |  -17.72384    2.17623    -8.14   0.000     -22.1333   -13.31438
    ------------------------------------------------------------------------------
    * Parameter cons taken as constant term in model & ANOVA table
     (SEs, P values, CIs, and correlations are asymptotic approximations)
    HTML Code:
     predict cases_hat
    (option yhat assumed; fitted values)
     graph twoway scatter ccces_hat year || line cases_hat year

    Click image for larger version

Name:	jointplot.png
Views:	1
Size:	84.6 KB
ID:	1713048

    #I now want to use the coefficients slope_l and slope_r to calculate APC for the entire range. So the formula would be {exp(range of years segment1*slope1 coefficient+range of years segemnt2*slope2 coefficient)/total years-1}*100. How can I implement that? I am not sure how to save the slope coefficients and calculate the range of years in each segment without manually looking them up from results and graphs.

    Thanks
    Josna

  • #2
    The first thing you should always do is look at the data. The first thing to notice is that this is not observed or empirical data (2050 hasn't happened yet). So we are dealing with a projection, which means that there is already an underlying model. Ideally, you already have that model and the coefficients and you can derive the growth rate from that. If that is not possible, then you can plot that projection

    Code:
    clear
    input year cases
    2010 7.07143
    2011 7.09581
    2012 7.12348
    2013 7.15027
    2014 7.17207
    2015 7.1936
    2016 7.21309
    2017 7.23115
    2018 7.24734
    2019 7.26185
    2020 7.27493
    2021 7.28676
    2022 7.29735
    2023 7.30616
    2024 7.3131
    2025 7.31808
    2026 7.32111
    2027 7.32244
    2028 7.32209
    2029 7.31986
    2030 7.3155
    2031 7.30892
    2032 7.30047
    2033 7.29041
    2034 7.27849
    2035 7.26443
    2036 7.24815
    2037 7.22982
    2038 7.20979
    2039 7.18799
    2040 7.16415
    2041 7.13816
    2042 7.11005
    2043 7.07997
    2044 7.04794
    2045 7.01377
    2046 6.97755
    2047 6.93938
    2048 6.89907
    2049 6.85672
    2050 6.81231
    end
    
    twoway line cases year
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	56.6 KB
ID:	1713053

    This screams polynomial (probably quadratic?) to me (and makes me extremely suspicious about the validity of that projection, but that is another story). So we can try to recover that model with its parameters.

    Code:
    //with quadratics it is often easier to first center the variable
    gen yearc = year - 2030
    
    poisson cases c.yearc##c.yearc, vce(robust)
    predict mu1
    twoway scatter cases year ||       ///
           line mu1 year,              ///
           lpattern(solid) legend(off)
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	68.3 KB
ID:	1713054

    Close, but not quite. It is however close enough to suspect that this projection is indeed based on a polynomial. So lets add a cube term.

    Code:
    poisson cases c.yearc##c.yearc##c.yearc, vce(robust)
    predict mu2
    twoway scatter cases year ||       ///
           line mu2 year,              ///
           lpattern(solid) legend(off)
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	69.3 KB
ID:	1713055

    I think we found our model. Now the growth rate. (Again I am really really really suspicious about the validity of this projection) The formula for the growth rate you gave is not true in general. In general you want the first derivative with respect to year. For a polynomial that is not too hard, but you can also just use margins

    Code:
     margins, dydx(yearc) over(year)
    marginsplot, plotopts(msymbol(i)) ///
        recastci(rarea) ciopts(astyle(ci))
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	66.8 KB
ID:	1713056




    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thanks Maarten for such a detailed response.
      You're right. It is a projection of cancer cases based on age specific rates and population projection. Since the age specific rate is assumed to be constant, the case projection is largely affected by the projected population growth. The population size decreases after 2030. I take your concern onboard and will double check my calculations.
      The formula for growth rate I shared is generally used for age standardized incidence rate. I wanted to try it out with number of cases as I need to calculate the annual percentage change for this.
      Thanks again for your help!

      Comment

      Working...
      X