Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Yearly Cumulative Distribution

    Dear Statalist, I want to construct the cumulative-distribution column. I tried below-two commands but results are not-correct.

    HTML Code:
    . bys year: gen cummat = _n * pcmat
    
    . bys year: egen cummat1 = max(cummat)
    
    . 
    . tabdisp year, c(pcmat cummat cummat1)
    
    ----------------------------------------------
         year |      pcmat      cummat     cummat1
    ----------+-----------------------------------
         2003 |   .0781001    .0781001    2.343003
         2004 |   .1447923    .1447923    7.384408
         2005 |   .1610266    .1610266     7.08517
         2006 |   .3227845    .3227845    24.53162
         2007 |   .5714908    .5714908    60.57803
         2008 |   1.051938    1.051938    239.8419
         2009 |   1.352345    1.352345    408.4083
         2010 |   2.676464    2.676464     1536.29
         2011 |   2.998298    2.998298    1757.002
         2012 |   5.357974    5.357974    5550.861
         2013 |   5.217203    5.217203     6417.16
         2014 |   5.526898    5.526898    7997.421
         2015 |   8.223252    8.223252    17227.71
         2016 |   13.52689    13.52689    37550.65
         2017 |   10.69372    10.69372    35267.88
         2018 |   12.48629    12.48629    47348.01
         2019 |   13.48302    13.48302     53743.3
         2020 |   16.12752    16.12752    77412.11
    ----------------------------------------------

    Below is my sample data:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float pcmat
    .0781001
    .1447923
    .1610266
    .3227845
    .5714908
    1.051938
    1.352345
    2.676464
    2.998298
    5.357974
    5.217203
    5.526898
    8.223252
    13.52689
    10.69372
    12.48629
    13.48302
    16.12752 
    end

  • #2
    Your example data are almost sorted, so a cumulative distribution function calculation is either direct, with sorting first, or by using the dedicated cumul command.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float pcmat
    .0781001
    .1447923
    .1610266
    .3227845
    .5714908
    1.051938
    1.352345
    2.676464
    2.998298
    5.357974
    5.217203
    5.526898
    8.223252
    13.52689
    10.69372
    12.48629
    13.48302
    16.12752 
    end
    
    sort pcmat 
    gen wanted = _n/_N 
    cumul pcmat, gen(WANTED)
    
    assert wanted == WANTED 
    
    list 
         +--------------------------------+
         |    pcmat     wanted     WANTED |
         |--------------------------------|
      1. | .0781001   .0555556   .0555556 |
      2. | .1447923   .1111111   .1111111 |
      3. | .1610266   .1666667   .1666667 |
      4. | .3227845   .2222222   .2222222 |
      5. | .5714908   .2777778   .2777778 |
         |--------------------------------|
      6. | 1.051938   .3333333   .3333333 |
      7. | 1.352345   .3888889   .3888889 |
      8. | 2.676464   .4444444   .4444444 |
      9. | 2.998298         .5         .5 |
     10. | 5.217203   .5555556   .5555556 |
         |--------------------------------|
     11. | 5.357974   .6111111   .6111111 |
     12. | 5.526898   .6666667   .6666667 |
     13. | 8.223252   .7222222   .7222222 |
     14. | 10.69372   .7777778   .7777778 |
     15. | 12.48629   .8333333   .8333333 |
         |--------------------------------|
     16. | 13.48302   .8888889   .8888889 |
     17. | 13.52689   .9444444   .9444444 |
     18. | 16.12752          1          1 |
         +--------------------------------+
    
    .
    Alternatively the cumulative total over time is just

    Code:
    sort year 
    gen cum_total = sum(pcmat)
    If you want something else, please give a definition.

    Comment


    • #3
      thanks Nick, cumul command worked perfectly.

      Comment

      Working...
      X