Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating summary median (or other quantiles) for panel data

    Hi all,
    I have unbalanced panel data and wish to calculate p50,p25 & p75 summary measures at each time the the panel.
    The data thins out with increasing time and the last panels have only one or two subjects still remaining
    Using egen & pctile to calculate summary measures at each time t produces quantile values even when only one or two subjects remain in the panel

    I have replicated this below

    I presume my code is incorrect and would appreciate any corrective input

    many thanks in advance
    Richard Hiscock

    clear
    input id t bob
    1 0 6
    1 1 7
    1 2 10
    1 3 7
    2 0 10
    2 1 6
    2 2 8
    2 3 9
    2 4 4
    2 5 4
    2 6 3
    2 7 4
    3 0 16
    3 1 6
    3 2 10
    3 3 8
    3 4 7
    3 5 6
    4 0 6
    4 1 7
    5 0 6
    5 1 10
    5 2 4
    end

    xtset id t
    xtline bob , overlay xlab(0(1)7)
    sort id t
    egen p50 =median(bob), by(t)
    egen p25 = pctile(bob), by(t) p(25)

    list id t bob p50 p25, sepby(id)





  • #2
    There is nothing wrong with your code, and there is nothing wrong with these results.

    Comment


    • #3
      For further background on how percentiles are calculated by Stata, a good source is the Methods and formulas section of the documentation for the summarize command in the Stata Base Reference Manual PDF included in your Stata installation.

      For a reasonably thorough and approachable discussion of the calculation of percentages in general, I recommend the Wikipedia article Percentile

      Comment


      • #4
        Clyde

        Thanks for the quick feedback
        I am still confused - at the expense of being declared very foolish

        My understanding was that the 25th percentile was that observation for which 25% of upward ranked observations were below & 75% above (or averaged over adjacent values should none exist)

        For my data id 2 t = 7 is the only observation yet clearly Stata computes p25 to = this value. Im unclear how this is calculated ? Shouldn't the25th percentile be undefined if observations are <=2?

        any clarification would be very helpful

        Cheers Richard Hiscock

        Comment


        • #5
          William,
          our posts crossed over
          Clear now using nearest method

          thanks Richard Hiscock

          Comment

          Working...
          X