Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to use collapse a data set for the confidence interval of a median?

    Dear Statalisters,

    Can anyone suggest code how to use the collapse command on a data set over a category for the confidence interval of the median of a variable?
    I suspect this has to be done twice, i.e. seperately for the lower bound and the upper bound.
    That would suit my purpose as I could merge the resulting data sets with the collapsed median values for further processing.
    Code:
    * help file example modified for the median
    webuse college
    collapse (p50) gpa [fw=number], by(year)
    Any suggestion is much appreciated.
    http://publicationslist.org/eric.melse

  • #2
    Hi Eric,

    I'm not sure how you might arrive at the desired confidence interval using collapse (though I'm interested to learn if that's possible).

    One non-parametric approach to the confidence interval would be to gather jackknife samples:

    Code:
    set seed 17
    sysuse auto, clear
    jackknife median=r(p50) : summ price, det
    To apply it over a by-group, you could wrap this up in a custom program and use -runby- or a loop, depending on your specific needs.

    Comment


    • #3
      I don't know how to use -collapse- to get a CI for a median either; I would use -bootstrap- as per p. 174 of Efron, B and Tibshirani, RJ (1993), An Introduction to the Bootstrap, Chapman & Hall; fuller details are in Efron, B (1979), "Bootstrap methods: another look at the jackknife," Annals of Statistics, 7: 1-26

      Comment


      • #4
        The only other thought I have is that when you glance at the documentation for quantile regression, bootstrapping is one of the techniques to estimate the standard errors. Otherwise, people much smarter than I have apparently proposed an estimator for the variance of whatever quantile you're estimating (equation 3 in methods and formulas). So ... whatever method you use, it does seem like collapse just isn't programmed to estimate the SE of any quantile, even though it's a tractable problem.
        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

        Comment


        • #5
          The -centile- command returns a confidence interval. So you can loop over the category levels, run centile for its observations and then store r(c_1), r(lb_1), and r(ub_1) as variables in the data set and retain just one observation per level. I don't know off hand how they calculate the confidence interval, but you can look that up in the documentation.

          (Or, if your data set is large enough to make this too slow, use -runby- instead of a loop. -runby- is written by Robert Picard and me, and is available from SSC.)
          Last edited by Clyde Schechter; 12 Oct 2021, 14:10.

          Comment

          Working...
          X