Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Center around the Median

    Good afternoon,

    Is there a command that allows you to center around the median of a continuous variable?

    Best,
    Tess

  • #2
    Tess:
    what springs to my mind is:
    Code:
    use "C:\Program Files\Stata16\ado\base\a\auto.dta"
    . quietly sum price, d
    
    . g center_median=price-r(p50)
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hi Carlo,

      So, price being the variable? Also, what does the p50 within the parentheses represent?

      Best,
      Tess
      Last edited by Tess Gecha; 21 Oct 2021, 12:08.

      Comment


      • #4
        I couldn't find one, but the essence is simply. 1. Find the median 2. Subtract it. (@Carlo Lazzaro got straight to it while I was writing this.)

        This sample program goes a bit further. Most obvious next step from my point of view: allow different variable type to be specified.

        Code:
        program centermed
        *! 1.0.0 NJC 21 Oct 2021
        version 8.2
        syntax varlist [if] [in]
        
        quietly {
            marksample touse, novarlist
            count if `touse'
            if r(N) == 0 error 2000
        
            ds `varlist', has(type numeric)
            if "`r(varlist)'" == "" error 102
            
            foreach v in `r(varlist)'  {
                centile `v' if `touse'
                gen `v'_med = `v' - r(c_1) if `touse'
                local lbl : var label  `v'
                if `'"`lbl'"' == "" local lbl "`v'"
                if length(`"`lbl'"') > 71 label var `v'_med "`v' - median"
                else label var `v'_med `"`lbl' - median"'
            }
        }
        
        end
        Sample usage:

        Code:
        sysuse auto, clear
        centermed *
        I'd say center on.... and much discussion points to that being more widely accepted than center around. We could circle around that small vexed question, and not converge on universal agreement.

        Strictly, I'd say centre but there you go. Last I heard the language is still called English.
        Last edited by Nick Cox; 21 Oct 2021, 12:17.

        Comment


        • #5
          #3 . r(p50) is the median or 50% point. If that's not a familiar definition, the help and manual entry for summarize say more.

          Comment


          • #6
            Got it! A few short follow-up questions:

            1. In Carlo's example code, the 50 represents the median for that variable?
            2. Would someone explain what the p before the 50 represents?
            3. The values below the median would then become negative - is there someway to counteract that?

            Comment


            • #7
              p could mean percentile or probability. It's conveniently ambiguous. I guess Stata developers had the percentile meaning uppermost in naming things.

              If you want to center on the median, usually that means some negative values. (A simple exception here is exemplified by foreign in the auto dataset which is 0 and 1 with a majority of 0. Hence the median is 0 and the centered values are exactly the same as the original. But indicator variables, and even graded variables, are examples where the median is not often especially useful. In fact, the mean of indicator variables is immensely more useful and easier to tie to concepts.)

              I don't follow the motivation here. If you want to center on the median, that exception aside, there will be some negative values. Why would you want to counteract that? It can be counteracted: you just add any number big enough to make all values positive, but what would you have gained?

              Comment


              • #8
                Ok, got it - thank you so much for your help!

                Comment


                • #9
                  A general statement is that (value MINUS median) will be zero or positive always if and only if the lowest value and the median are identical. This happens when the lowest value occurs repeatedly in a strict majority of observations. Another example is -2 -2 -1 where -2 is the median and standardized values are thus 0 0 1.

                  I've occasionally seen (value MINUS median) / IQR but not often. Two good reasons for that lack:

                  1. It is easy to find variables with IQR zero or at least very small and then you have indeterminacy or an explosion. So, the idea can't be general across measured or counted variables.

                  2. While median and IQR are robust or resistant this measure will not be. Even if #1 doesn't happen many outliers will be several IQRs away from the median. Further, as it is a linear transformation of the original it has identical skewness and kurtosis to the original,

                  Naturally the main motive for any standardization is not change of distribution shape but just rescaling to comparable values.

                  Comment

                  Working...
                  X