Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sorting data to compute the median income

    Hello,

    I am new to Stata and am working with the Canadian SLID data, based on the Census data. I performed all my calculations in Excel as Stata for Mac was not available at my University until very recently. I would like to double-check my results with Stata.

    I am using an income variable that I want to sort in order to find the median value. Is there a built-in command to do that in Stata?

    In addition, my dataset covers the 10 Canadian provinces. Is there a way to perform a "filter&sort"?

    1) Filter the data pertaining to a given province (e.g. #1) (for example, filtering the data that have "1" in the Province column )
    2) Finding the median income for that province

    Thanks in advance for your help.

    Francois

  • #2
    check the "detail" option for the -summarize- command to get medians (and much other descriptive summary info)

    I'm not sure what you mean by filter, but there are several ways to get this information for particular provinces; you might use "if" as in -su var if province==1, d-; another option would be to use -forval- if your values are numbered: e.g., if numbered 1-10, you might do:

    forval i=1/10 {

    su var if province==`i', d

    }

    there are other ways also, but you need to be clearer about what you want

    Rich

    Comment


    • #3
      Or perhaps:

      bysort province: summ income, detail

      Comment


      • #4
        Thanks Rich for your quick answer. Could you be a bit more specific about the the median command? What I see on the top menu is : Statistics / Summaries, Tables and tests /... but Median is not in te list.

        Province: my dataset for year 2010 is composed of several income variables (columns). In one of these columns, there is one assigned to "province" where a number is assigned to each province. For example, Quebec = 1, Ontario =2 and so on.

        What I want is to find the median income for Quebec only. In Excel, I would then "filter" that column to isolate the "1" and then compute the median on that subset, yielding the value of the median income for Quebec in 2010. I would like to perform something similar in Stata.

        Hope is a bit clearer.

        Thanks a lot!

        Francois

        Comment


        • #5
          There is a median command, but it is not what you want. As Richard Goldstein and Joe Canner already explained, there is a summarize command. Also check out egen.

          Comment


          • #6
            Code:
            sysuse nlsw88, clear
             _pctile  wage [pw=tenure] if race==1
            return list
            Put your income variable instead of wage in above, your weight variable instead of tenure in above, and your province variable instead of race in above.
            Last edited by Sergiy Radyakin; 24 Apr 2014, 15:10. Reason: Explained variables

            Comment


            • #7
              For some reason the Statistics->Summaries menu item does not provide the option required to get medians. Rich was referring to the command: summarize income, detail which you can type in the command window. Using this command in the command window will also make it easier for you to stratify by province, as both Rich and I describe above ("su" and "summ" are permitted abbreviations for "summarize"; I prefer the latter as being more informative).

              Comment


              • #8
                Francois,
                1) be aware that different software computes percentiles differently. You might not get exactly the same results when you start cross-checking.
                2) when working with survey data you don't forget using the weights, right? Finding a weighted median in Excel may be not a very standard function but have a look at this discussion.
                Best, Sergiy Radyakin

                Comment


                • #9
                  In addition, there is tabstat

                  Code:
                  tabstat income,by(province) stat(p50)

                  Comment


                  • #10
                    Command tabstat internally uses summarize and hence does not support pweights, though numerically it should be the same as using aweights, right?
                    Survey pweights are supported by table:

                    Code:
                    sysuse nlsw88
                    table race  [pw=tenure], c(p50 wage)

                    Comment

                    Working...
                    X