Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Boxplot/quantiles for frequency

    Hi everyone, I'm struggling with my dataset and I don't know how I can fix it. Here you see a sample of the data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte Inkomensgroep int Allehuishoudens
    
     1  48
     3  57
     5  58
     7  65
     9  71
    11 102
    13 206
    end
    The first column is income while the second one shows the frequency. I would like to calculate the mean, first and third quantile (and after that a box plot).
    However, how should I let STATA know it is a frequency so it doesn't not only plot income without taking the different frequencies into account /or how can I make one dataset with 48 times 1 as value, and 57 times 3 etc ?

    Thank you very much!

  • #2
    Originally posted by Harina Peternella View Post
    Hi everyone, I'm struggling with my dataset ... how can I make one dataset with 48 times 1 as value, and 57 times 3 etc ?
    Code:
    expand Allehuishoudens


    To verify:
    Code:
    tabulate Inkomensgroep 
    
    Inkomensgro |
             ep |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |         48        7.91        7.91
              3 |         57        9.39       17.30
              5 |         58        9.56       26.85
              7 |         65       10.71       37.56
              9 |         71       11.70       49.26
             11 |        102       16.80       66.06
             13 |        206       33.94      100.00
    ------------+-----------------------------------
          Total |        607      100.00

    Comment


    • #3
      That was the command I was looking for, thank you!!!

      Comment


      • #4
        I guess quantile here means quartile. Note that box plots can look very odd with discrete variables. Here your example probably is weirder than your full dataset, but with stripplot from SSC I get this:


        Code:
        clear
        input byte Inkomensgroep int Allehuishoudens
         1  48
         3  57
         5  58
         7  65
         9  71
        11 102
        13 206
        end
        
        expand Alle
        stripplot Ink , vertical box cumul refline centre yla(, ang(h))
        Click image for larger version

Name:	weirdboxplot.png
Views:	1
Size:	18.3 KB
ID:	1615076



        This has to be decoded: more than 25% are in the top group: therefore the same value is reported as the upper (third) quartile and the maximum, and there is no whisker.

        The refline option in stripplot defaults to showing the mean as an extra horizontal line (in this case).

        A histogram might be clearer.

        Comment

        Working...
        X