Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Graphs with a large number of observations

    Dear friends,

    I am trying to make a graph, but failing to do so. I think the problem is that I have to many observations, but I do not know.


    What I want to do is the following:

    I have two variables.
    - wealth (has positive and negative values)
    - dummy variable D=0,1

    Each variable has about 6,5 million observations.

    1) Firstly, I want a graph that shows the distribution of wealth in the population.

    Now, most of my variables has D=0. There are only about 900 D=1 observations.

    2) And now most importantly, I want to know where in the wealth distribution my 900 D=1 observations are?

    Are they in the top ten, topp five or are they in the top 1 percent of the wealth distribution?


    If I can not do this with a graph, can I do it with a table?


    Is there anyone out there willing to help?

    I have been trying for quite some time and not getting anywhere. I feel like Sysiphus.. rolling my rock up to the top of the mountain, but not getting anywhere.

    Best wishes,
    Silje


  • #2
    Hi Silje
    Originally posted by Silje Smith View Post
    I am trying to make a graph, but failing to do so.
    What does this mean? Which error does Stata provide?
    What was your code?

    Comment


    • #3
      Good morning,

      I understand that it is important to show code because it means that one has tried to solve the problem one self. However, I did not include code because I do not really know if am looking in the right place.

      My latest code is simply " tab wealth"
      Which gives the error code "too many values r(134)"

      I have also tried "graph bar (p10) wealth (p25) wealth (p50).. but that gives me the cummulative probability, which is not what I want.
      I have tried histogram, pctile, and many many more.

      In addition, I need to overlay graps for D=1 and D=0.

      Any help is appreciated, even if it is just to point me in the right direction.

      Best wishes,
      Silje

      Comment


      • #4
        What about this. It requires the user written qplot, which you can find by typing search qplot in Stata.

        Code:
        // open example data
        sysuse nlsw88, clear
        
        // prepare the data
        gen byte black = race == 2 if !missing(race)
        label variable black "respondent's race"
        label define black 0 "not black" ///
                           1 "black"
        label value black black
        
        qplot wage, over(black)            /// main graph
        scheme(s1color) ylabel(,angle(0))   // additional options to make it look prety
        (For more on examples I sent to the Statalist see: http://www.maartenbuis.nl/example_faq )

        Click image for larger version

Name:	Graph.png
Views:	1
Size:	26.1 KB
ID:	1384552


        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          I like Maarten's suggestion. I'd add for his example data logarithmic scales as surely easing comparison. Although Silje is specifically asking for superimposed displays, I don't feel constrained from suggesting something else. stripplot (SSC) is also user-written.

          Code:
          * Maarten's example
          sysuse nlsw88, clear
          
          set scheme s1color
          
          gen byte black = race == 2 if !missing(race)
          label variable black "respondent's race"
          label define black 0 "not black" ///
                             1 "black"
          label value black black
          
          * uses log scale
          qplot wage, over(black)            ///
          ylabel(,angle(0)) ysc(log) yla(40 30 20 10 5 2 1)
          
          * new in this post
          stripplot wage, over(black) vertical box cumul cumprob centre ///
          ylabel(,angle(0)) ysc(log) yla(40 30 20 10 5 2 1) refline(lcolor(red)) xla(, noticks)
          Click image for larger version

Name:	silje.png
Views:	1
Size:	19.3 KB
ID:	1384559


          The extra graph is a quantile-box plot with reference lines for means. (Geometric means could be a good idea.)
          Last edited by Nick Cox; 20 Apr 2017, 03:11.

          Comment


          • #6
            Thank you for useful comments.

            A quantile plot did the job, and the box-plot for means makes it easier to see the pattern

            Much appreciated and I will read up on quantile plots for the future.

            S;

            Comment

            Working...
            X