Graphs with a large number of observations

Silje Smith

Join Date: Apr 2017

Posts: 3
#1

Graphs with a large number of observations

19 Apr 2017, 14:26

Dear friends,

I am trying to make a graph, but failing to do so. I think the problem is that I have to many observations, but I do not know.

What I want to do is the following:

I have two variables.
- wealth (has positive and negative values)
- dummy variable D=0,1

Each variable has about 6,5 million observations.

1) Firstly, I want a graph that shows the distribution of wealth in the population.

Now, most of my variables has D=0. There are only about 900 D=1 observations.

2) And now most importantly, I want to know where in the wealth distribution my 900 D=1 observations are?

Are they in the top ten, topp five or are they in the top 1 percent of the wealth distribution?

If I can not do this with a graph, can I do it with a table?

Is there anyone out there willing to help?

I have been trying for quite some time and not getting anywhere. I feel like Sysiphus.. rolling my rock up to the top of the mountain, but not getting anywhere.

Best wishes,
Silje
Tags: None
Guillaume Geri

Join Date: Sep 2014

Posts: 55
#2

19 Apr 2017, 14:32

Hi Silje

Originally posted by Silje Smith View Post

I am trying to make a graph, but failing to do so.

What does this mean? Which error does Stata provide?
What was your code?
Comment
Silje Smith

Join Date: Apr 2017

Posts: 3
#3

19 Apr 2017, 23:28

Good morning,

I understand that it is important to show code because it means that one has tried to solve the problem one self. However, I did not include code because I do not really know if am looking in the right place.

My latest code is simply " tab wealth"
Which gives the error code "too many values r(134)"

I have also tried "graph bar (p10) wealth (p25) wealth (p50).. but that gives me the cummulative probability, which is not what I want.
I have tried histogram, pctile, and many many more.

In addition, I need to overlay graps for D=1 and D=0.

Any help is appreciated, even if it is just to point me in the right direction.

Best wishes,
Silje
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3458
#4

20 Apr 2017, 02:30

What about this. It requires the user written qplot, which you can find by typing search qplot in Stata.

Code:

// open example data sysuse nlsw88, clear // prepare the data gen byte black = race == 2 if !missing(race) label variable black "respondent's race" label define black 0 "not black" /// 1 "black" label value black black qplot wage, over(black) /// main graph scheme(s1color) ylabel(,angle(0)) // additional options to make it look prety

(For more on examples I sent to the Statalist see: http://www.maartenbuis.nl/example_faq )

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35711

20 Apr 2017, 03:08

I like Maarten's suggestion. I'd add for his example data logarithmic scales as surely easing comparison. Although Silje is specifically asking for superimposed displays, I don't feel constrained from suggesting something else. stripplot (SSC) is also user-written.

Code:

* Maarten's example
sysuse nlsw88, clear

set scheme s1color

gen byte black = race == 2 if !missing(race)
label variable black "respondent's race"
label define black 0 "not black" ///
                   1 "black"
label value black black

* uses log scale
qplot wage, over(black)            ///
ylabel(,angle(0)) ysc(log) yla(40 30 20 10 5 2 1)

* new in this post
stripplot wage, over(black) vertical box cumul cumprob centre ///
ylabel(,angle(0)) ysc(log) yla(40 30 20 10 5 2 1) refline(lcolor(red)) xla(, noticks)

Click image for larger version

Name: silje.png
Views: 1
Size: 19.3 KB
ID: 1384559

The extra graph is a quantile-box plot with reference lines for means. (Geometric means could be a good idea.)

Last edited by Nick Cox; 20 Apr 2017, 03:11.

Comment

Silje Smith

Join Date: Apr 2017

Posts: 3
#6

20 Apr 2017, 14:20

Thank you for useful comments.

A quantile plot did the job, and the box-plot for means makes it easier to see the pattern

Much appreciated and I will read up on quantile plots for the future.

S;
Comment

Announcement

Graphs with a large number of observations

Comment

Comment

Comment

Comment

Comment