plotting ecdf graph - too many lines

Tara Boyle

Join Date: Nov 2022

Posts: 142
#1

plotting ecdf graph - too many lines

29 Aug 2023, 09:08

Hi I want to plot a histogram with an ECDF line graph

code used:

Code:

cum xvarm gen(ecdf) twoway histogram xvarm, width (10) yaxis(1) || line cdf xvarm, c(J) yaxis(2)

My xvarm is a continous variable ranging from 1-498

I got the following graph which was not what I was was expecting - too many lines !!!!
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35724
#2

29 Aug 2023, 11:49

My guess is that you typed something more like

Code:

cumul xvarm. gen(ecdf)

The main problem with your command is sort order. You're specifying implicitly that values are connected in the current order of the dataset, which is not what you want for a cumulative distribution function.

This example code shows some technique, as typically density units and cumulative probability are not commensurate.
Using frequency and cumulative frequency is just one choice among several.

Code:

sysuse auto, clear cumul mpg, gen(ecdf) freq equal twoway histogram mpg, width(1) freq || line ecdf mpg, sort
Comment
Tara Boyle

Join Date: Nov 2022

Posts: 142
#3

13 Sep 2023, 12:04

Perhaps a bit late, I just wanted to double check.
I managed to get this graph
However due to frequencies as you can see my values are very small with 1.0e+04 appearing at the top of my Y axis

I do want to confirm that I can not change the Y-axis as this would change the meaning of my frequencies, is this correct and give a false impression
?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#4

13 Sep 2023, 12:19

In #2 the code should have been

Code:

cumul xvarm, gen(ecdf)

The total frequency is evidently about 16000 and the average frequency of the order of 16000/500 or 32, or about 0.2% of the total frequency.

I was being a bit mischievous in stating that this choice is one of several. Calculations like that above underline that it is not a good choice.

Some people would be happy if you plotted cumulative and bin on different scales so long as you declared what you are doing. For my part I recommend using a transformation and choosing one of histogram, ECDF or quantile plot.

See also my suggestions at #4 in https://www.statalist.org/forums/for...ging-bin-width

Last edited by Nick Cox; 13 Sep 2023, 12:22.
Comment
Tara Boyle

Join Date: Nov 2022

Posts: 142
#5

13 Sep 2023, 13:08

Hi I did try histogram...to no avail.
Do you plotting a ECDF and transposing it onto the histogram using different scales?

Perhaps I didn't understand you clearly in your post 4 here
https://www.statalist.org/forums/for...ging-bin-width

Why did you advise to preserve- contract and restore? How will this help?

Code:

preserve contract xvarm dataex, count(498) restore

Last edited by Tara Boyle; 13 Sep 2023, 13:18.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#6

13 Sep 2023, 14:30

I am not myself recommending a combination of ECDF and histogram for your problem. In my view that only works well for a much smaller number of bins than you seem to want. I am not keen on mixing scales, but as said others would no doubt regard that as acceptable.

In the linked thread the code you quote in #5 was suggested as code that

will yield dataex output that you can show us

and the emphasis here flags the explanation that I gave. That is, my guess is that your full dataset is far too large to show us, and is not needed anyway. All that is needed for any reader to experiment is a frequency table of values and counts. It would also allow experimentation with transformations.

Perhaps as a side-effect of mathematical education, posters here often use abstract names. Knowing what xvarm is here would be informative: e.g. people familiar with the subject-matter will have a sense of what graphics are used for such variables. On the other hand, if there is a need not to reveal sensitive data, that is understood, but the reason for coyness is still worth mentioning.
Comment

Announcement

plotting ecdf graph - too many lines

Comment

Comment

Comment

Comment

Comment