Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • plotting ecdf graph - too many lines

    Hi I want to plot a histogram with an ECDF line graph

    code used:
    Code:
    cum xvarm gen(ecdf)
    twoway histogram xvarm, width (10) yaxis(1) || line cdf xvarm, c(J) yaxis(2)
    My xvarm is a continous variable ranging from 1-498

    I got the following graph which was not what I was was expecting - too many lines !!!!

    Click image for larger version

Name:	Capture2.PNG
Views:	1
Size:	91.1 KB
ID:	1725386

  • #2
    My guess is that you typed something more like

    Code:
    cumul xvarm. gen(ecdf)
    The main problem with your command is sort order. You're specifying implicitly that values are connected in the current order of the dataset, which is not what you want for a cumulative distribution function.

    This example code shows some technique, as typically density units and cumulative probability are not commensurate.
    Using frequency and cumulative frequency is just one choice among several.

    Code:
    sysuse auto, clear
    cumul mpg, gen(ecdf) freq equal 
    twoway histogram mpg, width(1) freq || line ecdf mpg, sort

    Comment


    • #3
      Perhaps a bit late, I just wanted to double check.
      I managed to get this graph
      However due to frequencies as you can see my values are very small with 1.0e+04 appearing at the top of my Y axis

      I do want to confirm that I can not change the Y-axis as this would change the meaning of my frequencies, is this correct and give a false impression
      Click image for larger version

Name:	Screenshot 2023-09-13 at 19.04.03.png
Views:	1
Size:	37.4 KB
ID:	1727027
      ?

      Comment


      • #4
        In #2 the code should have been

        Code:
         cumul xvarm, gen(ecdf)
        The total frequency is evidently about 16000 and the average frequency of the order of 16000/500 or 32, or about 0.2% of the total frequency.

        I was being a bit mischievous in stating that this choice is one of several. Calculations like that above underline that it is not a good choice.

        Some people would be happy if you plotted cumulative and bin on different scales so long as you declared what you are doing. For my part I recommend using a transformation and choosing one of histogram, ECDF or quantile plot.

        See also my suggestions at #4 in https://www.statalist.org/forums/for...ging-bin-width
        Last edited by Nick Cox; 13 Sep 2023, 12:22.

        Comment


        • #5
          Hi I did try histogram...to no avail.
          Do you plotting a ECDF and transposing it onto the histogram using different scales?

          Perhaps I didn't understand you clearly in your post 4 here
          https://www.statalist.org/forums/for...ging-bin-width

          Why did you advise to preserve- contract and restore? How will this help?

          Code:
            
           preserve  contract xvarm dataex, count(498)  restore
          Last edited by Tara Boyle; 13 Sep 2023, 13:18.

          Comment


          • #6
            I am not myself recommending a combination of ECDF and histogram for your problem. In my view that only works well for a much smaller number of bins than you seem to want. I am not keen on mixing scales, but as said others would no doubt regard that as acceptable.

            In the linked thread the code you quote in #5 was suggested as code that

            will yield dataex output that you can show us
            and the emphasis here flags the explanation that I gave. That is, my guess is that your full dataset is far too large to show us, and is not needed anyway. All that is needed for any reader to experiment is a frequency table of values and counts. It would also allow experimentation with transformations.

            Perhaps as a side-effect of mathematical education, posters here often use abstract names. Knowing what xvarm is here would be informative: e.g. people familiar with the subject-matter will have a sense of what graphics are used for such variables. On the other hand, if there is a need not to reveal sensitive data, that is understood, but the reason for coyness is still worth mentioning.

            Comment

            Working...
            X