Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Expected frequencies

    Hi,

    I divided my sample into 120 intervals. I want to compare the actual frequencies in each interval with the expected frequencies in that interval (according to a normal distribution of that data).

    Is there any clever way to do this in Stata? I browsed the forum and the internet but cannot find an appropriate example.

    Thanks in advance,

    Jesse

  • #2
    You can use the -normal()- function to calculate the expected frequency in each interval from the normal distribution. Then you use -collapse- to get one observation per interval containing a count of the observed frequencies along with the normal-expected count. Then you can compare them in whatever way you like from there.

    You can also use -graph histogram interval, discrete normal- to get a histogram of the number of observations in each interval, with a normal curve superimposed on it.

    If you want actual code, you need to post an example of your data, using the -dataex- command. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Did the job. Thank you very much Clyde!

      Comment


      • #4
        I have to say that this is a very poor method to check or test for normality. For modest sample sizes many if not of the expected frequencies will be small. For any sample sizes binning throws away information and most comparisons of observed and expected frequencies lose the ordering or quantitative information on where the bins are. In particular, a discrepancy in either tail of a distribution will usually be much more important than one of the same size in the middle.

        It's hard to beat a normal quantile plot (normal probability plot) for checking on normality and that has long been available in qnorm.

        Comment


        • #5
          I don't think that the original post was asking for a test of normality. It says "compare the actual frequencies in each interval with the expected frequencies in that interval (according to a normal distribution)." I agree that for most purposes, -qnorm- is the best way to check on normality of a distribution in Stata. But it does not provide a direct comparison of the observed and expected frequencies in pre-defined intervals.

          Comment


          • #6
            Indeed; but I did say check or test. I can't see any advantage in a 120 x 2 table of frequencies for any assessment of normality over plotting the data directly, whether as a table, as a table plotted, or as reduced to a lack of fit measure or test statistic. Conversely, if there's a sound rationale, I would be happy to learn from it. Why 120, for example?

            Comment


            • #7
              Jesse, in light of posts #4 to #6, perhaps you could explain why you want to compare observed and expected frequencies in the manner you described. What is the real underlying question? Thanks for clarifying.
              --
              Bruce Weaver
              Email: [email protected]
              Version: Stata/MP 18.5 (Windows)

              Comment

              Working...
              X