Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fitting a normal distribution to a CDF

    I have data like
    x cumfract
    0 0.4
    1 0.7
    2 0.9
    The variable cumfract is the proportion of cases with values below the corresponding x value. I would like to fit a normal distribution to this (and return the best fitting mean and variance). Suggestions?


  • #2
    Have you tried Adrian Mander's -cdfplot- (user written program)? To find it, type findit cdfplot in Stata.
    Stata/MP 14.1 (64-bit x86-64)
    Revision 19 May 2016
    Win 8.1

    Comment


    • #3
      Looks like that procedure does do the fit calculation. However, it takes unit record data as input (and doesn't output the mean and variance).

      Comment


      • #4
        I've found a convenient way to do this in SAS (I'm sure it's possible in Stata also!). This does the job.
        proc lifereg ;
        weight fract;
        model (lower,upper)= /distribution=normal;
        run;

        Where is fract is the fraction in each range (as opposed to the cumulative fraction in my source data above), and lower and upper are the corresponding x values for the range.

        Comment


        • #5
          There is no call for any special fitting command here. The mean and the variance concerned are just the mean and variance you can get from summarize.

          It is not what you ask for but qnorm is in my view an immensely better command for checking normality graphically. See also qplot (SJ). That's one reason among several that I never added this functionality to distplot (SJ).
          Last edited by Nick Cox; 10 Mar 2016, 05:19.

          Comment


          • #6
            ... although you would need to use class midpoints and weights.

            Comment


            • #7
              But I don't know the midpoints for the end categories. (And even if it were from a bounded distribution, would using the midpoints give the same variance estimation as fitting to a distribution?)

              Comment


              • #8
                I hadn't realised you had so little information: my fault for insufficiently close reading. You need to make some strong assumptions and your procedure seems closer to assuming normality and estimating parameters then of checking for normality, which is only ever what I do here.

                If you have only a few bins, then using midpoints will not give a good approximation.

                But if all that is known are some sample cumulative probabilities P for values x, then the known information for assessing normality is no more than invnormal(P) which should be checked for linearity with x. That done graphically is what qnorm does on raw data.

                Comment

                Working...
                X