No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • the Confidence Interval on the mean for an observation

    Dear Statalist,

    Could you please explain it to me what "the Confidence Interval on the mean for an observation" is? How can I calculate it using stata?

  • #2
    Hi Masoumeh,

    It is usually our goal to collect data so that we may make inferences. Typically, due to logistical constraints, we collect a finite sample of individuals so that we may make inferences to a population. If we have collected our sample in an unbiased manner (and the sample is large enough so that Central Limit Theorem kicks in) we can usually make good inferences from our sample.

    However, it is important to keep in mind that the sample we collected is only one of a nearly infinite number of possible samples from the target population we could've drawn. So even though we may be confident in our specific sample, we still want to capture the uncertainty in the fact that we only have a sample of the population, and not the entire population itself.

    So let's say we collect a bunch of data and then we want to describe the mean for some variable (as is the question you have asked), such as adult height, we can do this in Stata by simply using the following command:

    webuse nhanes2, clear
    sum height
    This will give us the mean height we are looking for, but do we genuinely think estimate is exactly reflecting the population with no uncertainty? Probably not. So we can compute a measure (called the standard error) which reflects the uncertainty in this estimate. The standard error (SE) for the mean is the standard deviation of height divided by the square root of the sample size (N). Then using the standard error we can compute confidence intervals which covers 95% of the sampling distribution (this is true, on average, in the long run -- but lets skip over the details for now). The confidence bounds are computed as the mean +/- 1.96 * SE. Why 1.96? Well 1.96 above and below the mean of a Normal distribution contains 95% of the data.

    We can get these numbers in Stata using the following code:
    mean height
    There you have it, a mean and it's confidence interval which quantifies the uncertainty in our estimate.
    Last edited by Matt Warkentin; 12 Jun 2018, 20:49.


    • #3
      Thanks for the replay. However, recently I saw mean-/+SD (not SE) called as "68% Confidence Interval for individual observations" or "Confidence Interval on the mean for an observation"

      mean +/-1.96 *SD "95 % normal confidence interval for individual observations around the mean"
      Last edited by Masoumeh Sanagou; 12 Jun 2018, 21:13.


      • #4
        Could you provide more context where you heard these phrases used? A single observation has no variance and thus no standard error. Perhaps this could be in reference to a confidence interval for a model prediction for a new observation.


        • #5
          Thanks for your time and replys.

          please have a look at “Three Ways to Think About the 95% Confidence Interval for a Mean” then For X? 95% of individuals have X within ±1.96 sd of µ”

          look at “(3) Means, standard deviations, and 1.96 times the standard deviations (95% normal confidence interval for individual observations around the mean);”

          Evaluation of Descriptors for Solids.
          Figure 22

          Figure 4.22


          • #6
            Interesting question (and I am interested to learn if and how to compute and graph this). Possibly this blog on Cross Validated is helpful.


            • #7
              If I'm not mistaken, the two links you provided are describing the spread of the distribution and not the spread of the mean of the distribution. For the first link under the "Three ways..." heading, they want to describe the range of their data (assuming a Normal distribution) that contains 95% of observations. This is simply the mean +/- 1.96*SD. This differs from the confidence interval around a mean using the SE which describes the likely spread of the MEAN and not of a randomly selected observation.

              I believe this is the exact same meaning as described in the second link. I can't see the two figures you've provided so I can't comment. It isn't a confidence interval for a single specific observation, but rather a confidence interval that would contain 95% of randomly selected individual observations.
              webuse nhanes2
              sum height
              di r(mean) - (1.96 * r(sd))
              di r(mean) + (1.96 * r(sd))