Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Formula for confidence intervals of mean() - why is this the case?

    Hello! I realize this has been discussed in the past - for example, this post is an in-depth discussion on the topic. However, I am not sure the reasoning is very clear. At risk of maybe annoying some of the older forum members, I was wondering if can bring this up again.

    Consider the following example:

    Code:
    clear
    sysuse auto
    
    /// CIs estimated simultaneously  
    mean price, over(rep78)
    di `e(df_r)'
    
    /// CIs estimated for one specific group
    mean price if rep78 == 1
    di `e(df_r)'
    There is an inconsistency in the way confidence intervals are calculated. Firstly, the estimate of the means and as well as their standard errors are clearly correct. However, there is a difference in how the CIs are calculated across these two routines. Stata uses a group specific N for calculating the standard error in both routines, yet uses an overall N for calculating the the confidence intervals (in routine 1), while it uses the group specific N in the second (because the IF condition restricts the sample, I suppose?). I think, the second routine estimates CIs that are more intuitive, but I'm curious about the reasoning behind the CIs estimated via the first routine?

    My questions are the following:
    1. When is it "correct" to use mean Y, over(X) or mean Y if X == x , as opposed to tabstat Y, stats(mean semean) by(X) and hand-rolling the confidence intervals?
    2. Why is there a difference between the N used in the calculation of the standard error and the confidence intervals? It seems like this is an arbitrary choice. The manual too, is not clear on this.

    Thanks!

    Zaeen
Working...
X