Hello! I realize this has been discussed in the past - for example, this post is an in-depth discussion on the topic. However, I am not sure the reasoning is very clear. At risk of maybe annoying some of the older forum members, I was wondering if can bring this up again.
Consider the following example:
There is an inconsistency in the way confidence intervals are calculated. Firstly, the estimate of the means and as well as their standard errors are clearly correct. However, there is a difference in how the CIs are calculated across these two routines. Stata uses a group specific N for calculating the standard error in both routines, yet uses an overall N for calculating the the confidence intervals (in routine 1), while it uses the group specific N in the second (because the IF condition restricts the sample, I suppose?). I think, the second routine estimates CIs that are more intuitive, but I'm curious about the reasoning behind the CIs estimated via the first routine?
My questions are the following:
Thanks!
Zaeen
Consider the following example:
Code:
clear sysuse auto /// CIs estimated simultaneously mean price, over(rep78) di `e(df_r)' /// CIs estimated for one specific group mean price if rep78 == 1 di `e(df_r)'
My questions are the following:
- When is it "correct" to use mean Y, over(X) or mean Y if X == x , as opposed to tabstat Y, stats(mean semean) by(X) and hand-rolling the confidence intervals?
- Why is there a difference between the N used in the calculation of the standard error and the confidence intervals? It seems like this is an arbitrary choice. The manual too, is not clear on this.
Thanks!
Zaeen