Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Checking Clustered Standard Errors / p-value

    Hello! I am new to STATA and would appreciate some guidance in this.

    I am running a clustered regression with just 1 dependent variable (access_base) and 1 independent variable (treatment_pooled). The cluster variable is time_afternoon and I am also including fixed effects using daily_strata_base

    In the STATA results, coefficient is -0.308, standard error is 0.093, and there are 9563 observations. The t-statistic is -3.30 and so, I am guessing that there should be a low p-value. But the p-value is very high at 0.187 and the code/output is as below:


    reg access_base treatment_pooled i.daily_strata_base, vce(cluster time_afternoon)

    Linear regression Number of obs = 9,563
    F(0, 1) = .
    Prob > F = .
    R-squared = 0.1383
    Root MSE = 3.6333

    (Std. err. adjusted for 2 clusters in time_afternoon_base)
    -----------------------------------------------------------------------------------
    | Robust
    access_base | Coefficient std. err. t P>|t| [95% conf. interval]
    ------------------+----------------------------------------------------------------
    treatment_pooled | -.3077829 .0932648 -3.30 0.187 -1.492825 .8772587


    I am wondering how do I check if the standard errors and p-value is calculated correctly? Thank you very much!

  • #2
    It is all calculated correctly. The problem is with your use of clustered standard errors. You have only two clusters--so clustered standard errors are not valid. While there is no simple rule of thumb how many clusters are needed, you shouldn't even think about them with fewer than 15 clusters, and most people would recommend a much larger number like 50 or even 100.

    Putting that misuse of clustering aside, and imagining that it was OK, when you have n clusters, the degrees of freedom for your t-statistic is not based on the number of observations in the estimation sample. It is, instead, the n-1; in your case that's 2-1 = 1. (You can also see this in the header of the output, where it shows F(0, 1) = .; you have only 1 df.) The p-value associated with t = -3.30 with 1 df is, indeed, 0.187 (to 3 decimal places).

    Comment

    Working...
    X