Dear all, I
am dealing with a fairly complicated cross-sectional data structure ("cross-classified") which implies non-independence of the errors when analysing my Y. Individuals (N=1200) are nested in level two clusters (C1 and C2) of the size of 8 (C1) and 29 (C2). These clusters interact and form a level three cluster with the size of 195 (C3).
Hence, I started with a variance component model for my Y.
In STATA notation:
(1)
However, the ICC for C2 (0.008) and C3 (0.0003) is very low and close to zero. Only for C1 it is bigger (0.08).
Hence, I thought of just using a single level regression model with dummies for C1, as I only got explanatory variables on the C2, C3, and individual level.
Again in STATA notation:
Of course, a first conclusion of the variance-component model (1) is that the effects of the explanatory variables on C2 and C3 level will have little impact for explaining the phenomenon I'm interested in. However, since theory suggests so, I still want to incorporate them into the model.
Now, I was thinking about applying cluster-robust standard errors (CSR). But on which level should they be applied?
The interaction level C3 or C1? To cluster at C1, the cluster size (8 clusters) is too small. Clustering at level 3 (195 clusters) would be more feasible in this regard I think.
Is this approach wrong since the ICC was larger only for C1? Should I somehow try to bootstrap the CSR for C1 as some suggest for small cluster sizes?
This topic was corss-posten on: https://stats.stackexchange.com/ques...fied-structure
Kind regards
am dealing with a fairly complicated cross-sectional data structure ("cross-classified") which implies non-independence of the errors when analysing my Y. Individuals (N=1200) are nested in level two clusters (C1 and C2) of the size of 8 (C1) and 29 (C2). These clusters interact and form a level three cluster with the size of 195 (C3).
Hence, I started with a variance component model for my Y.
In STATA notation:
Code:
mixed Y , reml || _all: R.C1 || C2: || C3:
However, the ICC for C2 (0.008) and C3 (0.0003) is very low and close to zero. Only for C1 it is bigger (0.08).
Hence, I thought of just using a single level regression model with dummies for C1, as I only got explanatory variables on the C2, C3, and individual level.
Again in STATA notation:
Code:
reg Y i.C1
Now, I was thinking about applying cluster-robust standard errors (CSR). But on which level should they be applied?
The interaction level C3 or C1? To cluster at C1, the cluster size (8 clusters) is too small. Clustering at level 3 (195 clusters) would be more feasible in this regard I think.
Code:
reg Y i.C1, cluster(C3)
This topic was corss-posten on: https://stats.stackexchange.com/ques...fied-structure
Kind regards