Again, we have to distinguish two meanings of "clustering."
In terms of specifying a level in a random- or mixed-effects model, you generally wouldn't do it with only 5-10 entities at that level because a random sample of only 5-10 from a normal distribution gives extremely imprecise estimates of its variance. So the results you get from including random effects at that level are just not useful.
In terms of specifying clustering for cluster robust variance estimators, simulations have shown that the standard errors provided by the cluster robust vce are actually worse than the ordinary standard errors when the number of clusters is that small. VCE has good large sample (# of clusters) properties, but works poorly in small samples. Unfortunately, as far as I know, this is backed up only by limited simulation studies and, for that reason, there is no rigorous basis for knowing how few is too few and how many is enough under what circumstances.
In terms of specifying a level in a random- or mixed-effects model, you generally wouldn't do it with only 5-10 entities at that level because a random sample of only 5-10 from a normal distribution gives extremely imprecise estimates of its variance. So the results you get from including random effects at that level are just not useful.
In terms of specifying clustering for cluster robust variance estimators, simulations have shown that the standard errors provided by the cluster robust vce are actually worse than the ordinary standard errors when the number of clusters is that small. VCE has good large sample (# of clusters) properties, but works poorly in small samples. Unfortunately, as far as I know, this is backed up only by limited simulation studies and, for that reason, there is no rigorous basis for knowing how few is too few and how many is enough under what circumstances.
Comment