Dear Statalist,
I'm struggling with a negative intraclass correlation while trying to test the null hypothesis whether to fit a multilevel model or not and am not sure how to proceed. The data I'm using is a panel data set of financial statistics, running over only three years.
The dependent variable here are government spendings of the counties on personnel, in each of the years. These yearly spendings are probably clustered in counties, as the same counties are more likely to spend a similar amount on money on their personnel then other counties. (GKZ= County Variable, perso_pk = personnel spending)
I know that this is probably a sign for model misspecification. (https://www.stata.com/support/faqs/s...luster-option/)
I think that the negativ ICC is a result of extreme variances some/most clusters show due to a drop in the dependent variable (spendings on personnel from the first survey-year to the second survey-year). (Here`s a graph of the spendings of the counties in each of the the three panel years, the numbers of the extreme values are the counties = GKZ)

My first question is: Do you think this average drop in spendings can cause the negative intraclass correlation? In fact the intraclass correlation is almost 0.9 if I calculate it for only the years after the drop in spendings. (2008 and 2009)
My second question is how to proceed. The problem is that I don't have an independent variable that could explain the general drop in spendings and therefore maybe solve the problem of misspecification. (I'm not looking for an explanation in the drop of the spendings, rather an explanation of what conditions the spendings to be made, like gdp etc. Therefore I would like to kind of ignore the drop) As I'm testing the null hypothesis whether or not to apply a multilevel model, I could come to the conclusion to drop the idea and go on with an ordinary regression. Yet this would ignore the fact there is an intraclass correlation and the problem of misspecification is likely to exist in other models as well. Is there a way of working around the problem that I don't have an adequate variable to explain the drop other than dropping the first year of the panel (year 2000)? What do you think is the best solution?
Any ideas, hints and good advice are highly appreciated. Thank you and have a nice day,
Benedikt
I`m using Stata 14.0
I'm struggling with a negative intraclass correlation while trying to test the null hypothesis whether to fit a multilevel model or not and am not sure how to proceed. The data I'm using is a panel data set of financial statistics, running over only three years.
The dependent variable here are government spendings of the counties on personnel, in each of the years. These yearly spendings are probably clustered in counties, as the same counties are more likely to spend a similar amount on money on their personnel then other counties. (GKZ= County Variable, perso_pk = personnel spending)
Code:
mixed perso_pk || GKZ: if asample==2 Performing EM optimization: Performing gradient-based optimization: Iteration 0: log likelihood = -9049.8381 Iteration 1: log likelihood = -9042.8251 Iteration 2: log likelihood = -9042.7822 Iteration 3: log likelihood = -9042.7822 Computing standard errors: Mixed-effects ML regression Number of obs = 1,188 Group variable: GKZ Number of groups = 396 Obs per group: min = 3 avg = 3.0 max = 3 Wald chi2(0) = . Log likelihood = -9042.7822 Prob > chi2 = . ------------------------------------------------------------------------------ perso_pk | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | 715.5544 14.19398 50.41 0.000 687.7347 743.3741 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ GKZ: Identity | var(_cons) | 9.43e-14 2.10e-13 1.19e-15 7.46e-12 -----------------------------+------------------------------------------------ var(Residual) | 239345.2 9820.468 220851 259388 ------------------------------------------------------------------------------ LR test vs. linear model: chibar2(01) = 0.00 Prob >= chibar2 = 1.0000 . estimates store nmonelev . estat icc Intraclass correlation ------------------------------------------------------------------------------ Level | ICC Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ GKZ | 3.94e-19 0 3.94e-19 3.94e-19 ------------------------------------------------------------------------------ mixed perso_pk if asample==2 Mixed-effects ML regression Number of obs = 1,188 Wald chi2(0) = . Log likelihood = -9042.7822 Prob > chi2 = . ------------------------------------------------------------------------------ perso_pk | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | 715.5544 14.19398 50.41 0.000 687.7347 743.3741 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ var(Residual) | 239345.2 9820.451 220851 259388 ------------------------------------------------------------------------------ . estimates store nmnolev . . lrtest nmonelev nmnolev Likelihood-ratio test LR chi2(1) = 0.00 (Assumption: nmnolev nested in nmonelev) Prob > chi2 = 1.0000 Note: The reported degrees of freedom assumes the null hypothesis is not on the boundary of the parameter space. If this is not true, then the reported test is conservative.
I think that the negativ ICC is a result of extreme variances some/most clusters show due to a drop in the dependent variable (spendings on personnel from the first survey-year to the second survey-year). (Here`s a graph of the spendings of the counties in each of the the three panel years, the numbers of the extreme values are the counties = GKZ)
My first question is: Do you think this average drop in spendings can cause the negative intraclass correlation? In fact the intraclass correlation is almost 0.9 if I calculate it for only the years after the drop in spendings. (2008 and 2009)
My second question is how to proceed. The problem is that I don't have an independent variable that could explain the general drop in spendings and therefore maybe solve the problem of misspecification. (I'm not looking for an explanation in the drop of the spendings, rather an explanation of what conditions the spendings to be made, like gdp etc. Therefore I would like to kind of ignore the drop) As I'm testing the null hypothesis whether or not to apply a multilevel model, I could come to the conclusion to drop the idea and go on with an ordinary regression. Yet this would ignore the fact there is an intraclass correlation and the problem of misspecification is likely to exist in other models as well. Is there a way of working around the problem that I don't have an adequate variable to explain the drop other than dropping the first year of the panel (year 2000)? What do you think is the best solution?
Any ideas, hints and good advice are highly appreciated. Thank you and have a nice day,
Benedikt
I`m using Stata 14.0
Comment