Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to deal with a negative intraclass correlation

    Dear Statalist,

    I'm struggling with a negative intraclass correlation while trying to test the null hypothesis whether to fit a multilevel model or not and am not sure how to proceed. The data I'm using is a panel data set of financial statistics, running over only three years.
    The dependent variable here are government spendings of the counties on personnel, in each of the years. These yearly spendings are probably clustered in counties, as the same counties are more likely to spend a similar amount on money on their personnel then other counties. (GKZ= County Variable, perso_pk = personnel spending)

    Code:
     mixed perso_pk || GKZ: if asample==2
    
    Performing EM optimization: 
    
    Performing gradient-based optimization: 
    
    Iteration 0:   log likelihood = -9049.8381  
    Iteration 1:   log likelihood = -9042.8251  
    Iteration 2:   log likelihood = -9042.7822  
    Iteration 3:   log likelihood = -9042.7822  
    
    Computing standard errors:
    
    Mixed-effects ML regression                     Number of obs     =      1,188
    Group variable: GKZ                             Number of groups  =        396
    
                                                    Obs per group:
                                                                  min =          3
                                                                  avg =        3.0
                                                                  max =          3
    
                                                    Wald chi2(0)      =          .
    Log likelihood = -9042.7822                     Prob > chi2       =          .
    
    ------------------------------------------------------------------------------
        perso_pk |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _cons |   715.5544   14.19398    50.41   0.000     687.7347    743.3741
    ------------------------------------------------------------------------------
    
    ------------------------------------------------------------------------------
      Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
    -----------------------------+------------------------------------------------
    GKZ: Identity                |
                      var(_cons) |   9.43e-14   2.10e-13      1.19e-15    7.46e-12
    -----------------------------+------------------------------------------------
                   var(Residual) |   239345.2   9820.468        220851      259388
    ------------------------------------------------------------------------------
    LR test vs. linear model: chibar2(01) = 0.00          Prob >= chibar2 = 1.0000
    
    . estimates store nmonelev
    
    . estat icc
    
    Intraclass correlation
    
    ------------------------------------------------------------------------------
                           Level |        ICC   Std. Err.     [95% Conf. Interval]
    -----------------------------+------------------------------------------------
                             GKZ |   3.94e-19          0      3.94e-19    3.94e-19
    ------------------------------------------------------------------------------
    
    mixed perso_pk if asample==2
    
    Mixed-effects ML regression                     Number of obs     =      1,188
    
                                                    Wald chi2(0)      =          .
    Log likelihood = -9042.7822                     Prob > chi2       =          .
    
    ------------------------------------------------------------------------------
        perso_pk |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _cons |   715.5544   14.19398    50.41   0.000     687.7347    743.3741
    ------------------------------------------------------------------------------
    
    ------------------------------------------------------------------------------
      Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
    -----------------------------+------------------------------------------------
                   var(Residual) |   239345.2   9820.451        220851      259388
    ------------------------------------------------------------------------------
    
    . estimates store nmnolev
    
    . 
    . lrtest nmonelev nmnolev
    
    Likelihood-ratio test                                 LR chi2(1)  =      0.00
    (Assumption: nmnolev nested in nmonelev)              Prob > chi2 =    1.0000
    
    Note: The reported degrees of freedom assumes the null hypothesis is not on the boundary of
          the parameter space.  If this is not true, then the reported test is conservative.
    I know that this is probably a sign for model misspecification. (https://www.stata.com/support/faqs/s...luster-option/)
    I think that the negativ ICC is a result of extreme variances some/most clusters show due to a drop in the dependent variable (spendings on personnel from the first survey-year to the second survey-year). (Here`s a graph of the spendings of the counties in each of the the three panel years, the numbers of the extreme values are the counties = GKZ)

    Click image for larger version

Name:	Personalaufwand pro Kopf_Kreis_Jahr.jpg
Views:	1
Size:	116.5 KB
ID:	1442795


    My first question is: Do you think this average drop in spendings can cause the negative intraclass correlation? In fact the intraclass correlation is almost 0.9 if I calculate it for only the years after the drop in spendings. (2008 and 2009)
    My second question is how to proceed. The problem is that I don't have an independent variable that could explain the general drop in spendings and therefore maybe solve the problem of misspecification. (I'm not looking for an explanation in the drop of the spendings, rather an explanation of what conditions the spendings to be made, like gdp etc. Therefore I would like to kind of ignore the drop) As I'm testing the null hypothesis whether or not to apply a multilevel model, I could come to the conclusion to drop the idea and go on with an ordinary regression. Yet this would ignore the fact there is an intraclass correlation and the problem of misspecification is likely to exist in other models as well. Is there a way of working around the problem that I don't have an adequate variable to explain the drop other than dropping the first year of the panel (year 2000)? What do you think is the best solution?

    Any ideas, hints and good advice are highly appreciated. Thank you and have a nice day,

    Benedikt

    I`m using Stata 14.0

  • #2
    Can you help me to see the negative ICC? I can only see an ICC of 3.94e-19, practically equivalent to zero (but not below zero).

    Comment


    • #3
      I got the same impression, albeit seing the output from a smartphone.

      This is scientific notation. You may wish to take a look at the term. In short, it relates to where the ‘dot’ is supposed to be located. When negative, you may place it to the left.
      Best regards,

      Marcos

      Comment


      • #4
        Originally posted by Benedikt Walker View Post
        My first question is: Do you think this average drop in spendings can cause the negative intraclass correlation? In fact the intraclass correlation is almost 0.9 if I calculate it for only the years after the drop in spendings. (2008 and 2009)
        My second question is how to proceed. The problem is that I don't have an independent variable that could explain the general drop in spendings and therefore maybe solve the problem of misspecification. . . . Is there a way of working around the problem that I don't have an adequate variable to explain the drop other than dropping the first year of the panel (year 2000)?
        It's curious that you don't include survey year as a fixed effect in order to capture the decline in per-capita county government spending.

        Anyway, you've fitted a random-effects regression model to your data and, as others have mentioned, mixed's output doesn't show a negative intraclass correlation coefficient. It can't. The way that mixed is implemented, its variance components are constrained to be strictly positive.

        So, to show whether your data demonstrate a negative intraclass correlation, re-express your model like this:
        Code:
        mixed perso_pk i.jahr if asample==2 || GKZ: , noconstant residuals(exchangeable)
        (Use the actual variable name for your survey year variable.)

        What do you think is the best solution?
        Given the heteroskedasticity shown in your graph, and given what you've mentioned (intraclass correlation coefficient for the last two years alone is 0.9 while overall it's zero), the following might be a better-fitting model of your survey's data.
        Code:
        mixed perso_pk i.jahr if asample==2 || GKZ: , noconstant residuals(unstructured, t(jahr))
        You can also fit these same models (and, in general, very flexibly model your ideas about what's going on) using sem and gsem.


        Comment

        Working...
        X