Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to assess heteroskedasticity with mixed-effects linear regression models?

    Hi!
    I am working with data that have multilevel structure: repeated measures (2 time points), nested within 102 individuals (idtot), nested within 89 households (idhse), nested 12 villages (village). First, I'd like to run very simple models with "time" as my only predictor; I want to know if the dependant variable (kesspd6 in this case) changes between baseline and follow-up.

    Here is my syntax: mixed kesspd6 time, || village:, covariance(identity) || idhse:, covariance(identity) || idtot:, covariance(identity)

    I was told I should assess the heteroskedasticity, which I think means to plot the residuals against fitted values. Could anyone help me to do so?

    Many thanks!

    Karine

  • #2
    Maybe something like the following will give you the residuals-versus-fitted (fixed effects) plot that you're looking for. Take a look at the part beginning at the "Begin here" comment; the rest is just to create a fictional dataset for illustration.
    Code:
    clear *
    set seed 1420220
    
    quietly set obs 12
    generate byte village = _n
    generate double village_u = rnormal()
    
    quietly expand `=floor(89/12)'
    generate byte household = _n
    generate double household_u = rnormal()
    
    generate int tally = runiformint(`=floor(109/89)', `=ceil(109/89)') // not exact
    quietly expand tally
    generate int individual = _n
    generate double individual_u = rnormal()
    
    forvalues time = 1/2 {
        generate double kesspd6`time' = village_u + household_u + individual_u + rnormal()
    }
    
    quietly reshape long kesspd6, i(individual) j(time)
    
    *
    * Begin here
    *
    mixed kesspd6 i.time || village: || household: || individual:
    
    predict double xb, xb
    predict double re, residuals
    
    graph twoway scatter re xb, mcolor(black) msize(small) ylabel( , angle(horizontal) nogrid)
    
    *
    * Optional
    *
    estimates store Pooled
    
    mixed kesspd6 i.time || village: || household: || individual: , residuals(independent, by(time))
    lrtest . Pooled
    
    exit

    Comment


    • #3
      Thanks for posting these commands. I tested the predict double xb, xb and predict double re, residuals, and plotting the scatter plot on my own dataset, and I get residuals which looks heteroscedastic (different variance). I then run my mixed command over again, this time with the ", residuals(independent, by (time)) and I get a Stata output which seems to me like it is allowing for heteroscedastic variance and thus reporting the estimates accounting for the heteroscedasticity. Am I correct? :-) I added my scatter plot and my Stata output below.

      I'm also wondering if anyone has an opinion whether one should go for an approach with allowing for heteroscedastic variance, or if it is better to use robust standard errors (and get worse p-values and lose several significant findings). Click image for larger version

Name:	scatter re xb after used predic double xb double re with depvar tnfalpha and random intercept.png
Views:	1
Size:	18.2 KB
ID:	1442427

      ****STATA commands with output****

      mixed tnfalpha time3 time3_ptsd i.ptsd_any3 ///
      if pas_over95_percentil_19_stykk==1 || id:, cov(unstructured) dfmethod(repeated)
      Note: single-variable random-effects specification in id equation; covariance structure set to identity

      Performing EM optimization:

      Performing gradient-based optimization:

      Iteration 0: log likelihood = -975.91813
      Iteration 1: log likelihood = -975.91813

      Computing standard errors:

      Computing degrees of freedom:

      Mixed-effects ML regression Number of obs = 330
      Group variable: id Number of groups = 112

      Obs per group:
      min = 1
      avg = 2.9
      max = 3
      DF method: Repeated DF: min = 110.00
      avg = 163.00
      max = 216.00

      F(3, . ) = .
      Log likelihood = -975.91813 Prob > F = .

      ------------------------------------------------------------------------------
      tnfalpha | Coef. Std. Err. t P>|t| [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      time3 | -.0035024 .0438655 -0.08 0.936 -.0899617 .0829569
      time3_ptsd | .168248 .0714347 2.36 0.019 .0274497 .3090463
      |
      ptsd_any3 |
      Yes PTSD | 2.24675 1.473598 1.52 0.130 -.6735754 5.167076
      _cons | 2.212733 .8918672 2.48 0.015 .4452615 3.980205
      ------------------------------------------------------------------------------

      ------------------------------------------------------------------------------
      Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
      -----------------------------+------------------------------------------------
      id: Identity |
      var(_cons) | 50.07602 7.049178 38.00201 65.98619
      -----------------------------+------------------------------------------------
      var(Residual) | 7.906566 .756715 6.554232 9.537927
      ------------------------------------------------------------------------------
      LR test vs. linear model: chibar2(01) = 328.91 Prob >= chibar2 = 0.0000

      . estimates store M5

      . estat ic

      Akaike's information criterion and Bayesian information criterion

      -----------------------------------------------------------------------------
      Model | Obs ll(null) ll(model) df AIC BIC
      -------------+---------------------------------------------------------------
      M5 | 330 . -975.9181 6 1963.836 1986.631
      -----------------------------------------------------------------------------



      mixed tnfalpha time3 time3_ptsd i.ptsd_any3 ///
      > if pas_over95_percentil_19_stykk==1 || id:, cov(unstructured) dfmethod(repeated) residuals(independent, by(time3))

      Note: single-variable random-effects specification in id equation; covariance structure set to identity

      Obtaining starting values by EM:

      Performing gradient-based optimization:

      Iteration 0: log likelihood = -975.91813
      Iteration 1: log likelihood = -967.55957
      Iteration 2: log likelihood = -965.5496
      Iteration 3: log likelihood = -965.11209
      Iteration 4: log likelihood = -965.03766
      Iteration 5: log likelihood = -965.03303
      Iteration 6: log likelihood = -965.033

      Computing standard errors:

      Computing degrees of freedom:

      Mixed-effects ML regression Number of obs = 330
      Group variable: id Number of groups = 112

      Obs per group:
      min = 1
      avg = 2.9
      max = 3
      DF method: Repeated DF: min = 110.00
      avg = 163.00
      max = 216.00

      F(3, . ) = .
      Log likelihood = -965.033 Prob > F = .

      ------------------------------------------------------------------------------
      tnfalpha | Coef. Std. Err. t P>|t| [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      time3 | -.0002223 .0392183 -0.01 0.995 -.077522 .0770773
      time3_ptsd | .1954901 .0638117 3.06 0.002 .0697168 .3212634
      |
      ptsd_any3 |
      Yes PTSD | 2.049005 1.378937 1.49 0.140 -.6837256 4.781735
      _cons | 2.189979 .8343173 2.62 0.010 .5365584 3.8434
      ------------------------------------------------------------------------------

      ------------------------------------------------------------------------------
      Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
      -----------------------------+------------------------------------------------
      id: Identity |
      var(_cons) | 48.16031 6.584984 36.83868 62.96142
      -----------------------------+------------------------------------------------
      Residual: Independent, |
      by time3 |
      0: var(e) | 1.29829 1.12809 .2364608 7.128275
      5: var(e) | 10.05536 1.676319 7.252616 13.94121
      11: var(e) | 13.49873 2.419107 9.500557 19.17947
      ------------------------------------------------------------------------------
      LR test vs. linear model: chi2(3) = 350.68 Prob > chi2 = 0.0000

      Note: LR test is conservative and provided only for reference.

      .
      end of do-file

      . lrtest . pooled

      Likelihood-ratio test LR chi2(2) = 21.77
      (Assumption: pooled nested in .) Prob > chi2 = 0.0000
      Last edited by Helge Toft; 02 May 2018, 02:37.

      Comment

      Working...
      X