How to assess heteroskedasticity with mixed-effects linear regression models?

Karine Perreault

Join Date: Nov 2017

Posts: 1
#1

How to assess heteroskedasticity with mixed-effects linear regression models?

29 Nov 2017, 19:45

Hi!
I am working with data that have multilevel structure: repeated measures (2 time points), nested within 102 individuals (idtot), nested within 89 households (idhse), nested 12 villages (village). First, I'd like to run very simple models with "time" as my only predictor; I want to know if the dependant variable (kesspd6 in this case) changes between baseline and follow-up.

Here is my syntax: mixed kesspd6 time, || village:, covariance(identity) || idhse:, covariance(identity) || idtot:, covariance(identity)

I was told I should assess the heteroskedasticity, which I think means to plot the residuals against fitted values. Could anyone help me to do so?

Many thanks!

Karine
Tags: None

Joseph Coveney

Join Date: Apr 2014
Posts: 4420

30 Nov 2017, 03:05

Maybe something like the following will give you the residuals-versus-fitted (fixed effects) plot that you're looking for. Take a look at the part beginning at the "Begin here" comment; the rest is just to create a fictional dataset for illustration.

Code:

clear *
set seed 1420220

quietly set obs 12
generate byte village = _n
generate double village_u = rnormal()

quietly expand `=floor(89/12)'
generate byte household = _n
generate double household_u = rnormal()

generate int tally = runiformint(`=floor(109/89)', `=ceil(109/89)') // not exact
quietly expand tally
generate int individual = _n
generate double individual_u = rnormal()

forvalues time = 1/2 {
    generate double kesspd6`time' = village_u + household_u + individual_u + rnormal()
}

quietly reshape long kesspd6, i(individual) j(time)

*
* Begin here
*
mixed kesspd6 i.time || village: || household: || individual:

predict double xb, xb
predict double re, residuals

graph twoway scatter re xb, mcolor(black) msize(small) ylabel( , angle(horizontal) nogrid)

*
* Optional
*
estimates store Pooled

mixed kesspd6 i.time || village: || household: || individual: , residuals(independent, by(time))
lrtest . Pooled

exit

Comment

Helge Toft

Join Date: Apr 2018

Posts: 8
#3

02 May 2018, 02:35

Thanks for posting these commands. I tested the predict double xb, xb and predict double re, residuals, and plotting the scatter plot on my own dataset, and I get residuals which looks heteroscedastic (different variance). I then run my mixed command over again, this time with the ", residuals(independent, by (time)) and I get a Stata output which seems to me like it is allowing for heteroscedastic variance and thus reporting the estimates accounting for the heteroscedasticity. Am I correct? :-) I added my scatter plot and my Stata output below.

I'm also wondering if anyone has an opinion whether one should go for an approach with allowing for heteroscedastic variance, or if it is better to use robust standard errors (and get worse p-values and lose several significant findings).

****STATA commands with output****

mixed tnfalpha time3 time3_ptsd i.ptsd_any3 ///
if pas_over95_percentil_19_stykk==1 || id:, cov(unstructured) dfmethod(repeated)
Note: single-variable random-effects specification in id equation; covariance structure set to identity

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0: log likelihood = -975.91813
Iteration 1: log likelihood = -975.91813

Computing standard errors:

Computing degrees of freedom:

Mixed-effects ML regression Number of obs = 330
Group variable: id Number of groups = 112

Obs per group:
min = 1
avg = 2.9
max = 3
DF method: Repeated DF: min = 110.00
avg = 163.00
max = 216.00

F(3, . ) = .
Log likelihood = -975.91813 Prob > F = .

------------------------------------------------------------------------------
tnfalpha | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
time3 | -.0035024 .0438655 -0.08 0.936 -.0899617 .0829569
time3_ptsd | .168248 .0714347 2.36 0.019 .0274497 .3090463
|
ptsd_any3 |
Yes PTSD | 2.24675 1.473598 1.52 0.130 -.6735754 5.167076
_cons | 2.212733 .8918672 2.48 0.015 .4452615 3.980205
------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity |
var(_cons) | 50.07602 7.049178 38.00201 65.98619
-----------------------------+------------------------------------------------
var(Residual) | 7.906566 .756715 6.554232 9.537927
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 328.91 Prob >= chibar2 = 0.0000

. estimates store M5

. estat ic

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
Model | Obs ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
M5 | 330 . -975.9181 6 1963.836 1986.631
-----------------------------------------------------------------------------

mixed tnfalpha time3 time3_ptsd i.ptsd_any3 ///
> if pas_over95_percentil_19_stykk==1 || id:, cov(unstructured) dfmethod(repeated) residuals(independent, by(time3))

Note: single-variable random-effects specification in id equation; covariance structure set to identity

Obtaining starting values by EM:

Performing gradient-based optimization:

Iteration 0: log likelihood = -975.91813
Iteration 1: log likelihood = -967.55957
Iteration 2: log likelihood = -965.5496
Iteration 3: log likelihood = -965.11209
Iteration 4: log likelihood = -965.03766
Iteration 5: log likelihood = -965.03303
Iteration 6: log likelihood = -965.033

Computing standard errors:

Computing degrees of freedom:

Mixed-effects ML regression Number of obs = 330
Group variable: id Number of groups = 112

Obs per group:
min = 1
avg = 2.9
max = 3
DF method: Repeated DF: min = 110.00
avg = 163.00
max = 216.00

F(3, . ) = .
Log likelihood = -965.033 Prob > F = .

------------------------------------------------------------------------------
tnfalpha | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
time3 | -.0002223 .0392183 -0.01 0.995 -.077522 .0770773
time3_ptsd | .1954901 .0638117 3.06 0.002 .0697168 .3212634
|
ptsd_any3 |
Yes PTSD | 2.049005 1.378937 1.49 0.140 -.6837256 4.781735
_cons | 2.189979 .8343173 2.62 0.010 .5365584 3.8434
------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity |
var(_cons) | 48.16031 6.584984 36.83868 62.96142
-----------------------------+------------------------------------------------
Residual: Independent, |
by time3 |
0: var(e) | 1.29829 1.12809 .2364608 7.128275
5: var(e) | 10.05536 1.676319 7.252616 13.94121
11: var(e) | 13.49873 2.419107 9.500557 19.17947
------------------------------------------------------------------------------
LR test vs. linear model: chi2(3) = 350.68 Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

.
end of do-file

. lrtest . pooled

Likelihood-ratio test LR chi2(2) = 21.77
(Assumption: pooled nested in .) Prob > chi2 = 0.0000

Last edited by Helge Toft; 02 May 2018, 02:37.
Comment

Announcement

How to assess heteroskedasticity with mixed-effects linear regression models?

Comment

Comment