Modeling heterogeneous group variances

Leonardo Guizzetti

Join Date: Jul 2016
Posts: 2407

Modeling heterogeneous group variances

22 Jan 2019, 17:31

I would appreciate if the Statalist could help with my understanding of the mixed effect syntax and the implied model.

Suppose that I have a randomly samplied group of people (-person_id-) about whom I have measured two difference characteristics (-measure- and -x-). I would like to compute the difference between the mean of two measures and model a heterogeneous variance for each separate type of measure. Clearly a t-test for independent groups is inappropriate.

To help, I've provided a small, reproducible dataset with models that I think do what I'm asking.

Code:

clear
set seed 423

set obs 100
mat def M = (4, 7)
mat def SD = (1.5, 0.75)
mat def R = (1, 0.2 \ 0.2, 1)
drawnorm x1 x2, mean(M) sd(SD) corr(R)
gen int person_id = _n
order person_id, first
reshape long x , i(person_id) j(measure)
compress

In the first model, I estimate separate means for each measurement type. The residuals are independent and computed per measurement type, but are not correlated within-subjects. The 2nd-level is added to have each person as their own cluster, but I'm not estimating a person-specific intercept since I think this is handled by the residual variance structure in this case.

Code:

. mixed x ibn.measure, nocons || (person_id : , nocons), resid(ind, by(measure)) reml dfmethod(kroger) 
* output omitted

Mixed-effects REML regression                   Number of obs     =        200
Group variable: person_id                       Number of groups  =        100

                                                Obs per group:
                                                              min =          2
                                                              avg =        2.0
                                                              max =          2
DF method: Kenward-Roger                        DF:           min =      99.00
                                                              avg =      49.50
                                                              max =      99.00

                                                F(2,   263.30)    =    3910.89
Log restricted-likelihood = -297.96965          Prob > F          =     0.0000

------------------------------------------------------------------------------
           x |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     measure |
          1  |   3.912262   .1333588    29.34   0.000     3.647649    4.176875
          2  |   7.102253   .0850039    83.55   0.000     6.935648    7.268857
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
person_id:           (empty) |
-----------------------------+------------------------------------------------
Residual: Independent,       |
    by measure               |
                   1: var(e) |   1.778458   .2527791      1.346043    2.349785
                   2: var(e) |   .7225666   .1027011      .5468817    .9546901
------------------------------------------------------------------------------
LR test vs. linear model: chi2(1) = 19.43                 Prob > chi2 = 0.0000

Note: The reported degrees of freedom assumes the null hypothesis is not on the boundary of the parameter space.  If this is not true, then the reported
      test is conservative.

The second model is similar to the one above, except I'm allowing the measurements within-subjects to be correlated.

Code:

. mixed x ibn.measure, nocons || (person_id : , nocons), resid(un, t(measure)) reml dfmethod(kroger) 
* output omitted

Mixed-effects REML regression                   Number of obs     =        200
Group variable: person_id                       Number of groups  =        100

                                                Obs per group:
                                                              min =          2
                                                              avg =        2.0
                                                              max =          2
DF method: Kenward-Roger                        DF:           min =      99.00
                                                              avg =      99.00
                                                              max =      99.00

                                                F(2,    98.00)    =    3463.37
Log restricted-likelihood = -293.14194          Prob > F          =     0.0000

------------------------------------------------------------------------------
           x |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     measure |
          1  |   3.912262   .1333588    29.34   0.000     3.647649    4.176875
          2  |   7.102253   .0850039    83.55   0.000     6.933587    7.270919
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
person_id:           (empty) |
-----------------------------+------------------------------------------------
Residual: Unstructured       |
                     var(e1) |   1.778458   .2527791      1.346043    2.349785
                     var(e2) |   .7225667   .1027011      .5468817    .9546901
                  cov(e1,e2) |   .3455619   .1191073       .112116    .5790078
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 29.09                 Prob > chi2 = 0.0000

Note: The reported degrees of freedom assumes the null hypothesis is not on the boundary of the parameter space.  If this is not true, then the reported
      test is conservative.

Based on these, I seem to able to recover the same group means and variance-covariance structure. So am I on the right path?

Tags: None

Announcement

Modeling heterogeneous group variances