Mixed model specification for neuroimaging data

Gianfranco Di Gennaro

Join Date: Oct 2020
Posts: 144

Mixed model specification for neuroimaging data

17 May 2026, 00:41

Hi everyone,

I am working on a longitudinal neuroimaging dataset and I would appreciate some advice regarding the most appropriate analytical strategy.

I have cortical thickness measurements from 68 different brain regions for each subject.

The data are in long format, where each row represents:

subject (idd)
cortical area (area_id)
timepoint
cortical thickness (a)

Each subject was measured at two timepoints only: baseline and follow-up

The follow-up interval is variable across subjects and is represented by the variable "Months_between_MRI"

At baseline Months_between_MRI is always 0

The main clinical predictors are:

"sclerosi" (presence of sclerosis)
"resistant" (drug resistance)
"toniclonic" (presence of seizures)

Additional covariates:

Age
Gender
ICV
Epilepsy_duration

A datasubste with only 7 out 0f 68 brain areas is the following:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float idd int Months_between_MRI byte area_id double a byte(Age Gender) double(Epilepsy_duration Onset_years) float(sclerosi resistant toniclonic)
1  0 1  2.45 30 0  2 28 0 0 1
1  0 2 2.406 30 0  2 28 0 0 1
1  0 3 2.544 30 0  2 28 0 0 1
1  0 4 1.987 30 0  2 28 0 0 1
1  0 5 2.917 30 0  2 28 0 0 1
1  0 6 2.667 30 0  2 28 0 0 1
1 44 1 2.362 33 0  5 28 0 0 1
1 44 2 2.408 33 0  5 28 0 0 1
1 44 3 2.521 33 0  5 28 0 0 1
1 44 4  1.95 33 0  5 28 0 0 1
1 44 5 2.677 33 0  5 28 0 0 1
1 44 6  2.59 33 0  5 28 0 0 1
2  0 1 2.253 36 0 15 21 1 1 1
2  0 2 1.969 36 0 15 21 1 1 1
2  0 3 2.105 36 0 15 21 1 1 1
2  0 4 1.668 36 0 15 21 1 1 1
2  0 5 1.083 36 0 15 21 1 1 1
2  0 6 1.683 36 0 15 21 1 1 1
2 72 1 2.414 42 0 21 21 1 1 1
2 72 2 2.708 42 0 21 21 1 1 1
2 72 3 2.163 42 0 21 21 1 1 1
2 72 4  2.06 42 0 21 21 1 1 1
2 72 5 1.344 42 0 21 21 1 1 1
2 72 6 1.872 42 0 21 21 1 1 1
3  0 1  2.08 34 0 16 18 0 1 1
3  0 2 2.637 34 0 16 18 0 1 1
3  0 3 2.274 34 0 16 18 0 1 1
3  0 4 1.796 34 0 16 18 0 1 1
3  0 5  .944 34 0 16 18 0 1 1
3  0 6   1.3 34 0 16 18 0 1 1
3 38 1 2.039 37 0 19 18 0 1 1
3 38 2 2.516 37 0 19 18 0 1 1
3 38 3 2.092 37 0 19 18 0 1 1
3 38 4 2.457 37 0 19 18 0 1 1
3 38 5  .982 37 0 19 18 0 1 1
3 38 6 2.094 37 0 19 18 0 1 1
4  0 1 2.035 44 0  2 42 0 0 0
4  0 2 1.936 44 0  2 42 0 0 0
4  0 3 2.137 44 0  2 42 0 0 0
4  0 4 1.374 44 0  2 42 0 0 0
4  0 5   .83 44 0  2 42 0 0 0
4  0 6 1.563 44 0  2 42 0 0 0
4 36 1 1.786 47 0  5 42 0 0 0
4 36 2 1.805 47 0  5 42 0 0 0
4 36 3 1.993 47 0  5 42 0 0 0
4 36 4 2.314 47 0  5 42 0 0 0
4 36 5 1.433 47 0  5 42 0 0 0
4 36 6 1.628 47 0  5 42 0 0 0
5  0 1 2.002 49 0 15 34 0 1 1
5  0 2 1.783 49 0 15 34 0 1 1
5  0 3 2.322 49 0 15 34 0 1 1
5  0 4 1.403 49 0 15 34 0 1 1
5  0 5  .945 49 0 15 34 0 1 1
5  0 6  1.45 49 0 15 34 0 1 1
5 18 1 2.468 51 0 17 34 0 1 1
5 18 2  1.99 51 0 17 34 0 1 1
5 18 3 1.985 51 0 17 34 0 1 1
5 18 4 2.149 51 0 17 34 0 1 1
5 18 5  .942 51 0 17 34 0 1 1
5 18 6 1.599 51 0 17 34 0 1 1
6  0 1 2.484 25 0  5 20 0 0 1
6  0 2 3.064 25 0  5 20 0 0 1
6  0 3 2.445 25 0  5 20 0 0 1
6  0 4 1.973 25 0  5 20 0 0 1
6  0 5 3.265 25 0  5 20 0 0 1
6  0 6  2.84 25 0  5 20 0 0 1
6 66 1 2.072 31 0 11 20 0 0 1
6 66 2 2.731 31 0 11 20 0 0 1
6 66 3 1.656 31 0 11 20 0 0 1
6 66 4 2.286 31 0 11 20 0 0 1
6 66 5 1.459 31 0 11 20 0 0 1
6 66 6  1.88 31 0 11 20 0 0 1
7  0 1  2.11 24 0 11 13 0 1 1
7  0 2 2.386 24 0 11 13 0 1 1
7  0 3 2.184 24 0 11 13 0 1 1
7  0 4  1.28 24 0 11 13 0 1 1
7  0 5  .905 24 0 11 13 0 1 1
7  0 6 1.741 24 0 11 13 0 1 1
7 34 1 2.135 27 0 14 13 0 1 1
7 34 2 2.514 27 0 14 13 0 1 1
7 34 3 1.857 27 0 14 13 0 1 1
7 34 4 2.569 27 0 14 13 0 1 1
7 34 5   1.4 27 0 14 13 0 1 1
7 34 6 1.964 27 0 14 13 0 1 1
8  0 1 2.523 15 1  0 15 0 0 0
8  0 2 3.226 15 1  0 15 0 0 0
8  0 3 2.501 15 1  0 15 0 0 0
8  0 4 2.098 15 1  0 15 0 0 0
8  0 5 3.217 15 1  0 15 0 0 0
8  0 6 2.716 15 1  0 15 0 0 0
8 19 1 2.299 16 1  1 15 0 0 0
8 19 2 2.448 16 1  1 15 0 0 0
8 19 3 2.307 16 1  1 15 0 0 0
8 19 4 2.347 16 1  1 15 0 0 0
8 19 5 2.289 16 1  1 15 0 0 0
8 19 6 2.462 16 1  1 15 0 0 0
9  0 1 1.985 32 0 21 11 1 1 1
9  0 2 2.497 32 0 21 11 1 1 1
9  0 3 1.735 32 0 21 11 1 1 1
9  0 4 1.562 32 0 21 11 1 1 1
end

The cortical areas have very different absolute thickness scales, so using raw thickness values makes interpretation difficult across regions.

Initially, I tried z-scoring within area
andlongitudinal mixed models such as:

Code:

mixed z_area  i.sclerosi##c.Months_between_MRI i.resistant##c.Months_between_MRI  i.toniclonic##c.Months_between_MRI  Age Gender ICV Epilepsy_duration || idd: || area_id:

And then:

Code:

margins sclerosi#resistant#toniclonic, ///
    at(Months_between_MRI=(0 30 60 120))
marginsplot, xdimension(Months_between_MRI)

However:

interpretation became less intuitive, plotting trajectories was difficult and I am unsure whether this is the best strategy given that I only have two timepoints.

Current idea

I am now considering computing percentage change from baseline pct_change = 100*(followup - baseline)/baseline

and then analyzing it by keeping only the follow-up observation.

This would produce one observation per subject-area combination representing the rate of cortical thinning.

Then fitting something like:

Code:

mixed pct_change i.sclerosi i.responder  i.toniclonic  c.baseline_area Age Gender ICV Epilepsy_duration  || area_id:

Does this approach seem statistically reasonable given only two timepoints, variable follow-up duration, many cortical regions per subject?

In the end...Which analytical strategy would you suggest? My main aim is to identify which subjects, in terms of hippocampal sclerosis, drug resistance, and presence of tonic-clonic seizures,show the greatest cortical thickness reduction over time.
Thanks for your time!
Gianfranco

Last edited by Gianfranco Di Gennaro; 17 May 2026, 00:41. Reason: multilevel modelling

Tags: None

Joseph Coveney

Join Date: Apr 2014

Posts: 4540
#2

17 May 2026, 15:41

Originally posted by Gianfranco Di Gennaro View Post

Does this approach seem statistically reasonable given only two timepoints, variable follow-up duration, many cortical regions per subject?

You have the fixed and limited set of anatomical locations (area_id) in the model as the random effect. That doesn’t seem ideal.

In the end...Which analytical strategy would you suggest?

I recommend reducing the number of dimensions to something manageable and interpretable. One approach is to go back to your neurologist friends and ask them to provide you a list of the local surface areas circumscribed by (or represented by) the 68 cortical regions (area IDs) they’ve chosen, even if just relative to each other—they don’t have to be cm²; what you’re looking for is a set of reference values by which you can weight the 68 cortical thickness values in a dimension-reduction approach.

With this set of (absolute or relative) local surface areas, you can compute cortical volumes for each of the 68 cortical regions and sum them to get the total 68-region cortical volume—a single value for each patient at each of the two time points. This allows you to reshape wide to time-zero and time-t total cortical volumes and fit an ANCOVA model with the time-zero volume as the covariate along with the predictors of interest. You can examine diagnostic plots (rvfplot, qnorm, pnorm) and entertain a RESET-like modification of the ANCOVA model as warranted for baseline volume or observation interval.

Alternatively, you can use the change scores as you've shown, but rather than weight them by the baseline and retain all 68 values, weight the change scores by the list from your neurologist colleagues, and then sum the weighted change scores to get a single volume lost outcome variable.

This 68-cortical-region thing comes across as kinda phrenological. I assume that your colleagues have biological support for the premise that you can “identify which subjects, in terms of hippocampal sclerosis, drug resistance, and presence of tonic-clonic seizures, show the greatest cortical thickness reduction over time“.
Comment
Dave Airey

Join Date: Apr 2014

Posts: 416
#3

18 May 2026, 14:09

Probably this 68-cortical region thing comes from an MRI atlas like... Desikan RS, et al., PMID: 16530430, which seems to divide the whole cortex into 34 parcels per hemisphere. Unless the PI has specific hypotheses for cortical areas by disease conditions, I'd be inclined to summarize across region and time too, before trying a mixed model. Do you have lots of subjects?
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4540
#4

19 May 2026, 02:02

Originally posted by Dave Airey View Post

Probably this 68-cortical region thing comes from an MRI atlas . . .

Yeah, it's paywalled and I cannot tell whether the atlas gives areas for the regions, if that's what.Gianfranco's colleague used.

Unless the PI has specific hypotheses for cortical areas by disease conditions, I'd be inclined to summarize across region and time too, before trying a mixed model

Gianfranco's post doesn't seem to imply that, and then the question becomes how to summarize across regions if the atlas or colleague does not have areas for cortical regions. Forming interaction terms with each of the three predictors and follow-up interval (i.area_id##i.(sclerosi resistant toniclonic)##c.Months_between_MRI) will accommodate the difference in thickness between regions that troubles Gianfranco, but it leaves a lot of temptation for phrenology by the PI.

In the absence of a list of areas or other weights to allow summarizing across region, an alternative dimension-reducing approach to #2 above might be to consider the 68 cortical regions as judges of patients as targets, and fit a cross-classified random effects model, perhaps something like the following.

Code:

mixed a i.(sclerosi resistant toniclonic)##c.Months_between_MRI || _all: R.area_id || idd: c.Months_between_MRI

It has its own assumptions, and I'd be curious as to a critique of that approach given such circumstances.
Comment

Gianfranco Di Gennaro

Join Date: Oct 2020
Posts: 144

20 May 2026, 09:05

Dear Dave Airey and Joseph Coveney , thanky you for your efforts.
Yes, I think the areas' classification is from Desikan RS, et al.,

However, I attempted a (very) raw data synthesis by summing all 68 areas for each patient at baseline and follow-up to obtain an "overall thickness" (or "cortical integrity", if you prefer).
Then I ran a very simple ANCOVA in which I use the followup-baseline delta and adjusting for baseline and other covariates.

The results I obtained are consistent with physiology and expected deltas.

Code:

egen n_area = rownonmiss(a1-a68)
egen sum_area = rowtotal(a1-a68)




keep idd time sum_area ///
     Months_between_MRI ///
     sclerosi resistant toniclonic ///
     Age Gender Epilepsy_duration

reshape wide ///
    sum_area ///
    Months_between_MRI ///
    Age ///
    Epilepsy_duration, ///
    i(idd) j(time)
    

    


    
gen delta_mm = sum_area2 - sum_area1
regress delta_mm sum_area1 Months_between_MRI2 Epilepsy_duration1 Age1 Gender i.sclerosi i.resistant i.toniclonic
    margins sclerosi#resistant#toniclonic, saving(margins_results, replace)
    use margins_results, clear
    gen gruppo = _n

    
    
    
label define group ///
1 "No Scl / No Res / No TC" ///
2 "No Scl / No Res / TC" ///
3 "No Scl / Res / No TC" ///
4 "No Scl / Res / TC" ///
5 "Scl / No Res / No TC" ///
6 "Scl / No Res / TC" ///
7 "Scl / Res / No TC" ///
8 "Scl / Res / TC"

label values gruppo group
    
gsort -_margin
twoway ///
(scatter _margin gruppo, msize(medlarge)) ///
(rcap _ci_lb _ci_ub gruppo), ///
xlabel(1/8, valuelabel angle(45)) ///
ytitle("Adjusted global cortical lost (mm) in thickness") ///
xtitle("")
    
-------------------------------------------------------------------------------------
           delta_mm | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
--------------------+----------------------------------------------------------------
          sum_area1 |  -.9623607   .0695779   -13.83   0.000    -1.101916   -.8228052
Months_between_MRI2 |   .0155737   .0701308     0.22   0.825     -.125091    .1562383
 Epilepsy_duration1 |   .1867963   .1280367     1.46   0.150    -.0700128    .4436053
               Age1 |   .2890841   .1255578     2.30   0.025      .037247    .5409212
             Gender |   2.353688    3.71765     0.63   0.529    -5.102974     9.81035
         1.sclerosi |  -9.383257   3.676014    -2.55   0.014    -16.75641   -2.010107
        1.resistant |  -3.822598   4.367367    -0.88   0.385    -12.58243     4.93723
       1.toniclonic |  -.7848044   4.255132    -0.18   0.854    -9.319519     7.74991
              _cons |   133.1212    12.4609    10.68   0.000     108.1278    158.1146
-------------------------------------------------------------------------------------

These are the margins:

Click image for larger version

Name: image_38064.png
Views: 1
Size: 182.5 KB
ID: 1786161

On the other hand, by using a mixed model I have a different interpretation:

Code:

. mixed a i.(sclerosi resistant toniclonic)##c.Months_between_MRI Age Gender ICV Epilepsy_duration if caso==1 & area_id<69 || _all: R.area_id || idd: c.Months_between_MRI
margins sclerosi#resistant#toniclonic, atmeans at(Months_between_MRI=(0 30 60 120))
marginsplot
  


. . mixed a i.(sclerosi resistant toniclonic)##c.Months_between_MRI Age Gender ICV Epilepsy_duration if caso==1 & area_id<69 || _all: R.area
> _id || idd: c.Months_between_MRI

Performing EM optimization ...

Performing gradient-based optimization:
Iteration 0:  Log likelihood = -3633.5422  
Iteration 1:  Log likelihood = -3633.5422  

Computing standard errors ...

Mixed-effects ML regression                             Number of obs =  8,364

        Grouping information
        -------------------------------------------------------------
                        |     No. of       Observations per group
         Group variable |     groups    Minimum    Average    Maximum
        ----------------+--------------------------------------------
                   _all |          1      8,364    8,364.0      8,364
                    idd |         62         68      134.9        136
        -------------------------------------------------------------

                                                        Wald chi2(11) =  35.21
Log likelihood = -3633.5422                             Prob > chi2   = 0.0002

-------------------------------------------------------------------------------------------------
                              a | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
--------------------------------+----------------------------------------------------------------
                     1.sclerosi |   .0329639    .093033     0.35   0.723    -.1493774    .2153052
                    1.resistant |  -.3035172   .1093375    -2.78   0.006    -.5178149   -.0892196
                   1.toniclonic |   .1022727     .10962     0.93   0.351    -.1125785    .3171239
             Months_between_MRI |   -.003485   .0032172    -1.08   0.279    -.0097906    .0028207
                                |
  sclerosi#c.Months_between_MRI |
                             1  |  -.0094558   .0039387    -2.40   0.016    -.0171756    -.001736
                                |
 resistant#c.Months_between_MRI |
                             1  |    .004087   .0051298     0.80   0.426    -.0059673    .0141412
                                |
toniclonic#c.Months_between_MRI |
                             1  |   .0030731   .0051027     0.60   0.547     -.006928    .0130742
                                |
                            Age |   .0013671   .0032011     0.43   0.669    -.0049069     .007641
                         Gender |   .1404521   .1129306     1.24   0.214    -.0808877     .361792
                            ICV |   6.64e-07   3.35e-07     1.98   0.047     7.81e-09    1.32e-06
              Epilepsy_duration |   .0004757   .0031939     0.15   0.882    -.0057841    .0067355
                          _cons |   1.340091   .4769221     2.81   0.005     .4053409    2.274841
-------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
-----------------------------+------------------------------------------------
_all: Identity               |
              var(R.area_id) |   .0536837   .0094138      .0380696    .0757018
-----------------------------+------------------------------------------------
idd: Independent             |
     var(Months_between_MRI) |    .000206   .0000411      .0001393    .0003046
                  var(_cons) |   .0972191   .0178125      .0678882    .1392223
-----------------------------+------------------------------------------------
               var(Residual) |   .1266521   .0019825      .1228256    .1305978
------------------------------------------------------------------------------
LR test vs. linear model: chi2(3) = 4900.90               Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

.
end of do-file

Click image for larger version

Name: image_38065.png
Views: 1
Size: 258.1 KB
ID: 1786171

Residuals are well distributed and mode fits data reasonably well.

I well know that the two models have very different interpretations. On the other side I can't explaine trajectories of groups. for example, the yellow groups are people with toniclonic seizures and I expect a decreasing trajectory.

Do you have any ideas? Did I miss-spceify the model?

I thank you again.
Gianfranco

Last edited by Gianfranco Di Gennaro; 20 May 2026, 09:08.

Comment

Dave Airey

Join Date: Apr 2014

Posts: 416
#6

21 May 2026, 14:46

The aggregate approach looks to have turned out as well as could be expected! You might aggregate by functional cortical areas too (e.g., frontal, temporal, parietal, occiptial, limbic). Following that I would do a model for each region. Within region I would give each subject its own intercept and random slope. This approach is easier to interpret but requires careful multiplicity adjustment and ignores between region covariance missing multivariate signal perhaps. I would apply FDR (qqvalue ssc command) to a contrast of interest across regions. To this model you could explore adding a fixed effect for region of interest (either at the network aggregate level or the 68 regions) and test the time*clinical_factor*region interaction. But here you will have to figure out the covariance structure and hope you have a very large sample size. I would probably stick with a region by region analysis if you are interested in region by change over time results. It was not clear to me if you were mostly interested in change by region or also interested in actual rate of change by region. GEE might be another route if time is a nuisance factor.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4540
#7

22 May 2026, 00:20

Originally posted by Gianfranco Di Gennaro View Post

DI ran a very simple ANCOVA in which I use the followup-baseline delta and adjusting for baseline and other covariates.

The results I obtained are consistent with physiology and expected deltas.

A couple of points. First, you fitted a regression model of a change score against the baseline value. To quote the relevant part of your code:

Code:

gen delta_mm = sum_area2 - sum_area1 regress delta_mm sum_area1 . . .

If you're going to fit an ANCOVA with baseline thickness value as the covariate, then I recommend that you use the follow-up thickness value as the outcome (dependent) variable and not the change score.

Second, your colleague's objective was something like "identify which subjects, in terms of hippocampal sclerosis, drug resistance, and presence of tonic-clonic seizures,show the greatest cortical thickness reduction over time". An ANCOVA model for that would include the interaction of each of those three patient characteristics and time. So, in order to get at something more aligned with your colleague's goal by means of an ANCOVA model, I recommend something more like the following.

Code:

regress sum_area2 c.sum_area1 i.(sclerosi resistant toniclonic)##c.Months_between_MRI2

The model that you did fit does show that at least hippocampal sclerosis is associated with a reduction of cortical thickness between the baseline and follow-up MRI scans, but it doesn't tease out its relationship with time.

On the other hand, by using a mixed model I have a different interpretation:
. . . I can't explaine trajectories of groups. for example, the yellow groups are people with toniclonic seizures and I expect a decreasing trajectory.

Do you have any ideas? Did I miss-spceify the model?

First, no, you didn't misspecify this second model,* but it does differ from your first in that it does model the patient-characteristic × follow-up-inteval interaction that your ANCOVA doesn't—see my comments about that above—and so this second model's focus is properly on the temporal relationship.

Second, according to both models, presence of tonic-clonic seizures doesn't do anything at all, and so I'm not sure why you expect to see a decreasing trajectory for it. The only one of your three "main clinical predictors" that bears any association with changes in cortical thickness is hippocampal sclerosis—for that both models are in complete agreement. The marginsplot that you show illustrates this: all of the groups of patients with hippocampal sclerosis have a decreasing trajectory and for the remaining groups the trajectory is basically flat.

*I didn't suggest it in #4 above, but you might want to consider including in the model a term for covariance between the random intercept and the random slope. You can to that by adding the option covariance(unstructured). I notice that you have only 62 patients' data, and so you might be reaching a limit as to what you can model, but including the covariance term might improve efficiency a little, assuming that it converges.
Comment

Announcement

Mixed model specification for neuroimaging data

Comment

Comment

Comment

Comment

Comment

Comment