GSEM - Moderated mediation with multilevel latent variables

Julian Nuessle

Join Date: Jul 2017

Posts: 22
#1

GSEM - Moderated mediation with multilevel latent variables

05 Aug 2025, 03:19

Dear Statalist users,

data example at the bottom of the post.
I am unsure how to properly account for the specific structure of the dataset and happy for any help.
I run an experiment where I analyze teams of two members each which is identified by team in the example. Treatment is a random treatment on team level, i.e. each team either gets the treatment or not (0/1). Moderator_treat is another treatment on team level (0/1) which I expect to moderate the relationship between treatment and outcome_binary (or also the latent variable outcome). Team_process1, team_process2 and team_process3 are 3 different items (which measure a team process) and were asked to all participants (i.e. there is variation within the team) on a Likert-scale from 1-7. I also have a second outcome which is also measured by 3 items on a Likert-scale from 1-7. The graph below sums up the structure (only for the binary outcome).

I am trying the following code (or some variations similar to this one, but I am unsure whether it is correct):

Code:

* Interaction gen treat_mod = treatment * moderator_treat * Structure xtset team * Run multilevel SEM using gsem gsem /// (team_process1 team_process2 team_process3 <- Team_process@1) /// (outcome_binary <- treatment moderator_treat treat_mod Team_process), /// latent(Team_process) /// group(team) /// nocapslatent

Note: There might be some errors which might be an issue related to the data example since I plugged in random numbers.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(treatment moderator_treat team team_process1 team_process2 team_process3 outcome_binary outcome1 outcome2 outcome3) 1 1 1 5 6 7 1 2 5 3 1 1 1 6 5 4 1 6 6 7 1 0 2 5 4 5 1 5 7 5 1 0 2 4 3 2 1 2 2 1 0 0 3 1 3 2 1 2 2 1 0 0 3 2 5 3 1 3 3 3 1 0 4 6 6 7 1 5 4 4 1 0 4 5 7 5 1 4 3 3 0 1 5 2 2 1 1 7 7 7 0 1 5 2 2 1 1 5 6 5 1 1 6 3 3 3 0 3 4 3 1 1 6 5 4 4 0 4 5 4 1 0 7 1 2 2 0 3 2 3 0 0 7 4 3 3 0 5 4 5 0 0 8 7 7 7 0 4 3 2 1 0 8 5 6 5 0 1 3 2 1 0 9 3 4 3 0 2 5 3 1 0 9 4 5 4 0 6 6 7 0 1 10 3 2 3 1 3 2 3 0 1 10 6 6 6 1 6 6 6 end
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4449
#2

05 Aug 2025, 21:10

Originally posted by Julian Nuessle View Post

I am unsure how to properly account for the specific structure of the dataset and happy for any help. . . . The graph below sums up the structure . . .

First, I think that you don't have multilevel latent variables in this study, at least with respect to the binary outcome variable, for which each team has a single value. Rather, it seems to me that you have a single-level latent factor—Team Process—which is measured by two pairs of a three-item instrument: the same questionnaire given to each of a team's two team members. (With only two participants per team, it would seem problematic to try to define Team Process as the second-order latent factor in a CFA submodel.) This interpretation is consistent with the way variables are laid out in your diagram.

Thus, my first suggestion is to reshape your dataset wide so that each team is represented by a single row, and the two sets of three items is indexed by team member, arbitrarily assigned a within-team identifier of 1 and 2. Then I suggest constraining the factor loadings of the respective pairs of the three items on the Team Process latent factor to be equal. Finally, I suggest taking advantage of Stata's factor variable notation to express the interaction of treatment and "moderator" assignments to the teams.

So, perhaps consider something like the following as a start.

Code:

set seed 818726003 generate double randu = runiform() isid team randu, sort by team: generate byte pid = _n // Within-team participant ID reshape wide team_process1 team_process2 team_process3 outcome1 outcome2 outcome3, i(team) j(pid) #delimit ; gsem (outcome_binary <- i.treatment##i.moderator_treat TeamProcess, probit) ( team_process11@a team_process12@a team_process21@b team_process22@b team_process31@c team_process32@c <- TeamProcess, oprobit); #delimit cr

You can substitute logit and ologit to taste, but I don't recommend using the Gaussian distribution family as you show in your code, at least for the binary outcome variable.

My model above includes only a direct effect of the predictors and their interaction on the outcome, that is, it doesn't include "moderated mediation". My understanding is that "mediation" can get kinda messy in other than linear models.

I haven't considered the three ordered-categorical outcome variables, which in your "plugged in random numbers" data snippet appear to differ between team members. Is that correct?
Comment
Julian Nuessle

Join Date: Jul 2017

Posts: 22
#3

06 Aug 2025, 06:36

First, I would like to thank you very much! It really helps me getting forward! I have used your approach as a starting point!

To your question: Yes, outcome1-3 is pretty much the same as team_process1-3. Both team members rated the outcome on a Likert-Scale. So the values differ between team members, yet the outcome is still a team-level construct such as team coordination or the like. Do I handle the outcome variable the same way as the TeamProcess in that case?

May I ask you your opinion on averaging the items across individuals? I.e. collapsing the data on team level?

Thanks you so much in advance
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4449
#4

06 Aug 2025, 22:09

Originally posted by Julian Nuessle View Post

. . . outcome1-3 is pretty much the same as team_process1-3. Both team members rated the outcome on a Likert-Scale. So the values differ between team members, yet the outcome is still a team-level construct such as team coordination or the like. Do I handle the outcome variable the same way as the TeamProcess in that case?

You could, but here you have more than one set of outcome scores per team, and so you could handle team as a random effect. It would look something like the code below.

Code:

gsem /// (team_process1 team_process2 team_process3 /// outcome1 outcome2 outcome3 <- i.treatment##i.moderator_treat M[team], oprobit)

It assumes that the two questionnaires were administered to the study participants after assignment to treatment and moderator conditions so that the participants' team-process scores are just another outcome.

May I ask you your opinion on averaging the items across individuals? I.e. collapsing the data on team level?

That’s essentially what the constraints are doing in #2 above, on a per-item basis between team members, and I think that the random effects approach in the code just above will accomplish much the same.
Comment

Announcement

GSEM - Moderated mediation with multilevel latent variables

Comment

Comment

Comment