Three level HLM with mediation analysis

Yuhan HU

Join Date: Apr 2024

Posts: 22
#1

Three level HLM with mediation analysis

16 Apr 2025, 04:33

Hello, I am a PhD student and working on my thesis. I have the longitudinal data, and thus, I am using the three-level HLM to examine the association between y (country level) and x (individual level) with mixed code: mixed y x control variables(individual level) || ID2:|| ID: age, covariance(unstructured) log.
when I gradually introduced meditator variables (country level) into this model, I found the reduced significance of x. Now, I would like to know the mediation ratio of mediators, but I did not find a suitable code. I found ml_mediation is for two-level models. Are there any other codes for this analysis? Any ideas would be greatly appreciated.
Tags: HLM, mediation analysis, Multilevel Analysis
Erik Ruzek

Join Date: Oct 2017

Posts: 432
#2

17 Apr 2025, 13:03

It is probably easiest to do this with gsem. A 2-level mediation model is shown in the sem/gsem documentation. You will need to add the third level where your mediators sit as well as the country level mean of x (and potentially the country level means of the control variables). See here for an example of the three-level gsem syntax.
Comment
Yuhan HU

Join Date: Apr 2024

Posts: 22
#3

17 Apr 2025, 22:05

Originally posted by Erik Ruzek View Post

It is probably easiest to do this with gsem. A 2-level mediation model is shown in the sem/gsem documentation. You will need to add the third level where your mediators sit as well as the country level mean of x (and potentially the country level means of the control variables). See here for an example of the three-level gsem syntax.

Hi Erik, thanks for your comments! I am not clear about m1 and m2 in this example: gsem (perform <- satis support M1[branch]) (satis <- support M2[branch]),> cov(M1[branch]*M2[branch]@0). I guess m1 and m2 maybe the cluster name, however, it seems that perform and satis are on the same level (individual).
By the way, only the individual level variables in my model are repeated measured. country level variables only have one year observation. In this case, can I average the individual level variables and conduct traditional mediation analysis?
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 432
#4

18 Apr 2025, 10:04

No problem, Yuhan. In the syntax, M1 and M2 tell gsem to treat the cluster (branch) as a random intercept. In sem and gsem syntax, variable with names in capital letters are treated as latent variables unless otherwise specified. Yes, both perform and satis vary within branches.

In terms of your analysis, you are correct - take the country averages of your variables of interest. However, you said,

I am using the three-level HLM to examine the association between y (country level) and x (individual level) with mixed code: mixed y x control variables(individual level) || ID2:|| ID: age, covariance(unstructured) log.

In a mixed model, the outcome needs to be time-varying. Are you saying that your outcome varies within countries, but you are interested in the effect of certain mediators on the country-level average of y?
Comment
Yuhan HU

Join Date: Apr 2024

Posts: 22
#5

18 Apr 2025, 20:46

Originally posted by Erik Ruzek View Post

No problem, Yuhan. In the syntax, M1 and M2 tell gsem to treat the cluster (branch) as a random intercept. In sem and gsem syntax, variable with names in capital letters are treated as latent variables unless otherwise specified. Yes, both perform and satis vary within branches.

In terms of your analysis, you are correct - take the country averages of your variables of interest. However, you said,

In a mixed model, the outcome needs to be time-varying. Are you saying that your outcome varies within countries, but you are interested in the effect of certain mediators on the country-level average of y?

Yes, my outcome variables (individual level) are time varying across four years, y1 y2 y3 and y4. however, both exposure and mediator (country level) have only one observation, that is x1 and m1. i am wondering whether i can examine the mediation role of m1 between the relationship x1 and (y1+y2+y3+y4)/4. Thanks again for your help!
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 432
#6

19 Apr 2025, 13:28

In really simple terms, gsem is doing two regressions:

Code:

* First create the aggregate measures of y egen pmn_y = mean(y), by(individual) egen cmn_y = mean(pmn_y), by(country) * Regressions (note you would need to calcuate the country mean for each of your covariates) regress cmn_y x m cmn_covariates regress m x cmn_covariates

And from there, you use postestimation commands to estimate the indirect effect. If you have a Stata version >=18, you could use Stata's mediate command with these variables to estimate models from the causal mediation framework. If you have an earlier version of Stata, check out Hicks andTingley's medeff program. You might be able to get away with it, depending on your field and how picky your advisors are about measurement issues.

How gsem improves upon the above is that it does a latent decomposition of y across your three levels (within person, between person within country, and between country), which addresses sampling error when aggregating repeated measures data to the aggregate levels (see Marsh et al., 2014 for more details),
Comment
Yuhan HU

Join Date: Apr 2024

Posts: 22
#7

19 Apr 2025, 20:51

Originally posted by Erik Ruzek View Post

In really simple terms, gsem is doing two regressions:

Code:

* First create the aggregate measures of y egen pmn_y = mean(y), by(individual) egen cmn_y = mean(pmn_y), by(country) * Regressions (note you would need to calcuate the country mean for each of your covariates) regress cmn_y x m cmn_covariates regress m x cmn_covariates

And from there, you use postestimation commands to estimate the indirect effect. If you have a Stata version >=18, you could use Stata's mediate command with these variables to estimate models from the causal mediation framework. If you have an earlier version of Stata, check out Hicks andTingley's medeff program. You might be able to get away with it, depending on your field and how picky your advisors are about measurement issues.

How gsem improves upon the above is that it does a latent decomposition of y across your three levels (within person, between person within country, and between country), which addresses sampling error when aggregating repeated measures data to the aggregate levels (see Marsh et al., 2014 for more details),

Thanks a lot Erik! It's really helpful!
Comment
Yuhan HU

Join Date: Apr 2024

Posts: 22
#8

20 Apr 2025, 21:28

Originally posted by Erik Ruzek View Post

In really simple terms, gsem is doing two regressions:

Code:

* First create the aggregate measures of y egen pmn_y = mean(y), by(individual) egen cmn_y = mean(pmn_y), by(country) * Regressions (note you would need to calcuate the country mean for each of your covariates) regress cmn_y x m cmn_covariates regress m x cmn_covariates

And from there, you use postestimation commands to estimate the indirect effect. If you have a Stata version >=18, you could use Stata's mediate command with these variables to estimate models from the causal mediation framework. If you have an earlier version of Stata, check out Hicks andTingley's medeff program. You might be able to get away with it, depending on your field and how picky your advisors are about measurement issues.

How gsem improves upon the above is that it does a latent decomposition of y across your three levels (within person, between person within country, and between country), which addresses sampling error when aggregating repeated measures data to the aggregate levels (see Marsh et al., 2014 for more details),

Hi Erik, I am still wondering whether i can simply average y on the individual level across 4 years: gen avr_y= (y1+y2+y3+y4)/4, rather than creating the aggregate measures of y. And then run the code: "sgmediation2 avr_y, iv(x) mv(m) cv(control var on both community and individual level)" to conduct mediation analysis. Does it work? Thanks for your help!
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 432
#9

21 Apr 2025, 12:47

In theory you could use avr_y= (y1+y2+y3+y4)/4 if your dataset is wide. I assumed that your dataset was long since you were using mixed, which was why I suggested that you use egen to construct a person mean for y. You could feed the person mean of y as the outcome into sgmediation2 along with the relevant iv(), mv(), and cv() variables at whatever level they are measured at. Note that you must specify the vce option of sgmediation2 and request cluster standard errors at the country level - vce(cluster countryID).
Comment
Yuhan HU

Join Date: Apr 2024

Posts: 22
#10

21 Apr 2025, 20:58

Originally posted by Erik Ruzek View Post

In theory you could use avr_y= (y1+y2+y3+y4)/4 if your dataset is wide. I assumed that your dataset was long since you were using mixed, which was why I suggested that you use egen to construct a person mean for y. You could feed the person mean of y as the outcome into sgmediation2 along with the relevant iv(), mv(), and cv() variables at whatever level they are measured at. Note that you must specify the vce option of sgmediation2 and request cluster standard errors at the country level - vce(cluster countryID).

Got it! Thank you so much!
Comment
Yuhan HU

Join Date: Apr 2024

Posts: 22
#11

23 Apr 2025, 22:12

Originally posted by Erik Ruzek View Post

It is probably easiest to do this with gsem. A 2-level mediation model is shown in the sem/gsem documentation. You will need to add the third level where your mediators sit as well as the country level mean of x (and potentially the country level means of the control variables). See here for an example of the three-level gsem syntax.

Hi Erik! I tried to use builder to conduct analysis and stata gave me command: gsem (x-> y, ) (x -> m, ) (m -> y, ) (M1[countryID>ID] -> y, ) (M2[countryID] -> y, ), covstruct (_lexogenous, diagonal) group( ) latent(M1 M2 ) nocapslatent
I am wondering whether this is correct and how can i add control variables. it seems that i can add different control variables to different paths?
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 432
#12

24 Apr 2025, 17:11

Yes. Just add covariates to the x side of your path.
Comment
Yuhan HU

Join Date: Apr 2024

Posts: 22
#13

24 Apr 2025, 21:47

Originally posted by Erik Ruzek View Post

Yes. Just add covariates to the x side of your path.

Thanks a lot, Erik! I had two mediators, and I examined them separately. For both analyses, c' became non-significant. That is to say, there are two complete mediation pathways? It's so weird for me.
How can we explain the simultaneous appearance of two complete mediators?

My command is: gsem (x control var-> y, ) (x control var -> m1, ) (m1 control var-> y, ) (M1[countryID>ID] -> y, ) (M2[countryID] -> y, ), covstruct (_lexogenous, diagonal) latent(M1 M2 ) nocapslatent
gsem (x control var-> y, ) (x control var -> m2, ) (m2 control var-> y, ) (M1[countryID>ID] -> y, ) (M2[countryID] -> y, ), covstruct (_lexogenous, diagonal) latent(M1 M2 ) nocapslatent

By the way, in the mixed model, when I added m1 and m2 simultaneously, the coefficient of x also became non-significant.
Comment
Yuhan HU

Join Date: Apr 2024

Posts: 22
#14

24 Apr 2025, 21:56

Originally posted by Erik Ruzek View Post

Yes. Just add covariates to the x side of your path.

when i did not any control variables and examine two mediators in the same model. the coefficient of x is marginally non-significant (p=0.057).

gsem (x->y, ) (x -> m1, ) (x -> m2, ) (m1 -> y, ) (M1[countryID>ID] -> y, ) (M2[countryid] -> y, ) (m2 -> y, ) if rural==1, covstruct(_lexogenous, diagonal) latent(M1 M2 ) nocapslatent
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 432
#15

25 Apr 2025, 08:19

It seems possible that your main x variable is not a very strong predictor of your outcome (note that statistical significance is not a measure of strength). Mediation is not assessed by looking at whether the coefficient for x loses its significance in the presence of the mediator. If you are using the Barron and Kenny framework, then you need to multiply the two path coefficients (x ->m)*(m -> y) by each other using nlcom. See example 42g in the Stata sem manual. See here for some example code.
Comment

Announcement

Three level HLM with mediation analysis

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment