Hi Stata community,
I'm using SEM to investigate young people’s wellbeing. Data is self-reported, measured multidimensionally via a pre-validated instrument. Wellbeing is operationalised using 20 items measuring four different subconstructs (Interpersonal, Life Satisfaction, Negative and Eudaimonic). My research seeks to quantify each subconstruct's relative association with my outcome variable. At present I'm running EFA to build a model that best represents how each of the four wellbeing subconstructs as distinct but related.
Potential multicollinearity between the four subconstructs led me to consider using Exploratory Structural Equation Modelling (ESEM), which is now widely used in my field. However, Prof. Ender's materials here are the only resource I'm able to find on how to conduct ESEM in Stata. In terms of the extent of multicollinearity in my data, my exploration (scrutinising VIFs, bivariate correlations between variables and AVEs) found some evidence this may be problematic (output below). However, when building the model up into the full SEMs (i.e., Model 1 with only one wellbeing subconstruct, Model 2 with two wellbeing subconstructs... etc.) follow-up comparisons of path coefficients and SEs for each of the SEMs don't suggest multicollinearity is causing estimations to change too drastically (output again is below for Stata community to scrutinize).
My questions are:
Example of my data
Evidence of multicollinearity
Model comparisons (path coefficients and SEs in full SEM models) - NB. Models adding new wellbeing subconstructs with full four-factor SEM last (right hand-side)
Many thanks in advance for your time and expertise.
Kind regards,
Tania
I'm using SEM to investigate young people’s wellbeing. Data is self-reported, measured multidimensionally via a pre-validated instrument. Wellbeing is operationalised using 20 items measuring four different subconstructs (Interpersonal, Life Satisfaction, Negative and Eudaimonic). My research seeks to quantify each subconstruct's relative association with my outcome variable. At present I'm running EFA to build a model that best represents how each of the four wellbeing subconstructs as distinct but related.
Potential multicollinearity between the four subconstructs led me to consider using Exploratory Structural Equation Modelling (ESEM), which is now widely used in my field. However, Prof. Ender's materials here are the only resource I'm able to find on how to conduct ESEM in Stata. In terms of the extent of multicollinearity in my data, my exploration (scrutinising VIFs, bivariate correlations between variables and AVEs) found some evidence this may be problematic (output below). However, when building the model up into the full SEMs (i.e., Model 1 with only one wellbeing subconstruct, Model 2 with two wellbeing subconstructs... etc.) follow-up comparisons of path coefficients and SEs for each of the SEMs don't suggest multicollinearity is causing estimations to change too drastically (output again is below for Stata community to scrutinize).
My questions are:
- How ubiquitous is the use of ESEM in Stata in the way proposed by Prof. Ender? I've managed to replicate this with my data, but I'm puzzled by the lack of available resources for ESEM in Stata which is making me wonder why more resources are not available? I'm wondering whether I will struggle to use ESEM together with the structural part of my models due to complexity.
- When exploring multicollinearity in the context of SEM, should sum scores, factor scores or individual items be scrutinised when it comes to looking at VIFs, correlations and AVEs?
- Should the evidence of multicollinearity between my variables (output below) give me cause for concern when it comes to entering all four of these latent exogenous variables into a SEM together?
Example of my data
Code:
* Example generated by -dataex-. For more info, type help dataex clear input long(wbs1 wbs4 wbs8 wbs10 wbs18 wbs5 wbs13 wbs17 wbs19 wbs6 wbs7 wbs14 wbs20 wbs21 wbs2 wbs3 wbs9 wbs15) 2 4 4 4 3 3 3 3 3 3 2 3 2 2 2 2 2 5 1 5 5 5 5 1 1 1 1 2 1 5 1 1 1 1 2 1 3 2 4 4 3 2 4 3 2 3 3 4 3 3 2 3 2 2 2 5 4 4 3 3 1 2 2 3 3 3 2 4 3 2 3 2 1 5 5 5 4 1 1 2 1 2 1 1 1 1 1 2 2 1 4 3 5 5 3 3 3 2 4 3 4 3 5 2 2 4 4 4 5 2 2 3 1 5 3 4 4 3 2 4 4 3 5 4 4 4 2 4 5 4 5 3 2 2 3 2 2 2 2 2 3 2 2 2 3 3 5 5 3 4 1 2 3 2 2 4 3 2 3 2 2 3 1 5 5 5 5 1 1 1 1 1 1 1 1 1 1 1 1 1 2 4 5 5 1 2 2 2 2 3 2 2 2 3 4 2 4 2 3 3 4 4 3 3 4 4 4 3 3 4 4 4 3 4 4 4 3 3 4 2 4 2 3 3 3 2 1 4 3 2 3 4 3 3 3 4 5 3 3 3 3 2 3 2 3 4 2 4 5 3 3 3 4 2 3 3 3 3 4 3 4 4 4 4 4 4 4 4 4 4 1 4 4 3 5 3 1 2 2 3 1 2 2 2 2 1 1 2 4 2 4 3 3 3 3 3 3 4 4 3 3 3 4 4 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 1 4 5 5 4 4 1 2 1 2 1 2 3 1 2 2 2 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . end label def WB 1 "Never", modify label def WB 2 "Not Often", modify label def WB 3 "Sometimes", modify label def WB 4 "Often", modify label def WB 5 "Always", modify
Code:
**(1) Check VIFs for all factors in dataset. VIFs that are >.10 suggest a problem **
** Using sumscores of the wellbeing subconstructs**
. regress outcomevariable wbsint_sum2 wbseud_sum2 wbslife_sum2 wbsneg_sum2
Source | SS df MS Number of obs = 875
-------------+---------------------------------- F(4, 870) = 12.97
Model | 162.409346 4 40.6023365 Prob > F = 0.0000
Residual | 2722.69465 870 3.12953409 R-squared = 0.0563
-------------+---------------------------------- Adj R-squared = 0.0520
Total | 2885.104 874 3.30103432 Root MSE = 1.769
------------------------------------------------------------------------------
outcomevariable~h | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
wbsint_sum2 | -.0289679 .0233199 -1.24 0.214 -.0747377 .0168019
wbseud_sum2 | .1358681 .0242438 5.60 0.000 .0882849 .1834513
wbslife_sum2 | -.0358798 .0248797 -1.44 0.150 -.0847111 .0129515
wbsneg_sum2 | -.002507 .0261081 -0.10 0.924 -.0537493 .0487352
_cons | 3.981834 .6141696 6.48 0.000 2.776406 5.187261
------------------------------------------------------------------------------
. vif
Variable | VIF 1/VIF
-------------+----------------------
wbseud_sum2 | 3.20 0.312717
wbslife_sum2 | 2.94 0.340589
wbsint_sum2 | 2.53 0.395344
wbsneg_sum2 | 1.92 0.520799
-------------+----------------------
Mean VIF | 2.65
.
** Using factor scores of wellbeing subconstructs
. regress outcomevariable Eudaimonic Interpersonal Lifesat Negative
Source | SS df MS Number of obs = 917
-------------+---------------------------------- F(4, 912) = 12.90
Model | 163.757132 4 40.9392829 Prob > F = 0.0000
Residual | 2893.38791 912 3.17257446 R-squared = 0.0536
-------------+---------------------------------- Adj R-squared = 0.0494
Total | 3057.14504 916 3.33749458 Root MSE = 1.7812
-------------------------------------------------------------------------------
outcomevariable | Coefficient Std. err. t P>|t| [95% conf. interval]
--------------+----------------------------------------------------------------
Eudaimonic | .9078153 .190283 4.77 0.000 .5343718 1.281259
Interpersonal | -.4550978 .1775992 -2.56 0.011 -.8036483 -.1065472
Lifesat | -.29361 .266601 -1.10 0.271 -.8168328 .2296127
Negative | -.1445177 .0541875 -2.67 0.008 -.2508643 -.0381711
_cons | 4.854106 .2617944 18.54 0.000 4.340317 5.367896
-------------------------------------------------------------------------------
. vif
Variable | VIF 1/VIF
-------------+----------------------
Lifesat | 13.09 0.076418
Eudaimonic | 10.95 0.091284
Interperso~l | 9.01 0.110954
Negative | 1.03 0.967547
-------------+----------------------
Mean VIF | 8.52
.
*Using individual items
. regress outcomevariable wbs1 wbs2 wbs3 wbs9 wbs15 wbs6 wbs7 wbs11 wbs14 wbs21 wbs5 wbs13 wbs17 wbs19 wbs20
Source | SS df MS Number of obs = 875
-------------+---------------------------------- F(15, 859) = 7.11
Model | 318.516808 15 21.2344539 Prob > F = 0.0000
Residual | 2566.58719 859 2.98787799 R-squared = 0.1104
-------------+---------------------------------- Adj R-squared = 0.0949
Total | 2885.104 874 3.30103432 Root MSE = 1.7285
------------------------------------------------------------------------------
outcomevariable~h | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
wbs1 | .0059105 .1001725 0.06 0.953 -.1907011 .2025221
wbs2 | .093906 .08118 1.16 0.248 -.0654283 .2532404
wbs3 | .3839984 .0871796 4.40 0.000 .2128884 .5551084
wbs9 | .3026809 .0830548 3.64 0.000 .1396668 .4656949
wbs15 | -.0773907 .0832107 -0.93 0.353 -.2407107 .0859294
wbs6 | -.2069086 .0983767 -2.10 0.036 -.3999954 -.0138218
wbs7 | -.0988398 .098042 -1.01 0.314 -.2912697 .09359
wbs11 | -.1612017 .0884901 -1.82 0.069 -.3348838 .0124805
wbs14 | .0466253 .0756693 0.62 0.538 -.101893 .1951437
wbs21 | .1308543 .0767839 1.70 0.089 -.0198518 .2815604
wbs5 | -.2193482 .079129 -2.77 0.006 -.3746569 -.0640395
wbs13 | -.0558144 .078205 -0.71 0.476 -.2093097 .0976808
wbs17 | .0219586 .0921006 0.24 0.812 -.15881 .2027272
wbs19 | -.0667771 .0954528 -0.70 0.484 -.2541252 .120571
wbs20 | .2736249 .0985175 2.78 0.006 .0802617 .466988
_cons | 3.773488 .2565241 14.71 0.000 3.27 4.276975
------------------------------------------------------------------------------
. vif
Variable | VIF 1/VIF
-------------+----------------------
wbs1 | 3.16 0.316891
wbs7 | 3.04 0.328607
wbs6 | 2.87 0.348388
wbs20 | 2.79 0.358511
wbs3 | 2.55 0.392300
wbs19 | 2.49 0.401982
wbs15 | 2.49 0.402062
wbs17 | 2.29 0.437160
wbs13 | 2.21 0.452178
wbs2 | 2.16 0.462730
wbs9 | 2.03 0.492234
wbs11 | 1.98 0.504404
wbs21 | 1.88 0.532478
wbs5 | 1.84 0.544028
wbs14 | 1.82 0.549123
-------------+----------------------
Mean VIF | 2.37
.
collin wbsint_sum2 wbseud_sum2 wbslife_sum2 wbsneg_sum2
(obs=897)
Collinearity Diagnostics
SQRT R-
Variable VIF VIF Tolerance Squared
----------------------------------------------------
wbsint_sum2 2.52 1.59 0.3964 0.6036
wbseud_sum2 3.15 1.77 0.3175 0.6825
wbslife_sum2 2.96 1.72 0.3377 0.6623
wbsneg_sum2 1.91 1.38 0.5247 0.4753
----------------------------------------------------
Mean VIF 2.63
Cond
Eigenval Index
---------------------------------
1 4.7957 1.0000
2 0.1590 5.4926
3 0.0199 15.5422
4 0.0186 16.0768
5 0.0070 26.2302
---------------------------------
Condition Number 26.2302
Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept)
Det(correlation matrix) 0.0867
.
collin Eudaimonic Interpersonal Lifesat Negative
(obs=942)
Collinearity Diagnostics
SQRT R-
Variable VIF VIF Tolerance Squared
----------------------------------------------------
Eudaimonic 10.89 3.30 0.0918 0.9082
Interpersonal 9.11 3.02 0.1097 0.8903
Lifesat 13.31 3.65 0.0751 0.9249
Negative 1.03 1.01 0.9707 0.0293
----------------------------------------------------
Mean VIF 8.59
Cond
Eigenval Index
---------------------------------
1 4.7737 1.0000
2 0.1738 5.2402
3 0.0384 11.1434
4 0.0086 23.5783
5 0.0055 29.5547
---------------------------------
Condition Number 29.5547
Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept)
Det(correlation matrix) 0.0111
.
** (2) How are the factors correlated? [estat common command, run after EFA]
.
** Eudaimonic*Interpersonal = .42, Eudaimonic*Lifesat = .45, Eudaimonic*Negative = -.35
. ** Interpersonal*Lifesat = .49, Interpersonal*Negative = -.26
. ** Lifesat*Negative = -.33
.
** Correlation between sum scores
. pwcorr wbsint_sum2 wbseud_sum2 wbslife_sum2 wbsneg_sum2, sig star(0.05)
| wbsint~2 wbseud~2 wbslif~2 wbsneg~2
-------------+------------------------------------
wbsint_sum2 | 1.0000
|
|
wbseud_sum2 | 0.7248* 1.0000
| 0.0000
|
wbslife_sum2 | 0.7304* 0.7664* 1.0000
| 0.0000 0.0000
|
wbsneg_sum2 | -0.5741* -0.6641* -0.6183* 1.0000
| 0.0000 0.0000 0.0000
|
** Correlation between factor scores
. pwcorr Eudaimonic Interpersonal Lifesat, sig star(0.05)
| Eudaim~c Interp~l Lifesat
-------------+---------------------------
Eudaimonic | 1.0000
|
|
Interperso~l | 0.9206* 1.0000
| 0.0000
|
Lifesat | 0.9474* 0.9368* 1.0000
| 0.0000 0.0000
|
** Correlation between individual items
. pwcorr wbs1 wbs2 wbs3 wbs9 wbs15 wbs6 wbs7 wbs11 wbs14 wbs21 wbs5 wbs13 wbs17 wbs19 wbs20, sig star(0.05)
| wbs1 wbs2 wbs3 wbs9 wbs15 wbs6 wbs7
-------------+---------------------------------------------------------------
wbs1 | 1.0000
|
|
wbs2 | 0.6764* 1.0000
| 0.0000
|
wbs3 | 0.7035* 0.6051* 1.0000
| 0.0000 0.0000
|
wbs9 | 0.5713* 0.5334* 0.6235* 1.0000
| 0.0000 0.0000 0.0000
|
wbs15 | 0.6973* 0.5813* 0.6441* 0.5725* 1.0000
| 0.0000 0.0000 0.0000 0.0000
|
wbs6 | 0.5458* 0.5208* 0.5091* 0.4724* 0.4863* 1.0000
| 0.0000 0.0000 0.0000 0.0000 0.0000
|
wbs7 | 0.6242* 0.5624* 0.5780* 0.4985* 0.5582* 0.7516* 1.0000
| 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
|
wbs11 | 0.5396* 0.4928* 0.4923* 0.5210* 0.5197* 0.5714* 0.5925*
| 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
|
wbs14 | 0.4532* 0.4397* 0.4382* 0.4426* 0.4388* 0.5454* 0.4681*
| 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
|
wbs21 | 0.4118* 0.4084* 0.4125* 0.3987* 0.3871* 0.5836* 0.5578*
| 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
|
wbs5 | 0.5286* 0.4962* 0.4735* 0.4370* 0.4864* 0.4935* 0.5113*
| 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
|
wbs13 | 0.5548* 0.4799* 0.5111* 0.4934* 0.5427* 0.5100* 0.5396*
| 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
|
wbs17 | 0.5248* 0.4694* 0.5115* 0.4646* 0.4958* 0.4940* 0.5123*
| 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
|
wbs19 | 0.5476* 0.5085* 0.5384* 0.5186* 0.5574* 0.5188* 0.5378*
| 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
|
wbs20 | 0.6407* 0.5552* 0.5755* 0.5575* 0.6139* 0.6026* 0.5939*
| 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
|
| wbs11 wbs14 wbs21 wbs5 wbs13 wbs17 wbs19
-------------+---------------------------------------------------------------
wbs11 | 1.0000
|
|
wbs14 | 0.4462* 1.0000
| 0.0000
|
wbs21 | 0.4601* 0.5494* 1.0000
| 0.0000 0.0000
|
wbs5 | 0.4231* 0.3201* 0.3563* 1.0000
| 0.0000 0.0000 0.0000
|
wbs13 | 0.5202* 0.4238* 0.4253* 0.4630* 1.0000
| 0.0000 0.0000 0.0000 0.0000
|
wbs17 | 0.4878* 0.3967* 0.4295* 0.5101* 0.6532* 1.0000
| 0.0000 0.0000 0.0000 0.0000 0.0000
|
wbs19 | 0.4864* 0.3686* 0.4463* 0.5971* 0.5770* 0.6331* 1.0000
| 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
|
wbs20 | 0.5606* 0.5077* 0.5242* 0.5419* 0.6112* 0.6289* 0.6593*
| 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
|
** (3) (Check the Average Variance Extracted in the measurement models )
.
** type 'condisc' after running SEM to assess the average variance extracted (AVE) **
. sem (Eudaimonic -> wbs1@1 wbs2 wbs3 wbs9 wbs15)(Interpersonal -> wbs6@1 wbs7 wbs11 wbs14 wbs21)(Lifesat -> wbs5@1 wbs13 wbs17 wbs19 wbs20)(Negative -> wbs4 wbs8 wbs10), latent(Eudaimonic Interpersonal Lifesat Negative) stand
Endogenous variables
Measurement: wbs1 wbs2 wbs3 wbs9 wbs15 wbs6 wbs7 wbs11 wbs14 wbs21 wbs5 wbs13 wbs17 wbs19 wbs20 wbs4 wbs8 wbs10
Exogenous variables
Latent: Eudaimonic Interpersonal Lifesat Negative
[Full output omitted]
-----------------------------+----------------------------------------------------------------
cov(Eudaimonic,Interpersonal)| .8145113 .016024 50.83 0.000 .7831049 .8459178
cov(Eudaimonic,Lifesat)| .8613038 .013381 64.37 0.000 .8350775 .88753
cov(Eudaimonic,Negative)| -.7648549 .0235044 -32.54 0.000 -.8109226 -.7187872
cov(Interpersonal,Lifesat)| .8369258 .015206 55.04 0.000 .8071225 .866729
cov(Interpersonal,Negative)| -.6530577 .028078 -23.26 0.000 -.7080895 -.5980259
cov(Lifesat,Negative)| -.7470133 .0242314 -30.83 0.000 -.794506 -.6995207
----------------------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(129) = 562.28 Prob > chi2 = 0.0000
. condisc
Convergent and Discriminant Validity Assessment
------------------------------------------------------------------------------------------
Squared correlations (SC) among latent variables
------------------------------------------------------------------------------------------
Eudaimonic Interperso~l Lifesat Negative
Eudaimonic 1.000
Interperso~l 0.663 1.000
Lifesat 0.742 0.700 1.000
Negative 0.585 0.426 0.558 1.000
------------------------------------------------------------------------------------------
Average variance extracted (AVE) by latent variables
------------------------------------------------------------------------------------------
type mismatch
r(109);
end of do-file
Code:
estimates table eud eudint interp lifesat neg eudintlife eudintlifeneg, se
---------------------------------------------------------------------------------------------------------
Variable | eud eudint interp lifesat neg eudintlife eudintli~g
-------------+-------------------------------------------------------------------------------------------
outcomevariable~h
Eudaimonic | .36133661 .75222035 .85003712 .86322774
| .06790982 .1384507 .18293653 .19670361
Interperso~l | -.48209672 .11152458 -.4168596 -.42507065
| .1372821 .0661019 .15463278 .15633722
Lifesat | .25266663 -.21519739 -.16072192
| .08652639 .23852414 .25306004
Negative | -.22298494 .07071995
| .07563153 .1527385
Kind regards,
Tania
