Dear Statalist users,

I am using Stata 14, and am working with cross-sectional data.

I am trying to run Confirmatory Factor Analysis (CFA) on nine items using the 'sem' command. The items are 4-category ordinal variables.

What I would like to do is to compare two models, one with a correlated three-factor solution and another with a second-order factor. As a base for comparison, I intend to use AIC, RMSEA, chi2, TLI and CFIs.

Items are called i1, i2, i3, ..i9, and the factors are called f1, f2 and f3.

To fit the correlated-factor model (Model 1), I use the following command:

One of the items is complex; i7 loads onto f2 and f3. Exploratory Factor Analysis indicated the cross-loading, and I have enough theoretical reason to keep it as is.

And for the second-order model (Model 2) where I suspect a fourth latent variable, f4, may be an overarching construct (second-order factor) where f1, f2 and f3 loads strongly, I use :

For Model 1, Stata introduces some default constraints, by setting the loadings of certain anchor variables to 1. And for model 2, it does the same by an additional constraint of anchoring f4 on f1 by fixing f1's loading to 1. Because I would like to see the factor loadings, instead I constrain the variances of the latent variables to 1.

For Model 1:

For Model 2:

Yet with these set of constraints, Model 1 fits but Model 2 does not. Model 2 fits when I remove the cross loading item (i7) from f2.

However, I am not sure how comparable these two models can be if I omit one of i7 from f2. The fit statistics show poorer measures for Model 2, yet I believe it may be because of the omission of

one item from f2.

I have four questions:

1) Is there another constraint I can add to Model 2 to make it work without omitting a cross-loading item?

2) Using the default constraints added by Stata (the first set of commands), when I look at the fit statistics, I see that the model fit indices are exactly the same. Does that mean that a second-order model does not add anything to the model?

3) I would like to predict the second-order latent variable in Model 2. Does constraining one way or another for model identification matter in the prediction of this latent variable? I guess constraining the variance to 1 would normalize it, but are there any rules of thumb that I should be aware of?

4) Because the items are ordinal in nature, 'gsem' seems like it is a better command to use, yet I can't get it to fit to the second-order model. And also because no model fit statistics can be calculated with 'gsem,' I am not sure if it is the right route to take, but still I would appreciate any input in how to get 'gsem' to fit.

Thanks,

Sule

I am using Stata 14, and am working with cross-sectional data.

I am trying to run Confirmatory Factor Analysis (CFA) on nine items using the 'sem' command. The items are 4-category ordinal variables.

What I would like to do is to compare two models, one with a correlated three-factor solution and another with a second-order factor. As a base for comparison, I intend to use AIC, RMSEA, chi2, TLI and CFIs.

Items are called i1, i2, i3, ..i9, and the factors are called f1, f2 and f3.

To fit the correlated-factor model (Model 1), I use the following command:

Code:

sem (f1-> i1 i2 i3) (f2-> i4 i5 i6 i7) (f3-> i7 i8 i9), covstruct(_lexogenous, diagonal) /// latent (f1 f2 f3) cov(f1*f2 f1*f3 f2*f3) nocapslatent difficult standardized

And for the second-order model (Model 2) where I suspect a fourth latent variable, f4, may be an overarching construct (second-order factor) where f1, f2 and f3 loads strongly, I use :

Code:

sem (f1-> i1 i2 i3) (f2-> i4 i5 i6 i7) (f3-> i7 i8 i9) (f4-> f1 f2 f3), /// latent (f1 f2 f3 f4) nocapslatent difficult standardized

For Model 1, Stata introduces some default constraints, by setting the loadings of certain anchor variables to 1. And for model 2, it does the same by an additional constraint of anchoring f4 on f1 by fixing f1's loading to 1. Because I would like to see the factor loadings, instead I constrain the variances of the latent variables to 1.

For Model 1:

Code:

sem (f1-> i1 i2 i3) (f2-> i4 i5 i6 i7) (f3-> i7 i8 i9), covstruct(_lexogenous, diagonal) /// latent (f1 f2 f3) cov(f1@1 f1*f2 f1*f3 f2@1 f2*f3 f3@1) nocapslatent standardized

Code:

sem (f1-> i1 i2 i3) (f2-> i4 i5 i6 i7) (f3-> i7 i8 i9) (f4-> f1 f2 f3), latent (f1 f2 f3 f4) /// cov( e.f1@1 e.f2@1 e.f3@1 f4@1) nocapslatent standardized

However, I am not sure how comparable these two models can be if I omit one of i7 from f2. The fit statistics show poorer measures for Model 2, yet I believe it may be because of the omission of

one item from f2.

I have four questions:

1) Is there another constraint I can add to Model 2 to make it work without omitting a cross-loading item?

2) Using the default constraints added by Stata (the first set of commands), when I look at the fit statistics, I see that the model fit indices are exactly the same. Does that mean that a second-order model does not add anything to the model?

3) I would like to predict the second-order latent variable in Model 2. Does constraining one way or another for model identification matter in the prediction of this latent variable? I guess constraining the variance to 1 would normalize it, but are there any rules of thumb that I should be aware of?

4) Because the items are ordinal in nature, 'gsem' seems like it is a better command to use, yet I can't get it to fit to the second-order model. And also because no model fit statistics can be calculated with 'gsem,' I am not sure if it is the right route to take, but still I would appreciate any input in how to get 'gsem' to fit.

Thanks,

Sule

## Comment