Sem specification with syntax : overriding standard identifaction constraints

Raphael Lefranc

Join Date: Jul 2025

Posts: 10
#1

Sem specification with syntax : overriding standard identifaction constraints

09 Jul 2025, 04:38

Hello everyone,

I am new to Stata (19).

I want to specify a second-order model with the syntax not diagram.

In this model I want to fix to 1 the variance of the first-order factors (A,B,C, D in my syntax) as well as that of the second-order factor (F) instead of fixing to 1 a path from F to A to as well the path of an observed variable within each first-order factors (standard constraints).

In other word would to fix to 1 the variance of the first-order factors (A,B,C, D in my syntax) as well as that of the second-order factor (F) and thus overriding the standard constraints.

I manage to do it for the second order factor typing var (F@1).

It does not work, however, for my first-order factors. If I type "var(A@1) for example I got an error message.

Can anyone help me with this problem ?

Thanks in advance.

Hereunder is my syntax :

clear all
ssd init x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12
ssd set obs 2000
ssd set cov (ltd) 1 .02 1 .02 .02 1 .01 .01 .01 1 .01 .01 .01 .02 1 .01 .01 .01 .02 .02 1 .01 .01 .01 .01 .01 .01 1 .01 .01 .01 .01 .01 .01 .02 1 .01 .01 .01 .01 .01 .01 .02 .02 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 .02 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 .02 .02 1
. sem (A->x1 x2 x3) (B->x4 x5 x6) (C->x7 x8 x9) (D->x10 x11 x12) (A B C D<-F), var (F@1)
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4449
#2

09 Jul 2025, 05:40

Originally posted by Raphael Lefranc View Post

. . . I want to fix to 1 the variance of the first-order factors (A,B,C, D in my syntax) as well as that of the second-order factor (F) instead of fixing to 1 a path from F to A to as well the path of an observed variable within each first-order factors (standard constraints).

I manage to do it for the second order factor typing var (F@1).

It does not work, however, for my first-order factors. If I type "var(A@1) . . .

Latent factors A through D are endogenous, and so you'd fix the variances of e.A, e.B, e.C and e.D. Try something like the following.

Code:

version 19 clear * ssd init x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 ssd set obs 2000 #delimit ; ssd set cov (ltd) 1 .02 1 .02 .02 1 .01 .01 .01 1 .01 .01 .01 .02 1 .01 .01 .01 .02 .02 1 .01 .01 .01 .01 .01 .01 1 .01 .01 .01 .01 .01 .01 .02 1 .01 .01 .01 .01 .01 .01 .02 .02 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 .02 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 .02 .02 1; sem (x1-x3 <- A) (x4-x6 <- B) (x7-x9 <- C) (x10-x12 <- D) (A B C D <- F), variance(e.A@1 e.B@1 e.C@1 e.D@1 F@1); #delimit cr exit
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5024
#3

09 Jul 2025, 12:37

I don't think Joe's solution does exacty what rafael wants. Joe's solution fixes the endogenous latent variable residual variances at 1, but the total variances will be something greater than 1.

You can tack on the standardized option, but then all the observed variables will be standardized too. But maybe that would be ok given what is wanted? [Edit: I am not sure about this sentence, but I think I am right about Joe's solution not being exactly what Rafail wants.]

Last edited by Richard Williams; 09 Jul 2025, 12:41.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4449
#4

09 Jul 2025, 17:40

Originally posted by Richard Williams View Post

. . . but the total variances will be something greater than 1.

It's pretty common to fix the variance of each latent factor to one as an alternative to allowing its variance to be governed by the first manifest variable.

And given his example syntax showing what he's trying to accomplish

I manage to do it for the second order factor typing var (F@1).

It does not work, however, for my first-order factors. If I type "var(A@1) for example I got an error message.

that's how I interpreted Raphael's request.

I've not encountered a situation where it's desired that the sum of variances of latent factors at two levels of a CFA model are constrained to be one. I'm not sure how you would do that without at least specifying an additional constraint, such as that the variances of the latent factors (at the two levels) are equal, because otherwise there is an infinite combination of individual variances that can sum to one.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4449
#5

09 Jul 2025, 17:52

Originally posted by Joseph Coveney View Post

And given his example syntax showing what he's trying to accomplish . . .

. . . and given the title of this thread.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5024
#6

09 Jul 2025, 19:46

Yes, but Rafael says "In this model I want to fix to 1 the variance of the first-order factors (A,B,C, D in my syntax) as well as that of the second-order factor (F) instead of fixing to 1." I don't know how to do that either! But I don't see the point of standardizing if he can't do that. I think he wants the residual variances of A, B, C, D to equal 1 - explained variance.

If he were just analyzing observed variables, he could analyze the correlation matrix, and the variances of all vars would be one. But I don't know how you can do that with latent endogenous vars.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4449
#7

09 Jul 2025, 21:52

Originally posted by Richard Williams View Post

Yes, but Rafael says "In this model I want to fix to 1 the variance of the first-order factors (A,B,C, D in my syntax) as well as that of the second-order factor (F) instead of fixing to 1."

Well, I read it as, "In this model I want to fix to 1 the variance of the first-order factors (A,B,C, D in my syntax) as well as that of the second-order factor (F) instead of fixing to 1 a path from F to A to as well the path of an observed variable within each first-order factors (standard constraints)." [Emphasis added]

But maybe Rafael can clarify for both of us, and then we can go from there if needed.
Comment
Raphael Lefranc

Join Date: Jul 2025

Posts: 10
#8

10 Jul 2025, 03:52

Dear Joseph and Richard,

Thank you very much for your helpful reply and comments.

I will reply following the chronological order.

Richard, I do not know the standardized option. It is interesting. How do you specify it ?

As Joseph wrote "it is common to fix all latent variables variance to 1" and this is precisely what I want to do.

However, I am a little bit confused, at the moment, given the possibilities you gave me.

All these specifications are simple for me with Lavaan or Mplus but difficult at the moment with Stata.

A last question. I want to constraint manually to zero all covariances to between my latent variables. Depiste reading carefully documentation on the topic it is not very clear for me.

Could you help me in what I have to add to this line to constraint it manually ? sem (A->x1 x2 x3) (B->x4 x5 x6) (C->x7 x8 x9) (D->x10 x11 x12) (A B C D<-F)

Thank you again.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5024
#9

10 Jul 2025, 05:31

Raphael, are you sure about Lavaan making this simple? According to this post

https://stats.stackexchange.com/ques...ance-in-lavaan

"Lavaan (and most other software) makes it trivially easy to constrain exogenous latent variable variances and endogenous latent variable residuals to 1 (or any other number), but I haven't found a good way to constrain the endogenous latent variable variance itself. I understand that this isn't frequently done in SEM because typically the latent variable metric is relatively unimportant/not interpreted, but in this case it would be extremely useful."

In any event if lavaan or mplus really does make it easy why not use them?

As for standardized, just add standardized as a sem option, in your case make it the last thing on your sem command. But I don't think it does the standardization you want.

Given what you want, it would be nice if there was a standardizedlatent option. If your life truly depended on it you might be able to rescale coefficients by hand. Or maybe run the model like Joseph suggested and then figure out what the residual variances had to be fixed at so the total variance was 1.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4449
#10

10 Jul 2025, 05:55

Originally posted by Raphael Lefranc View Post

Richard, I do not know the standardized option. It is interesting. How do you specify it ?

You use the standardized option to sem.

"it is common to fix all latent variables variance to 1" and this is precisely what I want to do.

Yeah, I took your request at face value, that is, nothing more than alternative identification constraints. On the other hand, if I understand him correctly, I think Richard assumes that what you're really after is a standardized presentation of the model, which as Richard mentions, is not obvious how to do with endogenous latent factors in the model. (We're not certain that just adding standardized to the sem command line will give you something that can be interpreted in the usual manner as a standardized CFA when it's second-order. At least I'm not certain.)

A last question. I want to constraint manually to zero all covariances to between my latent variables. . . . Could you help me in what I have to add to this line to constraint it manually ? sem (A->x1 x2 x3) (B->x4 x5 x6) (C->x7 x8 x9) (D->x10 x11 x12) (A B C D<-F)

In short, you have to get rid of the second-order latent factor, because it automatically represents a covariance structure between your first-order latent factors. More to your question, it would be like this.

Code:

sem /// (x1-x3 <- A) /// (x4-x6 <- B) /// (x7-x9 <- C) /// (x10-x12 <- D), /// variance(A@1 B@1 C@1 D@1) /// covariance(A*B@0 A*C@0 A*D@0 B*C@0 B*D@0 C*D@0)

Note that you can't use the covstructure(_LEx, identity) opton in this case as far as I'm aware, because it will override your constraints on the variances immediately before.

But if Richard is correct, and what you're really after is a standardized presentation with zero correlation between the latent factors, then you're probably better off with something like this.

Code:

sem /// (x1-x3 <- A) /// (x4-x6 <- B) /// (x7-x9 <- C) /// (x10-x12 <- D), /// standardized /// covstructure(_LEx, diagonal)
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5024
#11

10 Jul 2025, 06:15

I think Joe is right or at least getting closer. This second order factor makes it hard to do what you want. Also

"A last question. I want to constraint manually to zero all covariances to between my latent variables. "

As is, the covariances between A, B, C, and D can't be zero, because they share F as a common cause. Get rid of F and you could make them be uncorrelated. But why do you want to make them be uncorrelated in the first place? The fit will probably be terrible. One consequence is that the Xs associated with one latent variable would be totally uncorrelated with the Xs associated with other latent variables. Unless the sets of variables were designed to be uncorrelated with the other sets, that is generally unlikely.

In short you have some conflicts here. You want a 2nd order factor but you also want the latent vars to be uncorrelated. You can't have both.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Raphael Lefranc

Join Date: Jul 2025

Posts: 10
#12

10 Jul 2025, 06:18

Tank you very much for your reply.

I think I made a mistake and I am sorry. Let me explain you why.

I would like first to estimate a bifactor model. That is :

clear all
ssd init x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12
ssd set obs 2000
ssd set cov (ltd) 1 .02 1 .02 .02 1 .01 .01 .01 1 .01 .01 .01 .02 1 .01 .01 .01 .02 .02 1 .01 .01 .01 .01 .01 .01 1 .01 .01 .01 .01 .01 .01 .02 1 .01 .01 .01 .01 .01 .01 .02 .02 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 .02 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 .02 .02 1
. sem (A->x1 x2 x3) (B->x4 x5 x6) (C->x7 x8 x9) (D->x10 x11 x12) (x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12<-F), var (F@1 A@1 B@1 C@1 D@1)

In this case no problem to fix the variance of all latent variable to one because (A,B,C,D) do not depend on F.

Then I want to estimate a second-order model (my question) :

clear all
ssd init x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12
ssd set obs 2000
ssd set cov (ltd) 1 .02 1 .02 .02 1 .01 .01 .01 1 .01 .01 .01 .02 1 .01 .01 .01 .02 .02 1 .01 .01 .01 .01 .01 .01 1 .01 .01 .01 .01 .01 .01 .02 1 .01 .01 .01 .01 .01 .01 .02 .02 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 .02 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 .02 .02 1
. sem (A->x1 x2 x3) (B->x4 x5 x6) (C->x7 x8 x9) (D->x10 x11 x12) (x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12<-F), var (F@1).

Often with this model, researchers also fix the the variance of the general factor (F) to 1 as well as what I was thinking of the first-order factors to 1.

But it seems that I made a mistake and it is in fact the residual variance of first-order factor that is fis fixed to 1 and not the total variance. Richard thank you very much to make me understand that. I apologize.

I think I made a mistake because generally one speak about fixing the variance at 1 of a latent variable but in this case the latent variable depends on another...

Therefore, if I ma still true the syntax of Joseph (e.A@1 e.B@1 e.C@1 e.D@1 F@1) will do what I want.

Also, thank you very much Joseph for you help regarding constraint on covariances.
Comment
Raphael Lefranc

Join Date: Jul 2025

Posts: 10
#13

10 Jul 2025, 06:31

I made a mistake the second-order model is this one obviously :

clear all
ssd init x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12
ssd set obs 2000
ssd set cov (ltd) 1 .02 1 .02 .02 1 .01 .01 .01 1 .01 .01 .01 .02 1 .01 .01 .01 .02 .02 1 .01 .01 .01 .01 .01 .01 1 .01 .01 .01 .01 .01 .01 .02 1 .01 .01 .01 .01 .01 .01 .02 .02 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 .02 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 .02 .02 1
. sem (A->x1 x2 x3) (B->x4 x5 x6) (C->x7 x8 x9) (D->x10 x11 x12) (A B C D<-F),
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5024
#14

10 Jul 2025, 06:42

Rafael, this is indeed very different and makes much more sense. Observed variables can be affected by more than one latent variable.

All latent vars are now exogenous so you can set all their variances at 1 if you want.

You can also now easily make all the latent vars have zero correlation if you want to, but does that make sense? The Xs would be correlated thanks to all having X as a cause. One way your specification might make sense is if A, B, C and D are methods artifact factors that only affected the vars in their set, e.g. something about the wording of the Qs in each set affected them but not the Xs in the other sets. But you may have some other reasonable justification.

EDIT: I didn't see your 2nd order factor post before. Are you giving up on that, or do you still want it? It isn't compatible with the other things you say you want.

Last edited by Richard Williams; 10 Jul 2025, 06:58.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 442
#15

10 Jul 2025, 10:35

I wonder if Raphael is interested in a bifactor model? It is one of the few "common" measurement models in which people set covariances between latent factors to 0. There, you always set the covariances between the general factor and the specific factors to 0. The covariances between the specific factors themselves are often set to 0 but this isn't a necessary requirement. See here and there have also been some Statalist discussions about bifactor models.
Comment

Announcement

Sem specification with syntax : overriding standard identifaction constraints

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment