Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sem specification with syntax : overriding standard identifaction constraints

    Hello everyone,

    I am new to Stata (19).

    I want to specify a second-order model with the syntax not diagram.

    In this model I want to fix to 1 the variance of the first-order factors (A,B,C, D in my syntax) as well as that of the second-order factor (F) instead of fixing to 1 a path from F to A to as well the path of an observed variable within each first-order factors (standard constraints).

    In other word would to fix to 1 the variance of the first-order factors (A,B,C, D in my syntax) as well as that of the second-order factor (F) and thus overriding the standard constraints.

    I manage to do it for the second order factor typing var (F@1).

    It does not work, however, for my first-order factors. If I type "var(A@1) for example I got an error message.

    Can anyone help me with this problem ?

    Thanks in advance.

    Hereunder is my syntax :

    clear all
    ssd init x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12
    ssd set obs 2000
    ssd set cov (ltd) 1 .02 1 .02 .02 1 .01 .01 .01 1 .01 .01 .01 .02 1 .01 .01 .01 .02 .02 1 .01 .01 .01 .01 .01 .01 1 .01 .01 .01 .01 .01 .01 .02 1 .01 .01 .01 .01 .01 .01 .02 .02 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 .02 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 .02 .02 1
    . sem (A->x1 x2 x3) (B->x4 x5 x6) (C->x7 x8 x9) (D->x10 x11 x12) (A B C D<-F), var (F@1)




  • #2
    Originally posted by Raphael Lefranc View Post
    . . . I want to fix to 1 the variance of the first-order factors (A,B,C, D in my syntax) as well as that of the second-order factor (F) instead of fixing to 1 a path from F to A to as well the path of an observed variable within each first-order factors (standard constraints).

    I manage to do it for the second order factor typing var (F@1).

    It does not work, however, for my first-order factors. If I type "var(A@1) . . .
    Latent factors A through D are endogenous, and so you'd fix the variances of e.A, e.B, e.C and e.D. Try something like the following.
    Code:
    version 19
    
    clear *
    
    ssd init x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12
    
    ssd set obs 2000
    
    #delimit ;
    ssd set cov (ltd) 
        1 .02 1 .02 .02 1 .01 .01 .01 1 .01 .01 .01 .02 1 .01 .01 
        .01 .02 .02 1 .01 .01 .01 .01 .01 .01 1 .01 .01 .01 .01 .01 
        .01 .02 1 .01 .01 .01 .01 .01 .01 .02 .02 1 .01 .01 .01 .01 
        .01 .01 .01 .01 .01 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 
        .02 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 .02 .02 1;
    sem 
        (x1-x3 <- A)
        (x4-x6 <- B)
        (x7-x9 <- C)
        (x10-x12 <- D)
        (A B C D <- F),
        variance(e.A@1 e.B@1 e.C@1 e.D@1 F@1);
    #delimit cr
    
    exit

    Comment


    • #3
      I don't think Joe's solution does exacty what rafael wants. Joe's solution fixes the endogenous latent variable residual variances at 1, but the total variances will be something greater than 1.

      You can tack on the standardized option, but then all the observed variables will be standardized too. But maybe that would be ok given what is wanted? [Edit: I am not sure about this sentence, but I think I am right about Joe's solution not being exactly what Rafail wants.]
      Last edited by Richard Williams; 09 Jul 2025, 12:41.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://academicweb.nd.edu/~rwilliam/

      Comment


      • #4
        Originally posted by Richard Williams View Post
        . . . but the total variances will be something greater than 1.
        It's pretty common to fix the variance of each latent factor to one as an alternative to allowing its variance to be governed by the first manifest variable.

        And given his example syntax showing what he's trying to accomplish

        I manage to do it for the second order factor typing var (F@1).

        It does not work, however, for my first-order factors. If I type "var(A@1) for example I got an error message.
        that's how I interpreted Raphael's request.

        I've not encountered a situation where it's desired that the sum of variances of latent factors at two levels of a CFA model are constrained to be one. I'm not sure how you would do that without at least specifying an additional constraint, such as that the variances of the latent factors (at the two levels) are equal, because otherwise there is an infinite combination of individual variances that can sum to one.

        Comment


        • #5
          Originally posted by Joseph Coveney View Post
          And given his example syntax showing what he's trying to accomplish . . .
          . . . and given the title of this thread.

          Comment


          • #6
            Yes, but Rafael says "In this model I want to fix to 1 the variance of the first-order factors (A,B,C, D in my syntax) as well as that of the second-order factor (F) instead of fixing to 1." I don't know how to do that either! But I don't see the point of standardizing if he can't do that. I think he wants the residual variances of A, B, C, D to equal 1 - explained variance.

            If he were just analyzing observed variables, he could analyze the correlation matrix, and the variances of all vars would be one. But I don't know how you can do that with latent endogenous vars.
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            StataNow Version: 19.5 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://academicweb.nd.edu/~rwilliam/

            Comment


            • #7
              Originally posted by Richard Williams View Post
              Yes, but Rafael says "In this model I want to fix to 1 the variance of the first-order factors (A,B,C, D in my syntax) as well as that of the second-order factor (F) instead of fixing to 1."
              Well, I read it as, "In this model I want to fix to 1 the variance of the first-order factors (A,B,C, D in my syntax) as well as that of the second-order factor (F) instead of fixing to 1 a path from F to A to as well the path of an observed variable within each first-order factors (standard constraints)." [Emphasis added]

              But maybe Rafael can clarify for both of us, and then we can go from there if needed.

              Comment


              • #8
                Dear Joseph and Richard,

                Thank you very much for your helpful reply and comments.

                I will reply following the chronological order.

                Richard, I do not know the standardized option. It is interesting. How do you specify it ?

                As Joseph wrote "it is common to fix all latent variables variance to 1" and this is precisely what I want to do.

                However, I am a little bit confused, at the moment, given the possibilities you gave me.

                All these specifications are simple for me with Lavaan or Mplus but difficult at the moment with Stata.

                A last question. I want to constraint manually to zero all covariances to between my latent variables. Depiste reading carefully documentation on the topic it is not very clear for me.

                Could you help me in what I have to add to this line to constraint it manually ? sem (A->x1 x2 x3) (B->x4 x5 x6) (C->x7 x8 x9) (D->x10 x11 x12) (A B C D<-F)

                Thank you again.

                Comment


                • #9
                  Raphael, are you sure about Lavaan making this simple? According to this post

                  https://stats.stackexchange.com/ques...ance-in-lavaan

                  "Lavaan (and most other software) makes it trivially easy to constrain exogenous latent variable variances and endogenous latent variable residuals to 1 (or any other number), but I haven't found a good way to constrain the endogenous latent variable variance itself. I understand that this isn't frequently done in SEM because typically the latent variable metric is relatively unimportant/not interpreted, but in this case it would be extremely useful."

                  In any event if lavaan or mplus really does make it easy why not use them?

                  As for standardized, just add standardized as a sem option, in your case make it the last thing on your sem command. But I don't think it does the standardization you want.

                  Given what you want, it would be nice if there was a standardizedlatent option. If your life truly depended on it you might be able to rescale coefficients by hand. Or maybe run the model like Joseph suggested and then figure out what the residual variances had to be fixed at so the total variance was 1.
                  -------------------------------------------
                  Richard Williams, Notre Dame Dept of Sociology
                  StataNow Version: 19.5 MP (2 processor)

                  EMAIL: [email protected]
                  WWW: https://academicweb.nd.edu/~rwilliam/

                  Comment


                  • #10
                    Originally posted by Raphael Lefranc View Post
                    Richard, I do not know the standardized option. It is interesting. How do you specify it ?
                    You use the standardized option to sem.

                    "it is common to fix all latent variables variance to 1" and this is precisely what I want to do.
                    Yeah, I took your request at face value, that is, nothing more than alternative identification constraints. On the other hand, if I understand him correctly, I think Richard assumes that what you're really after is a standardized presentation of the model, which as Richard mentions, is not obvious how to do with endogenous latent factors in the model. (We're not certain that just adding standardized to the sem command line will give you something that can be interpreted in the usual manner as a standardized CFA when it's second-order. At least I'm not certain.)

                    A last question. I want to constraint manually to zero all covariances to between my latent variables. . . . Could you help me in what I have to add to this line to constraint it manually ? sem (A->x1 x2 x3) (B->x4 x5 x6) (C->x7 x8 x9) (D->x10 x11 x12) (A B C D<-F)
                    In short, you have to get rid of the second-order latent factor, because it automatically represents a covariance structure between your first-order latent factors. More to your question, it would be like this.
                    Code:
                    sem ///
                        (x1-x3 <- A) ///
                        (x4-x6 <- B) ///
                        (x7-x9 <- C) ///
                        (x10-x12 <- D), ///
                            variance(A@1 B@1 C@1 D@1) ///
                            covariance(A*B@0 A*C@0 A*D@0 B*C@0 B*D@0 C*D@0)
                    Note that you can't use the covstructure(_LEx, identity) opton in this case as far as I'm aware, because it will override your constraints on the variances immediately before.

                    But if Richard is correct, and what you're really after is a standardized presentation with zero correlation between the latent factors, then you're probably better off with something like this.
                    Code:
                    sem ///
                        (x1-x3 <- A) ///
                        (x4-x6 <- B) ///
                        (x7-x9 <- C) ///
                        (x10-x12 <- D), /// 
                            standardized ///
                            covstructure(_LEx, diagonal)

                    Comment


                    • #11
                      I think Joe is right or at least getting closer. This second order factor makes it hard to do what you want. Also

                      "A last question. I want to constraint manually to zero all covariances to between my latent variables. "

                      As is, the covariances between A, B, C, and D can't be zero, because they share F as a common cause. Get rid of F and you could make them be uncorrelated. But why do you want to make them be uncorrelated in the first place? The fit will probably be terrible. One consequence is that the Xs associated with one latent variable would be totally uncorrelated with the Xs associated with other latent variables. Unless the sets of variables were designed to be uncorrelated with the other sets, that is generally unlikely.

                      In short you have some conflicts here. You want a 2nd order factor but you also want the latent vars to be uncorrelated. You can't have both.
                      -------------------------------------------
                      Richard Williams, Notre Dame Dept of Sociology
                      StataNow Version: 19.5 MP (2 processor)

                      EMAIL: [email protected]
                      WWW: https://academicweb.nd.edu/~rwilliam/

                      Comment


                      • #12
                        Tank you very much for your reply.

                        I think I made a mistake and I am sorry. Let me explain you why.

                        I would like first to estimate a bifactor model. That is :

                        clear all
                        ssd init x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12
                        ssd set obs 2000
                        ssd set cov (ltd) 1 .02 1 .02 .02 1 .01 .01 .01 1 .01 .01 .01 .02 1 .01 .01 .01 .02 .02 1 .01 .01 .01 .01 .01 .01 1 .01 .01 .01 .01 .01 .01 .02 1 .01 .01 .01 .01 .01 .01 .02 .02 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 .02 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 .02 .02 1
                        . sem (A->x1 x2 x3) (B->x4 x5 x6) (C->x7 x8 x9) (D->x10 x11 x12) (x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12<-F), var (F@1 A@1 B@1 C@1 D@1)

                        In this case no problem to fix the variance of all latent variable to one because (A,B,C,D) do not depend on F.

                        Then I want to estimate a second-order model (my question) :

                        clear all
                        ssd init x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12
                        ssd set obs 2000
                        ssd set cov (ltd) 1 .02 1 .02 .02 1 .01 .01 .01 1 .01 .01 .01 .02 1 .01 .01 .01 .02 .02 1 .01 .01 .01 .01 .01 .01 1 .01 .01 .01 .01 .01 .01 .02 1 .01 .01 .01 .01 .01 .01 .02 .02 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 .02 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 .02 .02 1
                        . sem (A->x1 x2 x3) (B->x4 x5 x6) (C->x7 x8 x9) (D->x10 x11 x12) (x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12<-F), var (F@1).


                        Often with this model, researchers also fix the the variance of the general factor (F) to 1 as well as what I was thinking of the first-order factors to 1.

                        But it seems that I made a mistake and it is in fact the residual variance of first-order factor that is fis fixed to 1 and not the total variance. Richard thank you very much to make me understand that. I apologize.

                        I think I made a mistake because generally one speak about fixing the variance at 1 of a latent variable but in this case the latent variable depends on another...

                        Therefore, if I ma still true the syntax of Joseph (e.A@1 e.B@1 e.C@1 e.D@1 F@1) will do what I want.

                        Also, thank you very much Joseph for you help regarding constraint on covariances.




                        Comment


                        • #13
                          I made a mistake the second-order model is this one obviously :

                          clear all
                          ssd init x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12
                          ssd set obs 2000
                          ssd set cov (ltd) 1 .02 1 .02 .02 1 .01 .01 .01 1 .01 .01 .01 .02 1 .01 .01 .01 .02 .02 1 .01 .01 .01 .01 .01 .01 1 .01 .01 .01 .01 .01 .01 .02 1 .01 .01 .01 .01 .01 .01 .02 .02 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 .02 1 .01 .01 .01 .01 .01 .01 .01 .01 .01 .02 .02 1
                          . sem (A->x1 x2 x3) (B->x4 x5 x6) (C->x7 x8 x9) (D->x10 x11 x12) (A B C D<-F),

                          Comment


                          • #14
                            Rafael, this is indeed very different and makes much more sense. Observed variables can be affected by more than one latent variable.

                            All latent vars are now exogenous so you can set all their variances at 1 if you want.

                            You can also now easily make all the latent vars have zero correlation if you want to, but does that make sense? The Xs would be correlated thanks to all having X as a cause. One way your specification might make sense is if A, B, C and D are methods artifact factors that only affected the vars in their set, e.g. something about the wording of the Qs in each set affected them but not the Xs in the other sets. But you may have some other reasonable justification.

                            EDIT: I didn't see your 2nd order factor post before. Are you giving up on that, or do you still want it? It isn't compatible with the other things you say you want.
                            Last edited by Richard Williams; 10 Jul 2025, 06:58.
                            -------------------------------------------
                            Richard Williams, Notre Dame Dept of Sociology
                            StataNow Version: 19.5 MP (2 processor)

                            EMAIL: [email protected]
                            WWW: https://academicweb.nd.edu/~rwilliam/

                            Comment


                            • #15
                              I wonder if Raphael is interested in a bifactor model? It is one of the few "common" measurement models in which people set covariances between latent factors to 0. There, you always set the covariances between the general factor and the specific factors to 0. The covariances between the specific factors themselves are often set to 0 but this isn't a necessary requirement. See here and there have also been some Statalist discussions about bifactor models.

                              Comment

                              Working...
                              X