Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • sem and gsem can produce very different coefficients in the same linear regression

    I'm trying to fit a linear structural equation (along with a nonlinear measurement model) in a panel data setting. I have noticed that the commands sem and gsem produce very different estimates on the same model specification, and I don't understand why.

    Here's a stylized example (without a measurement model):
    Code:
    webuse nlswork, clear
    
    reg d.ln_wage d.hours d.union
    
    sem (ln_wage <- hours@h union@u FE@1 _cons@c) ///
        (l.ln_wage <- l.hours@h l.union@u FE@1 _cons@c)
    
    gsem (ln_wage <- hours@h union@u FE@1 _cons@c) ///
        (l.ln_wage <- l.hours@h l.union@u FE@1 _cons@c) ///
        , listwise
    The coefficients from the sem command are close to the ones from the differenced OLS regression; those from the gsem command differ a lot. Can anyone explain why?

  • #2
    what is that 'listwise' option doing aftre gsem? If you are after listwise deletion of cases after gsem, that is a potential mismatch with sem. ML estimates are based on the likelihood fucntion given distribution of your data. Naturally the distribution of your data in sem and gsem are not same because of the listwise deletion and can produce different result if different samples are being used for estimation. GSEM can accomodate more data points with missing values than SEM can, on the otherhand, SEM has the 'mlmv' option that can incorporate and accomodate missing values implementing full information maximum likelihood. But any of these will need to meet the multvariate normality assumption for joint probability distribution of data given the parameter. However, try running both commands using complete cases and see if they differ.
    Roman

    Comment


    • #3
      Roman's point is well taken, but I don't think it's the root of the problem here. I ran the code that Thorsten posted, and the N is the same for all three analyses, so I think the estimation samples are the same.

      Comment


      • #4
        Perhaps a difference in the default family and link functions in gsem? There might also be differences in how gsem is treating the errors between the two models, but this definitely seems a bit strange at first glance.

        Comment


        • #5
          Roman Mostazir according to the documentation, listwise "appl[ies] sem’s (not gsem’s) rules for omitting observations with missing values", so the samples should be the same. I could have pointed that out above, sorry. First I suspected, based on the different estimates, gsem would pose an assumption of zero covariance between latent exogenous variables and observed exogenous variables, but that is also not the case according to the documentation: "[Covariance] assumed to be nonzero" according to "intro 4".

          wbuchanan The default for gsem is the Gaussian family with the identity link function; so it's the same as in sem. And both commands employ ml as the default method.

          Comment


          • #6
            Agree with Clyde, listwise is not the problem and after running your commands it produced same sample size. The problem lies with the additional error covariance parameters estimated by sem but not estimated by gsem. sem by default assuming covariance structure betwen the error terms even not specified. If you define diagonal covariance structure for the error terms in sem (assuming they are not corrlated), both commands will produce identical results:

            Code:
            gsem:
            =====================
            gsem (ln_wage <- hours@h union@u FE@1 _cons@c) ///
                (l.ln_wage <- l.hours@h l.union@u FE@1 _cons@c) ///
                , listwise  method(ml)
            
            gsem Output:
            ====================
            
            --------------------------------------------------------------------------------
                           |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            ---------------+----------------------------------------------------------------
            ln_wage <-     |
                     hours |   -.004186   .0004131   -10.13   0.000    -.0049956   -.0033763
                     union |   .1095575   .0080115    13.68   0.000     .0938552    .1252597
                        FE |          1  (constrained)
                     _cons |   1.920513   .0165162   116.28   0.000     1.888142    1.952884
            ---------------+----------------------------------------------------------------
            L.ln_wage <-   |
                     hours |
                       L1. |   -.004186   .0004131   -10.13   0.000    -.0049956   -.0033763
                           |
                     union |
                       L1. |   .1095575   .0080115    13.68   0.000     .0938552    .1252597
                           |
                        FE |          1  (constrained)
                     _cons |   1.920513   .0165162   116.28   0.000     1.888142    1.952884
            ---------------+----------------------------------------------------------------
                    var(FE)|   .1565935   .0032284                       .150392    .1630507
            ---------------+----------------------------------------------------------------
             var(e.ln_wage)|   .0348415   .0015021                      .0320183    .0379136
            var(Le.ln_wage)|   .0271895   .0014486                      .0244935    .0301822
            --------------------------------------------------------------------------------
            
            sem
            ==================
            sem (ln_wage <- hours@h union@u FE@1 _cons@c) ///
                (l.ln_wage <- l.hours@h l.union@u FE@1 _cons@c), method(ml) covstructure(_Ex,diag)
            
            sem Output:
            ====================
            ---------------+----------------------------------------------------------------
            Structural     |
              ln_wage <-   |
                     hours |   -.004186   .0004131   -10.13   0.000    -.0049956   -.0033763
                     union |   .1095575   .0080115    13.68   0.000     .0938552    .1252597
                        FE |          1  (constrained)
                     _cons |   1.920513   .0165162   116.28   0.000     1.888142    1.952884
              -------------+----------------------------------------------------------------
              L.ln_wage <- |
                     hours |
                       L1. |   -.004186   .0004131   -10.13   0.000    -.0049956   -.0033763
                           |
                     union |
                       L1. |   .1095575   .0080115    13.68   0.000     .0938552    .1252597
                           |
                        FE |          1  (constrained)
                     _cons |   1.920513   .0165162   116.28   0.000     1.888142    1.952884
            ---------------+----------------------------------------------------------------
            
            /*comment: additional outputs are ommitted*/






            Last edited by Roman Mostazir; 21 Feb 2016, 09:56. Reason: added comment
            Roman

            Comment


            • #7
              Great catch. The manual entry says

              The default covariance structure for exogenous variables is covstructure( Ex, unstructured)
              for sem. There is no simple way in this notation to write the default for gsem.

              I would think you'd want to go the other way -- have gsem handle things the same way sem does. But I am having trouble figuring out how to do that.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              Stata Version: 17.0 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Thank you Roman, the outputs are clearly the same, but I believe that the reason is a bit different from your explanation. Please correct me if I'm wrong.

                sem's option covstructure(_Ex,diag) is related to all exogenous variables (latent and observed) in the model, not the error terms. So it looks to me you specified the covariance between the latent variable FE and both observed exogenous variables to be zero. (This is stated on p. 505 of the SEM manual, "sem and gsem option covstructure( )".)

                But given the identical output in your example that must mean that gsem also assumes the covariance between latent exogenous variables and observed exogenous variables to be zero by default. However, on p. 48 of the manual it says "Covariances between latent exogenous and observed exogenous variables ... [a]ssumed to be nonzero. Cannot be estimated or constrained because this covariance is not among the identified parameters of the generalized SEM." Or am I mistaken here?

                Comment


                • #9
                  One other tidbit: If you specify predict after running sem, it wants you to specify two variables:

                  Code:
                  . predict v1
                  (xb(ln_wage L.ln_wage) assumed)
                  too few variables specified
                  r(102);
                  
                  . predict v1 v2
                  (xb(ln_wage L.ln_wage) assumed)
                  But if you specify predict after gsem it only wants one variable:

                  Code:
                  . predict xv1 xv2
                  (option mu assumed)
                  (option conditional(ebmeans) assumed)
                  (using 7 quadrature points)
                  too many variables specified
                  r(103);
                  
                  . predict xv1
                  (option mu assumed)
                  (option conditional(ebmeans) assumed)
                  (using 7 quadrature points)
                  (22555 missing values generated)
                  I am guessing that is why it is relatively easy to get sem to clone gsem's default, but harder to get gsem to clone sem's defaults (at least I haven't figured it out yet).

                  I find it a little odd that the programs behave differently in these respects but I assume there are good reasons fo rit.
                  -------------------------------------------
                  Richard Williams, Notre Dame Dept of Sociology
                  Stata Version: 17.0 MP (2 processor)

                  EMAIL: [email protected]
                  WWW: https://www3.nd.edu/~rwilliam

                  Comment


                  • #10
                    I have a question related to the likelihood function employed in gsem. Let's assume for a second that the exogenous variables in the vector x were not fixed but random, and we would have a joint probability density function over the vector of response variables, y, the exogenous observed variables, x, and the vector of latent variables, u: g(y,x,u). This could be decomposed into f(y|x,u) * h(u|x) * k(x), right? Now I might not be interested in the marginal distribution of x and leave k(x) out of the likelihood.

                    Now, the likelihood in gsem is the integral over u of f(y|x,u) * h(u) (with h() the normal density function). If x were indeed random, this would imply that h(u|x) = h(u), i.e. that x,u are independent, or am I wrong? However, x is assumed to be fixed. Can anyone explain to me what this means for the implied relationship between x and u?
                    Last edited by Thorsten Kemper; 23 Feb 2016, 12:04.

                    Comment


                    • #11
                      You've now asking questions completely unrelated to your original one. Please start a new topic.
                      Steve Samuels
                      Statistical Consulting
                      [email protected]

                      Stata 14.2

                      Comment


                      • #12
                        I believe the question is very much related to the original one. A poster has pointed out that gsem's output in the original example is identical to sem's when the covariance betw exogenous observed variables and latent observed ones is restricted to be zero. I'd like to find out why.

                        Comment


                        • #13
                          Hi Thorsten, apology for the delay in replying. As I am traveling can't check all that's referred to the manual. Being honest, I have not gone through the whole manual of gsem and haven't done much work with it. My reply to your question was kind of mechanistic. I thought, if we claim two things to be equal, we need equivalent functions set up for both. Seeing the sem and gsem commands (your ones) producing different results, my first reflection was that they have different covariance parameters too (seeing reported results). You can further check it if you type 'estat vce', after gsem and you will find no covariance parameter being reported by gsem while sem does both in results output and after 'estat vce' command. Therefore, we cannot claim they are producing different results after they are set to same functions. As I did not know (still don't) how to clone the sem results to gsem, I thought the other way around (as Richard picked up), made sem to behave like gsem (not estimating any covariance) and that produced the identical results and perhaps a basis to claim that when set to the same functions, both commands produce same results.

                          Hope somebody else can shed further light on it as your claims in reference to the manual about default non-zero cov in 'gsem' is valid and I am confused too. In fact that is what is expected i.e. some sort of correlation among the predictors. If the default is non-zero, the 'estat vce' command at least should show the estimated covariance parameters if for any reasons they are being ommitted from results output. Re: your second question, I understand that you may have different dimension in thinking relevant to this thread yet I think I agree with Steve that this deserve a new thread and lets stick to this one focused to see if we get some new shades. The new question may take the discussion to different direction. You can always start a new thread.
                          Best,
                          Roman

                          Comment


                          • #14
                            Roman, Steve,

                            thank you for your advice regarding my post #10; I have opened a new thread for it.

                            Thanks for all suggestions sofar to all.

                            Comment

                            Working...
                            X