Does Stata's gsem command assume zero covariance between latent and observed exogenous variables?

Thorsten Kemper

Join Date: Nov 2015

Posts: 14
#1

Does Stata's gsem command assume zero covariance between latent and observed exogenous variables?

24 Feb 2016, 03:50

I have more of a basic statistics-related question regarding the models underlying Stata's gsem command.

Please bear with me for a short moment before I pose the actual question.

In general, gsem treats observed exogenous variables as given: "Generalized SEMs drop the observed [exogenous] variables from the joint-normality assumption. Instead, generalized SEMs treat the observed exogenous variables as given and produce estimates conditional on their values" (p. 43, SEM manual). Regarding the covariance between latent variables and observed exogenous variables, the manual says "Covariances between latent exogenous and observed exogenous variables [... are a]ssumed to be nonzero. Cannot be estimated or constrained because this covariance is not among the identified parameters of the generalized SEM" (p. 48).

What I don't understand is what "taking as given" regressors actually implies in the context of ml estimation.

To get closer to my question, let's assume for a second that the exogenous variables in the vector x were not fixed but random, and we would have a joint probability density function over the vector of response variables, y, the exogenous observed variables, x, and the vector of latent variables, u: g(y,x,u). This could be factorized into f(y|x,u) * h(u|x) * k(x), right? Suppose my goal was to estimate parameters of the conditional density f(y|x,u). Now I might not be interested in the marginal distribution of x and leave k(x) out of the likelihood.

Now, the likelihood in gsem is the integral of f(y|x,u) * h(u) over u (instead of h(u|x), with h() the normal density function). If x were indeed random, this would imply that h(u|x) = h(u), i.e. that x,u are independent (and hence uncorrelated), or am I wrong?

However, x is assumed to be given. My main question is this: Can anyone explain to me what this means for the implied relationship between x and u?
(Disclaimer: I have posted the last three paragraphs in another thread and got the advice to open a new one instead.)

Why does it matter? My impression is that gsem does in fact assume the covariance between latent variables and observed exogenous variables to be zero. In the following code I have simulated a data set with correlated exogenous observed and latent variables and run an instrumental variable regression and the commands sem and gsem.

sem's output is very close to ivreg's. gsem's differs a lot to the latter. It is identical, however, to sem's output when I explicitly restrict the covariance between all exogenous variables to be zero in sem (even though I know their covariance is not zero).

Code:

clear set seed 2000 set obs 2000 cap drop l y1 y2 y3 gen l = rnormal() // Latent variable gen y1 = l + .7*rnormal() // Measurement 1 gen y2 = .5*l + .3*rnormal() // Measurement 2 gen y3 = -.4*l + .6*rnormal() // Measurement 3 gen x1 = .5 + .6*l + .3*rnormal() // Observed exog. regressor 1 gen x2 = .8 -.3*l + .8*x1 + .4*rnormal() // Observed exog. regressor 2 gen y = .2 + x1 + .3*x2 + l + rnormal() // Structural equation ivreg y x1 x2 (y1 = y2 y3) sem (y <- x1 x2 L) /// (y1 <- L@1 ) /// (y2 y3 <- L) gsem (y <- x1 x2 L) /// (y1 <- L@1 ) /// (y2 y3 <- L) sem (y <- x1 x2 L) /// (y1 <- L@1 ) /// (y2 y3 <- L), covstructure(_Ex, diag)
Tags: None
Thorsten Kemper

Join Date: Nov 2015

Posts: 14
#2

26 Feb 2016, 09:43

Dear all, can I kindly ask those who deal with structural equation modeling to reconsider my question? Any help would be much appreciated. Maybe I miss something too obvious, but even then I would be very grateful if someone could correct me.

In turn, if my argument was valid, this would mean that Stata's documentation was in part misleading (wrt to the nonzero covariance statement), and in my opinion it should be very prominently stated, when comparing sem and gsem, that when using the latter, one will have to assume that the covariance of latent and observed exogenous variables is in fact zero.
Comment
Dick Campbell

Join Date: Apr 2014

Posts: 279
#3

01 Mar 2016, 10:11

Example 33g in the SEM manual shows that using gsem to do a simple logistic regression produces exactly the same results as logistic which must mean that in a model where there are no latent variables gsem allows the exogenous variables to be freely correlated rather than assume that they are independent. However Thorsten's example strongly suggests that if one imposes a measurement model on some of the exogenous variables, i.e. on or more of the exogenous variables is latent, then gsem assumes independence among all exogenous variables, observed and latent. This is not how MPLus or various other SEM programs with which I am familiar behaves. I'm a little surprised that no one at Stata Corp has responded to this, although perhaps there has been a response offline.

Richard T. Campbell
Emeritus Professor of Biostatistics and Sociology
University of Illinois at Chicago
Comment

Announcement

Does Stata's gsem command assume zero covariance between latent and observed exogenous variables?

Comment

Comment