I have more of a basic statistics-related question regarding the models underlying Stata's gsem command.
Please bear with me for a short moment before I pose the actual question.
In general, gsem treats observed exogenous variables as given: "Generalized SEMs drop the observed [exogenous] variables from the joint-normality assumption. Instead, generalized SEMs treat the observed exogenous variables as given and produce estimates conditional on their values" (p. 43, SEM manual). Regarding the covariance between latent variables and observed exogenous variables, the manual says "Covariances between latent exogenous and observed exogenous variables [... are a]ssumed to be nonzero. Cannot be estimated or constrained because this covariance is not among the identified parameters of the generalized SEM" (p. 48).
What I don't understand is what "taking as given" regressors actually implies in the context of ml estimation.
To get closer to my question, let's assume for a second that the exogenous variables in the vector x were not fixed but random, and we would have a joint probability density function over the vector of response variables, y, the exogenous observed variables, x, and the vector of latent variables, u: g(y,x,u). This could be factorized into f(y|x,u) * h(u|x) * k(x), right? Suppose my goal was to estimate parameters of the conditional density f(y|x,u). Now I might not be interested in the marginal distribution of x and leave k(x) out of the likelihood.
Now, the likelihood in gsem is the integral of f(y|x,u) * h(u) over u (instead of h(u|x), with h() the normal density function). If x were indeed random, this would imply that h(u|x) = h(u), i.e. that x,u are independent (and hence uncorrelated), or am I wrong?
However, x is assumed to be given. My main question is this: Can anyone explain to me what this means for the implied relationship between x and u?
(Disclaimer: I have posted the last three paragraphs in another thread and got the advice to open a new one instead.)
Why does it matter? My impression is that gsem does in fact assume the covariance between latent variables and observed exogenous variables to be zero. In the following code I have simulated a data set with correlated exogenous observed and latent variables and run an instrumental variable regression and the commands sem and gsem.
sem's output is very close to ivreg's. gsem's differs a lot to the latter. It is identical, however, to sem's output when I explicitly restrict the covariance between all exogenous variables to be zero in sem (even though I know their covariance is not zero).
Please bear with me for a short moment before I pose the actual question.
In general, gsem treats observed exogenous variables as given: "Generalized SEMs drop the observed [exogenous] variables from the joint-normality assumption. Instead, generalized SEMs treat the observed exogenous variables as given and produce estimates conditional on their values" (p. 43, SEM manual). Regarding the covariance between latent variables and observed exogenous variables, the manual says "Covariances between latent exogenous and observed exogenous variables [... are a]ssumed to be nonzero. Cannot be estimated or constrained because this covariance is not among the identified parameters of the generalized SEM" (p. 48).
What I don't understand is what "taking as given" regressors actually implies in the context of ml estimation.
To get closer to my question, let's assume for a second that the exogenous variables in the vector x were not fixed but random, and we would have a joint probability density function over the vector of response variables, y, the exogenous observed variables, x, and the vector of latent variables, u: g(y,x,u). This could be factorized into f(y|x,u) * h(u|x) * k(x), right? Suppose my goal was to estimate parameters of the conditional density f(y|x,u). Now I might not be interested in the marginal distribution of x and leave k(x) out of the likelihood.
Now, the likelihood in gsem is the integral of f(y|x,u) * h(u) over u (instead of h(u|x), with h() the normal density function). If x were indeed random, this would imply that h(u|x) = h(u), i.e. that x,u are independent (and hence uncorrelated), or am I wrong?
However, x is assumed to be given. My main question is this: Can anyone explain to me what this means for the implied relationship between x and u?
(Disclaimer: I have posted the last three paragraphs in another thread and got the advice to open a new one instead.)
Why does it matter? My impression is that gsem does in fact assume the covariance between latent variables and observed exogenous variables to be zero. In the following code I have simulated a data set with correlated exogenous observed and latent variables and run an instrumental variable regression and the commands sem and gsem.
sem's output is very close to ivreg's. gsem's differs a lot to the latter. It is identical, however, to sem's output when I explicitly restrict the covariance between all exogenous variables to be zero in sem (even though I know their covariance is not zero).
Code:
clear set seed 2000 set obs 2000 cap drop l y1 y2 y3 gen l = rnormal() // Latent variable gen y1 = l + .7*rnormal() // Measurement 1 gen y2 = .5*l + .3*rnormal() // Measurement 2 gen y3 = -.4*l + .6*rnormal() // Measurement 3 gen x1 = .5 + .6*l + .3*rnormal() // Observed exog. regressor 1 gen x2 = .8 -.3*l + .8*x1 + .4*rnormal() // Observed exog. regressor 2 gen y = .2 + x1 + .3*x2 + l + rnormal() // Structural equation ivreg y x1 x2 (y1 = y2 y3) sem (y <- x1 x2 L) /// (y1 <- L@1 ) /// (y2 y3 <- L) gsem (y <- x1 x2 L) /// (y1 <- L@1 ) /// (y2 y3 <- L) sem (y <- x1 x2 L) /// (y1 <- L@1 ) /// (y2 y3 <- L), covstructure(_Ex, diag)
Comment