SEM Latent Variable Estimation

Christos Makridis

Join Date: Nov 2014

Posts: 157
#1

SEM Latent Variable Estimation

07 Dec 2014, 01:02

Hi All,

I had posted a related question a few weeks ago, but to my delight I came across the SEM capabilities in Stata. While I found a thread about a related question, it doesn't seem to be working for me (http://www.statalist.org/forums/foru...able-after-sem). I will explain below.

I have multiple measurements of my latent variable, call them m1 m2 m3 m4, and my latent variable H. There are measurement equations, e.g. regressions of m_j on H, and a transition equation, e.g., regressions of HFOR (forward iterated) on H and another variable, call it B. I thought that this would work...

sem (m1 <- H) ///
(m2 <- H) ///
(m3 <- H) ///
(m3 <- H) ///
(HFOR <- H V)
predict lhcap, latent(H)

Thank you in advance for your help!
Tags: None
wbuchanan

Join Date: Mar 2014

Posts: 1362
#2

07 Dec 2014, 10:05

It'd be more useful to follow the guidance/advice from the FAQ. There is likely one issue being cause by trying to regress the same latent on the same manifest variable twice or because you're referencing some latent (V) that has not been defined, or because the second order factor isn't identified. It also isn't clear what you are expecting to happen in this instance.
Comment
Christos Makridis

Join Date: Nov 2014

Posts: 157
#3

07 Dec 2014, 11:16

Thank you for the reply! The structure that I added into SEM is a variant of Cunha and Heckman (2007, AER) and Cunha, Heckman, Schennach (2010, ECTA) using a latent factor model. The idea is to use multiple measurements of a latent variable, together with a proxy for the latent variable, in order to estimate the distribution of the latent variable. In hope of implementing something similar via SEM, that's where the these variables come into play. (I was treating V as a given time varying determinants of HFOR, together with latent H.)
Comment
Christos Makridis

Join Date: Nov 2014

Posts: 157
#4

07 Dec 2014, 22:35

I tried various assortments of other specifications that are more consistent with the SEM structure. I will give two examples from my data, although neither are working since it says "backed up" -- I suppose this is a violation of concavity, but I am not sure why since I do not know what it's trying to do.

1) Suppose that hours worked, work weeks, and potential experience determine human capital, and human capital determines labor income, hours worked, potential experience, and total compensation. Then:

sem (H -> linc_tilde lhours_tilde tilde_lpotexp tilde_lwkw tilde_ltot_comp) ///
(H <- lhours_tilde tilde_lwkw tilde_lpotexp)

2) Suppose that potential experience, labor income, total compensation determine latent human capital, and hours worked, weeks worked, age, and overtime determine latent effective labor supply, and latent effective labor determines human capital. Then:

set more off
sem (H -> tilde_ltot_comp tilde_lpotexp linc_tilde lncomp) (L -> tilde_lwkw lhours_tilde tilde_lage tilde_lotime) (H <- L)
Comment
Christos Makridis

Join Date: Nov 2014

Posts: 157
#5

08 Dec 2014, 17:08

Just checking again to see if anyone might have worked with SEM before!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#6

08 Dec 2014, 17:58

I use SEM from time to time, though I don't consider myself an expert in it.

You have a bunch of paths in which a variable regresses indirectly on itself, e.g. lhours_tilde -> H -> lhours_tilde. My guess is that that's a set up for problems. Non-recursive models are occasionally estimable, but only in the presence of lots of other constraints that identify them. If you want to see what Stata is "thinking", use the -iterate()- option, specifying a number of iterations that gets you just a few iterations past the point where it starts perseverating and backing up. Then Stata will stop and show you its interim results. You'll be able to see from missing estimates or estimates of parameters that are absurd, which variables are problematic.

Some generic advice: when working with complicated models like SEM (or multi-level models), convergence difficulties are not uncommon, so it is usually best to start very simple and add things one at a time to see what can be estimated. When adding something leads to breakdown, back it out and either stop or try adding something else. (Also check that variable to see if there is something obviously wrong with it.) Sometimes using a different estimator will work: I have seen models successfully converge with -ml- but not -adf- or vice versa, for example. (Of course, you have to be careful that the estimator is appropriate for your purposes and not just convenient because it leads to convergence!)
1 like
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#7

08 Dec 2014, 18:05

Yes, but non-recursive models are not a good place to start learning SEM.

I can do non-recursive models in Lisrel and Mplus, and am sure they are possible to do in Stata, but I wouldn't trust myself to get it right. They take so many restraints (generally hard-to-defend ones like equality constraints or arbitrarily fixed paths based on proportion of observed variance or something like that) that they are seldom used or published. Yes, I know the world is non-recursive, but non-recursive SEMs are unstable and difficult to estimate.

I can vaguely understand your model from your syntax, but a path diagram would really help to pinpoint what's really going on, and what parts of your model would be difficult to estimate and how to estimate them. Or, it might make it easier to show your model is under-identified and would be without heroic constraints.

Last edited by ben earnhart; 08 Dec 2014, 18:15.
Comment
Christos Makridis

Join Date: Nov 2014

Posts: 157
#8

08 Dec 2014, 18:33

Thank you for the replies! I want to distinguish general SEM approaches from my objective, which is a variant of Cunha/Heckman AER 2007 and Cunha et al ECTA 2010. (Both of them are motivated by the original mimic model, but makes a bunch of modifications.)

The idea is to use multiple measurements of a latent variable(s) to estimate its distribution, where the latent variable is proxied. In other words, lets call H and L the two latent variables, and mH1,..,mHj and mL1,...,mLj the corresponding measurements of the latent variable.

The measurement equation will have the form:

mH1 = alpha1*H + epsH1
...
mHj = alphaj*H + epsHj

and similarly for the L latent variable.

The transition equation will have the form:

H_t+1 = phi*H + gamma*L + eta

The covariance restrictions among the different measurements of the latent variable identify the distribution, up to a scaling factor where one of the alpha's is set to 1. However, in my first post when I mentioned the attempted code, it didn't work -- seems that stata SEM thinks that the latent variables cannot be proxied, which is why in the most recent post it became a little more circular.
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#9

08 Dec 2014, 18:45

Can you do a path diagram? Even if Stata won't let you run the model, you can draw it, then export as a .png and upload. We might then be able to begin to figure out what you are attempting to do.
Comment
Christos Makridis

Join Date: Nov 2014

Posts: 157
#10

08 Dec 2014, 19:05

Sure, I can work on that. (The reason I haven't yet is because I tend to understand concepts better through the equations, rather than the figures... I still am trying to figure out the representation of path diagrams in the SEM manual!) Will have it in another day, thank you again for your interest!
Comment
Christos Makridis

Join Date: Nov 2014

Posts: 157
#11

10 Dec 2014, 10:10

Hi Ben and Clyde!

Here is a toy path diagram. There are two things I'd like to improve upon it that I am unsure about. First, how do I characterize a dynamic process -- if L1 is a variable that evolves over time based on past values of L1 AND L2? Second, the problem seems to be non concave. With data, I am never sure how to understand the root of an issue, whereas with a macro model and set of nonlinear equations, there's more of a methodology for understanding the parameter space where convexity may not hold.

Thank you!
Attached Files
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#12

10 Dec 2014, 11:20

Without instruments for L3 (AKA x4 ), your model is under-identified to portion effects into each causing each other; best you can do is their covariance or a uni-directional effect. Attached is the closest I see possible in a SEM framework. It would be easy enough to include past observations of L1 or L2 if you *have* previous observations to include in the model, but unless you have observations, you're out of luck. BTW, you might consider subsribing to SEMnet (http://www2.gsu.edu/~mkteer/semnet.html) which is devoted to SEM. Some of us here have some SEM experience, but there, you get pure SEM devotees who might be able to help out with something as complex and strange as you are trying to do. Even with this model, the error term for x4 needs to be constrained to some value (usually 0, sometimes determined as known reliability if you have one).
Attached Files

Last edited by ben earnhart; 10 Dec 2014, 11:37.
1 like
Comment
Christos Makridis

Join Date: Nov 2014

Posts: 157
#13

10 Dec 2014, 11:53

Thank you for the quick and concrete reply! From what I understand, L3 could behave as a lagged value to the extent that I have lagged values available -- which I do since I collected panel microdata (40 years). I could just lag all the terms in L1 and place them as terms for L3.

I also appreciate your suggestion about the SEM network, which I had not heard of before. What I am trying to implement is indeed complex and not well documented -- James Heckman's research group has done an admirable job applying factor loading methods, and I am eager to keep learning them as much as I can -- just starting at a more basic level.
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#14

10 Dec 2014, 15:13

Ah. If you have time series data, you may be able to achieve identification for a non-recursive model after all. I apologize that it is beyond me to explain it here (I'd just be confusing at best, misleading at worst), but with multiple observations over time, it may be possible. Check a complete SEM book and the literature, and you will probably find examples that are at least close to what you're trying to achieve. Such models still require stringent and heroic assumptions (generally equality constraints such that a->b = b->a) but they're estimable. But it's too complicated to get into here, and without my seeking out references and re-reading, I might get things wrong.
Comment

Announcement

SEM Latent Variable Estimation

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment