Understanding gsem

Felix Bittmann

Join Date: Aug 2018
Posts: 677

Understanding gsem

21 Jan 2021, 04:35

Dear all,
maybe these questions are a bit silly but i am new to sem and gsem and want to use it to test a model. For brevity I display the estimated model as is below. I want to model how performance, aspirations and social origin (income) influence school choice (a binary variable). For references see https://www.stata.com/manuals/sem.pdf

1. Aspirations is a latent construct generated from four ordinal variables with three levels each. I wonder how Stata generates this (continuous?) construct from the indicators. Is it possible to do this "manually" in Stata to evaluate the quality of this construct and how well this works? Like the reliability or something related. Or in other words, how can I demonstrate that it is possible and fine to generate this construct from the 4 variables?
2. Performance is another latent construct. What I understood from the manual is that the paths to school_choice are basically a logistic regression but why is performance constrained to 1 here? The manual explains this but not really how this affects interpretation (page 61). So the logit effect of performance is 1 and all other coefficients are relatively scaled to it? Can I say that aspirations are thus stronger / more important?

Code:

. gsem (aspirations -> idealabschluss, family(ordinal) link(logit)) (aspirations -> idealabsch
> luss_eltern, family(ordinal) link(logit)) (aspirations -> realabschluss, family(ordinal) lin
> k(logit)) (aspirations -> realabschluss_eltern, family(ordinal) link(logit)) (aspirations ->
>  gym5, family(binomial) link(logit)) (logeinkommen -> aspirations, ) (logeinkommen -> gym5,
> family(binomial) link(logit)) (logeinkommen -> performance, ) (performance -> aspirations, )
>  (performance -> gym5, family(binomial) link(logit)) (performance -> mathe3, ) (performance
> -> mathe4, ) (performance -> deutsch3, ) (performance -> deutsch4, ) (performance -> lehrerb
> ewertung, ) if wave ==6, difficult latent(aspirations performance ) nocapslatent

Fitting fixed-effects model:

Iteration 0:   log likelihood = -38046.866  
Iteration 1:   log likelihood = -38045.233  
Iteration 2:   log likelihood = -38045.233  

Refining starting values:

Grid node 0:   log likelihood = -34637.233

Fitting full model:

Iteration 0:   log likelihood = -34637.233  (not concave)
Iteration 1:   log likelihood = -30344.857  (not concave)
Iteration 2:   log likelihood = -29599.725  (not concave)
Iteration 3:   log likelihood = -29224.302  (not concave)
Iteration 4:   log likelihood = -29197.838  (not concave)
Iteration 5:   log likelihood = -29168.182  (not concave)
Iteration 6:   log likelihood = -29093.018  (not concave)
Iteration 7:   log likelihood = -29073.603  (not concave)
Iteration 8:   log likelihood = -29062.762  (not concave)
Iteration 9:   log likelihood = -29018.634  (not concave)
Iteration 10:  log likelihood = -28997.441  (not concave)
Iteration 11:  log likelihood = -28972.917  (not concave)
Iteration 12:  log likelihood = -28938.079  (not concave)
Iteration 13:  log likelihood = -28920.101  (not concave)
Iteration 14:  log likelihood = -28911.102  (not concave)
Iteration 15:  log likelihood = -28905.134  (not concave)
Iteration 16:  log likelihood = -28892.625  (not concave)
Iteration 17:  log likelihood = -28866.879  (not concave)
Iteration 18:  log likelihood = -28854.751  (not concave)
Iteration 19:  log likelihood = -28846.679  (not concave)
Iteration 20:  log likelihood =  -28836.32  (not concave)
Iteration 21:  log likelihood = -28831.094  (not concave)
Iteration 22:  log likelihood = -28826.781  (not concave)
Iteration 23:  log likelihood = -28820.993  (not concave)
Iteration 24:  log likelihood = -28817.148  (not concave)
Iteration 25:  log likelihood = -28815.084  (not concave)
Iteration 26:  log likelihood = -28813.244  (not concave)
Iteration 27:  log likelihood = -28813.061  (not concave)
Iteration 28:  log likelihood = -28813.014  
Iteration 29:  log likelihood = -28813.224  
Iteration 30:  log likelihood = -28813.076  
Iteration 31:  log likelihood = -28813.071  
Iteration 32:  log likelihood = -28813.072  
Iteration 33:  log likelihood = -28813.073  
Iteration 34:  log likelihood = -28813.072  
Iteration 35:  log likelihood = -28813.073  

Generalized structural equation model           Number of obs     =      6,401

Response       : idealabschluss                 Number of obs     =      5,351
Family         : ordinal
Link           : logit

Response       : idealabschluss_elt~n           Number of obs     =      4,651
Family         : ordinal
Link           : logit

Response       : realabschluss                  Number of obs     =      5,128
Family         : ordinal
Link           : logit

Response       : realabschluss_eltern           Number of obs     =      4,638
Family         : ordinal
Link           : logit

Response       : gym5                           Number of obs     =      3,369
Family         : Bernoulli
Link           : logit

Response       : mathe3                         Number of obs     =      4,246
Family         : Gaussian
Link           : identity

Response       : mathe4                         Number of obs     =      4,410
Family         : Gaussian
Link           : identity

Response       : deutsch3                       Number of obs     =      4,237
Family         : Gaussian
Link           : identity

Response       : deutsch4                       Number of obs     =      4,407
Family         : Gaussian
Link           : identity

Response       : lehrerbewertung                Number of obs     =      3,583
Family         : Gaussian
Link           : identity

Log likelihood = -28813.073

 ( 1)  [idealabschluss]aspirations = 1
 ( 2)  [gym5]performance = 1
----------------------------------------------------------------------------------------
                       |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
idealabschluss         |
           aspirations |          1  (constrained)
-----------------------+----------------------------------------------------------------
idealabschluss_eltern  |
           aspirations |   1.042155   .0725223    14.37   0.000     .9000138    1.184296
-----------------------+----------------------------------------------------------------
realabschluss          |
           aspirations |   .9120171   .0482997    18.88   0.000     .8173515    1.006683
-----------------------+----------------------------------------------------------------
realabschluss_eltern   |
           aspirations |   2.750176   .4215265     6.52   0.000     1.923999    3.576353
-----------------------+----------------------------------------------------------------
gym5                   |
          logeinkommen |  -.0495102   .1663453    -0.30   0.766    -.3755411    .2765206
           aspirations |    1.10003   .1044508    10.53   0.000     .8953105     1.30475
           performance |          1  (constrained)
                 _cons |  -22.37072   1.725779   -12.96   0.000    -25.75319   -18.98826
-----------------------+----------------------------------------------------------------
mathe3                 |
           performance |   .8408953   .1393619     6.03   0.000     .5677509     1.11404
                 _cons |   .8236341   .1490849     5.52   0.000     .5314331    1.115835
-----------------------+----------------------------------------------------------------
mathe4                 |
           performance |   .8815648   .1460884     6.03   0.000     .5952368    1.167893
                 _cons |   .5626762   .1558728     3.61   0.000      .257171    .8681814
-----------------------+----------------------------------------------------------------
deutsch3               |
           performance |   .8874467   .1470255     6.04   0.000      .599282    1.175611
                 _cons |   .5307909   .1544092     3.44   0.001     .2281544    .8334275
-----------------------+----------------------------------------------------------------
deutsch4               |
           performance |   .9502487   .1572729     6.04   0.000     .6419996    1.258498
                 _cons |   .2418313   .1644166     1.47   0.141    -.0804194     .564082
-----------------------+----------------------------------------------------------------
lehrerbewertung        |
           performance |   .9363136   .1552998     6.03   0.000     .6319316    1.240696
                 _cons |  -.0363143   .1649169    -0.22   0.826    -.3595455    .2869169
-----------------------+----------------------------------------------------------------
aspirations            |
           performance |    2.88466   .5057989     5.70   0.000     1.893312    3.876008
          logeinkommen |   .8647389   .1046278     8.26   0.000     .6596721    1.069806
-----------------------+----------------------------------------------------------------
performance            |
          logeinkommen |   .4601273   .0758535     6.07   0.000     .3114572    .6087974
-----------------------+----------------------------------------------------------------
/idealabschluss        |
                  cut1 |    15.3001    1.13888                      13.06793    17.53226
-----------------------+----------------------------------------------------------------
/idealabschluss_eltern |
                  cut1 |   16.21353   1.184927                      13.89112    18.53595
-----------------------+----------------------------------------------------------------
/realabschluss         |
                  cut1 |   15.09398   .9326309                      13.26606    16.92191
-----------------------+----------------------------------------------------------------
/realabschluss_eltern  |
                  cut1 |   45.71751   6.376759                       33.2193    58.21573
-----------------------+----------------------------------------------------------------
     var(e.aspirations)|   3.187809   .3478379                      2.574029    3.947946
     var(e.performance)|   .4555128   .1509999                      .2378659    .8723064
-----------------------+----------------------------------------------------------------
          var(e.mathe3)|   .2587328   .0072316                      .2449404    .2733018
          var(e.mathe4)|   .2667887   .0074515                      .2525766    .2818004
        var(e.deutsch3)|   .2163404   .0064982                      .2039717     .229459
        var(e.deutsch4)|     .20087   .0064828                      .1885575    .2139864
 var(e.lehrerbewertung)|   .2471626   .0081774                      .2316438     .263721
----------------------------------------------------------------------------------------

Attached Files

Best wishes

Stata 18.0 MP | ORCID | Google Scholar

Tags: gsem

Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

21 Jan 2021, 09:15

This is a fairly complex generalized SEM model (or perhaps it's not that the model is complex, but that I am simple). However, in terms of 1), you might benefit from reading the item response theory part of the manual to understand how we estimate the value of a latent trait from indicators. Basically, you assume there is a latent trait that varies in strength and is normally distributed with mean 0. (In IRT, you assume that the variance is 1, but this SEM model doesn't do that, and I think SEM models in general don't. Here, you have the variance freely estimated.) The higher the latent trait, the more likely you are to respond to each indicator. The indicators probably vary in difficulty, which enables you to make that estimation.

As a concrete example from my field, consider depression. Depression symptom questionnaires frequently ask questions like have you been bothered by being sad, and they may ask about suicide ideation (do you think you're better off dead). Responding yes to "better off dead" indicates much higher depression than saying I've been bothered by feeling sad.

Related to 2): In SEM, you typically constrain the loading of the first item on its latent trait to 1. Basically, the idea is that the latent trait causes responses to the indicators. And here, maybe intentionally or maybe accidentally, you specified that the latent trait performance causes responses to mathe3 and 4, German 3 and 4, and school choice (gym5).

You may be trying to regress performance on school choice. If so, then while this is confusing, the arrow needs to go the other way. I prefer to write my syntax this way, grouping all the indicators together instead of in separate brackets. Anyway, if you want to regress performance and aspirations on school choice, you want to type something like this - check the syntax to make sure that I haven't got the arrows the wrong way:

Code:

gsem (aspirations -> idealabschluss idealabschluss_eltern realabschluss realabschluss_eltern, ologit) (gym5 -> aspirations performance logeinkommen, logit) (logeinkommen -> performance, ) (performance -> aspirations mathe3 mathe4 deutsch3 deutsch4 lehrerbewertung) if wave ==6, difficult latent(aspirations performance ) nocapslatent

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3842
#3

21 Jan 2021, 13:19

Just some quick comments/questions on various aspects (some of which are pretty specific to the German educational system and probably go beyond the scope of Statalist):

1) I agree with Weiwen; this is a pretty complicated model. What exactly are you trying to gain over simple (additive) scores for performance and aspirations that you then plug into logistic regression?

2) The

Code:

(performance -> mathe3, ) (performance -> mathe4, ) (performance -> deutsch3, ) (performance -> deutsch4, ) (performance -> lehrerbewertung, )

part is called a measurement model. This is basically just a (confirmatory) factor analysis where you restrict the cross-factor loadings to 0. You can extract that part into sem where you have all the rule-of-thumb stuff such as CFI, RMSEA, etc. to assess the "quality" of your model. The measurement model for aspiration is probably more closely related to IRT, as Weiwen has already pointed out. Stata's irt calls gsem. If I remember correctly, there are some pretty strong assumptions that underly ordinal irt models. As you have mentioned reliability, note that some reliability measures (e.g., McDonald's omega) are based on factor loadings and unique variances; you get both from sem and/or gsem.

3) In my view, the binary choice (Gymnasium vs rest) that you are trying to model here is not really consistent with the aspirations measures (Haupt-, Real, and Gymnasium), which are often (most of that time?) considered to be nominal rather than ordinal. In my view, it would be more consistent to have binary indicator variables for aspirations (Gymnasium vs. rest), matching your outcome; conversely, you might want to have a (n alternative-specific) multinomial model for the outcome, which actually has more than two alternatives (in the German educational system).

4) Related to 3): If you stick with the binary model, check whether you still obtain the results that you would expect for a model 'Hauptschule' vs. rest as a sensitivity check.

5) You are aware of this. "Importance" is not a statistical concept nor is it mathematically defined. Thus, statements about the relative importance of predictors cannot be based solely on mathematical results.

Last edited by daniel klein; 21 Jan 2021, 13:24. Reason: formatting
1 like
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 677
#4

22 Jan 2021, 00:13

Dear Weiwen and Daniel,
thank you both so much for your detailed comments, I really appreciate it. It was quite difficult for me to find relevant infos online and your comments are on point. You both pointed out that my model is too complex and I will think about solutions on how to simplify it. Simple is usually better than complex, indeed.
Weiwen Ng : Thanks, I will have a look at these IRT models. Also your syntax is much cleaner and easier to read.
daniel klein : I agree that this is not really consistent on how things are operationalized at the moment. Some part of this is due to theory and some of it due to data restrictions (the number of pupils who actually choose a Haupt or Realschule or so low in the NEPS that it will be a challenge to estimate any effects for these at all). But it is really helpful to think about these aspects in detail so I can defend them plausibly if I need to keep them.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment

Announcement

Understanding gsem

Comment

Comment

Comment