Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Understanding gsem

    Dear all,
    maybe these questions are a bit silly but i am new to sem and gsem and want to use it to test a model. For brevity I display the estimated model as is below. I want to model how performance, aspirations and social origin (income) influence school choice (a binary variable). For references see https://www.stata.com/manuals/sem.pdf


    1. Aspirations is a latent construct generated from four ordinal variables with three levels each. I wonder how Stata generates this (continuous?) construct from the indicators. Is it possible to do this "manually" in Stata to evaluate the quality of this construct and how well this works? Like the reliability or something related. Or in other words, how can I demonstrate that it is possible and fine to generate this construct from the 4 variables?
    2. Performance is another latent construct. What I understood from the manual is that the paths to school_choice are basically a logistic regression but why is performance constrained to 1 here? The manual explains this but not really how this affects interpretation (page 61). So the logit effect of performance is 1 and all other coefficients are relatively scaled to it? Can I say that aspirations are thus stronger / more important?

    Code:
    . gsem (aspirations -> idealabschluss, family(ordinal) link(logit)) (aspirations -> idealabsch
    > luss_eltern, family(ordinal) link(logit)) (aspirations -> realabschluss, family(ordinal) lin
    > k(logit)) (aspirations -> realabschluss_eltern, family(ordinal) link(logit)) (aspirations ->
    >  gym5, family(binomial) link(logit)) (logeinkommen -> aspirations, ) (logeinkommen -> gym5,
    > family(binomial) link(logit)) (logeinkommen -> performance, ) (performance -> aspirations, )
    >  (performance -> gym5, family(binomial) link(logit)) (performance -> mathe3, ) (performance
    > -> mathe4, ) (performance -> deutsch3, ) (performance -> deutsch4, ) (performance -> lehrerb
    > ewertung, ) if wave ==6, difficult latent(aspirations performance ) nocapslatent
    
    Fitting fixed-effects model:
    
    Iteration 0:   log likelihood = -38046.866  
    Iteration 1:   log likelihood = -38045.233  
    Iteration 2:   log likelihood = -38045.233  
    
    Refining starting values:
    
    Grid node 0:   log likelihood = -34637.233
    
    Fitting full model:
    
    Iteration 0:   log likelihood = -34637.233  (not concave)
    Iteration 1:   log likelihood = -30344.857  (not concave)
    Iteration 2:   log likelihood = -29599.725  (not concave)
    Iteration 3:   log likelihood = -29224.302  (not concave)
    Iteration 4:   log likelihood = -29197.838  (not concave)
    Iteration 5:   log likelihood = -29168.182  (not concave)
    Iteration 6:   log likelihood = -29093.018  (not concave)
    Iteration 7:   log likelihood = -29073.603  (not concave)
    Iteration 8:   log likelihood = -29062.762  (not concave)
    Iteration 9:   log likelihood = -29018.634  (not concave)
    Iteration 10:  log likelihood = -28997.441  (not concave)
    Iteration 11:  log likelihood = -28972.917  (not concave)
    Iteration 12:  log likelihood = -28938.079  (not concave)
    Iteration 13:  log likelihood = -28920.101  (not concave)
    Iteration 14:  log likelihood = -28911.102  (not concave)
    Iteration 15:  log likelihood = -28905.134  (not concave)
    Iteration 16:  log likelihood = -28892.625  (not concave)
    Iteration 17:  log likelihood = -28866.879  (not concave)
    Iteration 18:  log likelihood = -28854.751  (not concave)
    Iteration 19:  log likelihood = -28846.679  (not concave)
    Iteration 20:  log likelihood =  -28836.32  (not concave)
    Iteration 21:  log likelihood = -28831.094  (not concave)
    Iteration 22:  log likelihood = -28826.781  (not concave)
    Iteration 23:  log likelihood = -28820.993  (not concave)
    Iteration 24:  log likelihood = -28817.148  (not concave)
    Iteration 25:  log likelihood = -28815.084  (not concave)
    Iteration 26:  log likelihood = -28813.244  (not concave)
    Iteration 27:  log likelihood = -28813.061  (not concave)
    Iteration 28:  log likelihood = -28813.014  
    Iteration 29:  log likelihood = -28813.224  
    Iteration 30:  log likelihood = -28813.076  
    Iteration 31:  log likelihood = -28813.071  
    Iteration 32:  log likelihood = -28813.072  
    Iteration 33:  log likelihood = -28813.073  
    Iteration 34:  log likelihood = -28813.072  
    Iteration 35:  log likelihood = -28813.073  
    
    Generalized structural equation model           Number of obs     =      6,401
    
    Response       : idealabschluss                 Number of obs     =      5,351
    Family         : ordinal
    Link           : logit
    
    Response       : idealabschluss_elt~n           Number of obs     =      4,651
    Family         : ordinal
    Link           : logit
    
    Response       : realabschluss                  Number of obs     =      5,128
    Family         : ordinal
    Link           : logit
    
    Response       : realabschluss_eltern           Number of obs     =      4,638
    Family         : ordinal
    Link           : logit
    
    Response       : gym5                           Number of obs     =      3,369
    Family         : Bernoulli
    Link           : logit
    
    Response       : mathe3                         Number of obs     =      4,246
    Family         : Gaussian
    Link           : identity
    
    Response       : mathe4                         Number of obs     =      4,410
    Family         : Gaussian
    Link           : identity
    
    Response       : deutsch3                       Number of obs     =      4,237
    Family         : Gaussian
    Link           : identity
    
    Response       : deutsch4                       Number of obs     =      4,407
    Family         : Gaussian
    Link           : identity
    
    Response       : lehrerbewertung                Number of obs     =      3,583
    Family         : Gaussian
    Link           : identity
    
    Log likelihood = -28813.073
    
     ( 1)  [idealabschluss]aspirations = 1
     ( 2)  [gym5]performance = 1
    ----------------------------------------------------------------------------------------
                           |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -----------------------+----------------------------------------------------------------
    idealabschluss         |
               aspirations |          1  (constrained)
    -----------------------+----------------------------------------------------------------
    idealabschluss_eltern  |
               aspirations |   1.042155   .0725223    14.37   0.000     .9000138    1.184296
    -----------------------+----------------------------------------------------------------
    realabschluss          |
               aspirations |   .9120171   .0482997    18.88   0.000     .8173515    1.006683
    -----------------------+----------------------------------------------------------------
    realabschluss_eltern   |
               aspirations |   2.750176   .4215265     6.52   0.000     1.923999    3.576353
    -----------------------+----------------------------------------------------------------
    gym5                   |
              logeinkommen |  -.0495102   .1663453    -0.30   0.766    -.3755411    .2765206
               aspirations |    1.10003   .1044508    10.53   0.000     .8953105     1.30475
               performance |          1  (constrained)
                     _cons |  -22.37072   1.725779   -12.96   0.000    -25.75319   -18.98826
    -----------------------+----------------------------------------------------------------
    mathe3                 |
               performance |   .8408953   .1393619     6.03   0.000     .5677509     1.11404
                     _cons |   .8236341   .1490849     5.52   0.000     .5314331    1.115835
    -----------------------+----------------------------------------------------------------
    mathe4                 |
               performance |   .8815648   .1460884     6.03   0.000     .5952368    1.167893
                     _cons |   .5626762   .1558728     3.61   0.000      .257171    .8681814
    -----------------------+----------------------------------------------------------------
    deutsch3               |
               performance |   .8874467   .1470255     6.04   0.000      .599282    1.175611
                     _cons |   .5307909   .1544092     3.44   0.001     .2281544    .8334275
    -----------------------+----------------------------------------------------------------
    deutsch4               |
               performance |   .9502487   .1572729     6.04   0.000     .6419996    1.258498
                     _cons |   .2418313   .1644166     1.47   0.141    -.0804194     .564082
    -----------------------+----------------------------------------------------------------
    lehrerbewertung        |
               performance |   .9363136   .1552998     6.03   0.000     .6319316    1.240696
                     _cons |  -.0363143   .1649169    -0.22   0.826    -.3595455    .2869169
    -----------------------+----------------------------------------------------------------
    aspirations            |
               performance |    2.88466   .5057989     5.70   0.000     1.893312    3.876008
              logeinkommen |   .8647389   .1046278     8.26   0.000     .6596721    1.069806
    -----------------------+----------------------------------------------------------------
    performance            |
              logeinkommen |   .4601273   .0758535     6.07   0.000     .3114572    .6087974
    -----------------------+----------------------------------------------------------------
    /idealabschluss        |
                      cut1 |    15.3001    1.13888                      13.06793    17.53226
    -----------------------+----------------------------------------------------------------
    /idealabschluss_eltern |
                      cut1 |   16.21353   1.184927                      13.89112    18.53595
    -----------------------+----------------------------------------------------------------
    /realabschluss         |
                      cut1 |   15.09398   .9326309                      13.26606    16.92191
    -----------------------+----------------------------------------------------------------
    /realabschluss_eltern  |
                      cut1 |   45.71751   6.376759                       33.2193    58.21573
    -----------------------+----------------------------------------------------------------
         var(e.aspirations)|   3.187809   .3478379                      2.574029    3.947946
         var(e.performance)|   .4555128   .1509999                      .2378659    .8723064
    -----------------------+----------------------------------------------------------------
              var(e.mathe3)|   .2587328   .0072316                      .2449404    .2733018
              var(e.mathe4)|   .2667887   .0074515                      .2525766    .2818004
            var(e.deutsch3)|   .2163404   .0064982                      .2039717     .229459
            var(e.deutsch4)|     .20087   .0064828                      .1885575    .2139864
     var(e.lehrerbewertung)|   .2471626   .0081774                      .2316438     .263721
    ----------------------------------------------------------------------------------------
    Attached Files
    Best wishes

    Stata 18.0 MP | ORCID | Google Scholar

  • #2
    This is a fairly complex generalized SEM model (or perhaps it's not that the model is complex, but that I am simple). However, in terms of 1), you might benefit from reading the item response theory part of the manual to understand how we estimate the value of a latent trait from indicators. Basically, you assume there is a latent trait that varies in strength and is normally distributed with mean 0. (In IRT, you assume that the variance is 1, but this SEM model doesn't do that, and I think SEM models in general don't. Here, you have the variance freely estimated.) The higher the latent trait, the more likely you are to respond to each indicator. The indicators probably vary in difficulty, which enables you to make that estimation.

    As a concrete example from my field, consider depression. Depression symptom questionnaires frequently ask questions like have you been bothered by being sad, and they may ask about suicide ideation (do you think you're better off dead). Responding yes to "better off dead" indicates much higher depression than saying I've been bothered by feeling sad.

    Related to 2): In SEM, you typically constrain the loading of the first item on its latent trait to 1. Basically, the idea is that the latent trait causes responses to the indicators. And here, maybe intentionally or maybe accidentally, you specified that the latent trait performance causes responses to mathe3 and 4, German 3 and 4, and school choice (gym5).

    You may be trying to regress performance on school choice. If so, then while this is confusing, the arrow needs to go the other way. I prefer to write my syntax this way, grouping all the indicators together instead of in separate brackets. Anyway, if you want to regress performance and aspirations on school choice, you want to type something like this - check the syntax to make sure that I haven't got the arrows the wrong way:

    Code:
    gsem (aspirations -> idealabschluss idealabschluss_eltern realabschluss realabschluss_eltern, ologit) (gym5 -> aspirations performance logeinkommen, logit) (logeinkommen -> performance, )  (performance -> aspirations mathe3 mathe4 deutsch3 deutsch4 lehrerbewertung) if wave ==6, difficult latent(aspirations performance ) nocapslatent
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Just some quick comments/questions on various aspects (some of which are pretty specific to the German educational system and probably go beyond the scope of Statalist):

      1) I agree with Weiwen; this is a pretty complicated model. What exactly are you trying to gain over simple (additive) scores for performance and aspirations that you then plug into logistic regression?

      2) The

      Code:
      (performance -> mathe3, ) (performance -> mathe4, ) (performance -> deutsch3, ) (performance -> deutsch4, ) (performance -> lehrerbewertung, )
      part is called a measurement model. This is basically just a (confirmatory) factor analysis where you restrict the cross-factor loadings to 0. You can extract that part into sem where you have all the rule-of-thumb stuff such as CFI, RMSEA, etc. to assess the "quality" of your model. The measurement model for aspiration is probably more closely related to IRT, as Weiwen has already pointed out. Stata's irt calls gsem. If I remember correctly, there are some pretty strong assumptions that underly ordinal irt models. As you have mentioned reliability, note that some reliability measures (e.g., McDonald's omega) are based on factor loadings and unique variances; you get both from sem and/or gsem.

      3) In my view, the binary choice (Gymnasium vs rest) that you are trying to model here is not really consistent with the aspirations measures (Haupt-, Real, and Gymnasium), which are often (most of that time?) considered to be nominal rather than ordinal. In my view, it would be more consistent to have binary indicator variables for aspirations (Gymnasium vs. rest), matching your outcome; conversely, you might want to have a (n alternative-specific) multinomial model for the outcome, which actually has more than two alternatives (in the German educational system).

      4) Related to 3): If you stick with the binary model, check whether you still obtain the results that you would expect for a model 'Hauptschule' vs. rest as a sensitivity check.

      5) You are aware of this. "Importance" is not a statistical concept nor is it mathematically defined. Thus, statements about the relative importance of predictors cannot be based solely on mathematical results.
      Last edited by daniel klein; 21 Jan 2021, 13:24. Reason: formatting

      Comment


      • #4
        Dear Weiwen and Daniel,
        thank you both so much for your detailed comments, I really appreciate it. It was quite difficult for me to find relevant infos online and your comments are on point. You both pointed out that my model is too complex and I will think about solutions on how to simplify it. Simple is usually better than complex, indeed.
        Weiwen Ng : Thanks, I will have a look at these IRT models. Also your syntax is much cleaner and easier to read.
        daniel klein : I agree that this is not really consistent on how things are operationalized at the moment. Some part of this is due to theory and some of it due to data restrictions (the number of pupils who actually choose a Haupt or Realschule or so low in the NEPS that it will be a challenge to estimate any effects for these at all). But it is really helpful to think about these aspects in detail so I can defend them plausibly if I need to keep them.
        Best wishes

        Stata 18.0 MP | ORCID | Google Scholar

        Comment

        Working...
        X