
No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • GSEM goodness of fit


    I posted this on the L-serv approx 2 weeks ago but got no responses so i'm trying it here in the hopes that by that time everyone had checked out of the L-serv and migrated here.

    I just started learning the SEM analyses technique recently in an attempt to verify that our data supports the theoretical information-motivation-behavior model. Most literature I've found on
    testing the validity of this model involve fully continuous data and therefor rely on goodness of fit statistics such as CFI and RMSEA.

    However, most if not all of my data is categorical. Some fit nicely into latent factors, others do not and/or need to enter the model as categorical.

    I understand why GSEM does not provide the same type of GOF statistics that SEM does, (It's a similar concept between logistic and linear regression). However, I can't seem to find any literature that does a similar type of analysis so how to determine if the model that seemingly fits (no convergence problems, all paths significant) is actually doing a good job. Is there some sort of ROC curve that can be created?

    I have also read briefly in this listserv archives, that you can treat all variables as continuous just to get the measures of fit? Are there any references to indicate that this is a valid technique?

    Any suggestions on resources to how to interpret/use/learn GSEM is appreciated.


  • #2
    Hi Robin:

    Here is a post I made on this topic, copied verbatin, awhile back:

    "At this time, and based on my asking the Tech. support people at Stata, the overidentification test (and here I mean the likelihood ratio test, or chi-square test) is not available for -gsem-, which is unfortunate, but understandable. This is only version 2 of -sem- and the program is really very advanced as compared to other programs when they were on version 2 (AMOS will is on version a zillion still can't do gsem, for example). From what tech. support told me, it is on the wishlist and hopefully we will have a Yuan-Bentler style chi-square test for models estimated by gsem, like Mplus does.
    As for assessing fit, you only need the chi-square test--indexes like RMSEA or CFI don't help at all. I elaborate below on an edited version of what I had written recently on SEMNET on this point (in particular see the anecdote about Karl Joreskog, who as you may know, was instrumental in developing SEM, about why approximate fit indexes were invented):

    "At the end of the day, science is self-correcting and with time, most researchers will gravitate towards some sort of consensus. I think that what will prevail are methods that are analytically derived (e.g., chi-square test and corrections to it for when it is not well behaved) and found to have support too via Monte Carlo. With respect to the latter, what is funny--well ironic and hypocritical too--is that measures of approximate fit are not analytically derived and the only support that they have is via what I would characterize as weak Monte Carlo's--which in turn are often summary dismissed---by the very people who use ignore the chi-square test--when the Monte Carlos provide evidence for the chi-square test.

    We have the following issues that need to be correctly dealt with to ensure the model passes the chi-square test (and also that inference is correct--i.e., with respect to standard errors):

    1. low sample size to parameters estimated ratio (need to correct the chi-square)
    2. non-multivariate normal data (need to correct the chi-square) 3. non-continuous measures (need to use appropriate estimator) 4. causal heterogeneity (need to control for sources of variance that render relations heterogenous)*
    5. bad measures 6. incorrectly specified model (i.e., the causal structure reflects reality and all threats to endogeneity are dealt with).

    Any of these or a combination of these can make the chi-square test fail. Now, some researchers shrug, in a defeatist kind of way and say, "well I don't know why my model failed the chi-square test, but I will interpret it in any case because the approximate fit indexes [like RMSEA or CFI] say it is OK." Unfortunately, the researcher will not know to what extent these estimates may be misleading or completely wrong. And, reporting misleading estimates is, I think unethical and uneconomical for society. That is why all efforts should be made to develop measures and find models that fit. At this time the best test we have is the chi-square test; we can also localize misfit via score tests or modification indexes. I will rejoice the day we find better and stronger tests; however, inventing weaker tests is not going to help us.

    Again, here is a snippet from Cam McIntosh's (2012) recent paper on this point:

    "A telling anecdote in this regard comes from Dag Sorböm, a long-time collaborator of Karl Joreskög, one of the key pioneers of SEM and creator of the LISREL software package. In recounting a LISREL workshop that he jointly gave with Joreskög in 1985, Sorböm notes that: ‘‘In his lecture Karl would say that the Chi-square is all you really need. One participant then asked ‘Why have you then added GFI [goodness-of-fit index]?’ Whereupon Karl answered ‘Well, users threaten us saying they would stop using LISREL if it always produces such large Chi-squares. So we had to invent something to make people happy. GFI serves that purpose’ (p. 10)’’.

    With respect to the causal heterogeneity point, according to Mulaik and James (1995, p. 132), samples must be causally homogenous to ensure that ‘‘the relations among their variable attributes are accounted for by the same causal relations.’’ As we say in our causal claims paper (Antonakis et al, 2010), "causally homogenous samples are not infinite (thus, there is a limit to how large the sample can be). Thus, finding sources of population heterogeneity and controlling for it will improve model fit whether using multiple groups (moderator models) or multiple indicator, multiple causes (MIMIC) models" (p. 1103). This issue is something that many applied researchers fail to understand and completely ignore.

    *Antonakis J., Bendahan S., Jacquart P. & Lalive R. (2010). On making causal claims: A review and recommendations. The Leadership Quarterly, 21(6), 1086-1120.
    Bera, A. K., & Bilias, Y. (2001). Rao's score, Neyman's C(α) and Silvey's LM tests: an essay on historical developments and some new results. Journal of Statistical Planning and Inference, 97(1), 9-44.
    *Bollen, K. A. 1989. Structural equations with latent variables. New York: Wiley.
    *James, L. R., Mulaik, S. A., & Brett, J. M. 1982. Causal Analysis: Assumptions, Models, and Data. Beverly Hills: Sage Publications.
    *Joreskog, K. G., & Goldberger, A. S. 1975. Estimation of a model with multiple indicators and multiple causes of a single latent variable. Journal of the American Statistical Association, 70(351): 631-639.
    McIntosh, C. (2012). Improving the evaluation of model fit in confirmatory factor analysis: A commentary on Gundy, C.M., Fayers, P.M., Groenvold, M., Petersen, M. Aa., Scott, N.W., Sprangers, M.A.J., Velikov, G., Aaronson, N.K. (2011). Comparing higher-order models for the EORTC QLQ-C30. Quality of Life Research. Quality of Life Research, 21(9), 1619-1621.
    *Muthén, B. O. 1989. Latent variable modeling in heterogenous populations. Psychometrika, 54(4): 557-585.
    *Mulaik, S. A. & James, L. R. 1995. Objectivity and reasoning in science and structural equation modeling. In R. H. Hoyle (Ed.), Structural Equation Modeling: Concepts, Issues, and Applications: 118-137. Thousand Oaks, CA: Sage Publications.
    And, here are some examples from my work where the chi-square test was passed (and the first study had a rather large sample)--so I don't live in a theoretical statistical bubble:

    Best, J.

    P.S. Take a look at the following posts too by me on these points on Statalist.

