Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to structure SEM (for CFA) in Stata when my latent variables constitute of both categorical (binary) and continuous indicators?

    Dear fellow Stata enthusiasts,

    after having established a new construct by EFA using polychoric correlations and factormat command in Stata due to having both categorical (binary) and continuous indicators, I do now try to validate my new construct using CFA:

    I would greatly appreciate your recommendations on whether:
    a) a solution based on GSEM might be most applicable (however, indicators for my latent factors vary by type such that e.g., 'logit' link option would also try to describe continuous indicators; gsem (latent_factor1 -> continuous_indicator1 continuous_indicator2 binary_indicator1, logit) )

    or

    b) There is a reliable option to use the polychoric correlation matrix from my EFA and use it as basis for my (g)sem (preferably sem, given that post-estimation commands are available) - possibly with 'ssd init'?
    I very much appreciate your help and look forward to helpful recommendations!

    Thanks and best,
    Florian

  • #2
    You can do both in Stata (see example below), although I would probably favor gsem, because some of the postestimation commands available after sem aren't really suitable for categorical indicator variables and you'll lose information forming the polychoric correlation matrix. You'd get the polychoric correlation matrix not from an exploratory factor analysis (EFA), but rather from the user-written command polychoric (search at Stata's command line in order to find it and install it). Begin at the "Begin here" comment in the output below (the first part of the output just shows creation of a fictitious dataset for use as illustration).

    .ÿ
    .ÿversionÿ16.1

    .ÿ
    .ÿclearÿ*

    .ÿ
    .ÿsetÿseedÿ`=strreverse("1588940")'

    .ÿ
    .ÿtempnameÿCorr

    .ÿmatrixÿdefineÿ`Corr'ÿ=ÿJ(4,ÿ4,ÿ0.5)ÿ+ÿI(4)ÿ*ÿ0.5

    .ÿquietlyÿdrawnormÿx1ÿx2ÿx3ÿx4,ÿdoubleÿcorr(`Corr')ÿn(250)

    .ÿ
    .ÿforeachÿvarÿofÿvarlistÿx3ÿx4ÿ{
    ÿÿ2.ÿÿÿÿÿÿÿÿÿquietlyÿreplaceÿ`var'ÿ=ÿ`var'ÿ>ÿ0
    ÿÿ3.ÿ}

    .ÿ
    .ÿ*
    .ÿ*ÿBeginÿhere
    .ÿ*
    .ÿpreserve

    .ÿ
    .ÿ//ÿUsingÿ-ssd-ÿ&ÿ-sem-ÿonÿpolyserial/tetrachoricÿcorrelationÿmatrix
    .ÿquietlyÿpolychoricÿx?

    .ÿ
    .ÿtempnameÿRho

    .ÿmatrixÿdefineÿ`Rho'ÿ=ÿr(R)

    .ÿlocalÿNÿ`r(N)'

    .ÿ
    .ÿdropÿ_all

    .ÿquietlyÿssdÿinitÿx1ÿx2ÿx3ÿx4

    .ÿquietlyÿssdÿsetÿobservationsÿ`N'

    .ÿquietlyÿssdÿsetÿcorrelationsÿ(stata)ÿ`Rho'

    .ÿsemÿ(x?ÿ<-ÿF),ÿnocnsreportÿnodescribeÿnolog

    StructuralÿequationÿmodelÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ250
    Estimationÿmethodÿÿ=ÿml
    Logÿlikelihoodÿÿÿÿÿ=ÿ-1274.7227

    ------------------------------------------------------------------------------
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿOIM
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
    -------------+----------------------------------------------------------------
    Measurementÿÿ|
    ÿÿx1ÿÿÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿÿÿÿÿÿFÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿ(constrained)
    ÿÿ-----------+----------------------------------------------------------------
    ÿÿx2ÿÿÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿÿÿÿÿÿFÿ|ÿÿÿ.9812251ÿÿÿ.1145466ÿÿÿÿÿ8.57ÿÿÿ0.000ÿÿÿÿÿ.7567179ÿÿÿÿ1.205732
    ÿÿ-----------+----------------------------------------------------------------
    ÿÿx3ÿÿÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿÿÿÿÿÿFÿ|ÿÿÿ.9670884ÿÿÿ.1032164ÿÿÿÿÿ9.37ÿÿÿ0.000ÿÿÿÿÿÿ.764788ÿÿÿÿ1.169389
    ÿÿ-----------+----------------------------------------------------------------
    ÿÿx4ÿÿÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿÿÿÿÿÿFÿ|ÿÿÿ.9969965ÿÿÿ.1184567ÿÿÿÿÿ8.42ÿÿÿ0.000ÿÿÿÿÿ.7648256ÿÿÿÿ1.229167
    -------------+----------------------------------------------------------------
    ÿÿÿÿvar(e.x1)|ÿÿÿ.4897168ÿÿÿ.0632865ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.3801399ÿÿÿÿ.6308795
    ÿÿÿÿvar(e.x2)|ÿÿÿ.5085491ÿÿÿ.0633444ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.3983896ÿÿÿÿ.6491689
    ÿÿÿÿvar(e.x3)|ÿÿÿ.5224936ÿÿÿ.0638398ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.4112241ÿÿÿÿ.6638704
    ÿÿÿÿvar(e.x4)|ÿÿÿ.4927534ÿÿÿÿ.063256ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.3831411ÿÿÿÿ.6337246
    ÿÿÿÿÿÿÿvar(F)|ÿÿÿ.5062832ÿÿÿÿ.090023ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.3573058ÿÿÿÿ.7173764
    ------------------------------------------------------------------------------
    LRÿtestÿofÿmodelÿvs.ÿsaturated:ÿchi2(2)ÿÿÿ=ÿÿÿÿÿ15.71,ÿProbÿ>ÿchi2ÿ=ÿ0.0004

    .ÿ
    .ÿrestore

    .ÿ
    .ÿ//ÿDirectlyÿwithÿ-gsem-
    .ÿgsemÿ(x1@1ÿx2ÿ<-ÿF)ÿ(x3ÿx4ÿ<-ÿF,ÿprobit),ÿnocnsreportÿnodvheaderÿnolog

    GeneralizedÿstructuralÿequationÿmodelÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ250
    Logÿlikelihoodÿ=ÿ-968.48131

    ------------------------------------------------------------------------------
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
    -------------+----------------------------------------------------------------
    x1ÿÿÿÿÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿÿÿÿÿÿFÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿ(constrained)
    ÿÿÿÿÿÿÿ_consÿ|ÿÿ-.0079629ÿÿÿ.0633284ÿÿÿÿ-0.13ÿÿÿ0.900ÿÿÿÿ-.1320842ÿÿÿÿ.1161584
    -------------+----------------------------------------------------------------
    x2ÿÿÿÿÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿÿÿÿÿÿFÿ|ÿÿÿ.9858853ÿÿÿ.1351548ÿÿÿÿÿ7.29ÿÿÿ0.000ÿÿÿÿÿ.7209868ÿÿÿÿ1.250784
    ÿÿÿÿÿÿÿ_consÿ|ÿÿ-.1270612ÿÿÿ.0652009ÿÿÿÿ-1.95ÿÿÿ0.051ÿÿÿÿ-.2548527ÿÿÿÿ.0007303
    -------------+----------------------------------------------------------------
    x3ÿÿÿÿÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿÿÿÿÿÿFÿ|ÿÿÿ1.330986ÿÿÿ.2558206ÿÿÿÿÿ5.20ÿÿÿ0.000ÿÿÿÿÿ.8295872ÿÿÿÿ1.832386
    ÿÿÿÿÿÿÿ_consÿ|ÿÿ-.0595528ÿÿÿ.1095103ÿÿÿÿ-0.54ÿÿÿ0.587ÿÿÿÿÿ-.274189ÿÿÿÿ.1550834
    -------------+----------------------------------------------------------------
    x4ÿÿÿÿÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿÿÿÿÿÿFÿ|ÿÿÿÿ1.37879ÿÿÿ.3045173ÿÿÿÿÿ4.53ÿÿÿ0.000ÿÿÿÿÿ.7819471ÿÿÿÿ1.975633
    ÿÿÿÿÿÿÿ_consÿ|ÿÿ-.0218087ÿÿÿÿ.111201ÿÿÿÿ-0.20ÿÿÿ0.845ÿÿÿÿ-.2397586ÿÿÿÿ.1961412
    -------------+----------------------------------------------------------------
    ÿÿÿÿÿÿÿvar(F)|ÿÿÿ.5177472ÿÿÿ.0992615ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.3555715ÿÿÿÿ.7538909
    -------------+----------------------------------------------------------------
    ÿÿÿÿvar(e.x1)|ÿÿÿ.4848748ÿÿÿ.0746497ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.3585762ÿÿÿÿ.6556587
    ÿÿÿÿvar(e.x2)|ÿÿÿ.5595573ÿÿÿÿ.078487ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.4250594ÿÿÿÿ.7366131
    ------------------------------------------------------------------------------

    .ÿ
    .ÿexit

    endÿofÿdo-file


    .


    With the first approach, because the correlation matrix is created pairwise, you can end up with a correlation matrix that isn't positive semidefinite, especially if you have a lot of indicator variables. In that case, you'll need to launder your correlation matrix through the offiical Stata factormat in a manner like the following prior to proceding to ssd and sem
    Code:
    quietly factormat `Rho', n(`N') forcepsd
    matrix define `Rho' = e(C)

    Comment


    • #3
      Dear Joseph,

      thanks a lot for the swift and greatly helpful response!

      Indeed I use the polychoric and factormat commands for my EFA thus the question whether - for reasons of consistency - you would still recommend using the gsem solution for the CFA or follow the same approach as applied for my EFA?
      Additionally, my understanding is that the gsem does not allow for post-estimation commands such as CFI, TLI, RMSEA etc., what would be suitable Goodness of Model Fit & Reliability test to apply after gsem?
      Lastly, I want to use the (hopefully validated) (g)sem model for further analyses for which the latent factors of my CFA would serve as independent variables (e.g., EBIT = a x Factor1 + b x Factor2 + ... + constant): depending on your recommended option, how would I proceed with my Stata code to "save" these latent factors (predict?)?

      Many many thanks for your help and best regards,

      Florian

      Comment


      • #4
        I'd use gsem inasmuch as it's designed for this use case.

        I can't answer for SEM goodness-of-fit indexes, because I never use them.

        Latent factor prediction is described in the help file, e.g., help gsem_predict##lstatistic: you'd use the latent(varlist) option of predict postestimation command.

        I have only very limited experience taking predictions of the latent factor as explanatory variables in another regression model. I've eschewed it mainly because it's problematic to account for the uncertainty in (linear combinations of) the latent factor predictions in the follow-on model.

        Sorry that I couldn't have been more help.

        Comment

        Working...
        X