How to structure SEM (for CFA) in Stata when my latent variables constitute of both categorical (binary) and continuous indicators?

Florian Bartels

Join Date: Jan 2021

Posts: 2
#1

How to structure SEM (for CFA) in Stata when my latent variables constitute of both categorical (binary) and continuous indicators?

08 Jan 2021, 12:39

Dear fellow Stata enthusiasts,

after having established a new construct by EFA using polychoric correlations and factormat command in Stata due to having both categorical (binary) and continuous indicators, I do now try to validate my new construct using CFA:

I would greatly appreciate your recommendations on whether:
a) a solution based on GSEM might be most applicable (however, indicators for my latent factors vary by type such that e.g., 'logit' link option would also try to describe continuous indicators; gsem (latent_factor1 -> continuous_indicator1 continuous_indicator2 binary_indicator1, logit) )

or

b) There is a reliable option to use the polychoric correlation matrix from my EFA and use it as basis for my (g)sem (preferably sem, given that post-estimation commands are available) - possibly with 'ssd init'?
I very much appreciate your help and look forward to helpful recommendations!

Thanks and best,
Florian
Tags: cfa, factormat, Latent Variable, polychoric, SEM
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#2

10 Jan 2021, 05:35

You can do both in Stata (see example below), although I would probably favor gsem, because some of the postestimation commands available after sem aren't really suitable for categorical indicator variables and you'll lose information forming the polychoric correlation matrix. You'd get the polychoric correlation matrix not from an exploratory factor analysis (EFA), but rather from the user-written command polychoric (search at Stata's command line in order to find it and install it). Begin at the "Begin here" comment in the output below (the first part of the output just shows creation of a fictitious dataset for use as illustration).

.ÿ
.ÿversionÿ16.1

.ÿ
.ÿclearÿ*

.ÿ
.ÿsetÿseedÿ`=strreverse("1588940")'

.ÿ
.ÿtempnameÿCorr

.ÿmatrixÿdefineÿ`Corr'ÿ=ÿJ(4,ÿ4,ÿ0.5)ÿ+ÿI(4)ÿ*ÿ0.5

.ÿquietlyÿdrawnormÿx1ÿx2ÿx3ÿx4,ÿdoubleÿcorr(`Corr')ÿn(250)

.ÿ
.ÿforeachÿvarÿofÿvarlistÿx3ÿx4ÿ{
ÿÿ2.ÿÿÿÿÿÿÿÿÿquietlyÿreplaceÿ`var'ÿ=ÿ`var'ÿ>ÿ0
ÿÿ3.ÿ}

.ÿ
.ÿ*
.ÿ*ÿBeginÿhere
.ÿ*
.ÿpreserve

.ÿ
.ÿ//ÿUsingÿ-ssd-ÿ&ÿ-sem-ÿonÿpolyserial/tetrachoricÿcorrelationÿmatrix
.ÿquietlyÿpolychoricÿx?

.ÿ
.ÿtempnameÿRho

.ÿmatrixÿdefineÿ`Rho'ÿ=ÿr(R)

.ÿlocalÿNÿ`r(N)'

.ÿ
.ÿdropÿ_all

.ÿquietlyÿssdÿinitÿx1ÿx2ÿx3ÿx4

.ÿquietlyÿssdÿsetÿobservationsÿ`N'

.ÿquietlyÿssdÿsetÿcorrelationsÿ(stata)ÿ`Rho'

.ÿsemÿ(x?ÿ<-ÿF),ÿnocnsreportÿnodescribeÿnolog

StructuralÿequationÿmodelÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ250
Estimationÿmethodÿÿ=ÿml
Logÿlikelihoodÿÿÿÿÿ=ÿ-1274.7227

------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿOIM
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-------------+----------------------------------------------------------------
Measurementÿÿ|
ÿÿx1ÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿFÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿ(constrained)
ÿÿ-----------+----------------------------------------------------------------
ÿÿx2ÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿFÿ|ÿÿÿ.9812251ÿÿÿ.1145466ÿÿÿÿÿ8.57ÿÿÿ0.000ÿÿÿÿÿ.7567179ÿÿÿÿ1.205732
ÿÿ-----------+----------------------------------------------------------------
ÿÿx3ÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿFÿ|ÿÿÿ.9670884ÿÿÿ.1032164ÿÿÿÿÿ9.37ÿÿÿ0.000ÿÿÿÿÿÿ.764788ÿÿÿÿ1.169389
ÿÿ-----------+----------------------------------------------------------------
ÿÿx4ÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿFÿ|ÿÿÿ.9969965ÿÿÿ.1184567ÿÿÿÿÿ8.42ÿÿÿ0.000ÿÿÿÿÿ.7648256ÿÿÿÿ1.229167
-------------+----------------------------------------------------------------
ÿÿÿÿvar(e.x1)|ÿÿÿ.4897168ÿÿÿ.0632865ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.3801399ÿÿÿÿ.6308795
ÿÿÿÿvar(e.x2)|ÿÿÿ.5085491ÿÿÿ.0633444ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.3983896ÿÿÿÿ.6491689
ÿÿÿÿvar(e.x3)|ÿÿÿ.5224936ÿÿÿ.0638398ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.4112241ÿÿÿÿ.6638704
ÿÿÿÿvar(e.x4)|ÿÿÿ.4927534ÿÿÿÿ.063256ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.3831411ÿÿÿÿ.6337246
ÿÿÿÿÿÿÿvar(F)|ÿÿÿ.5062832ÿÿÿÿ.090023ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.3573058ÿÿÿÿ.7173764
------------------------------------------------------------------------------
LRÿtestÿofÿmodelÿvs.ÿsaturated:ÿchi2(2)ÿÿÿ=ÿÿÿÿÿ15.71,ÿProbÿ>ÿchi2ÿ=ÿ0.0004

.ÿ
.ÿrestore

.ÿ
.ÿ//ÿDirectlyÿwithÿ-gsem-
.ÿgsemÿ(x1@1ÿx2ÿ<-ÿF)ÿ(x3ÿx4ÿ<-ÿF,ÿprobit),ÿnocnsreportÿnodvheaderÿnolog

GeneralizedÿstructuralÿequationÿmodelÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ250
Logÿlikelihoodÿ=ÿ-968.48131

------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-------------+----------------------------------------------------------------
x1ÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿFÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿ(constrained)
ÿÿÿÿÿÿÿ_consÿ|ÿÿ-.0079629ÿÿÿ.0633284ÿÿÿÿ-0.13ÿÿÿ0.900ÿÿÿÿ-.1320842ÿÿÿÿ.1161584
-------------+----------------------------------------------------------------
x2ÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿFÿ|ÿÿÿ.9858853ÿÿÿ.1351548ÿÿÿÿÿ7.29ÿÿÿ0.000ÿÿÿÿÿ.7209868ÿÿÿÿ1.250784
ÿÿÿÿÿÿÿ_consÿ|ÿÿ-.1270612ÿÿÿ.0652009ÿÿÿÿ-1.95ÿÿÿ0.051ÿÿÿÿ-.2548527ÿÿÿÿ.0007303
-------------+----------------------------------------------------------------
x3ÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿFÿ|ÿÿÿ1.330986ÿÿÿ.2558206ÿÿÿÿÿ5.20ÿÿÿ0.000ÿÿÿÿÿ.8295872ÿÿÿÿ1.832386
ÿÿÿÿÿÿÿ_consÿ|ÿÿ-.0595528ÿÿÿ.1095103ÿÿÿÿ-0.54ÿÿÿ0.587ÿÿÿÿÿ-.274189ÿÿÿÿ.1550834
-------------+----------------------------------------------------------------
x4ÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿFÿ|ÿÿÿÿ1.37879ÿÿÿ.3045173ÿÿÿÿÿ4.53ÿÿÿ0.000ÿÿÿÿÿ.7819471ÿÿÿÿ1.975633
ÿÿÿÿÿÿÿ_consÿ|ÿÿ-.0218087ÿÿÿÿ.111201ÿÿÿÿ-0.20ÿÿÿ0.845ÿÿÿÿ-.2397586ÿÿÿÿ.1961412
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿvar(F)|ÿÿÿ.5177472ÿÿÿ.0992615ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.3555715ÿÿÿÿ.7538909
-------------+----------------------------------------------------------------
ÿÿÿÿvar(e.x1)|ÿÿÿ.4848748ÿÿÿ.0746497ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.3585762ÿÿÿÿ.6556587
ÿÿÿÿvar(e.x2)|ÿÿÿ.5595573ÿÿÿÿ.078487ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.4250594ÿÿÿÿ.7366131
------------------------------------------------------------------------------

.ÿ
.ÿexit

endÿofÿdo-file

.

With the first approach, because the correlation matrix is created pairwise, you can end up with a correlation matrix that isn't positive semidefinite, especially if you have a lot of indicator variables. In that case, you'll need to launder your correlation matrix through the offiical Stata factormat in a manner like the following prior to proceding to ssd and sem

Code:

quietly factormat `Rho', n(`N') forcepsd matrix define `Rho' = e(C)
Comment
Florian Bartels

Join Date: Jan 2021

Posts: 2
#3

12 Jan 2021, 01:48

Dear Joseph,

thanks a lot for the swift and greatly helpful response!

Indeed I use the polychoric and factormat commands for my EFA thus the question whether - for reasons of consistency - you would still recommend using the gsem solution for the CFA or follow the same approach as applied for my EFA?
Additionally, my understanding is that the gsem does not allow for post-estimation commands such as CFI, TLI, RMSEA etc., what would be suitable Goodness of Model Fit & Reliability test to apply after gsem?
Lastly, I want to use the (hopefully validated) (g)sem model for further analyses for which the latent factors of my CFA would serve as independent variables (e.g., EBIT = a x Factor1 + b x Factor2 + ... + constant): depending on your recommended option, how would I proceed with my Stata code to "save" these latent factors (predict?)?

Many many thanks for your help and best regards,

Florian
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#4

12 Jan 2021, 05:05

I'd use gsem inasmuch as it's designed for this use case.

I can't answer for SEM goodness-of-fit indexes, because I never use them.

Latent factor prediction is described in the help file, e.g., help gsem_predict##lstatistic: you'd use the latent(varlist) option of predict postestimation command.

I have only very limited experience taking predictions of the latent factor as explanatory variables in another regression model. I've eschewed it mainly because it's problematic to account for the uncertainty in (linear combinations of) the latent factor predictions in the follow-on model.

Sorry that I couldn't have been more help.
Comment

Announcement

How to structure SEM (for CFA) in Stata when my latent variables constitute of both categorical (binary) and continuous indicators?

Comment

Comment

Comment