Conducting a higher-order confirmatory factor analysis with categorical and binary data

Pauley Faran

Join Date: Apr 2019

Posts: 1
#1

Conducting a higher-order confirmatory factor analysis with categorical and binary data

27 Apr 2019, 16:03

Hello Everyone in Stataland!

I am struggling to conduct a higher-order confirmatory factor analysis with ordinal categorical and binary variables (i.e. observed items).

My model is composed of four latent domains (first order) and one overarching latent domain (second order). Two of the domains comprise categorical ordinal items (e.g. "agree," "neutral," "disagree" etc.), while two comprise items with binary responses (e.g. "yes/no"). I have tried running the model, specifying the appropriate link/family option for each observed item (i.e. binary vs. categorical). None of the models have converged. When I remove entire domains from the model, I have the same issue. I am working with a data set of 986. All 986 respondents responded to all items.

I have scoured the online forums and tried to adapt code that I have found for conducting exploratory factor analysis and principal components analysis with binary and categorical data in STATA (e.g. using polychoric correlations), but to no avail. A lot of people I have talk to at my university have recommended I conduct this type of analysis in MPlus, but I can't imagine that STATA is not capable of a higher-order CFA with binary and categorical data.

Thank you in advance for your assistance!

Pauley
Tags: None

Joseph Coveney

Join Date: Apr 2014
Posts: 4410

28 Apr 2019, 01:47

You could probably fit the model in Stata, but with five latent factors, it's liable to take a while with gsem, even with intmethod(laplace). So, you're less likely to grow impatient waiting for an answer with MPlus, as it seems to have faster algorithms.

I don't understand what "but to no avail" means with your attempt with polychoric correlations. The following do-file for the CFA via a polychoric correlation matrix runs with no hitches.

Code:

version 15.1

clear *

set seed `=strreverse("1495431")'
quietly set obs 986

generate double secondary = rnormal()

forvalues domain = 1/4 {
    drawnorm v`domain'1 v`domain'2 v`domain'3, ///
        double corr(1 0.75 0.75 \ 0.75 1 0.75 \ 0.75 0.75 1)
    forvalues item = 1/3 {
        quietly replace v`domain'`item' = v`domain'`item' + secondary
    }
}

sem ///
    (v1? <- F1) ///
    (v2? <- F2) ///
    (v3? <- F3) ///
    (v4? <- F4) ///
        (F1 F2 F3 F4 <- S), nofootnote nocnsreport nodescribe nolog

// discretization
forvalues domain = 1/2 {
    forvalues item = 1/3 {
        egen byte o`domain'`item' = cut(v`domain'`item'), group(4)
    }
}
forvalues domain = 3/4 {
    forvalues item = 1/3 {
        summarize v`domain'`item', meanonly
        generate byte b`domain'`item' = v`domain'`item' > r(mean)
    }
}

*
* Begin here
*

quietly polychoric o* b* // -search polychoric-

tempname Rho n
matrix define `Rho' = r(R)
scalar define `n' = r(N)

drop _all
quietly ssd init o11 o12 o13 o21 o22 o23 b31 b32 b33 b41 b42 b43
quietly ssd set observations `=`n''
quietly ssd set correlations (stata) `Rho'

sem ///
    (o1? <- F1) ///
    (o2? <- F2) ///
    (b3? <- F3) ///
    (b4? <- F4) ///
        (F1 F2 F3 F4 <- S), nofootnote nocnsreport nodescribe nolog

exit

If your particular polychoric correlation matrix turns out not to be positive definite, then you can always run it through factormat , forcepsd first.

Announcement

Conducting a higher-order confirmatory factor analysis with categorical and binary data

Comment