Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Conducting a higher-order confirmatory factor analysis with categorical and binary data

    Hello Everyone in Stataland!

    I am struggling to conduct a higher-order confirmatory factor analysis with ordinal categorical and binary variables (i.e. observed items).

    My model is composed of four latent domains (first order) and one overarching latent domain (second order). Two of the domains comprise categorical ordinal items (e.g. "agree," "neutral," "disagree" etc.), while two comprise items with binary responses (e.g. "yes/no"). I have tried running the model, specifying the appropriate link/family option for each observed item (i.e. binary vs. categorical). None of the models have converged. When I remove entire domains from the model, I have the same issue. I am working with a data set of 986. All 986 respondents responded to all items.

    I have scoured the online forums and tried to adapt code that I have found for conducting exploratory factor analysis and principal components analysis with binary and categorical data in STATA (e.g. using polychoric correlations), but to no avail. A lot of people I have talk to at my university have recommended I conduct this type of analysis in MPlus, but I can't imagine that STATA is not capable of a higher-order CFA with binary and categorical data.


    Thank you in advance for your assistance!

    Pauley

  • #2
    You could probably fit the model in Stata, but with five latent factors, it's liable to take a while with gsem, even with intmethod(laplace). So, you're less likely to grow impatient waiting for an answer with MPlus, as it seems to have faster algorithms.

    I don't understand what "but to no avail" means with your attempt with polychoric correlations. The following do-file for the CFA via a polychoric correlation matrix runs with no hitches.
    Code:
    version 15.1
    
    clear *
    
    set seed `=strreverse("1495431")'
    quietly set obs 986
    
    generate double secondary = rnormal()
    
    forvalues domain = 1/4 {
        drawnorm v`domain'1 v`domain'2 v`domain'3, ///
            double corr(1 0.75 0.75 \ 0.75 1 0.75 \ 0.75 0.75 1)
        forvalues item = 1/3 {
            quietly replace v`domain'`item' = v`domain'`item' + secondary
        }
    }
    
    sem ///
        (v1? <- F1) ///
        (v2? <- F2) ///
        (v3? <- F3) ///
        (v4? <- F4) ///
            (F1 F2 F3 F4 <- S), nofootnote nocnsreport nodescribe nolog
    
    // discretization
    forvalues domain = 1/2 {
        forvalues item = 1/3 {
            egen byte o`domain'`item' = cut(v`domain'`item'), group(4)
        }
    }
    forvalues domain = 3/4 {
        forvalues item = 1/3 {
            summarize v`domain'`item', meanonly
            generate byte b`domain'`item' = v`domain'`item' > r(mean)
        }
    }
    
    *
    * Begin here
    *
    
    quietly polychoric o* b* // -search polychoric-
    
    tempname Rho n
    matrix define `Rho' = r(R)
    scalar define `n' = r(N)
    
    drop _all
    quietly ssd init o11 o12 o13 o21 o22 o23 b31 b32 b33 b41 b42 b43
    quietly ssd set observations `=`n''
    quietly ssd set correlations (stata) `Rho'
    
    sem ///
        (o1? <- F1) ///
        (o2? <- F2) ///
        (b3? <- F3) ///
        (b4? <- F4) ///
            (F1 F2 F3 F4 <- S), nofootnote nocnsreport nodescribe nolog
    
    exit
    If your particular polychoric correlation matrix turns out not to be positive definite, then you can always run it through factormat , forcepsd first.

    Comment

    Working...
    X