Categorical Survey Data - Polychoric and Factor Analysis Question

katharine sadowski

Join Date: Feb 2017

Posts: 3
#1

Categorical Survey Data - Polychoric and Factor Analysis Question

27 Apr 2017, 13:57

I have a survey with 258 observations.
Survey questions are all categorical (None/Some/Most/All and Yes/No)
I am trying to see which questions fall into factors/scales.

Because my data is categorical I ran
1. Reliability tests (alpha - inter-item reliability)
2. Polychoric correlations to create a matrix to plug-into my exploratory factor analysis
polychoric `my vars'
display r(sum_w)
global N = r(sum_w)
matrix r = r(R)
factormat r, n(213) factors(3) *note the n is lower hear because of missingness
3. Once I do this, I am stuck with how to run a CFA...
Because I am running an analysis on categorical variables, I want to use sem with the adf method, but there are few things happening with my output that I don't fully understand.
a. When I run sem r, method(adf) - which is running my factor analysis on the polychoric matrix I saved above - Stata is unable to come to convergence. BUT when I run sem `my vars',
method (adf) it gives me an output. Why would the latter method give me an output? And is it reliable? I am assuming because the adf method deals with categorical variables, I can
use the output from the sem `my vars', method(adf) and take it as reliable. I just wanted to check and try to understand why the first one doesn't work. I feel like it has something to do with Stata not understanding how to interpret a matrix with the sem funcation.
b. When I run sem `my vars', method(adf) the coefficients are all between 0.9 and 2. These factor loadings are very different from the one I received in Step 2 (which range between 0.4 and
0.8). I feel as though this is because of my small sample size. BUT when I run sem `my vars', stand I get comparable factor loadings to Step 2. I think those are incorrect because they
aren't accounting for the categorical nature of my data, but I don't know how to interpret the super high coefficients from the adf command.

Any advice on the issues in 3a and 3b would be greatly appreciated!

Last edited by katharine sadowski; 27 Apr 2017, 14:01.
Tags: categorical, data, SEM
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#2

27 Apr 2017, 17:50

If you want to use sem with your polychoric correlation matrix, then I recommend first

Code:

help sem_ssd_options

if you haven't already. ("When I run sem r, method(adf)" isn't correct syntax for the command as far as I know.)

But, why don't you use gsem instead and fit the model using the categorical indicator variables themselves?
Comment
katharine sadowski

Join Date: Feb 2017

Posts: 3
#3

28 Apr 2017, 11:02

Thanks for the ideas! I have two follow-ups:

First:
I attempted to run gsem with categorical indicator variables
For one of my binary questions I ran

Code:

gsem (q22 -> i.q22a i.q22b i.q22c)

When I run that though, I get an error "cannot compute an improvement -- discontinuous region encountered." When I run a normal gsem command, though, I get output.

Major Question: I guess I am just confused because I have been reading a ton of posts saying you can run gsem or sem with method adf for categorical questions. I chose ADF originally because of its likeness to WLS which in MPlus has an estimator WSLMV which seems perfect for categorical variables. I assumed that ADF was the closest I could get and still receive my goodness of fit statistics. Is my assumption wrong? I am not really sure which one is right, or if there is really even a difference even when accounting for the assumption differences. I am also unsure why gsem wouldn't run when I specified categories.

Second:
I ran the following code with ssd

Code:

clear all ssd init q24a q24b q24c q24d ssd set observations 213 ssd set correlations /// 1 \ /// .62912024 1 \ /// -.02284957 -.07465799 1 \ /// .10250394 .24679275 .33601044 1 sem (Q24 -> q24a q24b q24c q24d)

The correlation table I included in this was the correlation from my polychoric correlation table mentioned above. When I run this I don't get convergence.

Major Questions:
1) Theoretically, should I definitely be using gsem with categorical indicators OR the polychoric sem model? Theoretically, should both of those be giving me the same results?
2) When I try to run other models, like gsem without designating which variables are categorical or sem with method adf, am I getting convergence and a factor table that are based on the assumption that my variables are continuous and therefore not accurate for interpretation?

Final note, when I run the initial polychoric correlation and resulting factormat I get a factor output. I feel like the assumptions I made and the method is appropriate, so what I am uncertain about is confirming that model.
Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4410

28 Apr 2017, 19:38

For your first question, the syntax for binary (0/1) indicator variables is

Code:

gsem (q22a q22b q22c <- Q22, probit)
// or
gsem (q22a q22b q22c <- Q22, logit)

For your second, try

Code:

 gsem (q24a q24b q24c q24d <- Q24, probit)

(or oprobit if any of the categorical indicator variables has more than two categories). If you want to use sem with that polychoric correlation matrix, you'll probably need to add constraints. EFA with it using maximum likelihood gives rise to a Heywood case.

Code:

matrix input C = (  ///
    1            0.62912024 -0.02284957   0.10250394 \ ///
    0.62912024   1          -0.07465799   0.24679275 \ ///      
   -0.02284957  -0.07465799  1            0.33601044 \ ///  
    0.10250394   0.24679275  0.33601044   1)
matrix rownames C = q24a q24b q24c q24d
matrix colnames C = q24a q24b q24c q24d

factormat C, n(213) factors(1) ml nolog

Announcement

Categorical Survey Data - Polychoric and Factor Analysis Question

Comment

Comment

Comment