CFA with Binary variables

Anna Geo

Join Date: Feb 2021

Posts: 11
#1

CFA with Binary variables

18 Mar 2021, 13:20

Hi,

I have 4 dichotomous variables (sad, nervous, sleep_problems, lonely) that I want to load in one factor (Depression). I want to use this new variable (depressed) as the outcome of interest, in a multilevel model (survey structure).

I have been reading posts that suggest that I should conduct the CFA on a tetrachoric correlation matrix, by using SSD (summary statistics data). I have done this, but when I try to predict the latent variable I get this error:

. predict depressed, latent(Depression)

predict not possible with summary statistics data
r(198);

I am still not skilled enough to conduct a GSEM, so I would really appreciate if anyone would help me find a way around it.

Thank you,
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30114
#2

18 Mar 2021, 14:12

I believe the equivalent would be:

Code:

gsem (Depression -> sad nervous sleep_problems lonely, probit)
Comment
Anna Geo

Join Date: Feb 2021

Posts: 11
#3

19 Mar 2021, 08:57

Thank you,

I tried this and Stata could not identify a model. The iterations keep going for hours. I am using Stata/IC 16.1.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30114
#4

19 Mar 2021, 10:30

At some point, the log likelihood from the iterations is not really changing, right? Find that point and note at which iteration it gets stuck. Then add the -iterate(#)- option to your -gsem- and run it again, replacing # by a number just a bit larger than the iteration number at which it gets stuck. That way Stata will stop shortly after it gets stuck and will show you its results so far. Those results are not usable as a model, but they may identify what the sticking point is. You may find that the coefficient or standard error for one of the variables is some utterly outlandish value (near positive or negative infinity) or missing altogether. If there is such a variable, remove it from the model.

Added: If the log likelihood is still increasing, and not stuck, then you just need to be more patient.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4420
#5

19 Mar 2021, 21:29

Originally posted by Anna Geo View Post

I tried this and Stata could not identify a model. The iterations keep going for hours.

Examine the tetrachoric correlation matrix or the factor loadings from your summary-statistic SEM: if the individual interitem correlation coefficients in the tetrachoric correlation matrix are all about the same, or if the factor loadings in the summary-statistic SEM are all about the same (assuming that you've fixed the variance to one in order to free all of the factor loadings), then perhaps you could consider using the simple sumscore of the four items in lieu of factor scores (latent variable predictions). Or, especially if your vexed by incomplete responses to some items, you could use the row mean of the four items.

If one of the items correlates poorly with the others (or its factor loading is substantially near zero), then you could just ignore that questionnaire item and use the sumscore (or row mean) of the remaining items.

Admittedly, it's a judgment call as to what constitutes "all about the same". But using predictions (factor scores) as if they were a measured-without-error explanatory (independent) variable in a follow-on regression model makes it problematic to propagate uncertainty, anyway.
1 like
Comment

Announcement

CFA with Binary variables

Comment

Comment

Comment

Comment