How to generate scores from a bifactor (bi-factor) model OR how to generate scores from a SEM model?

Marc Rain

Join Date: Dec 2019

Posts: 19
#1

How to generate scores from a bifactor (bi-factor) model OR how to generate scores from a SEM model?

07 Dec 2023, 11:04

I am comparing the scores from a few factor models. So far I generate scores after conducting an orthogonal and an oblique rotation and those seem "fine". I recently heard about the bifactor model, which could be a good fit for my dataset.

Note that I am not familiar with the SEM (structural equation modelling) techniques and syntax and also far from comfortable with all the nuances when conducting a CFA (confirmatory factor analysis), so any generic advice in those regards are welcome.

So, my main questions are:

1) Is the following the correct syntax for a bifactor modeling?
2) Are there any restrictions I have to put in place?
3) After estimating the model, is it possible to generate scores for each factor (similar to those after running the "factor" command)?

Note that with the current syntax, the model does not converge (in my dataset as well).

Code:

use https://www.stata-press.com/data/r18/audiometric, clear factor lft* rght*, factors(2) pcf rotate, ortho cap drop fc_pcf_1 fc_pcf_2 predict fc_pcf_1 fc_pcf_2, b factor lft* rght*, factors(2) rotate, oblique promax(3) cap drop fct_obli_1 fct_obli_2 predict fct_obli_1 fct_obli_2, bart corr fc_pcf_? corr fct_obli_? sem ( Left -> (lft*) ) /// ( Right -> (rght*) ) /// ( Bifct -> (lft* rght*) ), var(Left@1 Right@1)

Diagram of the bifactor model (from: https://www.frontiersin.org/articles...020.01357/full)

Very broadly, in my use case, I have 12 variables (6 mental and 6 physical) and I aim to check which scores are more predictive of diagnoses, and if a single factor is good enough or if having two factors is worth any added complexity.
I got aware of the bifactor from https://onlinelibrary.wiley.com/doi/10.1002/wps.21097, and would like to incorporate it in my analysis, although I don't really agree with their conclusion, but that is for another topic.

Last edited by Marc Rain; 07 Dec 2023, 11:07.
Tags: None
Erik Ruzek

Join Date: Oct 2017

Posts: 430
#2

07 Dec 2023, 12:38

I would strongly suggest that you familiarize yourself with the varieties and complexities of measurement modeling (EFA, CFA) before going much further. This is a difficult area of statistics. A nice introductory paper aimed at applied researchers is here. It was published in Personality Disorders: Theory, Research, and Treatment. There you will find a lot of references to get you started. The bifactor model, in particular, is an area of much controversy and discussion. The work of Steven Reise and colleagues is particularly noteworthy.

In terms of your questions:

1. No. Bifactor models specify 0 covariance between the "general" factor and any "specific" factors. The "specific" factors can be allowed to covary, but often researchers specify a diagonal covariance matrix for the latent variables in these models.

2. Yes. See above. The Stata syntax to do this is the option covstructure(e._LEn _LEx,diagonal)in the sem statement.

3. Yes. See help sem_predict
Comment
ericmelse

Join Date: May 2014

Posts: 434
#3

08 Dec 2023, 02:47

Note that the Watts et al. replication materials are available here and the supporting information is available here.

http://publicationslist.org/eric.melse
Comment
ericmelse

Join Date: May 2014

Posts: 434
#4

08 Dec 2023, 03:04

Marc, in addition to Erik's concern about the bifactor model, I would like to refer you to this PsyArchiv Preprint:
Fried, E. I. (2023, July 9). No evidence for the existence of the d (isease) factor. https://doi.org/10.31234/osf.io/47avw
in which the Brandt at al. World Psychiatry paper, that you refer to in #1, is discussed and their argumentation, frankly, is repudiated by Eiko Fried: 'the authors did not discover the d factor: they created it.'

http://publicationslist.org/eric.melse
Comment
Marc Rain

Join Date: Dec 2019

Posts: 19
#5

09 Dec 2023, 07:30

Thank you, Erik and Eric, for your helpful suggestions. I'll definitely have to dig a bit in the literature.

I was able to run the model with the following syntax. Note that it only converges when using the option "difficult".

Code:

sem ( Left -> (lft*) ) /// ( Right -> (rght*) ) /// ( Bifct -> (lft* rght*) ), covstructure(e._LEn _LEx,diagonal) difficult

I am still exploring the options, but after a quick validation of calculating the average area under ROC curve after running a simple logit model of selected health diagnoses on different health scores (sorted lower to higher).

Code:

Mental health diagnoses (migraine depression sleep dementia ) sc_mean_p1 = .615 sc_mean_mcs_ortho = .632 sc_mean_mcs_def = .651 sc_mean_mcs_obli = .671 sc_mean_mcs_sep = .701 sc_mean_mcs_rmean = .703 sc_mean_p0 = .726 Physical health diagnoses (diabetes cardio stroke jointpain backpain) sc_mean_p2 = .542 sc_mean_pcs_sgl = .715 sc_mean_p0 = .720 sc_mean_pcs_rmean = .748 sc_mean_pcs_sep = .749 sc_mean_pcs_obli = .757 sc_mean_pcs_ortho = .757 sc_mean_pcs_def = .761 * _def: Default method (SF12) * _obli: Oblique rotation (promax(2)) * _ortho: Alternative orthogonal rotation * _rmean: Simple row mean * _sep: Two single factors loading only on each domain * _p1: First specific factor (of bifactor) * _p2: Second specific factor (of bifactor) * _p0: General factor (of bifactor)

So it looks like the general factor (p0) works relatively well in both domains, physical and mental, though in the physical domain there are other factors performing better. Their corresponding specific factors are not very predictive (but I might be doing something wrong here). It is worth noting that the simple row average of input variables also performs surprisingly well. Another option performing well are the two separated factors.

Having said all that, since I do want to be able to compare the predictiveness of physical and mental health on other socio-economic outcomes, I might leave the bifactor aside and pick a solution that performs relatively well in both domains.

-------------------------

Regarding the Brandt at al. World Psychiatry, I also don't agree with their conclusion, but the idea of the bifactor model looked promising. In my setting, I was previously just "blindly" using the Mental and Physical Component Summaries based on the SF-12 methodology. It happens, as already explored in the literature, that there are some issues due to the orthogonal rotation used in this methodology (see Taft et al, 2001, and others). That's why I stared looking for some alternative. And that's also why I deeply appreciate your suggestions.

Refs:
- Taft et al (2001) Do SF-36 summary component scores accurately summarize subscale scores?
Comment
Marc Rain

Join Date: Dec 2019

Posts: 19
#6

09 Dec 2023, 07:42

By the way, these are the types of artifacts I was finding with the default SF12 method of generating the scores based on an orthogonal rotation. With the oblique rotation, I don't get those, at least not as strongly. The MCS works well for "predicting" depression, but the PCS shows some distortions at extremes. The same happens, but to a lesser extent, with MCS when predicting a "physical health" diagnosis.

These are simple binscatter plots of "being diagnosed with depressive disorder" on each score, respectively.

Last edited by Marc Rain; 09 Dec 2023, 07:49.
Comment

Announcement

How to generate scores from a bifactor (bi-factor) model OR how to generate scores from a SEM model?

Comment

Comment

Comment

Comment

Comment