exploratory factor analysis w/ maximum likelihood estimation, model comparison

rudy rossi

Join Date: Mar 2016

Posts: 11
#1

exploratory factor analysis w/ maximum likelihood estimation, model comparison

05 Aug 2020, 03:44

Dear Statalisters,

I am trying to conduct an EFA on 17 binary variables. I have successfully used both tetrachoric and polychoric matrices with the factormat command, using the ml option

below my syntax

Code:

polychoric var1 - var17 display r(sum_w) global N = r(sum_w) matrix r = r(R) factormat r, n($N) ml factormat r, factor(2) ml n($N) altdiv rotate, promax sortl estat kmo estat factors factortest var1 - var17

and

Code:

tetrachoric var1 - var17 matrix r = r(Rho) factormat r, n(8819) ml estat factors factormat r, factor(2) ml n(8819) altdiv rotate, promax sortl estat kmo estat factors factortest var1 - var17

now, the reviewer says that the advantage of ml estimation over paf is "There isn't a good objective way to discern the optimal number of factors, and using the scree plot and eigenvalue rules are quite subjective and open to interpretation, and research shows that they tend to favor more rather than fewer factors. State of the art EFA would use maximum likelihood estimation for factoring, producing a chi-square test which allows for model comparison between a model with k factors vs. k-1 factors."

my question is: how do I obtain that chi2?

thanks in advance!
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4423
#2

05 Aug 2020, 05:14

Maybe something along the lines of the following?

.ÿ
.ÿversionÿ16.1

.ÿ
.ÿclearÿ*

.ÿ
.ÿsetÿseedÿ`=strreverse("1567069")'

.ÿ
.ÿ//ÿTwoÿfactorsÿisÿtrue
.ÿtempnameÿCorr

.ÿmatrixÿdefineÿ`Corr'ÿ=ÿJ(3,ÿ3,ÿ0.5)ÿ+ÿI(3)ÿ*ÿ0.5

.ÿmatrixÿdefineÿ`Corr'ÿ=ÿ`Corr'ÿ,ÿJ(3,ÿ3,ÿ0)ÿ\ÿJ(3,ÿ3,ÿ0),ÿ`Corr'

.ÿ
.ÿquietlyÿdrawnormÿy1ÿy2ÿy3ÿy4ÿy5ÿy6,ÿdoubleÿcorr(`Corr')ÿn(250)

.ÿ
.ÿ*
.ÿ*ÿBeginÿhere
.ÿ*
.ÿfactorÿy?,ÿfactors(2)ÿmlÿnolog
(obs=250)

Factorÿanalysis/correlationÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿ=ÿÿÿÿÿÿÿÿ250
ÿÿÿÿMethod:ÿmaximumÿlikelihoodÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿRetainedÿfactorsÿ=ÿÿÿÿÿÿÿÿÿÿ2
ÿÿÿÿRotation:ÿ(unrotated)ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿparamsÿ=ÿÿÿÿÿÿÿÿÿ11
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿSchwarz'sÿBICÿÿÿÿ=ÿÿÿÿÿ61.669
ÿÿÿÿLogÿlikelihoodÿ=ÿ-.4664546ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(Akaike's)ÿAICÿÿÿ=ÿÿÿÿ22.9329

ÿÿÿÿ--------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿFactorÿÿ|ÿÿÿEigenvalueÿÿÿDifferenceÿÿÿÿÿÿÿÿProportionÿÿÿCumulative
ÿÿÿÿ-------------+------------------------------------------------------------
ÿÿÿÿÿÿÿÿFactor1ÿÿ|ÿÿÿÿÿÿ1.49355ÿÿÿÿÿÿ0.13916ÿÿÿÿÿÿÿÿÿÿÿÿ0.5244ÿÿÿÿÿÿÿ0.5244
ÿÿÿÿÿÿÿÿFactor2ÿÿ|ÿÿÿÿÿÿ1.35439ÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿÿÿÿÿ0.4756ÿÿÿÿÿÿÿ1.0000
ÿÿÿÿ--------------------------------------------------------------------------
ÿÿÿÿLRÿtest:ÿindependentÿvs.ÿsaturated:ÿÿchi2(15)ÿ=ÿÿ308.06ÿProb>chi2ÿ=ÿ0.0000
ÿÿÿÿLRÿtest:ÿÿÿ2ÿfactorsÿvs.ÿsaturated:ÿÿchi2(4)ÿÿ=ÿÿÿÿ0.92ÿProb>chi2ÿ=ÿ0.9221

Factorÿloadingsÿ(patternÿmatrix)ÿandÿuniqueÿvariances

ÿÿÿÿ-------------------------------------------------
ÿÿÿÿÿÿÿÿVariableÿ|ÿÿFactor1ÿÿÿFactor2ÿ|ÿÿÿUniquenessÿ
ÿÿÿÿ-------------+--------------------+--------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy1ÿ|ÿÿ-0.1072ÿÿÿÿ0.6262ÿ|ÿÿÿÿÿÿ0.5964ÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy2ÿ|ÿÿ-0.1261ÿÿÿÿ0.7171ÿ|ÿÿÿÿÿÿ0.4698ÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy3ÿ|ÿÿ-0.1085ÿÿÿÿ0.6452ÿ|ÿÿÿÿÿÿ0.5720ÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy4ÿ|ÿÿÿ0.7225ÿÿÿÿ0.1243ÿ|ÿÿÿÿÿÿ0.4625ÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy5ÿ|ÿÿÿ0.6654ÿÿÿÿ0.0967ÿ|ÿÿÿÿÿÿ0.5478ÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy6ÿ|ÿÿÿ0.6997ÿÿÿÿ0.0834ÿ|ÿÿÿÿÿÿ0.5035ÿÿ
ÿÿÿÿ-------------------------------------------------

.ÿestimatesÿstoreÿTwoFactors

.ÿ
.ÿtempnameÿdf2

.ÿscalarÿdefineÿ`df2'ÿ=ÿe(df_m)

.ÿ
.ÿfactorÿy?,ÿfactors(1)ÿmlÿnolog
(obs=250)

Factorÿanalysis/correlationÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿ=ÿÿÿÿÿÿÿÿ250
ÿÿÿÿMethod:ÿmaximumÿlikelihoodÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿRetainedÿfactorsÿ=ÿÿÿÿÿÿÿÿÿÿ1
ÿÿÿÿRotation:ÿ(unrotated)ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿparamsÿ=ÿÿÿÿÿÿÿÿÿÿ6
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿSchwarz'sÿBICÿÿÿÿ=ÿÿÿÿ174.928
ÿÿÿÿLogÿlikelihoodÿ=ÿ-70.89948ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(Akaike's)ÿAICÿÿÿ=ÿÿÿÿ153.799

ÿÿÿÿ--------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿFactorÿÿ|ÿÿÿEigenvalueÿÿÿDifferenceÿÿÿÿÿÿÿÿProportionÿÿÿCumulative
ÿÿÿÿ-------------+------------------------------------------------------------
ÿÿÿÿÿÿÿÿFactor1ÿÿ|ÿÿÿÿÿÿ1.48634ÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿÿÿÿÿ1.0000ÿÿÿÿÿÿÿ1.0000
ÿÿÿÿ--------------------------------------------------------------------------
ÿÿÿÿLRÿtest:ÿindependentÿvs.ÿsaturated:ÿÿchi2(15)ÿ=ÿÿ308.06ÿProb>chi2ÿ=ÿ0.0000
ÿÿÿÿLRÿtest:ÿÿÿÿ1ÿfactorÿvs.ÿsaturated:ÿÿchi2(9)ÿÿ=ÿÿ139.81ÿProb>chi2ÿ=ÿ0.0000

Factorÿloadingsÿ(patternÿmatrix)ÿandÿuniqueÿvariances

ÿÿÿÿ---------------------------------------
ÿÿÿÿÿÿÿÿVariableÿ|ÿÿFactor1ÿ|ÿÿÿUniquenessÿ
ÿÿÿÿ-------------+----------+--------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy1ÿ|ÿÿ-0.0222ÿ|ÿÿÿÿÿÿ0.9995ÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy2ÿ|ÿÿ-0.0274ÿ|ÿÿÿÿÿÿ0.9992ÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy3ÿ|ÿÿ-0.0211ÿ|ÿÿÿÿÿÿ0.9996ÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy4ÿ|ÿÿÿ0.7316ÿ|ÿÿÿÿÿÿ0.4648ÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy5ÿ|ÿÿÿ0.6725ÿ|ÿÿÿÿÿÿ0.5477ÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy6ÿ|ÿÿÿ0.7051ÿ|ÿÿÿÿÿÿ0.5028ÿÿ
ÿÿÿÿ---------------------------------------

.ÿestimatesÿstoreÿOneFactor

.ÿ
.ÿlrtestÿTwoFactorsÿOneFactor,ÿdf(`=`df2'ÿ-ÿe(df_m)')
(TwoFactorsÿdoesÿnotÿcontainÿmatrixÿe(V);ÿrankÿ=ÿ0ÿassumed)
(OneFactorÿdoesÿnotÿcontainÿmatrixÿe(V);ÿrankÿ=ÿ0ÿassumed)

Likelihood-ratioÿtestÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿLRÿchi2(5)ÿÿ=ÿÿÿÿ140.87
(Assumption:ÿOneFactorÿnestedÿinÿTwoFactors)ÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿ=ÿÿÿÿ0.0000

.ÿ
.ÿexit

endÿofÿdo-file

.

The concept is by analogy, but caveat emptor: it's not officially sanctioned in the help file for factor, and so you might want to look into things like test size. A reference source on factor analysis might also be worth checking out.
Comment
rudy rossi

Join Date: Mar 2016

Posts: 11
#3

06 Aug 2020, 08:00

this could be useful. could you explain the following passages:

Code:

tempname df2 scalar define `df2' = e(df_m)

and the options on lrtest

Code:

lrtest TwoFactors OneFactor, df(`=`df2'-e(df_m)')

Thanks
Comment
Brad Anderson

Join Date: Sep 2014

Posts: 70
#4

06 Aug 2020, 11:19

I'm not the reviewer and they can be fickle. I'm generally not a big fan of formal tests of significance for determining the number of factors when doing EFA either. If the n is large even some trivial patterns can emerge as significant factors. If you've ever tried to do much CFA you'll understand how difficult it may be to find a model that provides good fit in a statistical sense. I've seen CFA models in which all of the items load strongly and significantly on the hypothesized factors. But the model doesn't fit statistically until you start freeing correlations between error terms, etc. I also don't rely on a single method. I usually look at results from Velicer's minimum average partial; to the best of my knowledge this is limited to determining the number of principal components (ssc install paran), Horn's parallel analysis (ssc install paran), BIC statistics for ML solutions, and an examination of the screeplot. Kaiser's eigenvalue > 1 for the correlation matrix usually suggests retaining too many factors. I don't think there is any method that is uniformly the best in some well accepted statistical sense for determining how many factors to retain. Those criteria usually a suggest examining a limited range of solutions. Then carefully study the results of solutions within a limited range and select a solution based on substantive interpretability. In a 3 factor solution do the variables that load on each factor make sense conceptually? There's usually some point at which only 1 or 2 variables load on additional factors, or several items that load together strongly in say a 3-factor model begin to splinter off but exhibit fairly substantial cross loadings on two factors. E.g., in a 3 factor model 6 variables load strongly together but in a 4 factor model some of those items load more strongly on different factors but they cross load, etc. The E in EFA stands for exploratory. It's a method that can make it easier for you to make conceptual sense of a set of variables. But it's not a method that will make sense of a set of variables for you. It ultimately requires you to make some judgements based on conceptual evaluation.

Last edited by Brad Anderson; 06 Aug 2020, 11:21.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4423
#5

06 Aug 2020, 19:31

Originally posted by rudy rossi View Post

this could be useful. could you explain the following passages

They're just to get the ersatz model degrees of freedom. (Basically a parameter tally.) You need the difference of those between models in order to get a positive value to look up the quantile of the chi-square test statistic. Again, what I showed is basically to feed your referee what he or she asked for so that you can move on. I think that, yes, it's exploratory factor analysis, but Brad basically has it, and it's summed up by your referee's opening phrase, "There isn't a good objective way to discern the optimal number of factors".
Comment

Announcement

exploratory factor analysis w/ maximum likelihood estimation, model comparison

Comment

Comment

Comment

Comment