Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • exploratory factor analysis w/ maximum likelihood estimation, model comparison

    Dear Statalisters,

    I am trying to conduct an EFA on 17 binary variables. I have successfully used both tetrachoric and polychoric matrices with the factormat command, using the ml option

    below my syntax

    Code:
    polychoric var1 - var17 
    display r(sum_w)
    global N = r(sum_w)
    matrix r = r(R)
    factormat r, n($N) ml
    factormat r,  factor(2) ml n($N) altdiv
    rotate, promax
    sortl
    estat kmo
    estat factors 
    factortest var1 - var17

    and

    Code:
    tetrachoric var1 - var17
    matrix r = r(Rho)
    factormat r,  n(8819) ml
    estat factors 
    
    factormat r,  factor(2) ml n(8819) altdiv
    rotate, promax
    sortl
    estat kmo
    estat factors 
    factortest var1 - var17

    now, the reviewer says that the advantage of ml estimation over paf is "There isn't a good objective way to discern the optimal number of factors, and using the scree plot and eigenvalue rules are quite subjective and open to interpretation, and research shows that they tend to favor more rather than fewer factors. State of the art EFA would use maximum likelihood estimation for factoring, producing a chi-square test which allows for model comparison between a model with k factors vs. k-1 factors."

    my question is: how do I obtain that chi2?

    thanks in advance!

  • #2
    Maybe something along the lines of the following?

    .ÿ
    .ÿversionÿ16.1

    .ÿ
    .ÿclearÿ*

    .ÿ
    .ÿsetÿseedÿ`=strreverse("1567069")'

    .ÿ
    .ÿ//ÿTwoÿfactorsÿisÿtrue
    .ÿtempnameÿCorr

    .ÿmatrixÿdefineÿ`Corr'ÿ=ÿJ(3,ÿ3,ÿ0.5)ÿ+ÿI(3)ÿ*ÿ0.5

    .ÿmatrixÿdefineÿ`Corr'ÿ=ÿ`Corr'ÿ,ÿJ(3,ÿ3,ÿ0)ÿ\ÿJ(3,ÿ3,ÿ0),ÿ`Corr'

    .ÿ
    .ÿquietlyÿdrawnormÿy1ÿy2ÿy3ÿy4ÿy5ÿy6,ÿdoubleÿcorr(`Corr')ÿn(250)

    .ÿ
    .ÿ*
    .ÿ*ÿBeginÿhere
    .ÿ*
    .ÿfactorÿy?,ÿfactors(2)ÿmlÿnolog
    (obs=250)

    Factorÿanalysis/correlationÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿ=ÿÿÿÿÿÿÿÿ250
    ÿÿÿÿMethod:ÿmaximumÿlikelihoodÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿRetainedÿfactorsÿ=ÿÿÿÿÿÿÿÿÿÿ2
    ÿÿÿÿRotation:ÿ(unrotated)ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿparamsÿ=ÿÿÿÿÿÿÿÿÿ11
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿSchwarz'sÿBICÿÿÿÿ=ÿÿÿÿÿ61.669
    ÿÿÿÿLogÿlikelihoodÿ=ÿ-.4664546ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(Akaike's)ÿAICÿÿÿ=ÿÿÿÿ22.9329

    ÿÿÿÿ--------------------------------------------------------------------------
    ÿÿÿÿÿÿÿÿÿFactorÿÿ|ÿÿÿEigenvalueÿÿÿDifferenceÿÿÿÿÿÿÿÿProportionÿÿÿCumulative
    ÿÿÿÿ-------------+------------------------------------------------------------
    ÿÿÿÿÿÿÿÿFactor1ÿÿ|ÿÿÿÿÿÿ1.49355ÿÿÿÿÿÿ0.13916ÿÿÿÿÿÿÿÿÿÿÿÿ0.5244ÿÿÿÿÿÿÿ0.5244
    ÿÿÿÿÿÿÿÿFactor2ÿÿ|ÿÿÿÿÿÿ1.35439ÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿÿÿÿÿ0.4756ÿÿÿÿÿÿÿ1.0000
    ÿÿÿÿ--------------------------------------------------------------------------
    ÿÿÿÿLRÿtest:ÿindependentÿvs.ÿsaturated:ÿÿchi2(15)ÿ=ÿÿ308.06ÿProb>chi2ÿ=ÿ0.0000
    ÿÿÿÿLRÿtest:ÿÿÿ2ÿfactorsÿvs.ÿsaturated:ÿÿchi2(4)ÿÿ=ÿÿÿÿ0.92ÿProb>chi2ÿ=ÿ0.9221

    Factorÿloadingsÿ(patternÿmatrix)ÿandÿuniqueÿvariances

    ÿÿÿÿ-------------------------------------------------
    ÿÿÿÿÿÿÿÿVariableÿ|ÿÿFactor1ÿÿÿFactor2ÿ|ÿÿÿUniquenessÿ
    ÿÿÿÿ-------------+--------------------+--------------
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy1ÿ|ÿÿ-0.1072ÿÿÿÿ0.6262ÿ|ÿÿÿÿÿÿ0.5964ÿÿ
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy2ÿ|ÿÿ-0.1261ÿÿÿÿ0.7171ÿ|ÿÿÿÿÿÿ0.4698ÿÿ
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy3ÿ|ÿÿ-0.1085ÿÿÿÿ0.6452ÿ|ÿÿÿÿÿÿ0.5720ÿÿ
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy4ÿ|ÿÿÿ0.7225ÿÿÿÿ0.1243ÿ|ÿÿÿÿÿÿ0.4625ÿÿ
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy5ÿ|ÿÿÿ0.6654ÿÿÿÿ0.0967ÿ|ÿÿÿÿÿÿ0.5478ÿÿ
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy6ÿ|ÿÿÿ0.6997ÿÿÿÿ0.0834ÿ|ÿÿÿÿÿÿ0.5035ÿÿ
    ÿÿÿÿ-------------------------------------------------

    .ÿestimatesÿstoreÿTwoFactors

    .ÿ
    .ÿtempnameÿdf2

    .ÿscalarÿdefineÿ`df2'ÿ=ÿe(df_m)

    .ÿ
    .ÿfactorÿy?,ÿfactors(1)ÿmlÿnolog
    (obs=250)

    Factorÿanalysis/correlationÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿ=ÿÿÿÿÿÿÿÿ250
    ÿÿÿÿMethod:ÿmaximumÿlikelihoodÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿRetainedÿfactorsÿ=ÿÿÿÿÿÿÿÿÿÿ1
    ÿÿÿÿRotation:ÿ(unrotated)ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿparamsÿ=ÿÿÿÿÿÿÿÿÿÿ6
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿSchwarz'sÿBICÿÿÿÿ=ÿÿÿÿ174.928
    ÿÿÿÿLogÿlikelihoodÿ=ÿ-70.89948ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(Akaike's)ÿAICÿÿÿ=ÿÿÿÿ153.799

    ÿÿÿÿ--------------------------------------------------------------------------
    ÿÿÿÿÿÿÿÿÿFactorÿÿ|ÿÿÿEigenvalueÿÿÿDifferenceÿÿÿÿÿÿÿÿProportionÿÿÿCumulative
    ÿÿÿÿ-------------+------------------------------------------------------------
    ÿÿÿÿÿÿÿÿFactor1ÿÿ|ÿÿÿÿÿÿ1.48634ÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿÿÿÿÿ1.0000ÿÿÿÿÿÿÿ1.0000
    ÿÿÿÿ--------------------------------------------------------------------------
    ÿÿÿÿLRÿtest:ÿindependentÿvs.ÿsaturated:ÿÿchi2(15)ÿ=ÿÿ308.06ÿProb>chi2ÿ=ÿ0.0000
    ÿÿÿÿLRÿtest:ÿÿÿÿ1ÿfactorÿvs.ÿsaturated:ÿÿchi2(9)ÿÿ=ÿÿ139.81ÿProb>chi2ÿ=ÿ0.0000

    Factorÿloadingsÿ(patternÿmatrix)ÿandÿuniqueÿvariances

    ÿÿÿÿ---------------------------------------
    ÿÿÿÿÿÿÿÿVariableÿ|ÿÿFactor1ÿ|ÿÿÿUniquenessÿ
    ÿÿÿÿ-------------+----------+--------------
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy1ÿ|ÿÿ-0.0222ÿ|ÿÿÿÿÿÿ0.9995ÿÿ
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy2ÿ|ÿÿ-0.0274ÿ|ÿÿÿÿÿÿ0.9992ÿÿ
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy3ÿ|ÿÿ-0.0211ÿ|ÿÿÿÿÿÿ0.9996ÿÿ
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy4ÿ|ÿÿÿ0.7316ÿ|ÿÿÿÿÿÿ0.4648ÿÿ
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy5ÿ|ÿÿÿ0.6725ÿ|ÿÿÿÿÿÿ0.5477ÿÿ
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿy6ÿ|ÿÿÿ0.7051ÿ|ÿÿÿÿÿÿ0.5028ÿÿ
    ÿÿÿÿ---------------------------------------

    .ÿestimatesÿstoreÿOneFactor

    .ÿ
    .ÿlrtestÿTwoFactorsÿOneFactor,ÿdf(`=`df2'ÿ-ÿe(df_m)')
    (TwoFactorsÿdoesÿnotÿcontainÿmatrixÿe(V);ÿrankÿ=ÿ0ÿassumed)
    (OneFactorÿdoesÿnotÿcontainÿmatrixÿe(V);ÿrankÿ=ÿ0ÿassumed)

    Likelihood-ratioÿtestÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿLRÿchi2(5)ÿÿ=ÿÿÿÿ140.87
    (Assumption:ÿOneFactorÿnestedÿinÿTwoFactors)ÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿ=ÿÿÿÿ0.0000

    .ÿ
    .ÿexit

    endÿofÿdo-file


    .


    The concept is by analogy, but caveat emptor: it's not officially sanctioned in the help file for factor, and so you might want to look into things like test size. A reference source on factor analysis might also be worth checking out.

    Comment


    • #3
      this could be useful. could you explain the following passages:

      Code:
      tempname df2
      
      scalar define `df2' = e(df_m)
      and the options on lrtest

      Code:
      lrtest TwoFactors OneFactor, df(`=`df2'-e(df_m)')
      Thanks

      Comment


      • #4
        I'm not the reviewer and they can be fickle. I'm generally not a big fan of formal tests of significance for determining the number of factors when doing EFA either. If the n is large even some trivial patterns can emerge as significant factors. If you've ever tried to do much CFA you'll understand how difficult it may be to find a model that provides good fit in a statistical sense. I've seen CFA models in which all of the items load strongly and significantly on the hypothesized factors. But the model doesn't fit statistically until you start freeing correlations between error terms, etc. I also don't rely on a single method. I usually look at results from Velicer's minimum average partial; to the best of my knowledge this is limited to determining the number of principal components (ssc install paran), Horn's parallel analysis (ssc install paran), BIC statistics for ML solutions, and an examination of the screeplot. Kaiser's eigenvalue > 1 for the correlation matrix usually suggests retaining too many factors. I don't think there is any method that is uniformly the best in some well accepted statistical sense for determining how many factors to retain. Those criteria usually a suggest examining a limited range of solutions. Then carefully study the results of solutions within a limited range and select a solution based on substantive interpretability. In a 3 factor solution do the variables that load on each factor make sense conceptually? There's usually some point at which only 1 or 2 variables load on additional factors, or several items that load together strongly in say a 3-factor model begin to splinter off but exhibit fairly substantial cross loadings on two factors. E.g., in a 3 factor model 6 variables load strongly together but in a 4 factor model some of those items load more strongly on different factors but they cross load, etc. The E in EFA stands for exploratory. It's a method that can make it easier for you to make conceptual sense of a set of variables. But it's not a method that will make sense of a set of variables for you. It ultimately requires you to make some judgements based on conceptual evaluation.
        Last edited by Brad Anderson; 06 Aug 2020, 11:21.

        Comment


        • #5
          Originally posted by rudy rossi View Post
          this could be useful. could you explain the following passages
          They're just to get the ersatz model degrees of freedom. (Basically a parameter tally.) You need the difference of those between models in order to get a positive value to look up the quantile of the chi-square test statistic. Again, what I showed is basically to feed your referee what he or she asked for so that you can move on. I think that, yes, it's exploratory factor analysis, but Brad basically has it, and it's summed up by your referee's opening phrase, "There isn't a good objective way to discern the optimal number of factors".

          Comment

          Working...
          X