Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Principal Component Analysis (PCA) in STATA and SPSS - completely different results

    For my PhD thesis I have to do a Principal Component Analysis (PCA). I didn't find it too difficult in STATA and was happy interpreting the results.
    (I am well aware that there is a difference between factor and principal component analysis).

    However, I discussed it with a colleague who uses SPSS, so I imported my data (from Excel) into SPSS too, and performed the PCA in there as well.
    Shockingly for me, the results differed enormously from my STATA results. Not even close to it (even different number of components in the end).
    How can that be? I did do a PCA in STATA as well as in SPSS for sure and the dataset is the same.

    Even stanger to me: When I did FACTOR varnames, PCF (principal-component factor) in STATA I received (almost) the same results as for the PCA in SPSS.
    What is principal-component factors? A mixture of PCA and factor analysis?

    I am confused. If people report in journals having done a PCA - should I then ask, with SPSS or STATA?
    Could anyone explain it to me? I'd be very greatful!
    Thanks very much for your help!
    Hanne

    P.S. I attached the output of both, SPSS and STATA after rotation (varimax, kaiser on, blanks(.4) and based on an eigenvalue>1 both retained 3 compnents).
    Attached Files
    Last edited by hanne brandt; 27 May 2015, 17:31.

  • #2
    It would probably be helpful for you to show the code from both platforms. Did you use the same rotation methods? Did you specify any options to the rotation methods? You may find that the results could differ due to different default settings, but it isn't possible to give you any more useful feedback without knowing exactly what it is that you did.

    Comment


    • #3
      I second wbuchanan. We need to see the code that led to these results to assess comparability.

      In particular, within your SPSS output it states that the rotation was varimax with Kaiser normalization. We can't tell from your Stata output which rotation you used at all, nor whether Kaiser normalization was applied. Varimax is the default orthogonal rotation in Stata, but Kaiser normalization is not used by default. (And your Stata rotation matrix is, at least to a reasonable degree of numerical accuracy, orthogonal.)

      Comment


      • #4
        Cross-posted at http://stats.stackexchange.com/quest...ferent-results

        Please see FAQ Advice for our policy on cross-posting, which is that you should tell us about it.

        Comment


        • #5
          They don't help me. Anyone trying to answer this question must be able to read the results side by side and compare. Two photos, each unreadable, just do not qualify. I already made this remark on Stack Overflow.

          Sorry, but we do give advice in the FAQ, which you are asked to read before posting, about how to present problems. The only real merit of being able to post photos is to show pictures of people, not Stata code, not Stata results. We advise listings as text, not photos.

          Comment


          • #6
            Sorry, I do not manage to copy the SPSS tables with my results directly in here from the programs without the tables going completely wild.
            I will put them in here as pdf - which should be readable.

            Stata:
            Code:
            pca bewert_sfu_a bewert_sfu_b bewert_sfu_c bewert_sfu_d bewert_sfu_e bewert_sfu_f bewert_sfu_g bewert_
            > sfu_h bewert_sfu_i bewert_sfu_j bewert_sfu_k bewert_sfu_l, mineigen(1)
            Code:
            Principal components/correlation                  Number of obs    =       158
            Number of comp.  =         3
            Trace            =        12
            Rotation: (unrotated = principal)             Rho              =    0.5382
            
            --------------------------------------------------------------------------
            Component |   Eigenvalue   Difference         Proportion   Cumulative
            -------------+------------------------------------------------------------
            Comp1 |       3.8723      2.46548             0.3227       0.3227
            Comp2 |      1.40682      .227718             0.1172       0.4399
            Comp3 |       1.1791      .206742             0.0983       0.5382
            Comp4 |      .972359      .169164             0.0810       0.6192
            Comp5 |      .803195      .050871             0.0669       0.6861
            Comp6 |      .752324     .0953662             0.0627       0.7488
            Comp7 |      .656957     .0137592             0.0547       0.8036
            Comp8 |      .643198      .135894             0.0536       0.8572
            Comp9 |      .507304     .0435925             0.0423       0.8995
            Comp10 |      .463711     .0749052             0.0386       0.9381
            Comp11 |      .388806     .0348752             0.0324       0.9705
            Comp12 |      .353931            .             0.0295       1.0000
            --------------------------------------------------------------------------
            
            Principal components (eigenvectors)
            
            ----------------------------------------------------------
            Variable |    Comp1     Comp2     Comp3 | Unexplained
            -------------+------------------------------+-------------
            bewert_sfu_a |   0.2700    0.3901   -0.1477 |       .4779
            bewert_sfu_b |   0.3298    0.2303   -0.4027 |       .3129
            bewert_sfu_c |  -0.3046    0.3149    0.1773 |       .4642
            bewert_sfu_d |   0.3489    0.1910    0.0700 |       .4715
            bewert_sfu_e |   0.3342    0.2067    0.2720 |       .4202
            bewert_sfu_f |  -0.2001    0.4561   -0.1587 |       .5227
            bewert_sfu_g |   0.3057    0.3128    0.1531 |       .4728
            bewert_sfu_h |  -0.3611    0.2180    0.2913 |        .328
            bewert_sfu_i |   0.2352   -0.2211    0.3662 |       .5588
            bewert_sfu_j |  -0.1556    0.3894    0.4578 |       .4457
            bewert_sfu_k |   0.3239    0.0525    0.0754 |       .5832
            bewert_sfu_l |   0.2091   -0.2445    0.4720 |       .4839
            ----------------------------------------------------------

            Code:
             rotate, varimax kaiser blanks(.4)
            Code:
            Principal components/correlation                  Number of obs    =       158
                                                              Number of comp.  =         3
                                                              Trace            =        12
                Rotation: orthogonal varimax (Kaiser on)      Rho              =    0.5382
            
                --------------------------------------------------------------------------
                   Component |     Variance   Difference         Proportion   Cumulative
                -------------+------------------------------------------------------------
                       Comp1 |      2.95242      .867357             0.2460       0.2460
                       Comp2 |      2.08506       .66433             0.1738       0.4198
                       Comp3 |      1.42073            .             0.1184       0.5382
                --------------------------------------------------------------------------
            
            Rotated components  (blanks are abs(loading)<.4)
            
                ----------------------------------------------------------
                    Variable |    Comp1     Comp2     Comp3 | Unexplained
                -------------+------------------------------+-------------
                bewert_sfu_a |   0.4076                     |       .4779
                bewert_sfu_b |                              |       .3129
                bewert_sfu_c |             0.4536           |       .4642
                bewert_sfu_d |   0.4007                     |       .4715
                bewert_sfu_e |   0.4392                     |       .4202
                bewert_sfu_f |                      -0.4451 |       .5227
                bewert_sfu_g |   0.4531                     |       .4728
                bewert_sfu_h |             0.5023           |        .328
                bewert_sfu_i |                       0.4684 |       .5588
                bewert_sfu_j |             0.5856           |       .4457
                bewert_sfu_k |                              |       .5832
                bewert_sfu_l |                       0.5564 |       .4839
                ----------------------------------------------------------
            
            Component rotation matrix
            
                --------------------------------------------
                             |    Comp1     Comp2     Comp3
                -------------+------------------------------
                       Comp1 |   0.7942   -0.5573    0.2422
                       Comp2 |   0.5724    0.5523   -0.6061
                       Comp3 |   0.2040    0.6200    0.7576
                --------------------------------------------
            SPSS:
            Code:
            FACTOR
              /VARIABLES bewert_sfu_a bewert_sfu_b bewert_sfu_c bewert_sfu_d bewert_sfu_e bewert_sfu_f bewert_sfu_g bewert_sfu_h bewert_sfu_i bewert_sfu_j bewert_sfu_k bewert_sfu_l
              /MISSING LISTWISE
              /ANALYSIS bewert_sfu_a bewert_sfu_b bewert_sfu_c bewert_sfu_d bewert_sfu_e bewert_sfu_f bewert_sfu_g bewert_sfu_h bewert_sfu_i bewert_sfu_j bewert_sfu_k bewert_sfu_l
              /PRINT EXTRACTION ROTATION
              /FORMAT BLANK(.40)
              /CRITERIA MINEIGEN(1) ITERATE(50)
              /EXTRACTION PC
              /CRITERIA ITERATE(50)
              /ROTATION VARIMAX
              /METHOD=CORRELATION.
            Last edited by hanne brandt; 28 May 2015, 06:25.

            Comment


            • #7
              here the more detailed results as a pdf
              Attached Files
              Last edited by hanne brandt; 28 May 2015, 06:52.

              Comment


              • #8
                For all intents and purposes, the SPSS extraction sum of squares loadings are identical to the Stata unrotated results, so your results are not as completely different as you suggested in your initial post.

                As others have suggested, the differences lie in the rotation and/or normalization. Those subtleties are beyond my expertise, however.

                Comment


                • #9
                  Thank you for looking into it, William!
                  even if I do a PCF, the eigenvalues stay the same (see below).

                  The problem is: I need to work with the rotated solution and that does differ enormously.
                  The rotation method I used was the same...

                  Code:
                  factor bewert_sfu_a bewert_sfu_b bewert_sfu_c bewert_sfu_d bewert_sfu_e bewert_sfu_f bewert_sfu_g bewert_sfu_h bewert_sfu_i bewert_sfu_j bewert_sfu_k bewert_sfu_l, pcf
                  Code:
                  Factor analysis/correlation                        Number of obs    =      158
                      Method: principal-component factors            Retained factors =        3
                      Rotation: (unrotated)                          Number of params =       33
                  
                  --------------------------------------------------------------------------
                  Factor  |   Eigenvalue   Difference        Proportion   Cumulative
                  -------------+------------------------------------------------------------
                  Factor1  |      3.87230      2.46548            0.3227       0.3227
                  Factor2  |      1.40682      0.22772            0.1172       0.4399
                  Factor3  |      1.17910      0.20674            0.0983       0.5382
                  Factor4  |      0.97236      0.16916            0.0810       0.6192
                  Factor5  |      0.80319      0.05087            0.0669       0.6861
                  Factor6  |      0.75232      0.09537            0.0627       0.7488
                  Factor7  |      0.65696      0.01376            0.0547       0.8036
                  Factor8  |      0.64320      0.13589            0.0536       0.8572
                  Factor9  |      0.50730      0.04359            0.0423       0.8995
                  Factor10  |     0.46371      0.07491            0.0386       0.9381
                  Factor11  |     0.38881      0.03488            0.0324       0.9705
                  Factor12  |     0.35393          .               0.0295       1.0000
                  --------------------------------------------------------------------------
                  LR test: independent vs. saturated:  chi2(66) =  453.95 Prob>chi2 = 0.0000
                  Last edited by hanne brandt; 28 May 2015, 09:23.

                  Comment


                  • #10
                    Note that this question was eventually addressed at this stack-exchange discussion referred to in #4 above.

                    Comment


                    • #11
                      Thank you guys for helping me with my problem, let me sum up what I understood:

                      The results of the initial calculation (before rotation) of a PCA in Stata and SPSS are the same, i.e. same Eigenvalues, number of components (given you select the same options in Stata and SPSS (mineigen(1) etc.) The same holds true for the Stata command: factor [varlist], pcf, which produces different EIgenvalues than the plain factorcommand (without pcf option).

                      However, differences between Stata and SPSS occur during the postestimation process as Stata (unlike other programmes) rotates Eigenvectors whereas SPSS rotates factor-loadings.

                      Therefore, (I am not 100% sure, please let me know if I am right):
                      Stata principal-component factor (`factor [varlist], pcf') is the same as SPSS pca (principal component analysis).

                      This could be of importance especially for beginner-Stata-users like me, because in Stata you could just do a PCA, then hit rotate and come to different results than people using other programmes.
                      Last edited by hanne brandt; 29 May 2015, 04:45.

                      Comment


                      • #12
                        Stata principal-component factor (`factor [varlist], pcf') is the same as SPSS pca (principal component analysis).
                        I don't think that's quite right. I think the root of your problem is an understandable confusion between "principal component analysis" and "factor analysis using principal component analysis for factor extraction". What I would say, adopting the terminology of the Stata's help factor documentation, is

                        Factor analysis using the principal component factor method produces the same results in Stata using factor ... , pcf as it does in SPSS using factor ... /extraction pc. Also, in Stata pca followed by rotate is not the same as factor analysis using the principal component factor method.
                        Regarding the confusion between principal component analysis and factor analysis, I commonly see "principal component analysis" used as short for "factor analysis using principal component analysis for factor extraction", but the two are not the same. This confusion is enhanced by SPSS's apparent lack of a separate command for doing principal component analysis as other than the first step of a factor analysis. Wikipedia's discussions of principal component analysis and factor analysis help clarify the distinction. In particular, from the article on principal component analysis,

                        PCA is generally preferred for purposes of data reduction (i.e., translating variable space into optimal factor space) but not when the goal is to detect the latent construct or factors. ... Factor analysis is generally used when the research purpose is detecting data structure (i.e., latent constructs or factors) or causal modeling.
                        Given that you need to work with a rotated solution, factor analysis is the appropriate context for your work. Finally, although

                        in Stata you could just do a PCA, then hit rotate and come to different results than people using other programmes
                        I would say that doing so is not the same as doing a factor analysis in Stata, which has the factor command dedicated to that purpose, and separate menu items for factor analysis and principal component analysis.
                        Last edited by William Lisowski; 29 May 2015, 07:48.

                        Comment


                        • #13
                          thank you william!

                          Comment


                          • #14
                            Hello, I am constructing a wealth index and have several doubts.

                            1. I followed the steps proposed for the DHS wealth index by Rutstein and, if I did no understand wrongly, they do not indicate to normalize variables. But other literature and Filmer and Pritchett suggest normalization. It is not clear for me whether I should do this since I created a dummy variable for each category of my indicator variables.
                            I found a document from Population Service international (PSI) that also shows step by step how to construct the index and they use the pca command and then generate a wealth score using the predict command. Then, on a next step, they generate a normalized variable for each of the dummy variables and create another wealth score as a linear combination of this normalized variables and the factor weights obtained in the previous step. Isn't that what the predict command actually does?
                            2. I am not sure what is the difference between running pca command or factor -, pcf factor(1). Can someone clarify please the difference?

                            This are the commands I ran (I tried both factor and pca, differences are not huge)

                            pca x1-x20, components(1) means
                            predict wealthscorepca

                            factor x1-x20, pcf factor(1)
                            predict wealthscorepcf

                            Thank you!
                            ​Mercedes

                            Comment


                            • #15
                              Mercedes: This is really a new question. Please ask again in a new thread using a more appropriate title.

                              Comment

                              Working...
                              X