Principal Component Analysis (PCA) in STATA and SPSS - completely different results

hanne brandt

Join Date: May 2015

Posts: 12
#1

Principal Component Analysis (PCA) in STATA and SPSS - completely different results

27 May 2015, 17:13

For my PhD thesis I have to do a Principal Component Analysis (PCA). I didn't find it too difficult in STATA and was happy interpreting the results.
(I am well aware that there is a difference between factor and principal component analysis).

However, I discussed it with a colleague who uses SPSS, so I imported my data (from Excel) into SPSS too, and performed the PCA in there as well.
Shockingly for me, the results differed enormously from my STATA results. Not even close to it (even different number of components in the end).
How can that be? I did do a PCA in STATA as well as in SPSS for sure and the dataset is the same.

Even stanger to me: When I did FACTOR varnames, PCF (principal-component factor) in STATA I received (almost) the same results as for the PCA in SPSS.
What is principal-component factors? A mixture of PCA and factor analysis?

I am confused. If people report in journals having done a PCA - should I then ask, with SPSS or STATA?
Could anyone explain it to me? I'd be very greatful!
Thanks very much for your help!
Hanne

P.S. I attached the output of both, SPSS and STATA after rotation (varimax, kaiser on, blanks(.4) and based on an eigenvalue>1 both retained 3 compnents).
Attached Files

output SPSS.pdf (4.0 KB, 1 view)

output STATA.pdf (128.9 KB, 1 view)

Last edited by hanne brandt; 27 May 2015, 17:31.
Tags: factor analysis, pca, spss
wbuchanan

Join Date: Mar 2014

Posts: 1361
#2

27 May 2015, 17:28

It would probably be helpful for you to show the code from both platforms. Did you use the same rotation methods? Did you specify any options to the rotation methods? You may find that the results could differ due to different default settings, but it isn't possible to give you any more useful feedback without knowing exactly what it is that you did.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30069
#3

27 May 2015, 17:40

I second wbuchanan. We need to see the code that led to these results to assess comparability.

In particular, within your SPSS output it states that the rotation was varimax with Kaiser normalization. We can't tell from your Stata output which rotation you used at all, nor whether Kaiser normalization was applied. Varimax is the default orthogonal rotation in Stata, but Kaiser normalization is not used by default. (And your Stata rotation matrix is, at least to a reasonable degree of numerical accuracy, orthogonal.)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35656
#4

28 May 2015, 05:08

Cross-posted at http://stats.stackexchange.com/quest...ferent-results

Please see FAQ Advice for our policy on cross-posting, which is that you should tell us about it.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35656
#5

28 May 2015, 05:25

They don't help me. Anyone trying to answer this question must be able to read the results side by side and compare. Two photos, each unreadable, just do not qualify. I already made this remark on Stack Overflow.

Sorry, but we do give advice in the FAQ, which you are asked to read before posting, about how to present problems. The only real merit of being able to post photos is to show pictures of people, not Stata code, not Stata results. We advise listings as text, not photos.
Comment

hanne brandt

Join Date: May 2015
Posts: 12

28 May 2015, 06:17

Sorry, I do not manage to copy the SPSS tables with my results directly in here from the programs without the tables going completely wild.
I will put them in here as pdf - which should be readable.

Stata:

Code:

pca bewert_sfu_a bewert_sfu_b bewert_sfu_c bewert_sfu_d bewert_sfu_e bewert_sfu_f bewert_sfu_g bewert_
&gt; sfu_h bewert_sfu_i bewert_sfu_j bewert_sfu_k bewert_sfu_l, mineigen(1)

Code:

Principal components/correlation                  Number of obs    =       158
Number of comp.  =         3
Trace            =        12
Rotation: (unrotated = principal)             Rho              =    0.5382

--------------------------------------------------------------------------
Component |   Eigenvalue   Difference         Proportion   Cumulative
-------------+------------------------------------------------------------
Comp1 |       3.8723      2.46548             0.3227       0.3227
Comp2 |      1.40682      .227718             0.1172       0.4399
Comp3 |       1.1791      .206742             0.0983       0.5382
Comp4 |      .972359      .169164             0.0810       0.6192
Comp5 |      .803195      .050871             0.0669       0.6861
Comp6 |      .752324     .0953662             0.0627       0.7488
Comp7 |      .656957     .0137592             0.0547       0.8036
Comp8 |      .643198      .135894             0.0536       0.8572
Comp9 |      .507304     .0435925             0.0423       0.8995
Comp10 |      .463711     .0749052             0.0386       0.9381
Comp11 |      .388806     .0348752             0.0324       0.9705
Comp12 |      .353931            .             0.0295       1.0000
--------------------------------------------------------------------------

Principal components (eigenvectors)

----------------------------------------------------------
Variable |    Comp1     Comp2     Comp3 | Unexplained
-------------+------------------------------+-------------
bewert_sfu_a |   0.2700    0.3901   -0.1477 |       .4779
bewert_sfu_b |   0.3298    0.2303   -0.4027 |       .3129
bewert_sfu_c |  -0.3046    0.3149    0.1773 |       .4642
bewert_sfu_d |   0.3489    0.1910    0.0700 |       .4715
bewert_sfu_e |   0.3342    0.2067    0.2720 |       .4202
bewert_sfu_f |  -0.2001    0.4561   -0.1587 |       .5227
bewert_sfu_g |   0.3057    0.3128    0.1531 |       .4728
bewert_sfu_h |  -0.3611    0.2180    0.2913 |        .328
bewert_sfu_i |   0.2352   -0.2211    0.3662 |       .5588
bewert_sfu_j |  -0.1556    0.3894    0.4578 |       .4457
bewert_sfu_k |   0.3239    0.0525    0.0754 |       .5832
bewert_sfu_l |   0.2091   -0.2445    0.4720 |       .4839
----------------------------------------------------------

Code:

 rotate, varimax kaiser blanks(.4)

Code:

Principal components/correlation                  Number of obs    =       158
                                                  Number of comp.  =         3
                                                  Trace            =        12
    Rotation: orthogonal varimax (Kaiser on)      Rho              =    0.5382

    --------------------------------------------------------------------------
       Component |     Variance   Difference         Proportion   Cumulative
    -------------+------------------------------------------------------------
           Comp1 |      2.95242      .867357             0.2460       0.2460
           Comp2 |      2.08506       .66433             0.1738       0.4198
           Comp3 |      1.42073            .             0.1184       0.5382
    --------------------------------------------------------------------------

Rotated components  (blanks are abs(loading)&lt;.4)

    ----------------------------------------------------------
        Variable |    Comp1     Comp2     Comp3 | Unexplained
    -------------+------------------------------+-------------
    bewert_sfu_a |   0.4076                     |       .4779
    bewert_sfu_b |                              |       .3129
    bewert_sfu_c |             0.4536           |       .4642
    bewert_sfu_d |   0.4007                     |       .4715
    bewert_sfu_e |   0.4392                     |       .4202
    bewert_sfu_f |                      -0.4451 |       .5227
    bewert_sfu_g |   0.4531                     |       .4728
    bewert_sfu_h |             0.5023           |        .328
    bewert_sfu_i |                       0.4684 |       .5588
    bewert_sfu_j |             0.5856           |       .4457
    bewert_sfu_k |                              |       .5832
    bewert_sfu_l |                       0.5564 |       .4839
    ----------------------------------------------------------

Component rotation matrix

    --------------------------------------------
                 |    Comp1     Comp2     Comp3
    -------------+------------------------------
           Comp1 |   0.7942   -0.5573    0.2422
           Comp2 |   0.5724    0.5523   -0.6061
           Comp3 |   0.2040    0.6200    0.7576
    --------------------------------------------

SPSS:

Code:

FACTOR
  /VARIABLES bewert_sfu_a bewert_sfu_b bewert_sfu_c bewert_sfu_d bewert_sfu_e bewert_sfu_f bewert_sfu_g bewert_sfu_h bewert_sfu_i bewert_sfu_j bewert_sfu_k bewert_sfu_l
  /MISSING LISTWISE
  /ANALYSIS bewert_sfu_a bewert_sfu_b bewert_sfu_c bewert_sfu_d bewert_sfu_e bewert_sfu_f bewert_sfu_g bewert_sfu_h bewert_sfu_i bewert_sfu_j bewert_sfu_k bewert_sfu_l
  /PRINT EXTRACTION ROTATION
  /FORMAT BLANK(.40)
  /CRITERIA MINEIGEN(1) ITERATE(50)
  /EXTRACTION PC
  /CRITERIA ITERATE(50)
  /ROTATION VARIMAX
  /METHOD=CORRELATION.

Last edited by hanne brandt; 28 May 2015, 06:25.

Comment

hanne brandt

Join Date: May 2015

Posts: 12
#7

28 May 2015, 06:29

here the more detailed results as a pdf
Attached Files

results SPSS.pdf (661.7 KB, 1 view)

results Stata.pdf (693.5 KB, 1 view)

Last edited by hanne brandt; 28 May 2015, 06:52.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#8

28 May 2015, 08:44

For all intents and purposes, the SPSS extraction sum of squares loadings are identical to the Stata unrotated results, so your results are not as completely different as you suggested in your initial post.

As others have suggested, the differences lie in the rotation and/or normalization. Those subtleties are beyond my expertise, however.
Comment

hanne brandt

Join Date: May 2015
Posts: 12

28 May 2015, 09:09

Thank you for looking into it, William!
even if I do a PCF, the eigenvalues stay the same (see below).

The problem is: I need to work with the rotated solution and that does differ enormously.
The rotation method I used was the same...

Code:

factor bewert_sfu_a bewert_sfu_b bewert_sfu_c bewert_sfu_d bewert_sfu_e bewert_sfu_f bewert_sfu_g bewert_sfu_h bewert_sfu_i bewert_sfu_j bewert_sfu_k bewert_sfu_l, pcf

Code:

Factor analysis/correlation                        Number of obs    =      158
    Method: principal-component factors            Retained factors =        3
    Rotation: (unrotated)                          Number of params =       33

--------------------------------------------------------------------------
Factor  |   Eigenvalue   Difference        Proportion   Cumulative
-------------+------------------------------------------------------------
Factor1  |      3.87230      2.46548            0.3227       0.3227
Factor2  |      1.40682      0.22772            0.1172       0.4399
Factor3  |      1.17910      0.20674            0.0983       0.5382
Factor4  |      0.97236      0.16916            0.0810       0.6192
Factor5  |      0.80319      0.05087            0.0669       0.6861
Factor6  |      0.75232      0.09537            0.0627       0.7488
Factor7  |      0.65696      0.01376            0.0547       0.8036
Factor8  |      0.64320      0.13589            0.0536       0.8572
Factor9  |      0.50730      0.04359            0.0423       0.8995
Factor10  |     0.46371      0.07491            0.0386       0.9381
Factor11  |     0.38881      0.03488            0.0324       0.9705
Factor12  |     0.35393          .               0.0295       1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated:  chi2(66) =  453.95 Prob>chi2 = 0.0000

Last edited by hanne brandt; 28 May 2015, 09:23.

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#10

28 May 2015, 20:44

Note that this question was eventually addressed at this stack-exchange discussion referred to in #4 above.
Comment
hanne brandt

Join Date: May 2015

Posts: 12
#11

29 May 2015, 04:42

Thank you guys for helping me with my problem, let me sum up what I understood:

The results of the initial calculation (before rotation) of a PCA in Stata and SPSS are the same, i.e. same Eigenvalues, number of components (given you select the same options in Stata and SPSS (mineigen(1) etc.) The same holds true for the Stata command: factor [varlist], pcf, which produces different EIgenvalues than the plain factorcommand (without pcf option).

However, differences between Stata and SPSS occur during the postestimation process as Stata (unlike other programmes) rotates Eigenvectors whereas SPSS rotates factor-loadings.

Therefore, (I am not 100% sure, please let me know if I am right):
Stata principal-component factor (`factor [varlist], pcf') is the same as SPSS pca (principal component analysis).

This could be of importance especially for beginner-Stata-users like me, because in Stata you could just do a PCA, then hit rotate and come to different results than people using other programmes.

Last edited by hanne brandt; 29 May 2015, 04:45.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#12

29 May 2015, 07:37

Stata principal-component factor (`factor [varlist], pcf') is the same as SPSS pca (principal component analysis).

I don't think that's quite right. I think the root of your problem is an understandable confusion between "principal component analysis" and "factor analysis using principal component analysis for factor extraction". What I would say, adopting the terminology of the Stata's help factor documentation, is

Factor analysis using the principal component factor method produces the same results in Stata using factor ... , pcf as it does in SPSS using factor ... /extraction pc. Also, in Stata pca followed by rotate is not the same as factor analysis using the principal component factor method.

Regarding the confusion between principal component analysis and factor analysis, I commonly see "principal component analysis" used as short for "factor analysis using principal component analysis for factor extraction", but the two are not the same. This confusion is enhanced by SPSS's apparent lack of a separate command for doing principal component analysis as other than the first step of a factor analysis. Wikipedia's discussions of principal component analysis and factor analysis help clarify the distinction. In particular, from the article on principal component analysis,

PCA is generally preferred for purposes of data reduction (i.e., translating variable space into optimal factor space) but not when the goal is to detect the latent construct or factors. ... Factor analysis is generally used when the research purpose is detecting data structure (i.e., latent constructs or factors) or causal modeling.

Given that you need to work with a rotated solution, factor analysis is the appropriate context for your work. Finally, although

in Stata you could just do a PCA, then hit rotate and come to different results than people using other programmes

I would say that doing so is not the same as doing a factor analysis in Stata, which has the factor command dedicated to that purpose, and separate menu items for factor analysis and principal component analysis.

Last edited by William Lisowski; 29 May 2015, 07:48.
1 like
Comment
hanne brandt

Join Date: May 2015

Posts: 12
#13

29 May 2015, 17:19

thank you william!
Comment
Mercedes Mac Mullen

Join Date: Jun 2015

Posts: 1
#14

13 Jun 2015, 22:45

Hello, I am constructing a wealth index and have several doubts.

1. I followed the steps proposed for the DHS wealth index by Rutstein and, if I did no understand wrongly, they do not indicate to normalize variables. But other literature and Filmer and Pritchett suggest normalization. It is not clear for me whether I should do this since I created a dummy variable for each category of my indicator variables.
I found a document from Population Service international (PSI) that also shows step by step how to construct the index and they use the pca command and then generate a wealth score using the predict command. Then, on a next step, they generate a normalized variable for each of the dummy variables and create another wealth score as a linear combination of this normalized variables and the factor weights obtained in the previous step. Isn't that what the predict command actually does?
2. I am not sure what is the difference between running pca command or factor -, pcf factor(1). Can someone clarify please the difference?

This are the commands I ran (I tried both factor and pca, differences are not huge)

pca x1-x20, components(1) means
predict wealthscorepca

factor x1-x20, pcf factor(1)
predict wealthscorepcf

Thank you!
Mercedes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35656
#15

14 Jun 2015, 03:17

Mercedes: This is really a new question. Please ask again in a new thread using a more appropriate title.
Comment

Announcement