Difference in PCA and PCA, factor

Naika Sangroo

Join Date: May 2020

Posts: 78
#1

Difference in PCA and PCA, factor

09 Aug 2020, 10:28

Hi everyone,

I am created an asset index for my paper using Principle Components Analysis. I used two different codes

In the first code i used just PCA and got a positive coefficient but when i used the second code with factor (1) i got a negative coefficient. I am wondering which is more accurate and which one should i use for my paper and what is the difference between just doing a PCA and doing a pca, factor(1) or pca, factor(3)?

Code:

***First Code pca $assets rotate predict asset_index drop S4D_Q1_2-S10_mud_rof_5

Code:

***Second Code pca $assets, factor(1) rotate predict asset_index drop S4D_Q1_2-S10_mud_rof_5
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#2

09 Aug 2020, 11:10

All the -factors()- option to -pca- does is specify how many of the principal components you want to keep. So, if your global macro assets contains 5 variables, Stata will calculate 5 principal components. If you don't specify -factors()-, Stata will report results for all of them. If you specify -factors(3)-, Stata will report results for the first three (the ones with the three largest eigenvalues) and skip telling you about the last two. If you specify -factors(1)- you will get results only for the first principal component (i.e. the one with the largest eigenvalue).

I have never had the experience you report of having the sign change when I specify the -factors()- option. That said, if the signs flip on all of the variables, then you are getting the same principal components reported. Remember that if vector X is a principal component, then so is aX for any a != 0. So just flipping all the signs changes nothing.

(If your reversed sign applies to some but not all of the variables, then something is wrong. In that case, please post back with example data [using the -dataex- command] and show the definition of global macro assets so that others can try to replicate your problem and figure out what is going wrong.

If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment

Naika Sangroo

Join Date: May 2020
Posts: 78

10 Aug 2020, 09:38

Hi Clyde,

Thank you so much for your response. Is it preferred more that we use pca, factor or if we use pca? In your opinion which would be seen as more acceptable by journals or in academia? I tried earlier to post my dataset but i got the message "input statement exceeds linesize limit" but this time i am posting just few of my variables out of 51.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(S4D_Q1_1 S4D_Q1_2 S10_a_1 S10_a_2) float(S10_ind_hou_1 S10_pub_pip_1 S10_sur_wat_3 S10_mud_rof_5)
1 0 1 0 1 1 0 0
0 0 1 0 1 0 0 0
1 0 1 0 1 0 1 0
0 1 0 0 0 0 0 0
1 0 1 0 1 0 0 0
0 0 1 0 1 0 1 0
0 0 1 0 1 1 0 0
0 0 1 0 1 0 1 0
0 0 1 0 1 0 1 1
0 0 1 0 1 0 0 0
1 0 1 0 1 1 0 0
0 0 1 0 0 0 1 0
0 0 1 0 1 0 1 0
0 0 1 0 1 0 1 0
1 0 1 0 1 0 0 0
0 0 1 0 1 0 1 0
0 0 1 0 1 0 1 0
1 0 1 0 1 0 1 0
0 0 1 0 1 0 1 0
0 0 1 0 1 0 1 0
0 0 1 0 1 0 1 0
0 0 1 0 1 1 0 0
0 0 1 0 1 0 1 0
1 0 1 0 1 1 0 0
1 1 1 0 1 0 1 0
1 0 1 0 1 0 1 0
0 0 0 0 1 0 1 0
0 0 1 0 1 1 0 0
1 0 1 0 0 0 1 0
0 0 1 0 1 0 1 0
1 0 1 0 1 1 0 0
1 0 1 0 1 0 1 0
0 0 1 0 1 0 1 0
0 0 1 1 1 0 1 0
0 0 1 0 0 0 1 0
0 0 1 0 1 0 1 0
1 0 1 0 1 0 1 0
0 0 1 0 1 0 1 0
0 0 1 0 1 0 1 0
1 0 1 0 1 0 0 0
1 0 1 0 1 0 1 0
0 0 1 0 1 1 0 0
1 0 1 0 1 0 0 0
1 1 1 0 1 1 0 0
1 0 1 0 1 0 1 0
0 0 1 0 1 1 0 0
1 0 0 0 0 0 1 0
1 0 1 0 1 0 1 0
1 0 1 0 1 0 1 0
1 0 1 0 1 0 1 0
1 0 1 0 1 0 1 0
1 0 1 0 1 0 1 0
1 0 1 0 1 0 1 0
0 0 1 0 1 0 1 0
0 0 1 0 1 0 1 0
1 0 1 0 1 0 1 0
1 0 1 0 1 0 1 0
1 0 1 0 1 0 1 0
0 0 1 0 1 0 1 0
1 1 1 0 1 1 0 0
0 0 1 0 1 0 1 0
0 0 1 0 1 0 1 0
0 0 1 0 1 0 1 0
0 0 1 0 0 0 1 0
0 0 1 0 1 0 1 0
0 0 1 0 1 0 1 0
0 0 1 0 1 0 1 0
1 0 1 0 1 0 0 0
1 0 1 0 1 1 0 0
0 0 1 0 1 1 0 0
0 0 0 0 0 0 1 0
0 1 1 0 1 0 1 0
0 0 1 0 1 0 1 0
0 0 1 0 1 0 1 0
1 0 1 0 1 0 1 0
1 0 1 0 1 0 1 0
1 0 1 0 1 0 1 0
1 0 1 0 1 0 0 0
1 0 1 0 1 0 1 0
1 0 1 0 1 0 1 0
1 0 0 0 0 0 1 0
1 0 1 0 1 0 0 0
0 0 1 0 1 0 0 0
0 0 1 1 0 0 1 0
0 0 1 0 1 1 0 0
0 0 1 0 1 0 1 0
1 0 1 0 1 0 1 0
1 0 1 0 1 0 1 0
0 0 1 0 1 0 1 0
0 0 1 0 1 0 1 0
0 0 1 0 1 0 1 0
0 0 1 0 1 0 1 0
0 0 1 0 1 1 0 0
0 0 1 0 1 0 0 0
1 0 1 0 1 1 0 0
0 0 1 0 1 0 0 0
0 0 1 0 1 0 0 0
0 0 1 0 1 0 1 0
1 0 1 0 1 0 0 0
0 0 1 0 1 0 0 0
end

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#4

10 Aug 2020, 16:23

OK, I think I see what you are talking about. If you use -pca, factors(1)-, only one component is retained (the first). And it is identical to the first principal component if you use -pca-. But when you rotate, when you use -pca, factors(1)- there is nothing to rotate, so you just get the same first principal component, whereas after -pca- you actually get rotated components.

So let's just review some things about principal components analysis and rotation. There are a few uses for principal components. One is to orthogonalize some colinear variables for use in subsequent analyses like regressions. When that is the purpose, the unrotated and rotated components will serve equally well, and since few or none of them may be interpretable in real world terms in either case, there is no compelling reason to prefer the rotated or unrotated.

Another common use, and I believe this is what you have in mind, is to reduce the dimensionality of a set of variables, sometimes to reduce it to a single summary variable (or, as they are sometimes called, an index). When that is the sole purpose, the unrotated first principal component is the best choice because it has the largest possible variance and hence provides the least loss of information.

However, sometimes people also want their "index" to be easily understood in terms of the variables it is created from. In fact, often the desire is to not even use the component directly but to reduce it to a "scale" that is just the simple sum of the items that load heavily on the component. In that case, it is helpful to have some of the loadings close to 1 and others close to 0. For that purpose, some version of rotation (varimax, promax, or quartimax) may produce the most desirable results, although some information will, in general be lost.

There is no sense in which one approach is more "academic" than another. Principal components analysis, except when used for orthogonalization, is a pragmatic trade-off between loss of information and gaining simplicity of structure. There is no "scientific" way to do it. It's a matter of looking at the results and choosing the ones that will be most practical for the purpose at hand.
2 likes
Comment
Naika Sangroo

Join Date: May 2020

Posts: 78
#5

12 Aug 2020, 12:07

Hi Clyde,

Thank you so much for the explanation. I think for now I will use pca without the factor.
Comment

Announcement

Difference in PCA and PCA, factor

Comment

Comment

Comment

Comment