PCA for a composite index

Yashdeep Singh

Join Date: May 2020

Posts: 1
#1

PCA for a composite index

19 May 2020, 15:46

Hello Statalist users,

I have read questions posted on this forum regarding PCA and forming an index. But I have a suite of queries that need to be tackled together.

I have a panel data of 45 countries and period 2003-2018. The data is comprised of 9 variables related to gender statistics retrieved from World Bank data. I would like to create a single composite index, which I call Women Empowerment index, from all these 9 variables using PCA. This index would then be used as an independent variable in a later regression analysis. Making this index is important—instead of performing the usual multivariate regression with these variables separated—because the other independent variables in my regression framework contain a host of cognitive-test-score parameters and, thus, through this index I intend to capture women empowerment prevalent in the macro-environment of different countries.

The following are my queries:

1. This one is obvious but I'm finding it hard to wrap my head around on how to select the number of principal components (PCs). According to Kaiser's criterion (eigenvalue > 1), I am getting 3 such PCs. The 3 PCs are able to explain 69.62% variance; this could be considered alright. But, on the other hand, scree plot is exhibiting a wide gap and sharp edge between component 1 and component 2; telling that only component 1 fulfils this criterion. However, if I pick only the 1st component (Comp1), that would explain only 44.26% of the total variance. Am I good to go with 3 components?

--------------------------------------------------------------------------
Component | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Comp1 | 3.9832 2.80695 0.4426 0.4426
Comp2 | 1.17625 .0699101 0.1307 0.5733
Comp3 | 1.10634 .349979 0.1229 0.6962
Comp4 | .756364 .124735 0.0840 0.7802
Comp5 | .631629 .0777993 0.0702 0.8504
Comp6 | .553829 .101858 0.0615 0.9120
Comp7 | .451971 .159765 0.0502 0.9622
Comp8 | .292207 .244002 0.0325 0.9946
Comp9 | .0482048 . 0.0054 1.0000
--------------------------------------------------------------------------

2. Now that I have selected the components, I obtain their loadings on each variable. These are unrotated for now. The loadings on Comp1 make sense from theory stand-point, while those on Comp2 and Comp3 don't. This has great significance in my question.
----------------------------------------------------------
Variable | Comp1 Comp2 Comp3 | Unexplained
-------------+------------------------------+-------------
v4_ | 0.4169 0.2472 -0.2823 | .1477
v5_ | 0.1017 -0.7868 -0.1225 | .214
v8_ | -0.3573 -0.0330 0.0064 | .4902
v9_ | -0.1447 0.1295 0.8026 | .1841
v10_ | 0.4157 0.2776 -0.1901 | .181
v11_ | 0.3866 -0.2515 0.1857 | .2922
v13_ | -0.3753 0.2580 -0.2979 | .2626
v14_ | 0.3552 -0.0617 0.2460 | .4261
v15_ | 0.2794 0.3021 0.2027 | .5362
----------------------------------------------------------

After that, I run

Code:

rotate, varimax

. This yields the following result:

Rotated components

----------------------------------------------------------
Variable | Comp1 Comp2 Comp3 | Unexplained
-------------+------------------------------+-------------
v4_ | 0.3739 -0.3607 -0.2113 | .1477
v5_ | 0.0030 -0.1346 0.7914 | .214
v8_ | -0.3505 0.0769 0.0004 | .4902
v9_ | 0.0287 0.8145 -0.1332 | .1841
v10_ | 0.3938 -0.2703 -0.2406 | .181
v11_ | 0.3909 0.1076 0.2878 | .2922
v13_ | -0.4014 -0.2199 -0.2944 | .2626
v14_ | 0.3895 0.1716 0.0966 | .4261
v15_ | 0.3403 0.1415 -0.2731 | .5362
----------------------------------------------------------

Comp1 now shows somewhat different loadings on variables. The most striking is the change in sign of v9_ from negative to positive, which is not right from theory. Can someone please explain why this is happening and should I consider rotating at all?

3. My final aim is to produce one single composite Women Empowerment index. Some of my peers have suggested to weight each PC with its proportion-of-explained-variance and add all of these weighted PCs together. Is this a correct approach? Are there any other alternatives to do so? Or should I at all consider mushing the components together? Looking at the loadings, it's increasingly difficult to justify the existence and characteristic of each PC in my regression framework.
Tags: composite index, panel data, pca
Nick Cox

Join Date: Mar 2014

Posts: 36053
#2

21 May 2020, 03:12

These are hard questions. But they don't seem consistent even. You start with saying that a single measure or index of women's empowerment is needed first and then will be used in a regression with other predictors. You finish by saying that the index is your final goal.

All the indications here are that the PCA doesn't identify well a single component to serve as index. Without seeing your data I am totally unsurprised. Many important ideas in social science can't be captured well by single precise measures. Some of the reasons might include nonlinearity, skewness and outliers messing up correlations, but the main reason is just too few strong correlations. Rotation cannot be a solution to this. If you want to use more PCs in the regression you can, but then all ways of combining them are arbitrary and even if you let the regression sort out the weights

response = intercept + terms using the PCs + terms using the other predictors

that is in practice just a messy alternative to

response = intercept + terms using women's empowerment variables + terms using the other predictors.

Much of this pivots on your statement

Making this index is important—instead of performing the usual multivariate regression with these variables separated—because the other independent variables in my regression framework contain a host of cognitive-test-score parameters

(where by "multivariate" you presumably mean "multiple")

I'd say that on your evidence a single index is unattainable usefully and is not necessarily the best goal, even given a project aim of focusing on women's empowerment in the context of other predictors too.
Comment

Announcement

PCA for a composite index

Comment