No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating single index using PCA

    Hello dear STATA users,

    I am struggling to understand the index score generation using PCA. First of all, I scaled my variables between zero (worst) and 1 (best) because the original variables have wildly different scales. Then I run PCA on the scaled variables.

    Please see the PCA output below.

    . pca sc_vul_m sc_vul_ex sc_n5y sc_od

    Principal components/correlation Number of obs = 27
    Number of comp. = 4
    Trace = 4
    Rotation: (unrotated = principal) Rho = 1.0000

    Component | Eigenvalue Difference Proportion Cumulative
    Comp1 | 2.34104 1.33824 0.5853 0.5853
    Comp2 | 1.0028 .380757 0.2507 0.8360
    Comp3 | .622046 .587939 0.1555 0.9915
    Comp4 | .0341067 . 0.0085 1.0000

    Principal components (eigenvectors)

    Variable | Comp1 Comp2 Comp3 Comp4 | Unexplained
    sc_vul_m | 0.6293 -0.0023 -0.2984 -0.7176 | 0
    sc_vul_ex | 0.6205 0.0164 -0.3629 0.6950 | 0
    sc_n5y | 0.0812 0.9813 0.1747 -0.0045 | 0
    sc_od | 0.4609 -0.1920 0.8653 0.0450 | 0

    Based on my research, after PCA, I needed to rotate the components and predict the PC1 to use that score as an "index" score. Using basically rotate and predict commands.

    My first question is: as you can see here the sc_n5y is loaded on the PC2 whereas the remaining are on PC1. So, when I rotate and predict the score, the sc_n5y will not contribute that score since it will be PC1 based right? Is there any way to count for both PC1 and PC2 and generate an index score out of those?

    My second question is about the rotate function. When I only type rotate , the rotation is applied to the all PCs however, I don't need to take PC3 and PC4 into account since they don't provide useful information. In such case, if I run rotate comp1 comp2 my result are different than rotate results. In this case, in order to achieve an index score from this PCA, which one shall I choose? I see this being a problem because of lack of correlation between sc_n5y and others. Is there any way to count all in one index thru the PCA analysis?

    Many thanks!

    Last edited by Gizem Levent; 05 Mar 2021, 09:57.

  • #2
    Warning: Prejudices ahead.

    I'd back up and look at a scatter plot matrix and correlation matrix here. The indications are that sc_5ny is uncorrelated with the other three variables. This is not a situation in which any PC, or any combination of PCs. is going to be more helpful than using the original variables (and perhaps finding out that not all are useful for whatever is the ultimate purpose).

    This practice of index construction is locaiized to some subfields. I've never seen a convincing rationale. The Holy Grail is, according to the sales pitch, a single composite variable that somehow stands for several, but even when that is possible, you are better off choosing one of them or doing something a lot simpler, like a plain average.


    • #3
      Thank you so much for your response Mr. Cox! I guess I might try to use a more simplistic approach then.. Much appreciated!