Combining components (from PCA) to a single variable

Gustav Egede Hansen

Join Date: May 2021

Posts: 94
#1

Combining components (from PCA) to a single variable

18 Oct 2022, 03:51

Hi everybody

I am studying how public buyers safeguard employees in outsourcing contracts. Safeguards are binary variables, indicating presence or no presence of safeguards in tender material. So far, I have created a count variable by adding safeguards together. However, I would also like to try and reduce the dimensions using a tetrachoric pca. Here is my current code:

Code:

tetrachoric safe_2 safe_3 safe_4 safe_5 safe_6 safe_7 safe_8 safe_10 safe_11 safe_12 safe_13 safe_15 safe_17 safe_18 safe_19 safe_20 safe_22 safe_23 safe_27 safe_28 safe_29 safe_33 safe_35 safe_36, pw posdef matrix C = r(Rho) matrix symeigen eigenvectors eigenvalues = C matrix list eigenvalues pcamat C, n(247) factor(6) screeplot rotate, blanks(.35) oblique oblimin predict safe_pca1 safe_pca2 safe_pca3 safe_pca4 safe_pca5 safe_pca6

From here, I would still like to create an overall safeguard variable fit for regression analysis. Would it be advisable to add safe_pca* together (gen = safe_pca1 + safe_pca2 + ... + safe_pca6)? Or is there a better solution?

Best,
Gustav
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 36053
#2

18 Oct 2022, 06:47

I see this idea quite often but I don't understand it.

Much of the point of PCA is that the first PC is the best single summary of the total pattern of variation. You can't improve on that by mushing it together with other PCs.

I can see a good case for using the total count as a summary (although there is a key substantive question of whether the single variables are really equally important for your research) -- but what is the logic that makes a mix of PCs better than that? . Indeed, giving PCs equal weight in an average or total PC ignores the evidence of the PCA itself.

Rotation complicates the issue but I can't see that it solves it. Also, taking PCA of binary variables divides opinion, but put that to one side.

Can you give a respected reference which explains why this is a good idea? A while back, I ploughed through a couple of books on PCA for other reasons and I don't recall this idea ever surfacing except in questions like this. I've said this much previously on Statalist but don't recollect any comeback except stunned silence.
1 like
Comment

Announcement

Combining components (from PCA) to a single variable

Comment