Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining components (from PCA) to a single variable

    Hi everybody

    I am studying how public buyers safeguard employees in outsourcing contracts. Safeguards are binary variables, indicating presence or no presence of safeguards in tender material. So far, I have created a count variable by adding safeguards together. However, I would also like to try and reduce the dimensions using a tetrachoric pca. Here is my current code:

    Code:
    tetrachoric  safe_2  safe_3  safe_4  safe_5  safe_6  safe_7  safe_8  safe_10  safe_11  safe_12  safe_13  safe_15    safe_17  safe_18  safe_19  safe_20  safe_22  safe_23  safe_27  safe_28 safe_29   safe_33  safe_35  safe_36, pw posdef
    matrix C = r(Rho)
    matrix symeigen eigenvectors eigenvalues = C
    matrix list eigenvalues
    pcamat C, n(247)  factor(6)
    screeplot
    
    rotate, blanks(.35) oblique oblimin 
    
    predict safe_pca1 safe_pca2 safe_pca3 safe_pca4 safe_pca5 safe_pca6
    From here, I would still like to create an overall safeguard variable fit for regression analysis. Would it be advisable to add safe_pca* together (gen = safe_pca1 + safe_pca2 + ... + safe_pca6)? Or is there a better solution?

    Best,
    Gustav

  • #2
    I see this idea quite often but I don't understand it.

    Much of the point of PCA is that the first PC is the best single summary of the total pattern of variation. You can't improve on that by mushing it together with other PCs.

    I can see a good case for using the total count as a summary (although there is a key substantive question of whether the single variables are really equally important for your research) -- but what is the logic that makes a mix of PCs better than that? . Indeed, giving PCs equal weight in an average or total PC ignores the evidence of the PCA itself.

    Rotation complicates the issue but I can't see that it solves it. Also, taking PCA of binary variables divides opinion, but put that to one side.

    Can you give a respected reference which explains why this is a good idea? A while back, I ploughed through a couple of books on PCA for other reasons and I don't recall this idea ever surfacing except in questions like this. I've said this much previously on Statalist but don't recollect any comeback except stunned silence.

    Comment

    Working...
    X