Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Low Loadings on the first component of PCA

    Hello all,
    I am running PCA for determining gentrification score for census tracts. Actually, I am trying to create gentrification scores for census tracts between two time points such as how much census tract A gentrified between 1990 and 2000 or 2000 and 2010 etc. I have 17 variables which are theoretically related to gentrification such as changes in total population or changes in percentage of professional job or changes in median home rent value in census tracts etc. When I run PCA for these changes between 1990 and 2000 or 2000-2010 or else, I had low PCA loadings, all of them are below 0.5. I attached an example what I obtained as a PCA result. Could you help me to understand why I have low loadings and how I can solve this issue. By the way I check correlations between components and each variables and I also attached these as well. Thanks in advance.
    Attached Files

  • #2
    I don't find anything very surprising about these results. More positively, they are quite good, considering the kind of data you have.

    Loosely, variables that are changes (so, derivatives or equivalently in practice first differences) don't march together as closely as do variables that are levels. Also, from your names of your variables I would expect a mix of weak and strong correlations.

    I have a love-hate relationship with PCA. I am interested enough to have written a couple of auxiliary commands (including pcacoefsave from SSC which I think you used here) but it's usually disappointing, even given an expectation that it is usually disappointing. It can't find strong relationships in a bundle of variables that don't exist, and so high expectations are often unrealistic. Also, the pattern it looks for is linear correlations (ellipsoidal structure) and so it will miss or mess up anything more complicated.

    Comment


    • #3
      Thanks, Mr. Cox for your comments. I have another question for ask your opinion. If I create an index based on these results, could it be questionable due to loadings which are lower than 0.5? I heard that people used or considered loadings which are higher than 0.5 for labeling component. So, I am worried about my loading scores which are lower than 0.5 for creating index based on first component. Thanks again.

      Comment


      • #4
        I don't see there a different question, but if principal components aren't well defined or easy to interpret why use them at all? There are answers to that -- say that scatter plots of the first few components can be useful maps of the data even if the axes are hard to "name". It sounds as if you want to use these PCs in later modelling, in which case it's hard to see how they improve on (some of) the original variables. Still, it's your project.

        Comment

        Working...
        X