Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Principal Component Analysis - Interpretation

    Hi everyone,

    I have some 26 variables (reduced to 13 for this post) that list the ownership of household assets and a variable for household income. I'm using the following codes for a PCA analysis:

    global household_assets qn3_19_1-qn3_19_13
    pca $household_assets, covariance comp(5)
    screeplot, yline(1)
    rotate
    estat kmo // values are more than 0.5 so using PCA is justified
    predict ha1 ha2 ha3 ha4 ha5, score

    Now that I have the 5 components which explain about 88% of the variation, I'd like to know how can I use this information in a regression analysis. More specifically, and for the purpose of this post i'd appreciate if someone can guide me into how should I interpret the coefficients of the following model

    Y (household income in Uganda shillings) = B0 + B1X1 + B2X2 + B3X3 + B4X4 + B5X5

    a) what should be the interpretation of the coefficients?
    b) what happens when B1,B2,B3,B4, B5 shows opposite signs?


    Thanks


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(qn3_19_1 qn3_19_2 qn3_19_3 qn3_19_4 qn3_19_5 qn3_19_6 qn3_19_7 qn3_19_8 qn3_19_9 qn3_19_10 qn3_19_11 qn3_19_12) int qn3_19_13 float hincome
    1 1 0 0 0 0 0 0 0 0  6  4 0      0
    1 2 0 0 0 0 0 0 0 2  5  3 0 300000
    0 1 0 0 0 0 0 0 0 0  7  3 0 300000
    1 1 0 0 0 0 0 0 0 0  4  4 0 100000
    0 1 0 0 0 0 0 0 0 0  7  4 0      0
    0 2 0 0 0 0 0 0 0 1  4  2 0  30000
    1 2 0 0 0 0 0 0 0 0  5  7 0 200000
    1 2 0 0 0 0 0 0 0 6  7  5 0      0
    1 2 0 0 0 0 0 0 0 2  4  5 0 120000
    1 1 0 0 0 0 0 0 0 1  6  3 0 150000
    1 2 0 0 0 0 0 0 0 2  7  5 5 100000
    0 3 0 0 0 0 0 0 0 0  6  4 0 140000
    1 2 0 0 1 0 0 0 0 0  5  7 2 150000
    0 2 0 0 0 0 0 0 0 0  5  3 0  50000
    1 1 0 0 0 0 0 0 0 2  6  4 0  50000
    1 1 0 0 0 0 0 0 0 0  2  2 0  30000
    1 1 0 0 0 0 0 0 0 2  6  4 0 100000
    1 1 0 0 0 0 0 0 0 0  7  4 2  50000
    1 1 0 0 0 0 0 0 0 0  2  2 0 300000
    1 1 0 0 0 0 0 0 0 0  3  4 0  10000
    0 0 0 0 0 0 0 0 0 1  2  2 0      0
    0 1 0 0 0 0 0 0 0 0  4  4 0      0
    1 1 0 0 0 0 0 0 0 0  8  3 0      0
    0 1 0 0 0 0 0 0 0 0  3  2 0 150000
    0 0 0 0 0 0 0 0 0 0  3  1 0      0
    0 1 0 0 0 0 0 0 0 1  6  4 0      .
    1 1 0 0 0 0 0 0 0 0  5  2 0      .
    1 0 0 0 0 0 0 0 0 0  0  0 0      0
    0 0 0 0 0 0 0 0 0 0  3  3 0      0
    1 1 0 0 0 0 0 0 0 0  5  4 0 100000
    1 1 0 0 0 0 0 0 0 2  7  2 0      0
    1 1 0 0 0 0 0 0 0 2  5  3 0      .
    0 1 0 0 0 0 0 0 0 0  2  3 0      0
    0 2 0 0 0 0 0 0 0 0  4  1 0 200000
    1 1 0 0 0 0 0 0 0 6  4  3 0      0
    1 2 0 0 0 0 0 0 0 2  5  4 0  50000
    0 1 0 0 0 0 0 0 0 1  4  5 0 100000
    1 2 0 0 1 0 0 0 0 0  5  3 0  70000
    1 0 0 0 0 0 0 0 0 0  4  6 0  50000
    1 1 0 0 0 0 0 0 0 3  8  5 0      0
    0 3 0 0 0 0 0 0 0 0  6  2 1 150000
    0 2 0 0 0 0 0 0 0 2  7  6 0      .
    0 1 0 0 0 0 0 0 0 0  5  4 0      0
    0 2 0 0 0 0 0 0 0 1  5  2 0 150000
    0 3 0 0 0 0 0 0 0 0  6  2 1      0
    1 2 0 0 0 0 0 0 0 3  7  4 0 100000
    1 2 0 0 1 0 0 0 0 3  7  3 0  85000
    1 2 0 0 1 0 0 0 0 3  6  3 0      0
    1 1 0 0 1 0 0 0 0 2  5  2 1 150000
    1 2 0 0 0 0 0 0 0 3  6  5 0 200000
    0 3 0 0 0 0 0 0 0 2  8  3 0  50000
    0 2 0 0 0 0 0 0 0 0  6 10 0  15000
    1 1 0 0 0 0 0 0 0 0  6  3 0 300000
    1 2 0 1 1 0 0 0 0 4  6  5 4 200000
    0 2 0 0 0 0 0 0 0 0  3  2 0      0
    1 1 0 1 0 1 0 0 0 5  8  5 0 300000
    3 3 0 0 0 0 0 0 0 0  5  3 0 150000
    0 7 1 3 2 0 0 0 0 0 10  7 5 200000
    1 2 0 0 0 0 0 0 0 0  6  3 1      0
    1 1 0 0 0 0 0 0 0 0  5  3 0      0
    0 0 0 0 0 0 0 0 0 0  7  6 0  80000
    1 0 0 0 0 0 0 0 0 2  5  3 0 250000
    1 1 0 0 0 0 0 0 0 0  6  8 0 100000
    1 0 0 0 0 0 0 0 0 0  5  4 0      0
    1 2 0 0 0 0 0 0 0 3  6  8 0 150000
    1 2 0 0 0 0 0 0 0 0  5  3 0 200000
    8 1 0 0 0 0 0 0 0 0  4  2 4      0
    0 0 0 0 1 0 0 0 0 0  2  2 4      0
    2 4 1 0 1 0 0 0 0 4  2  7 2      0
    1 1 0 0 0 0 0 0 0 2  7  3 2 300000
    0 0 0 0 0 0 0 0 0 0  4  1 2 180000
    0 3 0 0 0 0 0 0 0 0  3  4 5      0
    0 1 0 0 0 0 0 0 0 0  3  3 0 200000
    1 2 0 1 1 0 0 0 0 0  5  4 5  90000
    0 1 0 0 0 0 0 0 0 0  4  1 0 200000
    1 2 0 0 0 0 0 0 0 0  4  4 2      0
    0 0 0 0 0 0 0 0 0 3  3  3 0  80000
    0 0 0 0 0 0 0 0 0 1  4  2 0      0
    1 1 0 0 0 0 0 0 0 5  6  2 0      0
    1 4 0 1 0 0 0 0 0 0  5  3 1 100000
    0 1 0 0 0 0 0 0 0 5  6  2 2 100000
    0 1 0 0 0 0 0 0 0 0  7  2 0 200000
    0 1 0 0 0 0 0 0 0 0  4  2 1     99
    0 0 0 0 0 0 0 0 0 0  3  2 0 100000
    0 2 0 0 1 0 0 0 0 0  5  2 4      0
    0 1 0 0 0 0 0 0 0 2  6  3 0  80000
    1 3 0 0 0 0 0 0 0 0  8  6 5  40098
    1 1 0 0 0 0 0 0 0 0  5  4 4  60000
    0 3 0 0 0 0 0 0 0 0  5  6 5 300000
    0 3 0 0 0 0 0 0 0 0  5  3 2  75000
    1 2 0 0 0 0 0 0 0 0  5  2 3  60000
    0 0 0 0 1 0 0 0 0 0  4  8 4      0
    0 1 0 0 0 0 0 0 0 0  6  7 2      0
    0 2 0 0 0 0 0 0 0 0  5  1 4      .
    1 1 0 0 0 0 0 0 0 0  2  2 2  50000
    1 2 0 0 0 0 0 0 0 0  3  2 0      0
    0 1 0 0 0 0 0 0 0 0  3  1 0      0
    1 0 0 0 0 0 0 0 0 0  2  1 0 100000
    1 2 0 0 1 1 0 0 0 0  8  4 0  50000
    0 2 0 0 0 0 0 0 0 1  4  1 0      0
    end


  • #2
    Thanks for the clear data example.

    I don't understand whether "ownership of assets" means just "assets that are owned", that is, whether the assets variables are categorical or measure amounts of assets.

    Your data look highly problematic for what you're doing. In the sample, some assets variables are identically zero; others have several zeros; and so forth. Although PCA doesn't make strong assumptions about the data -- it's basically a transformation technique -- that's some distance from the multivariate normal shape that is, if not ideal, then at least easiest to work with.

    It's often the case that individual PCs have no clear interpretation in any case.

    I looked at the sum over the assets variables and its relation to income and could see almost nothing there, but I don't know what your expectations are.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      Thanks for the clear data example.

      I don't understand whether "ownership of assets" means just "assets that are owned", that is, whether the assets variables are categorical or measure amounts of assets.

      Your data look highly problematic for what you're doing. In the sample, some assets variables are identically zero; others have several zeros; and so forth. Although PCA doesn't make strong assumptions about the data -- it's basically a transformation technique -- that's some distance from the multivariate normal shape that is, if not ideal, then at least easiest to work with.

      It's often the case that individual PCs have no clear interpretation in any case.

      I looked at the sum over the assets variables and its relation to income and could see almost nothing there, but I don't know what your expectations are.

      Hi Nick,

      Thanks for the prompt reply. The ownership is a numeric variable, so if a household owns say 2 cell phones the enumerator records 2 and so on. My data has over 2500 households. Could it be the case of small sample why you're not seeing any association? Regardless, this is just an example I made up. I just want to understand how to interpret the coefficients when one uses components in a regression model. I couldn't find any resource so would appreciate if you can explain via any example or direct me to a resource that specifically answers that?

      Thanks.

      Comment


      • #4
        My line is that

        1. No concrete interpretation is guaranteed, especially with quite different kinds of variables mushed together.

        2. I try to interpret variables by looking at the loadings and (not the same) the correlations between variables and components. My add-on pcacoefsave (SSC) may help in the mechanics of that but it can't tell you about meanings.

        3. I never rotate. That is for consenting factor analysts in public.

        I've looked through various texts on PCA and never found one I liked. They all oversell PCA in my view. Conversely, it would be surprising if someone wrote a book on PCA and felt rather negatively about it.

        Comment


        • #5
          If you want to do a regression afterwards, you might consider skipping the exploratory step and move to SEM or GSEM. This would let you handle the measurement structure better. It does require you make some assumptions about that indicators go together.

          If you're considering PCA, you might look at J. Scott Armstrong, "Derivation of Theory by Means of Factor Analysis or Tom Swift and His Electric Factor Analysis Machine", The American Statistician, 1967, 17-21.

          Comment


          • #6
            The Armstrong article title alludes to a hero of 20th century juvenile fiction in the US. Tom Swift was a young boy who could do whatever was needed, fly planes, bring down villains, etc., etc. I only know this because I've talked to Stata people in the US who read Tom Swift books when aged about 9. I don't think "Tom Swift" is a widely known character outside the US, unlike Batman, Superman, Wolverine and other such Stata super-users.

            Thor too: see the hammer command forthcoming in Stata 16.
            Last edited by Nick Cox; 13 Oct 2017, 11:29.

            Comment


            • #7
              Thanks Nick & Phil for your comments. Will definitely look more into it

              Comment

              Working...
              X