Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using mca or pca to generate a single variable

    Dear colleagues I know I can use principle component analysis (PCA) on categorical variables by first coding them as dummies. Multiple correspondent analysis(MCA) can also be used but you don't need to convert your categorical variables into dummies. Let me assume I will convert my variables into dummies and use PCA and ask the below question.
    I have A latent variabel called Accessibility inferred by five likert questions (scale 1-7) and other questions below:
    1 How much do you spend on NHIF
    2 Mode of NHIF payment code 1 '"Out pocket" 2 "Sallary deduction" 3 "Insurance"
    I want combine all these questions to generate a singel variable called Accessibility and I plan to generate a set of dummies from my categorical variables inorder to use pca.

    suppose my dummies were a b c d e f g h for categorical variables and I do:
    pca a b c d e f g h How much do you spend on NHIF
    predict comp
    Can I use comp as my Accessibility variable?I just want to rename it to Accessibility and use it.I know the predicted comp will be first principle component anyway.
    The main question here is can I work with the first component as my Accessibility variable?





    I plan to use the same Idea for affordability and quality variables which are inferred by other several questions.Please advice







  • #2
    Although the wording is repeatedly "can", you really seem to be asking "should".

    Motives for this kind of reduction seem to be

    1. You are following a previous paper's methodology,

    2. It is repeated or standard practice in your field.

    3. It is a really good idea given your goals and relationships in your data.

    The last is by far the best reason.

    PCA and MCA have these simple things in common:

    1. The books are written by enthusiasts. Many researchers have intensely statistical careers without touching either technique.

    2. The examples are examples that work well (even this is in doubt: when teaching PCA in a minor way I had a hard time trying to find really convincing examples). The hearing loss data used in pca as an example are a rare convincing exception. When people ask about this on Statalist their goal is to have a single measure of "development" or "inequality" or some such vague concept and it seems all too predictable that their success will be limited given a lot of weak or moderate correlations between their variables.

    3. People who don't use or think much of the techniques tend to keep quiet about that (again, there are exceptions: in a few fields I know devastating critiques of PCA and factor analysis seem to have killed off serious interest).

    Faced with a bundle of Likert scale items alpha offers ways of assessing how far they swing together.

    Faced with a bundle of loosely related variables, here are some things to think about.

    1. Why not let a regression tell you how far they each have predictive value?

    2. If a bundle of variables have similar meanings, and you wish to cut down how many you use in a model, consider using their average, or choosing just one as the best single measure. That is a lot easier to think about and to defend and explain. See the previous point again.

    3. Calling a variable "accessibility" (or whatever else is in mind) when it is a mish-mash of this, that and the other can be just wishful thinking.

    4. It's really hard to do reproducible research, or for others to discuss your choices, if they hinge on early arbitrary decisions about mushing predictors together.

    Executive summary: Your project, your decisions, but speaking personally I am not a fan.

    Comment


    • #3
      Thanks Nick

      Comment

      Working...
      X