Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Principal Component Analysis

    I am working with data for Indian schools. I intend to create an Index by combining several variables(availability of library, no. of computers in the school, no. of toilets in the school) that can reflect the net infrastructure available in the school. I wish to use this index later in my regression specification.

    I performed a pca to construct this index. However, my first principal component only explained around 23% of the variation in the data. The second component also explained around 20% variation. I read that 23% is quite low and the first component cannot directly be used as the index in such cases.

    Can somebody recommend what shall I do in such a case. Is there any other way to construct the index. Is it sensible to combine the first two components. If yes, how can I do it?

  • #2
    I would use the original variables directly in the regression and see which help.

    A mishmash of variables in quite different forms usually disappoints in PCA.

    Comment


    • #3
      Thanks a lot for the reply!!

      1. I also thought of doing the same initially but there are 20 such variables. Won't it be an issue?

      2. I also intend to generate certain segregation measures like index of dissimilarity etc. For this purpose, I was thinking of dividing the schools in to quintiles and then generating the indices to see if the segregation changes with the quintile. Can you recommend some other way of combining these variables in to one?

      Comment


      • #4
        1. Indeed.

        2. I don't understand how quintiles will help. That sounds an arbitrary distraction. Otherwise the question is just as in the original post. How to combine many variables?

        Expertise on the internet does not help when the issue is complicated dataset and researcher unclear what to do. There are many possible answers, including

        A1. You need a much more strongly theory-based approach to selection of predictors.

        A2. You need a much more thorough exploratory analysis before you should even begin to think about regression.

        Some people arguing for A1 would disagree with A2, but not so much I think the other way round. It's hard to avoid the appearance of a fishing expedition either way. Invocations of theory are often rhetorical in any case, often no more than some previous author mentioned a variable.

        Comment


        • #5
          Thanks a lot for your advice.

          I will go back to the literature and try to work it out.

          Comment

          Working...
          X