Creating an index using PCA

Nina Persson

Join Date: Oct 2017

Posts: 1
#1

Creating an index using PCA

07 Oct 2017, 05:36

Hi,
I want to create an index for each of the big 5 personality traits using PCA. My dataset consists of questions to the participants that captures some part of the personality trait.
For extroversion, I have 17 questions that each is believed to capture differents part of the personality trait. From PCA, I conclude to incorporate 4 components that together explain 45% of the total variation.

Is this enough?
Can I merge the 4 components into a single index to capture the overall measure of extroversion? (This is crucial since it is the dependent variable in the analysis)

Thanks, Nina Persson
Tags: None
Anat Tchetchik

Join Date: Jun 2014

Posts: 217
#2

07 Oct 2017, 08:00

Nina, what do you mean by merge the 4 components? (the -predict- command creates the factor) can you send the code you are using?
Comment
ericmelse

Join Date: May 2014

Posts: 434
#3

07 Oct 2017, 10:43

Nina, I think you mean to write about extraversion (https://en.wikipedia.org/wiki/Big_Fi...onality_traits).
Myself, I am not certain if it is appropriate to create an 'index' of a personality trait. What would be the meaning of having an index of 0, 1 or 100 of extraversion? Would a person with an index of 100 be a 100 times more extravert than another person with an index of 1? And, suppose that you have administered your questions to the same persons on two different dates, for example with 6 months in between. Would the same person with an index of 80 at t2 be 4 times more extravert when he or she has an index of 20 at t1?

I suggest you consult the Stata manual, for example example 15 — Higher-order CFA.
Furthermore, also consider using an IRT GRM model, the manual explains the basics.
If you are new to these methods, I recommend the Stata Press book of Alan Acock, it is a very good read with well explained examples (http://www.stata-press.com/books/dis...g-using-stata/).

As Anat already points out, the principal objective is to compute (predict) the score of each respondent on a (so called) latent scale, which supposedly is somehow somewhere driving respondents' responses. We can never really measure that scale (in the mind of our respondents) but our models actually might be close. It all depends on the number of respondents and their characteristics. Should we succeed in getting a cohort of respondents that is more or less representative of the general population, then we can expect the scores to be normally distributed. Hence, the mean response of your extraversion measurement should be (close to) 0, whereas the standard deviation should be (close to) 1 (yes, you can standardize your scales after prediction). In that case, you would compare individual respondents against the cohort. Something like, 'your profile of extraversion is 1 standard deviation above average, which puts you in stanine 7, a score that you share with about 12% of the general (or reference) population (https://en.wikipedia.org/wiki/Stanine). The width of each stanine is .5 standard deviation.

When you use scores of such a latent scale, in the literature we refer to the 'theta', you can make meaningful comparisons because the scale is partitioned by standard deviations.
In theory neither end of the scale is finite, but scores below -3 or above +3 are extreme scores.

Personally, I recommend to classify or visualize scores by stanines. It makes more sense to know that someone's profile is average (stanine 5, 20%), and that 40% of the cohort are above or below average. Very important is to note that scoring above or below average is neither good nor bad! There are no good or bad scores in psychometrics. It all depends on the context how particular scores of particular scales associate with a particular phenomenon (e.g. school drop out or not, being married or not, etc.).

http://publicationslist.org/eric.melse
Comment

Announcement

Creating an index using PCA

Comment

Comment