Obtaining mean zero, standardized factor scores after principal factor analysis of polychoric correlation matrix

Rashad Jafarov

Join Date: Oct 2015

Posts: 14
#1

Obtaining mean zero, standardized factor scores after principal factor analysis of polychoric correlation matrix

08 Nov 2015, 09:59

Dear all,

After running, principal factor analysis on the polychoric correlation matrix of my (ordinal scale) items (factormat matrix, pcf) I saved the predicted factor scores to be used for further analysis. However, my new variables are not zero mean standardized. They are not even, normally distributed, exhibiting some degree of skewness. I attach their basic descriptives. Can someone, help me to understand why this is the case and whether there is a way of correcting for this. As in, what is the way of obtaining, zero mean standardized factor scores after factormat command in STATA?

Best regards,
R
Attached Files
Tags: factor scores, factormat, ordinal data, pcf, polychoric correlation
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

08 Nov 2015, 15:20

Welcome to Statalist. Please read the FAQ to learn more about effective posting. Attached picture files are typically unreadable, and for showing output from Stata the best method is to copy directly from either Stata's Results window or your log file and pasting into the forum editor in a code block. That said, your attachment happens to be readable on my computer today. Also, when showing results, it is really important to show the command(s) that gave those results along with it so we know exactly what you did and what you got.

The distributions of factor scores depends on the distributions of the underlying variables. They will seldom be normal--basically only if each of the underlying variables has a normal distributions--and even then it is not necessarily so. So for this, it is your expectations, rather than Stata's output that needs adjustment.

As for having them standardized, they don't necessarily come out that way when Stata creates them. Factor scores are inherently dimensionless, so you can apply any linear or affine transformation that you like to them and have results that are equivalent for most purposes. They usually do come out centered very close to zero, but various rounding errors etc., sometimes lead to means that are slightly off zero. The results that you show are surprising. You don't show us the command you issued to produce those results. I'm guessing that you mistakenly -summarized- the variables you factored rather than the factor scores. However, as you also don't show us the -factormat- command you used, it is possible that you mis-specified that, for example by not specifying the -means()- and -sds()- options.

So I think you need to repost your question, showing us everything from the -factormat- command through its results and then the -predict- command you used to calculate the factor scores, and its results (and -summarize- output for the factor scores.)
Comment

Rashad Jafarov

Join Date: Oct 2015
Posts: 14

09 Nov 2015, 08:45

Dear Clyde,

Thank you very much for the time you took to answer my post. I am sorry for not following the forum rules earlier. Below I reproduce the Stata commands (bold charachters) that took me to the final results.

global insight_new insight10 insight09 insight03 insight11 insight06 insight08 insight07 insight04

. codebook	$insight_new,	compact
Variable	Obs	Unique	Mean	Min	Max	Label
insight10	16194	5	3.435408	1	5	My self-discipline is good
insight09	16194	5	2.959121	1	5	I do not have a tendency to procrastinate
insight03	16194	5	3.600408	1	5	I find planning my study independently, easy
insight11	16194	5	2.957453	1	5	I spend enough time on my studies
insight06	16194	5	3.322836	1	5	I can study well, generally
insight08	16194	5	2.589848	1	5	I find putting effort into uninteresting parts of my study, easy
insight07	16194	5	3.754539	1	5	I am satisfied with the study performance that I've accomplished so far
insight04	16194	5	3.447326	1	5	My activities outside the study, does not prevent me from focusing on my	studies

***Tabulation of one of the variables as an example****

. tab insight10
My
self-discipline
is good	Freq.	Percent	Cum.

Totally disagree	569	3.51	3.51
2	2,585	15.96	19.48
3	4,544	28.06	47.54
4	6,218	38.40	85.93
Totally agree	2,278	14.07	100.00

Total	16,194	100.00

****Then I proceeded with the following commands****

polychoric $insight_new
display r(sum_w)
global N = r(sum_w)
matrix R = r(R)
factormat R, n($N) mineigen(1) blanks(.4) pcf
predict f1
rename f1 determination
label var determination "F1 score of PCF analysis on insight variables, unidimensional"

****Summary statistics of the resulting factor variable***

. sum determination
Variable	Obs	Mean	Std. Dev.	Min	Max
determinattion	16194	4.414692	1.058	1.359118	6.795589

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#4

09 Nov 2015, 09:31

So, it is as I suspected. Your -factormat- command does not specify the -means()- and -sds()- options. Consequently, when you run -predict-, -predict- assumes that the underying variables have mean 0 and sd 1, which is manifestly not the case.

From the help file for -factormat-:

sds(matname2) specifies a k x 1 or 1 x k matrix with the standard deviations of the variables. The row or column names should match the variable names, unless the names() option is
specified. sds() may be specified only if matname is a correlation matrix. Specify sds() if you have variables in your dataset and want to use predict after factormat. sds()
does not affect the computations of factormat but provides information so that predict does not assume that the standard deviations are one.

means(matname3) specifies a k x 1 or 1 x k matrix with the means of the variables. The row or column names should match the variable names, unless the names() option is specified.
Specify means() if you have variables in your dataset and want to use predict after factormat. means() does not affect the computations of factormat but provides information so
that predict does not assume the means are zero.
Comment
Rashad Jafarov

Join Date: Oct 2015

Posts: 14
#5

09 Nov 2015, 12:44

Sir, I appreciate this help of yours very much! Thanks for pointing it out!! I am not an advanced Stata user, as you can tell. I did give it a go, by resolving the issue as following. Could you please, remark whether the steps I took were correct.

***Continuing with the variables I introduced above***

tabstat $insight_new, stat(mean) save

***With tabstat, I saved the vector under the name mean
tabstatmat mean

.mat list mean
***Stata output***

mean[1,8]

insight10 insight09 insight03 insight11 insight06 insight08 insight07 insight04

mean 3.4354082 2.9591207 3.6004076 2.9574534 3.3228356 2.5898481 3.7545387 3.4473262

tabstat $insight_new, stat(sd) save

***With tabstat, I saved the vector under the name stdev
tabstatmat stdev

.mat list stdev
***Stata output***

stdev[1,8]

insight10 insight09 insight03 insight11 insight06 insight08 insight07 insight04

sd 1.0282533 1.1573788 1.1435377 1.1389503 1.1234056 1.0267157 1.0003795 1.1123946

polychoric $insight_new
global N = r(sum_w)
matrix R = r(R)
factormat R, n($N) mineigen(1) blanks(.4) sds(stdev) means(mean) pcf
predict determination

***Stata output***

Variable Obs Mean Std. Dev. Min Max

determination 16194 -7.82e-10 .9621974 -2.800276 2.175893

I have also uploaded histogram of the variable.

1 Photo
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#6

09 Nov 2015, 12:56

Yes, this looks correct. And you will notice that now the mean is -7.82e-10, which is, for practical purposes zero. (There are rounding errors that creep into the calculation of the factor scores, so a mean of exactly zero is not always obtained.)

The standard deviation came out to be 0.9621974. That's fairly close to 1, but that may be coincidental. The predicted factors are not guaranteed to have standard deviation = 1. Now, factors are inherently dimensionless, so if it is convenient for you to rescale them to standard deviation = 1, you can do that easily with -egen, std()-.
Comment
Rashad Jafarov

Join Date: Oct 2015

Posts: 14
#7

09 Nov 2015, 13:47

Thank you once more!! I did so, I rescaled the variable to a standardized one with st.dev. of 1 which will be helpful in interpretation later on, I think. I will be using this variable as an independent variable in binary logistic regression. If I am not mistaken, one unit increase in this variable will denote one standard deviation increase? My searches online, about interpretation of coefficients of factor variables as independent variables in regression analyses have been fruitless. Could you please point me to a source, if you are informed of any?

Best regards,
R
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#8

09 Nov 2015, 14:21

Well, there is nothing special about interpreting the coefficients of factor variables. They work the same way as the coefficients of any other variables. The coefficient is the expected difference in outcome associated with a unit change in the predictor variable. When the predictor variable is standardized with sd = 1, a unit change in the predictor variable is the same as a 1 SD change.

The challenge is typically in explaining the factors themselves. They are latent variables that capture shared variance among a set of manifest variables, estimated as linear combinations of the manifest variables. And it often takes considerable knowledge of the content and science of your discipline, as well as a hefty dose of creativity to attach a meaning to a factor. In any case, deciding what they mean is a blend of art and science; but it is not statistics. The best statistician in the world, without understanding of the underlying science, will be of no help for that.

Finally let me suggest that if your plan is to use factors as variables in other regression analyses, you might want to use structural equations modeling (-sem- or -gsem- in Stata) rather than your current approach of estimating factor scores and using those in a regression.
Comment
Rashad Jafarov

Join Date: Oct 2015

Posts: 14
#9

11 Nov 2015, 05:44

Thank you your valuable advice and instructions. I am going to look into sem.

Best regards!!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#10

11 Nov 2015, 09:57

You're welcome. Since you are interested in polychoric correlations, you should look at -gsem- in particular, as you will probably want a -logit- link for some of the regressions in your model.
Comment

mean[1,8]
	insight10	insight09	insight03	insight11	insight06	insight08	insight07	insight04
mean	3.4354082	2.9591207	3.6004076	2.9574534	3.3228356	2.5898481	3.7545387	3.4473262

stdev[1,8]
	insight10	insight09	insight03	insight11	insight06	insight08	insight07	insight04
sd	1.0282533	1.1573788	1.1435377	1.1389503	1.1234056	1.0267157	1.0003795	1.1123946

Announcement