Factor analysis - Not use the factors but the scale mean?

Cordula Kiel

Join Date: Feb 2016

Posts: 45
#1

Factor analysis - Not use the factors but the scale mean?

05 Mar 2016, 12:07

Hello,

I am very new to Stata and statistical analysis and have a lot of questions, sorry if they are stupid. =)
I tried to figure out the basics myself, but now I am not very sure how to move forward.

I have used a scale consisting of 10 items in a questionnaire and conducted principal-component factor analysis, which resulted in two factors explaining abount 65% of the total variance.
Cronbach's alpha as well as the KMO measure for the 10 items is larger than 0.8.

Now there is no big difference in subject between the items of factor 1 and the items of factor 2, and I would like to use a score over the whole 10-items scale for further analysis.
E.g. I would calculate the mean score over the 10 items' values.

Is there a way how I could argue that I do it this way and not use the two factors?

Thanks a lot in advance!
Tags: None
Oded Mcdossi

Join Date: Jun 2014

Posts: 577
#2

05 Mar 2016, 12:17

Without any reproducible example you are unlikely to get useful advices.
As for creating scores after factor analysis you should read the help file of "factor postestimation##predict" as well as the help rotate.
Comment
Cordula Kiel

Join Date: Feb 2016

Posts: 45
#3

05 Mar 2016, 13:14

I did the varimax rotation after the factor analysis, resulting in two factors, one containing 6 items and one containing 4 items, each of them with loading > 0.6.
My problem is that I used an existing scale (CETSCALE) which should measure one construct called consumer ethnocentrism.
It seems to me that it does not make much sense to split this scale up into two components, when the items do not really measure two distinct factors, so I would like to have one factor.
Is this possible if the factor analysis tells me that there should be two factors?
Comment
Oded Mcdossi

Join Date: Jun 2014

Posts: 577
#4

05 Mar 2016, 13:22

I think you are answering to yourself. If you ask the factor analysis to produce two factor solution, then you can predict two scores. Although, you can always use the first predicted scores in subsequent analysis, to me it sounds a bad idea. If you want one score why not just force the factor solution to produce one score?
Comment
Dick Campbell

Join Date: Apr 2014

Posts: 279
#5

05 Mar 2016, 15:01

It sounds like you are using an existing scale which previous researchers have declared to be unidimensional, meaning that you expected to find one factor. If your intent is to confirm that one factor is adequate you should be doing a confirmatory factor analysis (CFA) rather than exploratory and in any case you should not be using a principal component solution. Further, the varimax rotation that you did assumes that the two components (not factors) you obtained are uncorrelated which is unlikely to be the case.

So, it seems to me that you have two choices here: (a) use the existing scale uncritically, citing the source that tells you it is unidimensional or (b) make a serious effort to test the previous assertion. It may well be the case that the scale is not unidimensional in your particular population.

You tell us that you are fairly new to this and I realize that asking you to jump into confirmatory factor analysis may be a bit of a leap if you are unfamiliar with it But the current state of the art for testing whether the factor structure of a set of items in your own data matches previously reported results probably requires you to do CFA if you intend to publish this. I say "probably" because I don't know your field and what standard practice is within it. In any case, tale a look at examples 1 and 3 in the SEM manual.

Richard T. Campbell
Emeritus Professor of Biostatistics and Sociology
University of Illinois at Chicago
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1362
#6

05 Mar 2016, 15:52

CFA and PCF are not equivalent methods, but @Dick Campbell's advice is sound nonetheless. In CFA you are making assumptions about a causal relationship between theta and the probability of a given response. In PCA/PCF the same assumption is not a requirement, rather it is a form of singular value decomposition. I would second the suggestion to use CFA and would go a step further to say that if there is an intent to use some sort of scale in secondary analyses that you use SEM to do so. The reason is that the regression scores from the PCF will not allow you to adjust your subsequent parameter estimates for the uncertainty inherent in the PCF scores, while SEM would allow the simultaneous estimation of the scale, it's error, and the model you are interested in fitting to your data in the end.
Comment
Dick Campbell

Join Date: Apr 2014

Posts: 279
#7

05 Mar 2016, 16:14

NO, FA and PCA are not equivalent methods which is why I said "you should not be using a principal component solution." In fact, the term "principal components factor analysis" is a contradiction in terms. This is a point that may seem pedantic but it is not. PCA carries with it a set of assumptions that one typically does not want to make.

Richard T. Campbell
Emeritus Professor of Biostatistics and Sociology
University of Illinois at Chicago
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#8

05 Mar 2016, 17:09

While I agree with nearly all of what Dick Campbell said about PCA, I'm surprised he thinks it is unlikely to be the case that the two components are uncorrelated. My understanding of PCA is that the components are always orthogonal. They are after all, eigenvectors of a symmetric matrix. And one of the purposes for which PCA is sometimes used is to replace a group of regression predictor variables with independent predictors. Am I missing something?
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#9

05 Mar 2016, 18:20

Any chance we can get the actual commands used? pca and factor, pcf are very different animals. I also support CFA instead.
Comment
Cordula Kiel

Join Date: Feb 2016

Posts: 45
#10

06 Mar 2016, 00:59

Thanks for all of your comments! I will try to understand and apply CFA now (I got the point why this makes more sese for an existing scale) and probably might come back with follow-up questions.
Unfortunately unidimensionality has not been proven. The CETSCALE has been applied very often in different studies, some of them finding unidimensionality, some of them finding multidimensionality (mostly with two factors).

@ben earnhart: I used "factor, pcf".
Comment
Cordula Kiel

Join Date: Feb 2016

Posts: 45
#11

06 Mar 2016, 03:32

I tried to conduct a CFA using sem, following examples 1 and 3 of the sem help.
Though I am not quite sure whether I did it correctly and how to interpret the results (I consulted the wikipedia entry on CFA).
It seems to me that the 2 factor model is more appropriate than the 1 factor model, although both don't show very good measures of goodness-of-fit.

Overall, I am a bit confused now and not sure how to precede.
What I would like to do is use a variable which represents this scale by the mean value of the 10 items' values for each respondent. (Each of the items uses a 7-point Likert scale.)

This mean score variable should then be used as one of the independent variables in a multinomial logit model.
I am not sure how to do it with two factors which rather measure the same construct (there might be a slight difference in strength of the statements which could justify the two factors, but overall, they measure all the attitude of consumers towards products made in their country vs. foreign-made products).

I also tried Oded Mcdossi's proposal to force factor analysis to produce only one factor. Factor loadings for all variables are >0.6 (except for one variable which is around 0.55). But the uniqueness values are relatively high.
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1362
#12

06 Mar 2016, 05:11

Likert type response sets are ordinal, so a mean is forcing a major assumption on the data (e.g., that all respondents interpreted the response set exactly the same to include the distance between the options). Rather than just saying what you tried, it would be good to show the syntax you used along with the output for additional help. With ordinal scaled response sets, gsem would be a better approach in terms of fitting the CFA (you can use ologit to specify the family and link functions). If you use AIC or BIC for model comparisons you would want the model with the smaller value on the information criterion you use. If you do find that the unidimensional model fits the data better, you could also try fitting a Partial Credit, Graded Response, or Rating Scale model to your data to calibrate the items and get an estimate of theta that will be independent of the item pool (from what you've said you could probably get a second paper out of an IRT approach and if others have done any IRT modeling you could see whether or not using the existing item parameters fits the model better or could check for things like parameter drift and/or DIF).
Comment
Cordula Kiel

Join Date: Feb 2016

Posts: 45
#13

06 Mar 2016, 09:35

I did not really figured out yet how to poste output correctly in here. I installed dataex but am not sure how to use it. =)
Most of the things you just mentioned I have never heard of before and I am not sure whether this might be a bit "over the top", as I am a beginner to that whole topic and need it just for my Master thesis and it's only a small part of it, so I am trying not to do anything which is completely nonsense but also I don't have the capacity to go deep into science here.
I read publications which used the samle scale as a sum of its item's values (also 7-point Likert scale).
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#14

06 Mar 2016, 11:17

-dataex- is for posting sample data. not results. To post results, you should open a code block. You can do that by clicking on the A button at the top of your post window. That opens the advanced editor. Then look for a # button. Click on that. That opens a code block: and you will see the beginning and ending delimiters of the code block. Then copy Stata's output directly from the Results window or your log file and paste them between those delimiters. It is important to show exactly what happened, so do not edit the output in any way!

Since you are doing this for a master's thesis, I suggest you speak to your thesis advisor about what level of statistical depth is expected. The advice you are getting about using -gsem- for CFA, etc., strikes me as appropriate for a master's thesis, and I would expect it in a thesis I advise. But the expectations in your discipline and at your institution may differ. Your thesis advisor should be able to clarify the expectations for you.
2 likes
Comment
Dick Campbell

Join Date: Apr 2014

Posts: 279
#15

06 Mar 2016, 11:29

For many many many years it has been common practice in the social sciences to create a scale by simply summing up the scores on a set of Likert items such as those you have. As Mr. Buchanan states, that practice can not be justified on strictly mathematical grounds. With the availability of greatly increased computational power the kinds of models that Mr. Buchanan describes have become increasingly popular. However, applying them does require a fair amount of study and experience.

There is a long history of simulation studies which show that treating Likert items of the kind you have as though they were continuous measures will often lead to reasonably correct conclusions. See for example: Bollen, K. A. and Barb, K. H. 1981. Pearson's r and coarsely categorized measures. American Sociological Review, Vol. 46, No. 2 (Apr., 1981), pp. 232-239. If you were to search for subseqent papers which cite Bollen and Barb you would find arguments taking both points of view.

In your case, trying to decide between a single or two factor model, the available literature suggests that estimates of factor loadings in CFA are reasonably unbiased when you violate the underlying multivariate normality assumption in standard CFA, but the goodness of fit statistics are not. In particular, the chi-square model fit statistic and various statistics based on it tend to indicate poor fit, particularly when the item distributions are badly skewed. The implication is that you will tend to require more factors to get a good model fit. This is a case, unfortunately, where you would want to use gsem (generalized structural equations) which does not require a multivariate normality assumption. If you are going to pursue this further I'd suggest you take a look at Alan Acock's excellent book Discovering Structural Equation Modeling Using Stata, Revised Edition. . You can find the exact reference on the Stata Press website.

The two factor CFA that you ran should give you as estimate of the correlation between the two factors. In a previous post I suggested that you would find such a correlation and I wouldn't be surprised if it is fairly high. Even so, there is no reason why you can"t compute two scale scores and use them in your multinomial logistic analysis. You will need to acknowledge that what you did is an approximation to a more correct analysis. All research involves compromises of one sort or another. We just need to be honest about them.

Richard T. Campbell
Emeritus Professor of Biostatistics and Sociology
University of Illinois at Chicago
Comment

Announcement

Factor analysis - Not use the factors but the scale mean?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment