Create factor score after multiple imputation

Lisa van Dijk

Join Date: Apr 2020

Posts: 6
#1

Create factor score after multiple imputation

24 May 2022, 04:12

Dear all,

I have a question re: the creation of a factor score after multiple imputation.

In my main analysis, I use a factor score created from six items (using factor items*; then predict f1). As a robustness check, I have create multiple imputations of the missing values on those six items (entered individually).
I now want to recreate the factor score of the six items using the imputed datasets (using mi passive. Yet, it returns the following error code: factor invalid mi passive subcommand.

Your advice is very welcome!
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3842
#2

24 May 2022, 05:36

mi passive does not support factor. That is just a technical issue. The substantively more complicated question is how to create factor scores for multiply imputed data. There is probably literature on that, which I have not read.

Given M imputed dataset, factor will produce M sets of factor loadings. If we view those loadings as population estimates, we might want to combine the loadings over the imputed datasets just like we would combine regression coefficients. We would then use the combined loadings to predict factor scores. Moreover, if we were to produce predictions in the same way as we would for regression models, we would average the predicted values over all datasets. We would then end up with constant factor scores within units across the M imputed datasets.

As an alternative, we might not want to average the predicted factor scores and just keep the imputed-dataset-specific predictions. That would lead to varying factor scores within units, based on the one set of combined loadings.

We could also not combine the factor loadings. That is, we could run both factor and predict for each imputed dataset, basing the factor scores on the imputed-dataset-specific loadings and predictions.

If you tell us which of those alternatives, if any, you want, we can move forward.

Edit/Addendum

While pointing to different possible approaches to obtain predicted values from multiply imputed datasets, my reply falsely implies that mi predict bases predictions on the combined coefficients. After reviewing the manual, I believe that this is not the case. Instead, mi predict produces averages (at the unit level) over the imputed-dataset-specific predicted values based on imputed-dataset-specific coefficients.

Last edited by daniel klein; 24 May 2022, 06:07.
Comment
Lisa van Dijk

Join Date: Apr 2020

Posts: 6
#3

24 May 2022, 08:34

Dear Daniel,

Thank you for your answer.

The option of combining the loadings as to create a combined factor score sounds great. If I understood correctly, that approach mimics what would happen if 'factor' would work under 'mi passive'.
Do you have a concrete suggestion as to how I can implement / specify this approach in my Stata do-file?

Thank you in advance!
Comment
daniel klein

Join Date: Mar 2014

Posts: 3842
#4

24 May 2022, 09:40

Originally posted by Lisa van Dijk View Post

If I understood correctly, that approach mimics what would happen if 'factor' would work under 'mi passive'.

I think you might slightly misunderstand what mi passive does. The prefix command is for creating (or changing) variables; it works only with generate, egen and replace. In contrast, factor does not create or change variables, it produces estimates. Combining the loadings would therefore be more closely related to mi estimate. However, there are a couple of problems related to (exploratory) factor analysis in multiply imputed data, both technically and substantively. Technically, factor does not return e(b), i.e., a coefficient vector, but a loadings matrix (except there is only one factor). Moreover, factor does not compute a covariance matrix for the loadings -- something that mi estimate expects to see. Substantively, there is no guarantee that factor will produce the same number of factors (in the same sequence) in each imputed dataset. Obviously a different number of factors and/or different order would probably pose technical challenges, too. Overall, it is not trivial to get factor to work with mi estimate.

If I wanted to do this, I would start with getting the data into flong style (if they are not already).

Code:

mi convert flong

Then, I would run factor on each imputed dataset and get the predicted values for the first factor

Code:

mi xeq : factor ... ; predict f1

At this point, we could already use the dataset-specific predicted factor scores. And, that might be the better alternative.

Anyway, we could also take this a step further and apply Rubin's rules to the factor scores at the unit-level.

Code:

mi query local M = r(M) sort _mi_id _mi_m by _mi_id (_mi_m) : generate mi_f1 = sum(f1*(_mi_m!=0))/`M' by _mi_id (_mi_m) : replace mi_f1 = mi_f1[_N]

Note that we must exclude the predicted values in the original dataset (_mi_m == 0) when summing up to get the averages. The resulting predicted factor scores will be constant within units. You can now use both, f1 and mi_f1, and see whether the results differ.

Note that the suggested code does nothing to check for an equal number of factors (in the same order) across imputed datasets.

Edit/Addendum

I would like to point to sem as related to factor analyses and another possible route. Because structural equation modeling and CFA and conceptually different topics, I will stick closely to the original question and not go into details.

Last edited by daniel klein; 24 May 2022, 09:51.
Comment

Announcement

Create factor score after multiple imputation

Comment

Comment

Comment