Exploratory Factor Analysis with missing values and ordinal data: EM algorithm and polychoric correlations?

Svane Blume

Join Date: Sep 2019

Posts: 5
#1

Exploratory Factor Analysis with missing values and ordinal data: EM algorithm and polychoric correlations?

25 Sep 2019, 10:02

Dear Statalist users,

I am using Stata 15.1. I am trying to compute an Exploratory factor analysis. I have 32 ordinal variables (4 to 5 point likert scales) on nursing care quality during recent hospital stays. Some of them contain missing values.

The data set contains N=1,727 observations with only 44.5% complete cases. I assume MAR because missingness only depends on the observed variable "service XY was not necessary during my hospital stay".
With my boss, I agreed on using a matrix of EM covariances as input for EFA to adress the missingness (as suggested by John Graham 2009). However, since the variables are ordinal, I would also like to use polychoric correlations. Is there any chance to combine the command

Code:

polychoric

(package from Stata Journal) with

Code:

mi impute mvn varlist, emonly

to obtain a covariance matrix that can be used as input for

Code:

factormat

afterwards? Or is there another way of adressing ordinal data structure when using the EM algorithm?
Any suggestions would be highly appreciated!
Thanks a lot, Svane
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3842
#2

25 Sep 2019, 13:09

Originally posted by Svane Blume View Post

I assume MAR because missingness only depends on the observed variable "service XY was not necessary during my hospital stay".

Usually, when we think about multiple imputation (or related methods), we assume that there is a "true" value that we merely do not observe; it is "masked" by a missing value. From what you describe here, I would not be sure that this assumption is plausible. If someone did not experience a certain type of service, does it really make sense to impute their judgment of the quality of that service?

Technically, you could, for example, impute missing values using a ordered logit model, run polychoric on each imputed dataset, combine the results (perhaps using a proper transformation) and feed the final matrix to factormat. Another possibility might be to run both polychoric and factormat on each imputed dataset and combine the results.

Which approach is best suited, I cannot really tell.

Best
Daniel
1 like
Comment
Svane Blume

Join Date: Sep 2019

Posts: 5
#3

26 Sep 2019, 05:14

Thanks for your thoughts Daniel.

I, too, was wondering whether multiple imputation makes sense given that the question was actually not applicable to the subset of individuals producing the missing values.
But what would be an adaequate alternative? Pairwise deletion is said to produce biased estimates and requires MCAR. So I thought of the imputation as a prediction: "What would have been the individual's judgement of the service quality if he or she had experienced it?" I could swear I've read that multiple imputation is superior to other approaches even in such cases of intentional missing data, but - shame on me - I can't find it at the moment.

Turning back to the technical aspect:
As far as I see, ordered logit imputation models are only available for univaritate regressions. I thought I would need to apply multivariate models and didn't even think about ordered logit (also, I am completely unexperienced and a bit overstrained with the various degrees of freedom for my estimation).
Would you say that univariate ordered logit is definately superior to the multivariate normal regression?

I think I would apply your first approach: run polychoric on each imputed dataset, combine the results and use the resulting matrix for factormat, because applying the second proposal, I'd have to run factor rotation on each imputed dataset as well, which as Lorenzo-Seva & van Ginkel (2016) state, is only possible using consensus rotation which ensures that the rotation results are comparable and can be pooled afterwards. Consensus rotation, however, is not available in Stata.

Is it relatively straightforward to manually combine results of matrices? Time is running and I have to find the best pragmatic solution.

Thanks again!
Best
Kai
Comment
daniel klein

Join Date: Mar 2014

Posts: 3842
#4

26 Sep 2019, 07:35

Originally posted by Svane Blume View Post

I could swear I've read that multiple imputation is superior to other approaches even in such cases of intentional missing data, but - shame on me - I can't find it at the moment.

"Intentional" missing has no clear definition that I am aware of. However, there is a difference between not asking respondents a question for which an answer in principle exists and not asking a question for which there simply is no answer.

Originally posted by Svane Blume View Post

So I thought of the imputation as a prediction

Predictions have their place (e.g., in counterfactual causality frameworks or forecasting at the stock market); only you can judge whether they also make any sense in your case; I am not saying they not.

Originally posted by Svane Blume View Post

But what would be an adaequate alternative?

You could look into gsem which, according to the manual, is an "equationwise" deleter. That seems a very different route and I cannot advise much further without looking into it more closely.

Originally posted by Svane Blume View Post

As far as I see, ordered logit imputation models are only available for univaritate regressions.

No. All univariate methods* can be used in a chained-equation (often called [M]ICE) approach.

Originally posted by Svane Blume View Post

Would you say that univariate ordered logit is definately superior to the multivariate normal regression?

No. I think that using multivariate normal (or the correlation matrix from EM) is not really compatible with the idea of following up with polychoric correlation.

Originally posted by Svane Blume View Post

I think I would apply your first approach: run polychoric on each imputed dataset, combine the results and use the resulting matrix for factormat, because applying the second proposal, I'd have to run factor rotation on each imputed dataset as well, which as Lorenzo-Seva & van Ginkel (2016) state, is only possible using consensus rotation which ensures that the rotation results are comparable and can be pooled afterwards. Consensus rotation, however, is not available in Stata.

Thanks for pointing me to this; I cannot comment because I have not read that (or related) work. If you find the time, we appreciate full citations here on Statalist.

Is it relatively straightforward to manually combine results of matrices? Time is running and I have to find the best pragmatic solution.

That probably depends on your experience with Stata. Here is a quick draft

Code:

*! version 1.0.0 26sep2019 program mipolychoric , rclass version 15 syntax varlist(min = 2 numeric) u_mi_assert_set local M `_dta[_mi_M]' local nvar : word count `varlist' tempname R Z matrix `R' = J(`nvar', `nvar', 0) mi xeq 1/`M' : polychoric `varlist'; /// mata : st_matrix("`Z'", atanh(st_matrix("r(R)"))); /// matrix `R' = `R' + `Z' mata : st_matrix("`R'", tanh(st_matrix("`R'")/`M')) forvalues i = 1/`nvar' { matrix `R'[`i', `i'] = 1 } matrix rownames `R' = `varlist' matrix colnames `R' = `varlist' matlist `R' return matrix R = `R' end

Best
Daniel

* This might not b true for user-defined methods; if so, that should perhaps go on the Wishlist for Stata 17.
Comment
Svane Blume

Join Date: Sep 2019

Posts: 5
#5

26 Sep 2019, 08:05

Thanks a lot for your helpful suggestions and the code!

I will do more research on the different possible paths and will hopefully be able to make a decision afterwards.

The full citation of the paper on multiple imputation for EFA using consensus rotation is the following:

Lorenzo-Seva, U. & van Ginkel, J. R. (2016): Multiple Imputation of missing values in exploratory factor analysis of multidimensional scales: estimating latent trait scores, Anales de Psicología 32 (2): 596.

Best
Kai
Comment

Announcement

Exploratory Factor Analysis with missing values and ordinal data: EM algorithm and polychoric correlations?

Comment

Comment

Comment

Comment