Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Exploratory Factor Analysis with missing values and ordinal data: EM algorithm and polychoric correlations?

    Dear Statalist users,

    I am using Stata 15.1. I am trying to compute an Exploratory factor analysis. I have 32 ordinal variables (4 to 5 point likert scales) on nursing care quality during recent hospital stays. Some of them contain missing values.

    The data set contains N=1,727 observations with only 44.5% complete cases. I assume MAR because missingness only depends on the observed variable "service XY was not necessary during my hospital stay".
    With my boss, I agreed on using a matrix of EM covariances as input for EFA to adress the missingness (as suggested by John Graham 2009). However, since the variables are ordinal, I would also like to use polychoric correlations. Is there any chance to combine the command
    Code:
    polychoric
    (package from Stata Journal) with
    Code:
    mi impute mvn varlist, emonly
    to obtain a covariance matrix that can be used as input for
    Code:
    factormat
    afterwards? Or is there another way of adressing ordinal data structure when using the EM algorithm?
    Any suggestions would be highly appreciated!
    Thanks a lot, Svane


  • #2
    Originally posted by Svane Blume View Post
    I assume MAR because missingness only depends on the observed variable "service XY was not necessary during my hospital stay".
    Usually, when we think about multiple imputation (or related methods), we assume that there is a "true" value that we merely do not observe; it is "masked" by a missing value. From what you describe here, I would not be sure that this assumption is plausible. If someone did not experience a certain type of service, does it really make sense to impute their judgment of the quality of that service?

    Technically, you could, for example, impute missing values using a ordered logit model, run polychoric on each imputed dataset, combine the results (perhaps using a proper transformation) and feed the final matrix to factormat. Another possibility might be to run both polychoric and factormat on each imputed dataset and combine the results.

    Which approach is best suited, I cannot really tell.

    Best
    Daniel

    Comment


    • #3
      Thanks for your thoughts Daniel.

      I, too, was wondering whether multiple imputation makes sense given that the question was actually not applicable to the subset of individuals producing the missing values.
      But what would be an adaequate alternative? Pairwise deletion is said to produce biased estimates and requires MCAR. So I thought of the imputation as a prediction: "What would have been the individual's judgement of the service quality if he or she had experienced it?" I could swear I've read that multiple imputation is superior to other approaches even in such cases of intentional missing data, but - shame on me - I can't find it at the moment.

      Turning back to the technical aspect:
      As far as I see, ordered logit imputation models are only available for univaritate regressions. I thought I would need to apply multivariate models and didn't even think about ordered logit (also, I am completely unexperienced and a bit overstrained with the various degrees of freedom for my estimation).
      Would you say that univariate ordered logit is definately superior to the multivariate normal regression?

      I think I would apply your first approach: run polychoric on each imputed dataset, combine the results and use the resulting matrix for factormat, because applying the second proposal, I'd have to run factor rotation on each imputed dataset as well, which as Lorenzo-Seva & van Ginkel (2016) state, is only possible using consensus rotation which ensures that the rotation results are comparable and can be pooled afterwards. Consensus rotation, however, is not available in Stata.

      Is it relatively straightforward to manually combine results of matrices? Time is running and I have to find the best pragmatic solution.

      Thanks again!
      Best
      Kai

      Comment


      • #4
        Originally posted by Svane Blume View Post
        I could swear I've read that multiple imputation is superior to other approaches even in such cases of intentional missing data, but - shame on me - I can't find it at the moment.
        "Intentional" missing has no clear definition that I am aware of. However, there is a difference between not asking respondents a question for which an answer in principle exists and not asking a question for which there simply is no answer.

        Originally posted by Svane Blume View Post
        So I thought of the imputation as a prediction
        Predictions have their place (e.g., in counterfactual causality frameworks or forecasting at the stock market); only you can judge whether they also make any sense in your case; I am not saying they not.

        Originally posted by Svane Blume View Post
        But what would be an adaequate alternative?
        You could look into gsem which, according to the manual, is an "equationwise" deleter. That seems a very different route and I cannot advise much further without looking into it more closely.

        Originally posted by Svane Blume View Post
        As far as I see, ordered logit imputation models are only available for univaritate regressions.
        No. All univariate methods* can be used in a chained-equation (often called [M]ICE) approach.

        Originally posted by Svane Blume View Post
        Would you say that univariate ordered logit is definately superior to the multivariate normal regression?
        No. I think that using multivariate normal (or the correlation matrix from EM) is not really compatible with the idea of following up with polychoric correlation.

        Originally posted by Svane Blume View Post
        I think I would apply your first approach: run polychoric on each imputed dataset, combine the results and use the resulting matrix for factormat, because applying the second proposal, I'd have to run factor rotation on each imputed dataset as well, which as Lorenzo-Seva & van Ginkel (2016) state, is only possible using consensus rotation which ensures that the rotation results are comparable and can be pooled afterwards. Consensus rotation, however, is not available in Stata.
        Thanks for pointing me to this; I cannot comment because I have not read that (or related) work. If you find the time, we appreciate full citations here on Statalist.

        Is it relatively straightforward to manually combine results of matrices? Time is running and I have to find the best pragmatic solution.
        That probably depends on your experience with Stata. Here is a quick draft

        Code:
        *! version 1.0.0 26sep2019
        program mipolychoric , rclass
            version 15
            
            syntax varlist(min = 2 numeric)
            
            u_mi_assert_set
            local M `_dta[_mi_M]'
            
            local nvar : word count `varlist'
            
            tempname R Z
            
            matrix `R' = J(`nvar', `nvar', 0)
            
            mi xeq 1/`M' : polychoric `varlist';                              ///
                           mata : st_matrix("`Z'", atanh(st_matrix("r(R)"))); ///
                           matrix `R' = `R' + `Z'
            
            mata : st_matrix("`R'", tanh(st_matrix("`R'")/`M'))
            forvalues i = 1/`nvar' {
                matrix `R'[`i', `i'] = 1
            }
            matrix rownames `R' = `varlist'
            matrix colnames `R' = `varlist'
            
            matlist `R'
            
            return matrix R = `R'
        end
        Best
        Daniel


        * This might not b true for user-defined methods; if so, that should perhaps go on the Wishlist for Stata 17.

        Comment


        • #5
          Thanks a lot for your helpful suggestions and the code!

          I will do more research on the different possible paths and will hopefully be able to make a decision afterwards.

          The full citation of the paper on multiple imputation for EFA using consensus rotation is the following:

          Lorenzo-Seva, U. & van Ginkel, J. R. (2016): Multiple Imputation of missing values in exploratory factor analysis of multidimensional scales: estimating latent trait scores, Anales de PsicologĂ­a 32 (2): 596.

          Best
          Kai

          Comment

          Working...
          X