How to resolve errors on: polychoric: Matrix R has missing values when using Stata 12

Katherine Picho

Join Date: Apr 2014

Posts: 32
#1

How to resolve errors on: polychoric: Matrix R has missing values when using Stata 12

26 Nov 2014, 07:22

Hi
I currently use Stata 12.
I have 25 test items assessing different components of clinical reasoning. Scores for each item range from 0-2 such that: 0 = wrong diagnosis, 1 = partially correct and 2 = Correct.

I have to compute test item reliability for each facet of clinical reasoning; because the items are scored in a ranked ordinal fashion as step #1, I must obtain a polychoric matrix & use that to compute ordinal reliability coefficients.

I have used the following command:

Code:

~~ polychoric x* matrix r = r(R) factormat r, n(20) factors(1)

However, I am running into problems with obtaining this matrix; possibly because of the nature of my variables. ~~my variables do have missing values...some more than others.... Also, the possible range of scores for each item range from 0-2, and some test items were done very well (i.e. almost everyone got a '2') and others were done 50/50 (i.e most 1s)...and some were so difficult that most were left blank (i.e. uncompleted). For example, I have a qtn 17 that has blanks except for one individual/observation that scored a ‘1’.

As such, I keep getting errors messages to the tune of:

1. no variability in qtn17, (or other such qtns)
2. cant calculate numeric derivatives,
missing values encountered...
No variables defined.

3. Matrix R has missing values.
when I check the error code for #3, I get the message from stata that

see below:

Code:

could not calculate numerical derivatives missing values encountered could not calculate numerical derivatives missing values encountered numerical derivatives are approximate nearby values are missing numerical derivatives are approximate nearby values are missing matrix r has missing values r(504);

~~

Code:

~~ matrix has missing values; This return code is now infrequently used because, beginning with version 8, Stata now permits missing values in matrices.”

But if Stata now permits missing values in matrices (& im using Stata 12)...what is the option to specify that for that so that I can get the command to run ...?

How do I address missing values or lack of variability in items so as to get the polychoric matrix!! I'm pretty sure there has to be a way because I can’t imagine that there aren't datasets without missing values...

Any help on this would be greatly appreciated.
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3850
#2

26 Nov 2014, 07:38

I have no clear idea here, but you might have some more fundamental problems beyond the fact that missing values are now present. Maybe instead of asking how to deal with them, you migth want to consider where they come from, and whether this problem needs to be fixed in the first place. You might find this discussion useful.

Best
Daniel
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#3

26 Nov 2014, 08:13

Stata may permit missing values in matrices, but that's not going to help calculations such as eigenvalue-eigenvector extraction from a correlation matrix. It's true that some operations such as matrix addition are easy to extend when missing values are present, but how would you define the eigenvalues, for example?

There is no forcing here. Either you omit observations with missing values, or you get into imputation.

(polychoric is user-written from http://web.missouri.edu/~kolenikovs/stata as you are asked to specify.)

(Please note the request to use full real names here. In your case, you can't expect that everyone will remember your signature on some previous posts as Katherine Picho. It's easiest just to contact the administrators and get them to change your identifier.)
1 like
Comment
Katherine Picho

Join Date: Apr 2014

Posts: 32
#4

26 Nov 2014, 08:30

Thank you for your recommendations on the Matrix situation. I will take a crack at imputation....

** (also, sent an email to have the identifier changed. Thanks!)
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#5

26 Nov 2014, 11:35

Katherine --

MI may just work, now that I think about it. If -polychoric- can run off of multiply imputed datasets, to get one complete and usable matrix, then you're over the hump -- even if -factormat- doesn't recognize multiple matrices to impute from (which I'm sure it can't), all you need is the final one combining information from all the imputed ones.

Really messy no matter how you look at it, and I had another complication. Is your n of 20 after listwise, or before listwise? If it's after, then maybe not so bad depending on how many cases you ost along the way. If it's before, and (if I remember correctly) you have 25 items, then your factor analysis is pretty sketchy, even if -factormat- doesn't know the true sample size. See for example, https://www.encorewiki.org/display/~...actor+Analysis for a nice write-up. If I knew your sample size was so small to begin with, I might have told you it was hopeless or near-hopeless.

Which brings us back to an idea I proposed in another thread (that's dropped off the front page), in http://www.statalist.org/forums/foru...using-stata-12 I suggested running it in pieces, with just six variables in one analysis, five in another, etc. on the assumption the factor structure is already known. Does anybody see any problems with this approach to end-running sample-size issues? The ultimate goal is alpha, not determining factor structure.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#6

26 Nov 2014, 13:31

The ultimate goal is alpha, not determining factor structure.

Did not follow this one closely, but if the factor structure, i.e. the underlying dimensions, is/are not of interest, then why would a factor analysis be necessary in the first place?

This points to yet other suggestions.

Best
Daniel
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#7

26 Nov 2014, 15:48

Daniel -- Well, maybe Alpha is obsolete, but Katherine had a reasonably recent article that claimed the proper approach was to run factor analysis *on* the polychorics, then you take the loadings, uniquenesses/communalities from the factor analysis to compute Alpha. I actually found the post you found, and mentioned it to her, but it seemed simpler to run the factor analysis and compute Alpha in Excel (after the factor analysis, the formula is trivial, but using matrices from within Stata like Coveney did is not trivial).

But the super-small n is a new twist, one that may make the endeavor even more iffy than it is.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#8

27 Nov 2014, 03:52

Originally posted by ben earnhart;n486301[...

but Katherine had a reasonably recent article that claimed the proper approach was to run factor analysis *on* the polychorics, then you take the loadings, uniquenesses/communalities from the factor analysis to compute Alpha.

Thanks for clarifying.

Best
Daniel
Comment
Joana M. Lima

Join Date: Mar 2015

Posts: 11
#9

21 Dec 2016, 10:15

Hi,
I would like to follow up on this thread as I am currently struggling with a similar situation.

I am using STATA 13 to undertake a factor analysis of 126 variables collected in 168 Countries.

The vast majority of these variables are binary and capture the presence (variable value=1) or absence (variable value =0) of a policy. A smaller number of these variables may take three values ( Presence=1; Undetermined=0.5;Absence=0).

The code I used is as follows:

findit polychoric
polychoric B*
/*display r(sum_w)*/
/*global N = r(sum_w)*/
matrix r = r(R)
factormat r, n(168) factors(20)

I encounter a very similar problem to the one you discussed here, the response I obtain is

"matrix r has missing values"

I find this quite puzzling because my dataset has no missing values, yet the actual matrix displays correlations as missing.

Additionally, the error message (r(504) )states that " This return code is now infrequently used because, beginning with version 8, Stata now permits missing values in matrices."

I would very much appreciate your help on how to proceed.
Joana
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#10

21 Dec 2016, 11:28

Well, I don't know if this will help or not, as the occurrence of missing values in the correlation matrix when there are no missing data probably implies some other problem with the data that makes it difficult or impossible to identify the underlying latent variables or something like that. But you can try emulating polychoric correlation using David Roodman's' -cmp- command, which you can get at -net describe st0224_1, from(http://www.stata-journal.com/software/sj12-4)-. The -cmp- help file has an example for tetrachoric correlation, and polychoric would work analogously, but using an oprobit link. The main drawback is that you will not get a matrix of correlations, but rather coefficients stored in e(b) which you would then have to extract into a matrix to use with -factormat-. With 126 variables that might be something of a chore, but it's doable with some nested -foreach- loops.

Not responsive directly to your question, I would also raise a doubt whether the three-valued variables you describe are suitable for polychoric correlation. Polychoric correlation is for ordinal variables. It seems to me that treating "Undetermined" as ordered between Presence and Absence is a stretch. Now, it may be that in your particular context, this really does make sense (e.g. if the policy consists of several components and "Undetermined" is being used to indicate that some but not all of the components). But generically, "Undetermined" seems more analogous to a missing value.
2 likes
Comment
Joana M. Lima

Join Date: Mar 2015

Posts: 11
#11

21 Dec 2016, 11:56

I will recode and try again. Thank you for your input!
Comment
Joana M. Lima

Join Date: Mar 2015

Posts: 11
#12

05 Jan 2017, 08:52

Dear Clyde,
Happy New Year!
I followed your advise and re-coded my variables so to obtain a dataset with 168 binary variables with no missing data, and used a tetrachoric correlation.
This is my code

tetrachoric B*
display r(sum_w)
global N = r(sum_w)
matrix r = r(R)
factormat r, n(168) factors(30)

Much to my frustration, given that I know for a fact that I have no missing values, I keep getting the error message

"matrix r has missing values". Furthermore, r(504) tells me that "This return code is now infrequently used because, beginning
with version 8, Stata now permits missing values in matrices"

Should I just abandon the idea of a factor analysis on this dataset completely?
Thank you again,
Joana
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#13

05 Jan 2017, 09:39

Um, first try running -which tetrachoric-. The current version is version 2.2.0 28feb2015. If that's not what you have, you need to -update- your Stata. If it is what you have, then your code above is wrong because -tetrachoric- returns the correlation matrix in r(Rho), not r(R). There is no r(R) after -tetrachoric-, and attempting to use it will result in an entirely empty matrix which, of course, cannot be factored. Sorry I didn't notice this sooner.

If, however, you are really using r(Rho) in your code, you could try a few things:

1. Inspect matrix r to see where the missing values are arising. You might then go back and see if the variables heading those rows/columns are problematic in your data in some way. How does an ordinary Pearson correlation matrix among these variables look: are the values that turn up missing with tetrachoric unusual in in the Pearson correlation matrix in some way (e.g. very close to 1 or -1)? Perhaps it suffices to remove those variables from your model and work with just the remaining ones?

2. As mentioned in #10 you can try to estimate these correlations using the -cmp- command. That won't give them to you wrapped up in a neat matrix: you will find them in e(b) instead. You can then write some code to create a matrix from those values and proceed with that. If -cmp- also gives you missing values for some of the tetrachoric correlations, then it probably just can't be done and you may have to abandon ship.

3. If you have a particular correlation structure you are anticipating, you could skip this exploratory step and do a confirmatory factor analysis using -gsem-.

Completely as an aside, kill that -global N = r(sum_w)- command. You aren't even using global N (at least in what you show so far). But more important, global macros are unsafe programming because they are subject to name clashes with other global macros, including global macros being used by other programs which may be running without your being aware of them. So you may be changing a global macro that some other program needs to use, or, that other program might change the value of global N "behind your back" between the time you create it and when you need its value. It is much safer to use local macros, and you should always prefer locals to globals except when there is no alternative. (And, I'll even go a step further and say if you really need to use global macros it is usually a sign of poor program architecture with inadequate information hiding and independent functional structure.)
Comment
Joana M. Lima

Join Date: Mar 2015

Posts: 11
#14

06 Jan 2017, 04:03

Dear Clyde,

Thank you so very much for your guidance! I followed up with some further reading and came up with this:

tetrachoric B*, posdef
matrix C = r(corr)
matrix symeigen eigenvectors eigenvalues = C
matrix list eigenvalues
factormat C, n(168) ipf factor(26)

It works!

Thank you again,

Joana
Comment
Joana M. Lima

Join Date: Mar 2015

Posts: 11
#15

31 Mar 2017, 03:34

Hello,

I return to this thread because I am attempting to replicate my findings using the same dataset as in the previous post. This time I am using one less observation, so a total of 167 observations. I ran the same code:

tetrachoric B*, posdef
matrix C = r(corr)
matrix symeigen eigenvectors eigenvalues = C
matrix list eigenvalues
factormat C, n(167) ipf factor(26)

Bizarrely, I now obtain an error message

C invalid; correlation outside [-1,1] found

I find this strange because when I inspect the correlation matrix visually I don't find any such correlation, i.e. outside the range.

Has anyone encountered this problem? What does this mean?

Thank you for your help,

Joana
Comment

Announcement

How to resolve errors on: polychoric: Matrix R has missing values when using Stata 12

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment