Problem with 11x11 correlation matrix in SEM

Soheil Goodarzi

Join Date: Nov 2019

Posts: 5
#1

Problem with 11x11 correlation matrix in SEM

25 Nov 2019, 10:05

Hi All,

I am trying to run an SEM model with specified number of observations, means, SDs, and correlation matrix for 11 variables. It appears that Stata cannot handle a correlation matrix with 11 variables in SEM because when I run the syntax, I get the following error message: “matrix not positive semidefinite. One or more numeric values are incorrect because real data can generate only positive semidefinite covariance or correlation matrices. r(459);”
The issue is not about numeric values because I tried many other numbers in the correlation matrix and got the same error message. However, when I reduce the number of variables to 10 and have a 10 by 10 correlation matrix, it runs ok as expected. I appreciate any help.

Here is my entire code:

set more off

clear all

ssd init knsh knut knen knem perform inte soca innov masu trans know

ssd set obs 1520
ssd set means 3.24 4.79 3.82 4.87 3.55 4.72 3.53 4.07 4.01 4.33 3.53
ssd set sd 1.08 1.44 1.09 1.99 1.45 1.98 1.72 0.97 1.86 1.17 1.16

#delimit ;
ssd set corr 1 \
0.96 1 \
0.40 0.52 1 \
0.48 0.63 0.54 1 \
0.46 0.33 0.72 0.28 1 \
0.37 0.77 0.57 0.58 0.23 1 \
0.62 0.63 0.59 0.55 0.39 0.29 1 \
0.55 0.14 0.21 0.70 0.64 0.77 0.46 1 \
0.31 0.27 0.29 0.18 0.36 0.34 0.45 0.30 1 \
0.52 0.39 0.72 0.81 0.33 0.62 0.17 0.45 0.60 1 \
0.43 0.34 0.20 0.47 0.24 0.44 0.68 0.38 0.33 0.46 1;

#delimit cr

ssd describe

ssd list

* RESEARCH MODEL

sem (KS@1 -> knsh) (KU@1 -> knut) (KEnh2@1 -> knen) (KEmbed@1 -> knem) ///
(Performance@1 -> perform) (IT@1 -> inte) (SC@1 -> soca) ///
(Innovation@1 -> innov) (MS@1 -> masu) (TMS@1 -> trans) (K@1 -> know) ///
(IT SC MS -> TMS) (KEnh2 -> K) (IT SC MS TMS -> KEnh2) (IT SC MS TMS K KEnh2 -> KEmbed) (IT SC MS TMS K KEmbed -> KS) ///
(IT SC MS TMS K KS -> KU) (KU -> Perfomance), ///
latent(KS KU KEnh2 KEmbed Performance IT SC Innovation MS TMS K) ///
reliability (knsh .99999 knut .99999 knen .99999 knem .99999 perform .99999 inte .99999 soca .99999 innov .99999 masu .99999 trans .99999 know .99999) ///
covstruct(_lexogenous, unstructured) nocapslatent

estat eqgof
estat gof, stats(all)
estat mindices
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30153
#2

25 Nov 2019, 10:30

Well, if you input those same numbers and save them as a matrix and then calculate its eigenvalues (-help matrix symeigen-) you will find that the last two are negative. So Stata is quite correct that this matrix is not positive definite, and so it cannot be a true correlation matrix of any set of variables.
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

25 Nov 2019, 10:50

Following up on Clyde's explanation, can you tell us how the correlation matrix you are using was calculated from your original data? What command did you use?

If perhaps you used pwcorr rather than correlate, and if these variables have observations with missing values, that could explain the problem, since the pairwise correlations from pwcorr would be calculated using different subsets of the observations, and the result is not a true correlation matrix.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30153
#4

25 Nov 2019, 10:55

Adding on to William Lisowski's wisdom, another source of "correlation" matrices that are not positive definite is tetrachoric correlation.
1 like
Comment
Soheil Goodarzi

Join Date: Nov 2019

Posts: 5
#5

25 Nov 2019, 11:35

Thank you Clyde and William. The correlation matrix is the result of meta-analyses. So they are corrected correlations for measurement and sampling errors. But I made numerous attempts changing the correlations to different numbers and I still get the same error.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30153

25 Nov 2019, 11:57

Well, without knowing what specifically you did when changing the correlations to different numbers are, it isn't possible to comment. Here is an example that demonstrates that there is no general problem with 11x11 correlation matrices:

Code:

. sysuse auto, clear
(1978 Automobile Data)

.
. corr price-foreign
(obs=69)

             |    price      mpg    rep78 headroom    trunk   weight   length     turn displa~t gear_r~o  foreign
-------------+---------------------------------------------------------------------------------------------------
       price |   1.0000
         mpg |  -0.4559   1.0000
       rep78 |   0.0066   0.4023   1.0000
    headroom |   0.1112  -0.3996  -0.1480   1.0000
       trunk |   0.3232  -0.5798  -0.1572   0.6608   1.0000
      weight |   0.5478  -0.8055  -0.4003   0.4795   0.6691   1.0000
      length |   0.4425  -0.8037  -0.3606   0.5240   0.7326   0.9478   1.0000
        turn |   0.3302  -0.7355  -0.4961   0.4347   0.6008   0.8610   0.8631   1.0000
displacement |   0.5479  -0.7434  -0.4119   0.4763   0.6287   0.9316   0.8621   0.8124   1.0000
  gear_ratio |  -0.3802   0.6565   0.4103  -0.3790  -0.5107  -0.7906  -0.7232  -0.7005  -0.8381   1.0000
     foreign |  -0.0174   0.4538   0.5922  -0.3347  -0.4053  -0.6460  -0.6110  -0.6768  -0.6383   0.7266   1.0000


. matrix C = r(C)

. matrix list C

symmetric C[11,11]
                     price           mpg         rep78      headroom         trunk        weight        length          turn  displacement    gear_ratio       foreign
       price             1
         mpg    -.45594895             1
       rep78     .00655327     .40234039             1
    headroom     .11124345    -.39958535    -.14799818             1
       trunk     .32320651     -.5798199     -.1572433     .66078419             1
      weight     .54783956     -.8055198    -.40034415     .47946639     .66913796             1
      length     .44245747    -.80367632    -.36056548     .52396633     .73257959      .9478298             1
        turn     .33020127    -.73545844    -.49613079     .43468557     .60079461     .86102259     .86309638             1
displacement     .54792381    -.74336291    -.41194287     .47629768     .62869657     .93162119     .86206376     .81237497             1
  gear_ratio    -.38023757      .6565194     .41034002    -.37904863    -.51069855    -.79060384    -.72317304    -.70051512    -.83805559             1
     foreign    -.01736391     .45383397     .59223686    -.33468341    -.40528928    -.64598391    -.61103826    -.67683603    -.63831697     .72655293             1

.
. clear all

.
. ssd init knsh knut knen knem perform inte soca innov masu trans know

Summary statistics data initialized.  Next use, in any order,

    ssd set observations (required)
        It is best to do this first.

    ssd set means (optional)
        Default setting is 0.

    ssd set variances or ssd set sd (optional)
        Use this only if you have set or will set correlations and, even then, this is optional but highly recommended.  Default setting is 1.

    ssd set covariances or ssd set correlations (required)

.
.
. ssd set obs 1520
  (value set)

    Status:
                       observations:    set
                              means:  unset
                    variances or sd:  unset
        covariances or correlations:  unset (required to be set)

. ssd set means 3.24 4.79 3.82 4.87 3.55 4.72 3.53 4.07 4.01 4.33 3.53
  (values set)

    Status:
                       observations:    set
                              means:    set
                    variances or sd:  unset
        covariances or correlations:  unset (required to be set)

. ssd set sd 1.08 1.44 1.09 1.99 1.45 1.98 1.72 0.97 1.86 1.17 1.16
  (values set)

    Status:
                       observations:    set
                              means:    set
                    variances or sd:    set
        covariances or correlations:  unset (required to be set)

.
. #delimit ;
delimiter now ;
. ssd set corr
>          1 \
> -.45594895 1 \
> .00655327 .4023403 1 \
> .11124345 -.39958535 -.14799818 1 \
> .32320651 -.5798199 -.1572433 .66078419 1 \
> .54783956 -.8055198 -.40034415 .47946639 .66913796 1 \
> .44245747 -.80367632 -.36056548 .52396633 .73257959 .9478298 1 \
> .33020127 -.73545844 -.49613079 .43468557 .60079461 .86102259 .86309638 1 \
> .54792381 -.74336291 -.41194287 .47629768 .62869657 .93162119 .86206376 .81237497 1 \
> -.38023757 .6565194 .41034002 -.37904863 -.51069855 -.79060384 -.72317304 -.70051512 -.83805559 1 \
> -.01736391 .45383397 .59223686 -.33468341 -.40528928 -.64598391 -.61103826 -.67683603 -.63831697 .72655293 1;
  (values set)

    Status:
                       observations:    set
                              means:    set
                    variances or sd:    set
        covariances or correlations:    set

. #delimit cr
delimiter now cr

And, with this toy data, your SEM model runs fine, at least it does after you correct the typo in the (KU->Performance) equation.

Comment

Soheil Goodarzi

Join Date: Nov 2019

Posts: 5
#7

25 Nov 2019, 12:06

Thanks a lot, Clyde. Will look into it.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#8

25 Nov 2019, 12:06

It doesn't matter how many unsuccessful arbitrary changes you make, what Stata tells you is correct: your matrix is not positive semidefinite and thus cannot be used for SEM (or for even as simple a task as OLS).

Without some idea of the Stata commands you used that produced the not-quite-correlation matrix, there's not much more I can advise.

I will add that I believe I have seen,, possibly even used, a technique that "adjusts" elements in a non-positive-semidefinite matrix to yield a positive semidefinite matrix "close" to the original, but I cannot recall what that technique/command/option was.
Comment
Soheil Goodarzi

Join Date: Nov 2019

Posts: 5
#9

25 Nov 2019, 13:16

Thank you William.
Comment
Kai-Yuan Cheng

Join Date: Apr 2017

Posts: 10
#10

30 Apr 2020, 09:33

Hello to all,

Thank you to everyone in this thread, who gave quite an informative and helpful discussion for me to read.
I hope it is permissible to take the opportunity to ask a related question:
In my little experience of running SEM (using a full dataset rather than just the covariance structure), I have never been told by STATA that my data structure is not positive definitive. I always generate the matrix and check it by myself and they are always positive definitive. However, there is always a part of me that fears that I may have failed to notice problems while checking it by myself.

That Soheil has encountered such an error message prompts me to wonder if STATA actually does tell you when the matrix generated is not positive definitive, and that if a model is successfully run and estimated, then the data matrix is guaranteed to be positive definitive?

Thank you in advance for your kind help!

Kai-Yuan
Comment

Announcement

Problem with 11x11 correlation matrix in SEM

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment