Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with 11x11 correlation matrix in SEM

    Hi All,

    I am trying to run an SEM model with specified number of observations, means, SDs, and correlation matrix for 11 variables. It appears that Stata cannot handle a correlation matrix with 11 variables in SEM because when I run the syntax, I get the following error message: “matrix not positive semidefinite. One or more numeric values are incorrect because real data can generate only positive semidefinite covariance or correlation matrices. r(459);”
    The issue is not about numeric values because I tried many other numbers in the correlation matrix and got the same error message. However, when I reduce the number of variables to 10 and have a 10 by 10 correlation matrix, it runs ok as expected. I appreciate any help.

    Here is my entire code:

    set more off

    clear all

    ssd init knsh knut knen knem perform inte soca innov masu trans know


    ssd set obs 1520
    ssd set means 3.24 4.79 3.82 4.87 3.55 4.72 3.53 4.07 4.01 4.33 3.53
    ssd set sd 1.08 1.44 1.09 1.99 1.45 1.98 1.72 0.97 1.86 1.17 1.16

    #delimit ;
    ssd set corr 1 \
    0.96 1 \
    0.40 0.52 1 \
    0.48 0.63 0.54 1 \
    0.46 0.33 0.72 0.28 1 \
    0.37 0.77 0.57 0.58 0.23 1 \
    0.62 0.63 0.59 0.55 0.39 0.29 1 \
    0.55 0.14 0.21 0.70 0.64 0.77 0.46 1 \
    0.31 0.27 0.29 0.18 0.36 0.34 0.45 0.30 1 \
    0.52 0.39 0.72 0.81 0.33 0.62 0.17 0.45 0.60 1 \
    0.43 0.34 0.20 0.47 0.24 0.44 0.68 0.38 0.33 0.46 1;

    #delimit cr

    ssd describe

    ssd list


    * RESEARCH MODEL

    sem (KS@1 -> knsh) (KU@1 -> knut) (KEnh2@1 -> knen) (KEmbed@1 -> knem) ///
    (Performance@1 -> perform) (IT@1 -> inte) (SC@1 -> soca) ///
    (Innovation@1 -> innov) (MS@1 -> masu) (TMS@1 -> trans) (K@1 -> know) ///
    (IT SC MS -> TMS) (KEnh2 -> K) (IT SC MS TMS -> KEnh2) (IT SC MS TMS K KEnh2 -> KEmbed) (IT SC MS TMS K KEmbed -> KS) ///
    (IT SC MS TMS K KS -> KU) (KU -> Perfomance), ///
    latent(KS KU KEnh2 KEmbed Performance IT SC Innovation MS TMS K) ///
    reliability (knsh .99999 knut .99999 knen .99999 knem .99999 perform .99999 inte .99999 soca .99999 innov .99999 masu .99999 trans .99999 know .99999) ///
    covstruct(_lexogenous, unstructured) nocapslatent

    estat eqgof
    estat gof, stats(all)
    estat mindices

  • #2
    Well, if you input those same numbers and save them as a matrix and then calculate its eigenvalues (-help matrix symeigen-) you will find that the last two are negative. So Stata is quite correct that this matrix is not positive definite, and so it cannot be a true correlation matrix of any set of variables.

    Comment


    • #3
      Following up on Clyde's explanation, can you tell us how the correlation matrix you are using was calculated from your original data? What command did you use?

      If perhaps you used pwcorr rather than correlate, and if these variables have observations with missing values, that could explain the problem, since the pairwise correlations from pwcorr would be calculated using different subsets of the observations, and the result is not a true correlation matrix.

      Comment


      • #4
        Adding on to William Lisowski's wisdom, another source of "correlation" matrices that are not positive definite is tetrachoric correlation.

        Comment


        • #5
          Thank you Clyde and William. The correlation matrix is the result of meta-analyses. So they are corrected correlations for measurement and sampling errors. But I made numerous attempts changing the correlations to different numbers and I still get the same error.

          Comment


          • #6
            Well, without knowing what specifically you did when changing the correlations to different numbers are, it isn't possible to comment. Here is an example that demonstrates that there is no general problem with 11x11 correlation matrices:

            Code:
            . sysuse auto, clear
            (1978 Automobile Data)
            
            .
            . corr price-foreign
            (obs=69)
            
                         |    price      mpg    rep78 headroom    trunk   weight   length     turn displa~t gear_r~o  foreign
            -------------+---------------------------------------------------------------------------------------------------
                   price |   1.0000
                     mpg |  -0.4559   1.0000
                   rep78 |   0.0066   0.4023   1.0000
                headroom |   0.1112  -0.3996  -0.1480   1.0000
                   trunk |   0.3232  -0.5798  -0.1572   0.6608   1.0000
                  weight |   0.5478  -0.8055  -0.4003   0.4795   0.6691   1.0000
                  length |   0.4425  -0.8037  -0.3606   0.5240   0.7326   0.9478   1.0000
                    turn |   0.3302  -0.7355  -0.4961   0.4347   0.6008   0.8610   0.8631   1.0000
            displacement |   0.5479  -0.7434  -0.4119   0.4763   0.6287   0.9316   0.8621   0.8124   1.0000
              gear_ratio |  -0.3802   0.6565   0.4103  -0.3790  -0.5107  -0.7906  -0.7232  -0.7005  -0.8381   1.0000
                 foreign |  -0.0174   0.4538   0.5922  -0.3347  -0.4053  -0.6460  -0.6110  -0.6768  -0.6383   0.7266   1.0000
            
            
            . matrix C = r(C)
            
            . matrix list C
            
            symmetric C[11,11]
                                 price           mpg         rep78      headroom         trunk        weight        length          turn  displacement    gear_ratio       foreign
                   price             1
                     mpg    -.45594895             1
                   rep78     .00655327     .40234039             1
                headroom     .11124345    -.39958535    -.14799818             1
                   trunk     .32320651     -.5798199     -.1572433     .66078419             1
                  weight     .54783956     -.8055198    -.40034415     .47946639     .66913796             1
                  length     .44245747    -.80367632    -.36056548     .52396633     .73257959      .9478298             1
                    turn     .33020127    -.73545844    -.49613079     .43468557     .60079461     .86102259     .86309638             1
            displacement     .54792381    -.74336291    -.41194287     .47629768     .62869657     .93162119     .86206376     .81237497             1
              gear_ratio    -.38023757      .6565194     .41034002    -.37904863    -.51069855    -.79060384    -.72317304    -.70051512    -.83805559             1
                 foreign    -.01736391     .45383397     .59223686    -.33468341    -.40528928    -.64598391    -.61103826    -.67683603    -.63831697     .72655293             1
            
            .
            . clear all
            
            .
            . ssd init knsh knut knen knem perform inte soca innov masu trans know
            
            Summary statistics data initialized.  Next use, in any order,
            
                ssd set observations (required)
                    It is best to do this first.
            
                ssd set means (optional)
                    Default setting is 0.
            
                ssd set variances or ssd set sd (optional)
                    Use this only if you have set or will set correlations and, even then, this is optional but highly recommended.  Default setting is 1.
            
                ssd set covariances or ssd set correlations (required)
            
            .
            .
            . ssd set obs 1520
              (value set)
            
                Status:
                                   observations:    set
                                          means:  unset
                                variances or sd:  unset
                    covariances or correlations:  unset (required to be set)
            
            . ssd set means 3.24 4.79 3.82 4.87 3.55 4.72 3.53 4.07 4.01 4.33 3.53
              (values set)
            
                Status:
                                   observations:    set
                                          means:    set
                                variances or sd:  unset
                    covariances or correlations:  unset (required to be set)
            
            . ssd set sd 1.08 1.44 1.09 1.99 1.45 1.98 1.72 0.97 1.86 1.17 1.16
              (values set)
            
                Status:
                                   observations:    set
                                          means:    set
                                variances or sd:    set
                    covariances or correlations:  unset (required to be set)
            
            .
            . #delimit ;
            delimiter now ;
            . ssd set corr
            >          1 \
            > -.45594895 1 \
            > .00655327 .4023403 1 \
            > .11124345 -.39958535 -.14799818 1 \
            > .32320651 -.5798199 -.1572433 .66078419 1 \
            > .54783956 -.8055198 -.40034415 .47946639 .66913796 1 \
            > .44245747 -.80367632 -.36056548 .52396633 .73257959 .9478298 1 \
            > .33020127 -.73545844 -.49613079 .43468557 .60079461 .86102259 .86309638 1 \
            > .54792381 -.74336291 -.41194287 .47629768 .62869657 .93162119 .86206376 .81237497 1 \
            > -.38023757 .6565194 .41034002 -.37904863 -.51069855 -.79060384 -.72317304 -.70051512 -.83805559 1 \
            > -.01736391 .45383397 .59223686 -.33468341 -.40528928 -.64598391 -.61103826 -.67683603 -.63831697 .72655293 1;
              (values set)
            
                Status:
                                   observations:    set
                                          means:    set
                                variances or sd:    set
                    covariances or correlations:    set
            
            . #delimit cr
            delimiter now cr
            And, with this toy data, your SEM model runs fine, at least it does after you correct the typo in the (KU->Performance) equation.

            Comment


            • #7
              Thanks a lot, Clyde. Will look into it.

              Comment


              • #8
                It doesn't matter how many unsuccessful arbitrary changes you make, what Stata tells you is correct: your matrix is not positive semidefinite and thus cannot be used for SEM (or for even as simple a task as OLS).

                Without some idea of the Stata commands you used that produced the not-quite-correlation matrix, there's not much more I can advise.

                I will add that I believe I have seen,, possibly even used, a technique that "adjusts" elements in a non-positive-semidefinite matrix to yield a positive semidefinite matrix "close" to the original, but I cannot recall what that technique/command/option was.

                Comment


                • #9
                  Thank you William.

                  Comment


                  • #10
                    Hello to all,

                    Thank you to everyone in this thread, who gave quite an informative and helpful discussion for me to read.
                    I hope it is permissible to take the opportunity to ask a related question:
                    In my little experience of running SEM (using a full dataset rather than just the covariance structure), I have never been told by STATA that my data structure is not positive definitive. I always generate the matrix and check it by myself and they are always positive definitive. However, there is always a part of me that fears that I may have failed to notice problems while checking it by myself.

                    That Soheil has encountered such an error message prompts me to wonder if STATA actually does tell you when the matrix generated is not positive definitive, and that if a model is successfully run and estimated, then the data matrix is guaranteed to be positive definitive?

                    Thank you in advance for your kind help!

                    Kai-Yuan

                    Comment

                    Working...
                    X