Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Okay. I didn't post the table with the correlation coefficient as it seemed a bit "overkill".

    Thank you, I will try the proposed method.

    Comment


    • #17
      I have had time to return to the data, and I had some colleagues look at it as well. We have not been able to solve this issue. This is why I am returning to your comment, William with some follow-up questions.

      Here is the entire output from the tetrachoric command
      Code:
       tetrachoric requirement funding competition discussion event citizen_science cam
      > paign platform training guiding rules policy network unit strategy standard coop
      > eration res_area sup_res sup_ed
      (obs=217)
      
      matrix with tetrachoric correlations is not positive semidefinite;
        it has 5 negative eigenvalues
        maxdiff(corr,adj-corr) =  0,3721
        (adj-corr: tetrachoric correlations adjusted to be positive semidefinite)
      
                   | requir~t  funding compet~n discus~n    event citize~e campaign
      -------------+---------------------------------------------------------------
       requirement |   1,0000
           funding |   0,4324   1,0000
       competition |  -0,0793   0,3978   1,0000
        discussion |   0,2243  -0,1203  -0,0237   1,0000
             event |  -0,0324   0,5958   0,6756   0,1967   1,0000
      citizen_sc~e |   0,1179   0,7278   0,0196   0,0858   0,7500   1,0000
          campaign |  -0,0662   0,5213  -0,0535   0,1996   0,6992   0,7144   1,0000
          platform |   0,0072  -0,0525   0,3951   0,0028   0,4322   0,3194   0,0325
          training |  -0,1158   0,4146   0,4114   0,4036   0,4363   0,1631   0,1344
           guiding |  -0,0856   0,1019  -0,1200   0,0301  -0,0822   0,1609   0,2408
             rules |   0,5276   0,2780  -0,3215   0,1724  -0,1953  -0,0759  -0,1423
            policy |   0,3280   0,0655   0,0826   0,4743   0,4133   0,2716   0,4290
           network |   0,2335  -0,0340  -1,0000   0,5163   0,1251   0,1201   0,1619
              unit |   0,0249  -0,1447   0,4409   0,0978   0,4082  -0,0638  -0,0774
          strategy |   0,3853   0,3276   0,3085   0,1831   0,2463   0,4806   0,1703
          standard |   0,3935  -0,0214  -0,1923  -0,1689  -0,3900  -0,2688  -0,1765
       cooperation |   0,3161   0,4847  -0,0482   0,3448   0,5356   0,7006   0,7144
          res_area |   0,1842   0,7106   0,1242  -0,1252   0,5347   0,7205   0,5274
           sup_res |   0,5837   0,5239  -0,1151   0,1727   0,1608   0,0809   0,1774
            sup_ed |   0,3814   0,5254   0,2286   0,0858   0,3090  -0,2188   0,1445
      
                   | platform training  guiding    rules   policy  network     unit
      -------------+---------------------------------------------------------------
          platform |   1,0000
          training |   0,2358   1,0000
           guiding |  -0,0352  -0,1662   1,0000
             rules |  -0,0759   0,0277   0,0701   1,0000
            policy |   0,2716   0,3622   0,0701   0,0145   1,0000
           network |   0,2362   0,4568   0,0614   0,4572   0,7082   1,0000
              unit |   0,3637   0,1421   0,1180  -0,0419  -0,0419   0,1168   1,0000
          strategy |   0,0485   0,0263   0,3949   0,0036  -0,1743   0,0887   0,3676
          standard |  -0,0065   0,0206   0,2743   0,4876   0,2042   0,1660  -0,0323
       cooperation |   0,2833   0,5182   0,3405   0,1989   0,6876   0,7500   0,1872
          res_area |   0,1416   0,4321   0,3638   0,2500   0,3215   0,1607  -0,0112
           sup_res |  -0,0281   0,3880  -0,1474   0,7382   0,2875   0,2829   0,0360
            sup_ed |  -0,2188   0,4902   0,1609   0,1728   0,4389   0,2362   0,0175
      
                   | strategy standard cooper~n res_area  sup_res   sup_ed
      -------------+------------------------------------------------------
          strategy |   1,0000
          standard |   0,3688   1,0000
       cooperation |   0,3327   0,1095   1,0000
          res_area |   0,4191  -0,0972   0,6530   1,0000
           sup_res |   0,1619   0,4269   0,5665   0,3928   1,0000
            sup_ed |  -0,0829   0,3397   0,5830   0,0466   0,6018   1,0000
      
      . matrix r = r(Rho)
      
      . factormat r, n(217)
      r not positive (semi)definite
      r(506);
      
      end of do-file
      
      r(506);
      A colleague of mine noticed that the correlation between network and competition is -1. Using the polychoric command, this correlation coefficient equals 0.0. I understand now, why you requested the entire output. It turns out that there are no observations in the dataset that have both networking activities and hosts competitions, Can this be why I get the error message? I tried recoding competition so that the competition=1/network=1 combination was represented in the dataset, i do, however, still get the above error message.

      The requirement that the correlation matrix input to principle components analysis be positive semidefinite is a serious statistical concern, so the error message from factormat about the correlation matrix from tetrachoric is important.
      The problem can be solved with the ,posdef option as you mentioned, but what does this mean? is this legitimate? I am highly confused as I did a similar analysis on other variables in the dataset, which are also all binary. These are on which understandings of responsibility the oranizations worked with and this worked just fine with the polychoricpca and tetrachoric varlist, matrix r = r(Rho)
      factormat r, n(217).
      However, these two methods - that yield the same correlation coefficients - suggest three dimensions for the first method and 5 dimensions for the latter. I belive it should be similar? These analyses are done with all 217 observations in the dataset. I hope you can help me, I am on the verge of quitting this factor analysis all together.
      Last edited by Malene Christensen; 18 Jul 2017, 08:01.

      Comment


      • #18
        Code:
        polychoricpca
        and
        Code:
        tetrachoric varlist
        matrix r = r(Rho)
        factormat r, n(217)
        ... these two methods - that yield the same correlation coefficients - suggest three dimensions for the first method and 5 dimensions for the latter. I belive it should be similar?
        Reviewing the documentation in help polychoricpca and help factormat and help pca suggests that you are comparing apples with oranges. polychoricpca produces a principle components analysis, whereas factormat produces a factor analysis, by default using the principal-factor method, although optionally using the principle-component factor method.

        I think that to be comparable to polychoricpca, you would have to use the pca command rather than the factormat command. Or conversely, going back to your post #1, since you want a factor analysis, polychoricpca is not the tool for you; factormat is. However, it is possible that
        Code:
        factormat r, n(217) pcf
        would produce results more similar to those from polychoricpca.

        Comment


        • #19
          Wow, I have indeed confused these terms! Thank you for clearing this up! I believe that the Principal Component Analysis is the most correct to use in this case, as the primary purpose is data reduction in an explorative out-set. So I need a polychoric PCA - which does not work or I could try to store the tetrachoric/polychoric correlation coefficients and do a pcamat, which also does not work.

          Code:
           polychoricpca requirement funding competition discussion event citizen_science c
          > ampaign platform training guiding rules policy network unit strategy standard co
          > operation res_area sup_res sup_ed
          could not calculate numerical derivatives
          missing values encountered
          could not calculate numerical derivatives
          missing values encountered
          
          Polychoric correlation matrix
          
                               requirement          funding      competition
              requirement                1
                  funding        ,43360211                1
              competition       -,07963118        ,39906673                1
               discussion        ,22511135       -,12062146       -,02376225
                    event       -,03243606        ,59658092        ,67674132
          citizen_science        ,11872629        ,72926096        ,01991276
                 campaign       -,06642385        ,52262024       -,05369341
                 platform        ,00737133       -,05270511        ,39700047
                 training        -,1162005        ,41540034        ,41257345
                  guiding       -,08579849        ,10211343       -,12030967
                    rules        ,52943032         ,2791685       -,32344164
                   policy        ,32958669        ,06590443        ,08317795
                  network        ,23488741       -,03405168                .
                     unit        ,02502208       -,14509878        ,44205054
                 strategy        ,38606621        ,32795366        ,30917024
                 standard        ,39458861       -,02141349       -,19300705
              cooperation        ,31689848        ,48534115       -,04827437
                 res_area        ,18502194        ,71155498        ,12484554
                  sup_res        ,58525384        ,52510344       -,11564683
                   sup_ed        ,38322625        ,52704529        ,22991512
          
                                discussion            event  citizen_science
               discussion                1
                    event        ,19717124                1
          citizen_science        ,08629164        ,75138492                1
                 campaign        ,20043572        ,70035335        ,71610885
                 platform        ,00290081        ,43367535        ,32136945
                 training        ,40441367        ,43696061         ,1639602
                  guiding        ,03021667       -,08227838        ,16150499
                    rules        ,17318865       -,19611518        -,0762605
                   policy        ,47571215        ,41453787        ,27321665
                  network        ,51792978        ,12572792        ,12105779
                     unit        ,09806615        ,40882123        -,0639711
                 strategy        ,18334256        ,24649133        ,48180778
                 standard       -,16930525       -,39110504       -,27012386
              cooperation        ,34534373        ,53610536        ,70199426
                 res_area        -,1256119        ,53572544        ,72206206
                  sup_res        ,17333193        ,16136692        ,08144885
                   sup_ed        ,08629164        ,31020763       -,22033115
          
                                  campaign         platform         training
                 campaign                1
                 platform        ,03285636                1
                 training        ,13500998        ,23688703                1
                  guiding        ,24149222       -,03522224       -,16647947
                    rules       -,14311739        -,0762605        ,02786731
                   policy        ,43079963        ,27321665        ,36346845
                  network        ,16301905        ,23786434        ,45842578
                     unit       -,07768877        ,36506261        ,14251282
                 strategy        ,17071145        ,04870872        ,02642428
                 standard       -,17717211       -,00640102        ,02070659
              cooperation        ,71552201        ,28435191        ,51882571
                 res_area        ,52891711        ,14245858        ,43315516
                  sup_res        ,17836936       -,02810993        ,38906015
                   sup_ed        ,14547083       -,22033115        ,49178715
          
                                   guiding            rules           policy
                  guiding                1
                    rules        ,07038232                1
                   policy        ,07038232        ,01469897                1
                  network        ,06168779        ,45937531        ,71012239
                     unit        ,11815959       -,04198881       -,04198881
                 strategy        ,39487928         ,0036789        -,1747876
                 standard        ,27461083         ,4889252        ,20498582
              cooperation         ,3407695        ,19958837        ,68885162
                 res_area        ,36451458        ,25116821        ,32285854
                  sup_res       -,14781197        ,73962198        ,28886311
                   sup_ed        ,16150499        ,17401462         ,4409355
          
                                   network             unit         strategy
                  network                1
                     unit        ,11746568                1
                 strategy        ,08910018        ,36786705                1
                 standard        ,16677031       -,03233762        ,36906466
              cooperation        ,75144149        ,18751041        ,33280116
                 res_area        ,16164549       -,01119367        ,41976814
                  sup_res        ,28442846         ,0362096        ,16230451
                   sup_ed        ,23786434        ,01768492       -,08308129
          
                                  standard      cooperation         res_area
                 standard                1
              cooperation        ,10972283                1
                 res_area       -,09751973        ,65382663                1
                  sup_res        ,42791047        ,56748995        ,39403875
                   sup_ed        ,34095752        ,58448547        ,04701396
          
                                   sup_res           sup_ed
                  sup_res                1
                   sup_ed        ,60365551                1
          matrix symeigen: matrix has missing values
          r(504);
          Code:
          . tetrachoric requirement funding competition discussion event citizen_science cam
          > paign platform training guiding rules policy network unit strategy standard coop
          > eration res_area sup_res sup_ed
          (obs=217)
          
          matrix with tetrachoric correlations is not positive semidefinite;
            it has 5 negative eigenvalues
            maxdiff(corr,adj-corr) =  0,3721
            (adj-corr: tetrachoric correlations adjusted to be positive semidefinite)
          
                       | requir~t  funding compet~n discus~n    event citize~e campaign
          -------------+---------------------------------------------------------------
           requirement |   1,0000
               funding |   0,4324   1,0000
           competition |  -0,0793   0,3978   1,0000
            discussion |   0,2243  -0,1203  -0,0237   1,0000
                 event |  -0,0324   0,5958   0,6756   0,1967   1,0000
          citizen_sc~e |   0,1179   0,7278   0,0196   0,0858   0,7500   1,0000
              campaign |  -0,0662   0,5213  -0,0535   0,1996   0,6992   0,7144   1,0000
              platform |   0,0072  -0,0525   0,3951   0,0028   0,4322   0,3194   0,0325
              training |  -0,1158   0,4146   0,4114   0,4036   0,4363   0,1631   0,1344
               guiding |  -0,0856   0,1019  -0,1200   0,0301  -0,0822   0,1609   0,2408
                 rules |   0,5276   0,2780  -0,3215   0,1724  -0,1953  -0,0759  -0,1423
                policy |   0,3280   0,0655   0,0826   0,4743   0,4133   0,2716   0,4290
               network |   0,2335  -0,0340  -1,0000   0,5163   0,1251   0,1201   0,1619
                  unit |   0,0249  -0,1447   0,4409   0,0978   0,4082  -0,0638  -0,0774
              strategy |   0,3853   0,3276   0,3085   0,1831   0,2463   0,4806   0,1703
              standard |   0,3935  -0,0214  -0,1923  -0,1689  -0,3900  -0,2688  -0,1765
           cooperation |   0,3161   0,4847  -0,0482   0,3448   0,5356   0,7006   0,7144
              res_area |   0,1842   0,7106   0,1242  -0,1252   0,5347   0,7205   0,5274
               sup_res |   0,5837   0,5239  -0,1151   0,1727   0,1608   0,0809   0,1774
                sup_ed |   0,3814   0,5254   0,2286   0,0858   0,3090  -0,2188   0,1445
          
                       | platform training  guiding    rules   policy  network     unit
          -------------+---------------------------------------------------------------
              platform |   1,0000
              training |   0,2358   1,0000
               guiding |  -0,0352  -0,1662   1,0000
                 rules |  -0,0759   0,0277   0,0701   1,0000
                policy |   0,2716   0,3622   0,0701   0,0145   1,0000
               network |   0,2362   0,4568   0,0614   0,4572   0,7082   1,0000
                  unit |   0,3637   0,1421   0,1180  -0,0419  -0,0419   0,1168   1,0000
              strategy |   0,0485   0,0263   0,3949   0,0036  -0,1743   0,0887   0,3676
              standard |  -0,0065   0,0206   0,2743   0,4876   0,2042   0,1660  -0,0323
           cooperation |   0,2833   0,5182   0,3405   0,1989   0,6876   0,7500   0,1872
              res_area |   0,1416   0,4321   0,3638   0,2500   0,3215   0,1607  -0,0112
               sup_res |  -0,0281   0,3880  -0,1474   0,7382   0,2875   0,2829   0,0360
                sup_ed |  -0,2188   0,4902   0,1609   0,1728   0,4389   0,2362   0,0175
          
                       | strategy standard cooper~n res_area  sup_res   sup_ed
          -------------+------------------------------------------------------
              strategy |   1,0000
              standard |   0,3688   1,0000
           cooperation |   0,3327   0,1095   1,0000
              res_area |   0,4191  -0,0972   0,6530   1,0000
               sup_res |   0,1619   0,4269   0,5665   0,3928   1,0000
                sup_ed |  -0,0829   0,3397   0,5830   0,0466   0,6018   1,0000
          
          . matrix r = r(R)
          
          . pcamat r, n(217)
          matrix r has missing values
          r(504);
          
          end of do-file
          
          r(504);
          If I exclude either the Network variable or the Competition variable, it works, which is highly perculiar! The only odd thing here is, as already mentioned, that there are no observations in the dataset which score 1 on both there variables:

          Code:
                    |      competition
             network |         0          1 |     Total
          -----------+----------------------+----------
                   0 |       168         28 |       196
                   1 |        21          0 |        21
          -----------+----------------------+----------
               Total |       189         28 |       217
          I don't understand, however, why this should be a problem to the PCA on tetrachoric coefficients? Can anyone enlighten me on this?
          Last edited by Malene Christensen; 19 Jul 2017, 03:41.

          Comment


          • #20
            In general, when working with categorical variables like network and competition, if knowing that in your data one of the variables has a particular value (e.g. network=1) allows you to state that another variable has a particular value (e.g. competition=0), the methodology breaks down. The best estimate of P{competition=0 given network=1} is 1, but do you really believe that if you have 217 million observations not a single value of competition=1 and network=1 will occur? Let's look at the following example.

            I start by reproducing your data for network and competition (which I shorten to net and comp), and create a third binary variable z.
            Code:
            . summarize
            
                Variable |        Obs        Mean    Std. Dev.       Min        Max
            -------------+---------------------------------------------------------
                     net |        217    .0967742    .2963336          0          1
                    comp |        217    .1290323    .3360108          0          1
                       z |        217    .4976959    .5011507          0          1
            
            . tab comp net
            
                       |          net
                  comp |         0          1 |     Total
            -----------+----------------------+----------
                     0 |       168         21 |       189 
                     1 |        28          0 |        28 
            -----------+----------------------+----------
                 Total |       196         21 |       217
            Now let's look at what happens when we try to model comp as a function of net and z.
            Code:
            . logit comp net z
            
            note: net != 0 predicts failure perfectly
                  net dropped and 21 obs not used
            
            Iteration 0:   log likelihood = -80.382798  
            Iteration 1:   log likelihood = -78.760538  
            Iteration 2:   log likelihood = -78.731712  
            Iteration 3:   log likelihood = -78.731702  
            Iteration 4:   log likelihood = -78.731702  
            
            Logistic regression                             Number of obs     =        196
                                                            LR chi2(1)        =       3.30
                                                            Prob > chi2       =     0.0692
            Log likelihood = -78.731702                     Pseudo R2         =     0.0205
            
            ------------------------------------------------------------------------------
                    comp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                     net |          0  (omitted)
                       z |   .7548408   .4237117     1.78   0.075    -.0756188      1.5853
                   _cons |  -2.208275   .3331501    -6.63   0.000    -2.861237   -1.555312
            ------------------------------------------------------------------------------
            In your data, every observation with net != 0 has the dependent variable comp == 0, and that is what logit tells us in the note at the top of it's output. It cannot deal with that, so it drops the variable net and the 21 observations for which net != 0. The objective of this is to show you that the problem you are experiencing is not unique to polychoric and tetrachoric.

            Now let's look at tetrachoric.
            Code:
            . tetrachoric comp net z
            (obs=217)
            
            matrix with tetrachoric correlations is not positive semidefinite;
              it has 1 negative eigenvalue
              maxdiff(corr,adj-corr) =  0.0661
              (adj-corr: tetrachoric correlations adjusted to be positive semidefinite)
            
                         |     comp      net        z
            -------------+---------------------------
                    comp |   1.0000 
                     net |  -1.0000   1.0000 
                       z |   0.2235   0.1723   1.0000 
            
            . matrix r = r(Rho)
            
            . matrix symeigen e v = r
            
            . matrix list v
            
            v[1,3]
                        e1          e2          e3
            r1   2.0013614   1.0716784  -.07303978
            
            . pcamat r, n(217)
            r not positive (semi)definite
            Same sort of results you got - tetrachoric tells us the correlation matrix is not positive semidefinite, that adjusting it to be positive seimidefinite would result in changing correlations by no more than 0.0661, and shows us the correlation matrix, in which the correlation between net and comp is shown as -1.0. For later reference, I capture and display the three eigenvalues, and we indeed see that the smallest is negative. And then pcmat declines to perform.

            Now let's try tetrachoric with the posdef option to actually do the adjusting referred to in the previous output.
            Code:
            . tetrachoric comp net z, posdef
            (obs=217)
            
            matrix with tetrachoric correlations is not positive semidefinite;
              it has 1 negative eigenvalue
              maxdiff(corr,adj-corr) =  0.0661
              (adj-corr: tetrachoric correlations adjusted to be positive semidefinite)
            
                adj-corr |     comp      net        z
            -------------+---------------------------
                    comp |   1.0000 
                     net |  -0.9339   1.0000 
                       z |   0.2068   0.1567   1.0000 
            
            . matrix r = r(Rho)
            
            . matrix symeigen e v = r
            
            . matrix list v
            
            v[1,3]
                        e1          e2          e3
            r1   1.9352714   1.0647286  -6.661e-16
            
            . pcamat r, n(217)
            
            Principal components/correlation                 Number of obs    =        217
                                                             Number of comp.  =          2
                                                             Trace            =          3
                Rotation: (unrotated = principal)            Rho              =     1.0000
            
                --------------------------------------------------------------------------
                   Component |   Eigenvalue   Difference         Proportion   Cumulative
                -------------+------------------------------------------------------------
                       Comp1 |      1.93527      .870543             0.6451       0.6451
                       Comp2 |      1.06473      1.06473             0.3549       1.0000
                       Comp3 |            0            .             0.0000       1.0000
                --------------------------------------------------------------------------
            
            Principal components (eigenvectors) 
            
                ------------------------------------------------
                    Variable |    Comp1     Comp2 | Unexplained 
                -------------+--------------------+-------------
                        comp |   0.7104    0.1483 |           0 
                         net |  -0.7027    0.2040 |           0 
                           z |   0.0393    0.9677 |           0 
                ------------------------------------------------
            We note that as advertised, when we compare the adjusted correlations to the earlier unadjusted correlations, the largest adjustment was to the correlation between comp and net, which changed from -1.0 to -0.9339. We see that the third eigenvalue has been set to (effectively) zero, the first two are somewhat different, and that pcamat uses the adjusted correlation matrix to produce two principal components that account for 100% of the variance.

            Not reported here, i also tried tetrachoric with the zeroadjust option. It succeeded in producing a positive semidefinite correlation matrix and pcmat produced three principle components. I don't encourage this approach: fiddling with the data is a somewhat older approach with little theoretical justification; and I like the fact that tetrachoric, posdef reduces the number of principle components, which feels a lot like logit dropping the variable.

            Finally, for completeness, polychoric.
            Code:
            . polychoric comp net z
            could not calculate numerical derivatives
            missing values encountered
            could not calculate numerical derivatives
            missing values encountered
            
            Polychoric correlation matrix
            
                       comp        net          z
            comp          1
             net          .          1
               z  .22407535  .17295066          1
            
            . matrix r = r(R)
            
            . matrix list r
            
            symmetric r[3,3]
                       comp        net          z
            comp          1
             net          .          1
               z  .22407535  .17295066          1
            
            . pcamat r, n(217)
            matrix r has missing values
            I note that polychoric reports a missing correlation between net and comp, rather than -1.0 as in tetrachoric. It provides no options for accommodating the problem data, and pcmat is of course unable to extract principle components. My own feeling on this is that tetrachoric reporting -1.0 correlation is appropriate - after all, whenever one of the variables is 1 the other is zero. Perhaps an expert would feel otherwise.

            Comment

            Working...
            X