Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stumped: CFA not converging

    Hello Community:

    I'm having issues getting a simple CFA model to converge: sem (LATENT -> observed1 observed2)

    Sem gets stuck on a particular log likelihood value "(not concave)". After many iterations it concludes "Warning: convergence not achieved."

    observed1 is a binary variable
    observed2 is a categorical (ordinal) variable, from 1-5

    I've tried standardizing (z-score) the variables, but it doesn't seem to make a difference. I've also tried dropping data with missing values. That doesn't work either. And the chapter 12 https://www.stata.com/manuals13/semintro12.pdf methods didn't work either. Only one approach has worked so far, but it is so elaborate that I no longer trust the results: using the "stand" and "difficult" and "init" options all combined.


    What could be wrong that is causing it not to converge?



    Many Thanks for your assistance!

  • #2
    Show us your data as

    Code:
    tabulate observed1 observed2 
    
    tabulate observed1 observed2, missing
    Expecting a linear composite to underlie that may be optimistic.

    z-scores are just a linear rescaling and won't make the data any more suitable for this than they were originally.

    Comment


    • #3
      Thanks for the speedy response! I am trying to replicate a Factor variable conducted in other research with similar data to mine.

      Here are the data:


      | observed2
      observed1 | 1 2 3 4 5 | Total
      -----------+-------------------------------------------------------+----------
      0 No | 1,087 9 971 2,349 386 | 4,802
      1 Yes | 1,770 35 1,607 3,665 747 | 7,824
      -----------+-------------------------------------------------------+----------
      Total | 2,857 44 2,578 6,014 1,133 | 12,626


      . tab observed1 observed2, missing

      | observed2
      observed1 | 1 2 3 4 5 . | Total
      -----------+------------------------------------------------------------------+----------
      0 No | 1,087 9 971 2,349 386 9 | 4,811
      1 Yes | 1,770 35 1,607 3,665 747 48 | 7,872
      . | 67 0 195 131 0 29 | 422
      -----------+------------------------------------------------------------------+----------
      Total | 2,924 44 2,773 6,145 1,133 86 | 13,105

      Click image for larger version

Name:	Untitled.gif
Views:	1
Size:	7.8 KB
ID:	1435647

      Last edited by Zach Rodgers; 22 Mar 2018, 05:44.

      Comment


      • #4
        Thanks for sharing the data. In essence, your two variables don't have enough in common to make a composite summary interesting or useful. They are practically uncorrelated. I don't know much at all about structural equation models, but a PCA shows this without fuss and indeed the correlation that is nearly zero predicts the futility of either exercise.

        Small tips for anyone watching: pushing a small table of counts through tabi if there are no data in memory has the useful side-effect of leaving behind the equivalent dataset as new data when you finish. Also, Zach's tables are clear enough but a graph (tabplot from the Stata Journal) perhaps makes the lack of correlation (also lack of association) clearer.

        For your wider project, whatever it is, both variables could be helpful but there is little or no scope for, and no gain from, searching for a composite, whether it's called latent or is just a pragmatic linear combination. That's my diagnosis.


        Code:
        . clear 
        
        . tabi 1087 9 971 2349 386 \ 1770 35 1607 3665 747 
        
                   |                          col
               row |         1          2          3          4          5 |     Total
        -----------+-------------------------------------------------------+----------
                 1 |     1,087          9        971      2,349        386 |     4,802 
                 2 |     1,770         35      1,607      3,665        747 |     7,824 
        -----------+-------------------------------------------------------+----------
             Total |     2,857         44      2,578      6,014      1,133 |    12,626 
        
                  Pearson chi2(4) =  16.1575   Pr = 0.003
        
        . 
        . replace row  = row - 1
        (10 real changes made)
        
        . rename (row col) (observed1 observed2)
        
        . expand pop 
        (12,616 observations created)
        
        . corr observed* 
        (obs=12,626)
        
                     | observ~1 observ~2
        -------------+------------------
           observed1 |   1.0000
           observed2 |   0.0026   1.0000
        
        
        . pca observed*
        
        Principal components/correlation                 Number of obs    =     12,626
                                                         Number of comp.  =          2
                                                         Trace            =          2
            Rotation: (unrotated = principal)            Rho              =     1.0000
        
            --------------------------------------------------------------------------
               Component |   Eigenvalue   Difference         Proportion   Cumulative
            -------------+------------------------------------------------------------
                   Comp1 |      1.00265    .00529494             0.5013       0.5013
                   Comp2 |      .997353            .             0.4987       1.0000
            --------------------------------------------------------------------------
        
        Principal components (eigenvectors) 
        
            ------------------------------------------------
                Variable |    Comp1     Comp2 | Unexplained 
            -------------+--------------------+-------------
               observed1 |   0.7071    0.7071 |           0 
               observed2 |   0.7071   -0.7071 |           0 
            ------------------------------------------------
        
        . tabplot observed* , showval bfcolor(none)
        Click image for larger version

Name:	nothingdoing.png
Views:	1
Size:	14.8 KB
ID:	1435655

        Last edited by Nick Cox; 22 Mar 2018, 06:27.

        Comment


        • #5
          I don't know what these variables are, but there is a hint that grade 2 on observed2 is very unusual. Why would people (?) avoid it or not admit it?

          Comment


          • #6
            Awesome, Thanks so much for the insight, Nick! Very helpful. I'll see if I can reach the authors of the studies that used my variables as a factor.


            I thought that a CFA would still converge for poor or even zero correlation, but with poor evaluative values (RMSEA, etc.). But it sounds like poor correlation can actually prevent convergence entirely?


            Is it possible that the authors of the other study put enough other variables in the model that observed1 and observed2 correlation emerged? I'm assuming that the correlation is independently calculated. If non-zero correlation is a pre-condition of CFA model convergence, it's hard for me to imagine how this could have been calculated.

            Comment


            • #7
              P.S. Funny how much more intuitive it is to view tabplot than the raw table outputs.

              Comment


              • #8
                I haven't studied the internals of sem but my guess is that it's using a very general algorithm and you're presenting it with an extreme case. There is almost nothing to find in your example and the command is having a hard time finding it.

                I really can't speculate helpfully about another study and the other variables in that study given zero information on either. But in principle it is possible that conditionally on other variables more structure could be found.

                My prejudice is that the fancier the models in a sub-field, the less skill and attention is often devoted to looking at the data carefully, wrongly dismissed or even deprecated as too elementary to engage many researchers. Conversely, although you have named nothing this looks loosely like social survey data with a big sample, so weak relationships are typical, yet the sample size allows significance at conventional levels.

                It's salutary that the chi-square test in #4 shows a P-value many researchers would be delighted to see, but the association is very weak. Yet again, if the variables were named, it is possible that a slight difference between groups is exactly what would make sense to researchers and they just want to quantify it.
                Last edited by Nick Cox; 23 Mar 2018, 05:20.

                Comment


                • #9
                  #7 Not funny to me! I've devoted a fair fraction of my career to trying to get people to look at their data....

                  Comment

                  Working...
                  X