Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Factor analysis steps in Stata

    Dear Stata users,

    I have an unbalanced panel data set on six World Bank governance indicators. Their theoretical range is from 0 to 100.

    Code:
    . su
    
        Variable |       Obs        Mean    Std. Dev.       Min        Max
    -------------+--------------------------------------------------------
         country |         0
            year |      4066    2007.684    5.992421       1996       2017
           voice |      3936    50.01535    29.02587          0        100
       stability |      3901    50.03778    29.05658          0        100
          effect |      3889    50.02948    29.02575          0        100
    -------------+--------------------------------------------------------
            regq |      3889      50.025    29.03567          0        100
             law |      3961    50.03459    29.04241          0        100
      corruption |      3903    50.04943    29.06459          0        100
    I would like to create a "Governance quality" index from these six indicators for visualizations purposes using factor analysis . The correlation rate between the indicators varies from 0.66 to 0.94.
    Code:
     . corr voice stability effect regq law corruption
    (obs=3828)
    
               |    voice stabil~y   effect     regq      law corrup~n
    -------------+------------------------------------------------------
           voice |   1.0000
       stability |   0.7117   1.0000
          effect |   0.7639   0.7078   1.0000
            regq |   0.7724   0.6572   0.9307   1.0000
             law |   0.8344   0.7971   0.9249   0.8952   1.0000
      corruption |   0.7934   0.7776   0.9105   0.8563   0.9379   1.0000
    This is the code I use to perform the analysis and to create the index.


    Code:
    . factor voice stability effect regq law corruption, factors(1)
    (obs=3828)
    
    Factor analysis/correlation                        Number of obs    =     3828
        Method: principal factors                      Retained factors =        1
        Rotation: (unrotated)                          Number of params =        6
    
        --------------------------------------------------------------------------
             Factor  |   Eigenvalue   Difference        Proportion   Cumulative
        -------------+------------------------------------------------------------
            Factor1  |      4.95288      4.78361            0.9866       0.9866
            Factor2  |      0.16927      0.14594            0.0337       1.0203
            Factor3  |      0.02333      0.05405            0.0046       1.0249
            Factor4  |     -0.03072      0.00477           -0.0061       1.0188
            Factor5  |     -0.03549      0.02353           -0.0071       1.0118
            Factor6  |     -0.05902            .           -0.0118       1.0000
        --------------------------------------------------------------------------
        LR test: independent vs. saturated:  chi2(15) = 3.3e+04 Prob>chi2 = 0.0000
    
    Factor loadings (pattern matrix) and unique variances
    
        ---------------------------------------
            Variable |  Factor1 |   Uniqueness 
        -------------+----------+--------------
               voice |   0.8428 |      0.2896  
           stability |   0.7933 |      0.3706  
              effect |   0.9506 |      0.0964  
                regq |   0.9208 |      0.1521  
                 law |   0.9782 |      0.0432  
          corruption |   0.9512 |      0.0952  
        ---------------------------------------
    
    . rotate
    
    Factor analysis/correlation                        Number of obs    =     3828
        Method: principal factors                      Retained factors =        1
        Rotation: orthogonal varimax (Kaiser off)      Number of params =        6
    
        --------------------------------------------------------------------------
             Factor  |     Variance   Difference        Proportion   Cumulative
        -------------+------------------------------------------------------------
            Factor1  |      4.95288            .            0.9866       0.9866
        --------------------------------------------------------------------------
        LR test: independent vs. saturated:  chi2(15) = 3.3e+04 Prob>chi2 = 0.0000
    
    Rotated factor loadings (pattern matrix) and unique variances
    
        ---------------------------------------
            Variable |  Factor1 |   Uniqueness 
        -------------+----------+--------------
               voice |   0.8428 |      0.2896  
           stability |   0.7933 |      0.3706  
              effect |   0.9506 |      0.0964  
                regq |   0.9208 |      0.1521  
                 law |   0.9782 |      0.0432  
          corruption |   0.9512 |      0.0952  
        ---------------------------------------
    
    Factor rotation matrix
    
        -----------------------
                     | Factor1 
        -------------+---------
             Factor1 |  1.0000 
        -----------------------
    
    
    . predict index
    (regression scoring assumed)
    
    Scoring coefficients (method = regression; based on varimax rotated factors)
    
        ------------------------
            Variable |  Factor1 
        -------------+----------
               voice |  0.06952 
           stability |  0.04774 
              effect |  0.19615 
                regq |  0.11562 
                 law |  0.43183 
          corruption |  0.17631 
        ------------------------
    
    
    . su index
    
        Variable |       Obs        Mean    Std. Dev.       Min        Max
    -------------+--------------------------------------------------------
           index |      3828    1.44e-10    .9896982  -1.763607   1.790659
    And here is what I would like to ask some advice on. Based on my understanding of the process and results, creating one index from the six indicators, rather than two or more indices, for example, makes perfect sense in this case. Only one factor has eigenvalue above 1, it explains 98.7 percent of the variance, and according to the factor loadings its highly collinear with each of the six indicators. Is this logic reasonable? Are there any others checks I need to perform to be reasonably confident that one factor is sufficient with these data?

    I am using Stata/SE 13.1 on Windows 10 x64.

    Thank you!

  • #2
    Your interpretation of EFA seems reasonable. The results are consistent with a one-factor solution. If you are only retaining one factor, you do not need to rotate anything - you can see that the rotated factor solution is identical to the unrotated one. There's nothing to rotate. Although it does look like predicting factor scores requires you to use rotate.

    If those six items had been designed a priori to be a unidimensional scale, I think you could have simply used CFA and checked the model fit indices under a single factor solution. You would probably have presented one of the accepted fit indices.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      @Weiwen Thank you for the feedback.

      Comment


      • #4
        I remembered another related question I wanted to ask. How should one interpret the "LR test: independent vs. saturated" test after "factor" command? What is the null hypothesis and what literature is the test derived from (to refer to in a paper)?

        Comment

        Working...
        X