Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Obtaining mean zero, standardized factor scores after principal factor analysis of polychoric correlation matrix

    Dear all,

    After running, principal factor analysis on the polychoric correlation matrix of my (ordinal scale) items (factormat matrix, pcf) I saved the predicted factor scores to be used for further analysis. However, my new variables are not zero mean standardized. They are not even, normally distributed, exhibiting some degree of skewness. I attach their basic descriptives. Can someone, help me to understand why this is the case and whether there is a way of correcting for this. As in, what is the way of obtaining, zero mean standardized factor scores after factormat command in STATA?

    Best regards,
    R
    Attached Files

  • #2
    Welcome to Statalist. Please read the FAQ to learn more about effective posting. Attached picture files are typically unreadable, and for showing output from Stata the best method is to copy directly from either Stata's Results window or your log file and pasting into the forum editor in a code block. That said, your attachment happens to be readable on my computer today. Also, when showing results, it is really important to show the command(s) that gave those results along with it so we know exactly what you did and what you got.

    The distributions of factor scores depends on the distributions of the underlying variables. They will seldom be normal--basically only if each of the underlying variables has a normal distributions--and even then it is not necessarily so. So for this, it is your expectations, rather than Stata's output that needs adjustment.

    As for having them standardized, they don't necessarily come out that way when Stata creates them. Factor scores are inherently dimensionless, so you can apply any linear or affine transformation that you like to them and have results that are equivalent for most purposes. They usually do come out centered very close to zero, but various rounding errors etc., sometimes lead to means that are slightly off zero. The results that you show are surprising. You don't show us the command you issued to produce those results. I'm guessing that you mistakenly -summarized- the variables you factored rather than the factor scores. However, as you also don't show us the -factormat- command you used, it is possible that you mis-specified that, for example by not specifying the -means()- and -sds()- options.

    So I think you need to repost your question, showing us everything from the -factormat- command through its results and then the -predict- command you used to calculate the factor scores, and its results (and -summarize- output for the factor scores.)

    Comment


    • #3
      Dear Clyde,

      Thank you very much for the time you took to answer my post. I am sorry for not following the forum rules earlier. Below I reproduce the Stata commands (bold charachters) that took me to the final results.


      global insight_new insight10 insight09 insight03 insight11 insight06 insight08 insight07 insight04
      . codebook $insight_new, compact
      Variable Obs Unique Mean Min Max Label
      insight10 16194 5 3.435408 1 5 My self-discipline is good
      insight09 16194 5 2.959121 1 5 I do not have a tendency to procrastinate
      insight03 16194 5 3.600408 1 5 I find planning my study independently, easy
      insight11 16194 5 2.957453 1 5 I spend enough time on my studies
      insight06 16194 5 3.322836 1 5 I can study well, generally
      insight08 16194 5 2.589848 1 5 I find putting effort into uninteresting parts of my study, easy
      insight07 16194 5 3.754539 1 5 I am satisfied with the study performance that I've accomplished so far
      insight04 16194 5 3.447326 1 5 My activities outside the study, does not prevent me from focusing on my studies


      ***Tabulation of one of the variables as an example****


      . tab insight10
      My
      self-discipline
      is good Freq. Percent Cum.
      Totally disagree 569 3.51 3.51
      2 2,585 15.96 19.48
      3 4,544 28.06 47.54
      4 6,218 38.40 85.93
      Totally agree 2,278 14.07 100.00
      Total 16,194 100.00

      ****Then I proceeded with the following commands****

      polychoric $insight_new
      display r(sum_w)
      global N = r(sum_w)
      matrix R = r(R)
      factormat R, n($N) mineigen(1) blanks(.4) pcf
      predict f1
      rename f1 determination
      label var determination "F1 score of PCF analysis on insight variables, unidimensional"


      ****Summary statistics of the resulting factor variable***
      . sum determination
      Variable Obs Mean Std. Dev. Min Max
      determinattion 16194 4.414692 1.058 1.359118 6.795589

      Comment


      • #4
        So, it is as I suspected. Your -factormat- command does not specify the -means()- and -sds()- options. Consequently, when you run -predict-, -predict- assumes that the underying variables have mean 0 and sd 1, which is manifestly not the case.

        From the help file for -factormat-:

        sds(matname2) specifies a k x 1 or 1 x k matrix with the standard deviations of the variables. The row or column names should match the variable names, unless the names() option is
        specified. sds() may be specified only if matname is a correlation matrix. Specify sds() if you have variables in your dataset and want to use predict after factormat. sds()
        does not affect the computations of factormat but provides information so that predict does not assume that the standard deviations are one.

        means(matname3) specifies a k x 1 or 1 x k matrix with the means of the variables. The row or column names should match the variable names, unless the names() option is specified.
        Specify means() if you have variables in your dataset and want to use predict after factormat. means() does not affect the computations of factormat but provides information so
        that predict does not assume the means are zero.

        Comment


        • #5
          Sir, I appreciate this help of yours very much! Thanks for pointing it out!! I am not an advanced Stata user, as you can tell. I did give it a go, by resolving the issue as following. Could you please, remark whether the steps I took were correct.

          ***Continuing with the variables I introduced above***

          tabstat $insight_new, stat(mean) save

          ***With tabstat, I saved the vector under the name mean
          tabstatmat mean

          .mat list mean

          ***Stata output***
          mean[1,8]
          insight10 insight09 insight03 insight11 insight06 insight08 insight07 insight04
          mean 3.4354082 2.9591207 3.6004076 2.9574534 3.3228356 2.5898481 3.7545387 3.4473262

          tabstat $insight_new, stat(sd) save

          ***With tabstat, I saved the vector under the name stdev
          tabstatmat stdev

          .mat list stdev

          ***Stata output***
          stdev[1,8]
          insight10 insight09 insight03 insight11 insight06 insight08 insight07 insight04
          sd 1.0282533 1.1573788 1.1435377 1.1389503 1.1234056 1.0267157 1.0003795 1.1123946

          polychoric $insight_new
          global N = r(sum_w)
          matrix R = r(R)
          factormat R, n($N) mineigen(1) blanks(.4) sds(stdev) means(mean) pcf
          predict determination


          ***Stata output***
          Variable Obs Mean Std. Dev. Min Max
          determination 16194 -7.82e-10 .9621974 -2.800276 2.175893

          I have also uploaded histogram of the variable.

          Comment


          • #6
            Yes, this looks correct. And you will notice that now the mean is -7.82e-10, which is, for practical purposes zero. (There are rounding errors that creep into the calculation of the factor scores, so a mean of exactly zero is not always obtained.)

            The standard deviation came out to be 0.9621974. That's fairly close to 1, but that may be coincidental. The predicted factors are not guaranteed to have standard deviation = 1. Now, factors are inherently dimensionless, so if it is convenient for you to rescale them to standard deviation = 1, you can do that easily with -egen, std()-.

            Comment


            • #7
              Thank you once more!! I did so, I rescaled the variable to a standardized one with st.dev. of 1 which will be helpful in interpretation later on, I think. I will be using this variable as an independent variable in binary logistic regression. If I am not mistaken, one unit increase in this variable will denote one standard deviation increase? My searches online, about interpretation of coefficients of factor variables as independent variables in regression analyses have been fruitless. Could you please point me to a source, if you are informed of any?

              Best regards,
              R

              Comment


              • #8
                Well, there is nothing special about interpreting the coefficients of factor variables. They work the same way as the coefficients of any other variables. The coefficient is the expected difference in outcome associated with a unit change in the predictor variable. When the predictor variable is standardized with sd = 1, a unit change in the predictor variable is the same as a 1 SD change.

                The challenge is typically in explaining the factors themselves. They are latent variables that capture shared variance among a set of manifest variables, estimated as linear combinations of the manifest variables. And it often takes considerable knowledge of the content and science of your discipline, as well as a hefty dose of creativity to attach a meaning to a factor. In any case, deciding what they mean is a blend of art and science; but it is not statistics. The best statistician in the world, without understanding of the underlying science, will be of no help for that.

                Finally let me suggest that if your plan is to use factors as variables in other regression analyses, you might want to use structural equations modeling (-sem- or -gsem- in Stata) rather than your current approach of estimating factor scores and using those in a regression.

                Comment


                • #9
                  Thank you your valuable advice and instructions. I am going to look into sem.

                  Best regards!!

                  Comment


                  • #10
                    You're welcome. Since you are interested in polychoric correlations, you should look at -gsem- in particular, as you will probably want a -logit- link for some of the regressions in your model.

                    Comment

                    Working...
                    X