Creating single variable as the standardized factor score for common factor

Roger Clements

Join Date: Jun 2017

Posts: 40
#1

Creating single variable as the standardized factor score for common factor

12 Jul 2018, 01:43

Hello All,

I am somewhat familiar with factor analysis. A paper on the risk-taking of companies that I am reading states: "we calculated a single risk-taking variable as the standardized factor score for this common factor." They use three variables (total dollar values of R&D spending, capital expenditures, and long-term debt) in their factor analysis. They state "Our factor analysis produced a single factor explaining 73.1 percent of the variance. The factor loadings were 0.82 for R&D expense, 0.86 for capital expenditure, and 0.88 for long-term debt; the eigenvalue was 2.19."

My question is: What does it mean to create a standardized factor score for this common factor? And, how should I do this?

The authors do not elaborate further. So I am unsure whether they weighted each variable (R&D spending, capital expenditures, and long-term debt) to create the final risk-taking variable? Or, whether they did a simple summation of the three variables to create a final risk-taking variable? I think what is tripping me up is the meaning of "standardized factor score..."

Any advice and guidance would be greatly appreciated! Thank you.

Roger
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#2

12 Jul 2018, 02:07

Roger:
you may be interested in the following thread:https://www.statalist.org/forums/for...elation-matrix.
Besides, googling with the string -standardized factor score- gives back some promising (I hope) entries.
As it is always the case with published papers/articles, the best approach is to email the corresponding author about your methodological concerns.

Kind regards,
Carlo
(Stata 19.0)
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35453

12 Jul 2018, 02:07

Naturally I can't speak for unnamed authors and an unreferenced paper, particularly if it is not clear. But this is possibly simpler than you fear. In Stata terms, the guess is to use factor and then predict to get a single summary variable, the most important or successful factor.

Here is a silly example. In Stata's auto data you can notice that several variables are measures of size in some sense. So, we might seek some overall construct.

Code:

. sysuse auto
(1978 Automobile Data)

. ds
make          rep78         weight        displacement
price         headroom      length        gear_ratio
mpg           trunk         turn          foreign

. factor headroom trunk weight length displacement
(obs=74)

Factor analysis/correlation                      Number of obs    =         74
    Method: principal factors                    Retained factors =          3
    Rotation: (unrotated)                        Number of params =         10

    --------------------------------------------------------------------------
         Factor  |   Eigenvalue   Difference        Proportion   Cumulative
    -------------+------------------------------------------------------------
        Factor1  |      3.55375      3.20457            0.9506       0.9506
        Factor2  |      0.34918      0.32503            0.0934       1.0441
        Factor3  |      0.02414      0.07150            0.0065       1.0505
        Factor4  |     -0.04736      0.09411           -0.0127       1.0378
        Factor5  |     -0.14147            .           -0.0378       1.0000
    --------------------------------------------------------------------------
    LR test: independent vs. saturated:  chi2(10) =  373.68 Prob>chi2 = 0.0000

Factor loadings (pattern matrix) and unique variances

    -----------------------------------------------------------
        Variable |  Factor1   Factor2   Factor3 |   Uniqueness 
    -------------+------------------------------+--------------
        headroom |   0.6063    0.3769    0.0335 |      0.4892  
           trunk |   0.7783    0.3287   -0.0133 |      0.2861  
          weight |   0.9534   -0.2272   -0.0047 |      0.0393  
          length |   0.9506   -0.1064   -0.1055 |      0.0740  
    displacement |   0.8763   -0.1900    0.1081 |      0.1843  
    -----------------------------------------------------------

. predict double unicorn
(regression scoring assumed)

Scoring coefficients (method = regression)

    --------------------------------------------
        Variable |  Factor1   Factor2   Factor3 
    -------------+------------------------------
        headroom |  0.10001   0.34016   0.04821 
           trunk |  0.13832   0.61838   0.11699 
          weight |  0.43381  -1.08999   0.43569 
          length |  0.29869   0.30726  -1.03676 
    displacement |  0.10697  -0.00897   0.48999 
    --------------------------------------------


. su unicorn

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     unicorn |         74    9.28e-17    .9795709   -1.69056   2.183327

.

Standardized here means what it typically means in statistics, that the new variable has in essence mean 0 and SD 1, although I am not an expert on factor analysis (indeed would never use it for any purpose of my own) and I can't explain why the SD is a bit off 1. The story here is more important: namely that I threw into the analysis variables with quite different units of measurement and different spread, and there is no good reason for a factor analysis to respect those differences. So, it's part of the machinery that such differences are washed out and the factor analysis is based on the correlations, not the covariances.

That's the easy bit. The difficult bit is that I just used the default of Stata. From what you tell us, it seems likely that the authors did something similar in whatever software they used, but the territory is bestrewn with pitfalls. I wouldn't even assume that the default factor analysis in other software is the same procedure as used by Stata.

I am not an economist or even a social scientist, but I wouldn't mush together variables like R&D spending, capital expenditures, and long-term debt, which have distinct meanings and seem to likely to be only moderately correlated. Even if your purpose is purely predictive, that sounds a dubious exercise. I'd also worry a lot about skewness and outliers for such data. Naturally if your instructions are to mimic a previous analysis, and no more, then that is what you are doing, but how far it is a good idea is part of a wider discussion.

Comment

Roger Clements

Join Date: Jun 2017

Posts: 40
#4

12 Jul 2018, 02:29

Thank you both.

Nick, I really appreciate the example to illustrate a simple response to what was confusing me. This makes things a lot clearer! Sure enough, I checked out the mean and SD in the correlation table of that published article and it is indeed a standardized variable. I was just missing the postestimation 'predict' part.

As well, thanks for the warnings. In this case, I am simply starting by replicating, but, like you, I am also concerned about tossing data away to distill three quite different variables into a single factor. I'll start with this basic replication then try to improve. The challenge right now is picking variables with available secondary data that indicate firm risk-taking. But that's clearly not a statistical problem or one for this forum. Lastly, thanks for the point about the differences in FA across software packages -- I didn't know such a thing was a possibility. Very informative!
Comment
jaskaran kaur

Join Date: Jul 2017

Posts: 12
#5

12 May 2020, 18:00

Dear Roger,
I am stuck exactly at the same point. I was wondering if you were able to find the solution to the problem -'
Creating single variable as the standardized factor score for common factor '
Comment

Announcement