PCA: Storing principal components coefficients and calculate component scores for other data -pcacoefsave-

DORRA SELLAMI

Join Date: Feb 2017

Posts: 28
#1

PCA: Storing principal components coefficients and calculate component scores for other data -pcacoefsave-

28 Aug 2018, 14:47

Hello,

I performed a principal component analysis on my database (T=2010). My database is composed of 8 different financial indicators for 9 countries for the year 2010

Code:

<code snipped for confidentiality>

I performed a PCA and kept only two factors, one for debt and the other one for profitability.
Principal components/correlation
Number of obs = 9
Number of comp. = 7
Trace = 8
Rotation: (unrotated = principal) Rho = 1.0000

--------------------------------------------------------------------------
Component | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Comp1 | 3.90452 2.02562 0.4881 0.4881
Comp2 | 1.8789 .610633 0.2349 0.7229
Comp3 | 1.26827 .761312 0.1585 0.8815
Comp4 | .506957 .183429 0.0634 0.9448
Comp5 | .323528 .234112 0.0404 0.9853
Comp6 | .0894166 .06101 0.0112 0.9964
Comp7 | .0284067 .0284061 0.0036 1.0000
Comp8 | 6.01525e-07 . 0.0000 1.0000
--------------------------------------------------------------------------

Principal components (eigenvectors)

--------------------------------------------------------------------------------------------------
Variable | Comp1 Comp2 Comp3 Comp4 Comp5 Comp6 Comp7 | Unexplained
-------------+----------------------------------------------------------------------+-------------
Investment | 0.1941 -0.2343 0.6841 0.5303 -0.1490 -0.2530 -0.1667 | 0
Profit | -0.4123 0.4029 -0.1143 0.0721 -0.1736 0.0448 -0.2658 | 0
Income | -0.3790 0.3527 0.2719 0.4050 -0.0737 0.5081 0.3570 | 0
Tax | 0.2978 -0.4402 -0.3934 0.3676 -0.0311 0.5105 0.2128 | 0
Repayment | -0.4026 -0.2626 0.1072 0.0552 0.8240 -0.0996 0.1709 | 0
Leverage | 0.4117 0.3513 0.0802 0.0697 0.4771 0.3848 -0.5616 | 0
Interest | 0.2801 0.4558 -0.3523 0.4692 0.1748 -0.4921 0.3109 | 0
Liquidity | 0.3872 0.2541 0.3851 -0.4316 0.0647 0.1227 0.5363 | 0
--------------------------------------------------------------------------------------------------

I want to use the coefficients of the matrix of weighs and apply it to a new database in the year 2011.
I saw in this previous chat (https://www.statalist.org/forums/for...pal-components) that the command pcacoefsave could work. But, I can't manage to find the command that comes after pcacoefsave. Do I have to clear ? then import the new database. If so, What do I have to do after then?

Thank you for your answer

Last edited by sladmin; 29 Aug 2018, 11:15. Reason: confidential data removed
Tags: pca
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

28 Aug 2018, 18:48

As Clyde suggested in the topic you link to, saving the coefficients isn't always necessary, and it is not necessary in this case. What is necessary is saving the estimation results created by pca, and subsequently restoring those results. The pcacoefsave command was used in the linked topic to save the coefficients for use as weights, not for use with the predict command to generate predictions.

Here's sample code that demonstrates the approach.

Code:

sysuse auto, clear tempfile auto save `auto' use `auto' if foreign==0, clear pca weight length turn, components(1) estimates save pcaest, replace predict score summarize score clear all use `auto' if foreign==1, clear estimates use pcaest predict score summarize score

Copy this code into the do-file editor, run it, and review the results. The key parts of the documentation I referred to were
help pca

help pca postestimation

help estimates

Last edited by William Lisowski; 28 Aug 2018, 18:50.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#3

29 Aug 2018, 01:54

I'm the author of pcacoefsave (SSC, as you are asked to explain). William's advice is exactly what I would suggest. pcacoefsave is for when you want results as a new dataset, but in your case you want to read in a new dataset and apply previous estimation results, which is a different problem.
Comment
DORRA SELLAMI

Join Date: Feb 2017

Posts: 28
#4

29 Aug 2018, 15:00

Thank you for your answer !
However, I am not sure I understand mathematically what is pcaest the estimation results and what is going on exactly here?
What do i keep from the first estimation (foreign==0) and what how do I apply it to foreign==1?
This is maybe obvious to you. But i am not sure to understand correctly
thank you again for your help
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

29 Aug 2018, 15:15

What do i keep from the first estimation (foreign==0) and how do I apply it to foreign==1?

Using estimates save keeps the estimation results from foreign==0. Then, estimates use reads them back in, so they are available for predict to use to create the score variable. These are described in the help documentation I recommended.

If you are not familiar with Stata's approach to estimation commands and postestimation commands, and how results from estimation are used by postestimation, you should review Chapter 20 "Estimation and postestimation commands" in the Stata User's Guide PDF included in your Stata installation and accessible from Stata's help menu.

And if, as it seems, you are unfamiliar with that chapter, I'd like to give some advice I often give to those new to Stata. When I began using Stata in a serious way, I started - as others here did - by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. All of these manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through Stata's Help menu. The objective in doing this was not so much to master Stata as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and manual.

Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.

If you had done that reading already, you would have already read Chapter 20 and, when you were looking to apply your pca results to a new dataset, would have understood the basic approach, although, if you're like me, you would have needed to review the necessary commands. But you would have known where to go for the answers you sought, and that wasn't to run a search that led you a tool (pcaest) that even the author agrees was not appropriate for what you want to do.
Comment

Announcement

PCA: ​​​​​​​Storing principal components coefficients and calculate component scores for other data -pcacoefsave-

Comment

Comment

Comment

Comment

PCA: Storing principal components coefficients and calculate component scores for other data -pcacoefsave-