Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • PCA: ​​​​​​​Storing principal components coefficients and calculate component scores for other data -pcacoefsave-

    Hello,

    I performed a principal component analysis on my database (T=2010). My database is composed of 8 different financial indicators for 9 countries for the year 2010

    Code:
    <code snipped for confidentiality>
    I performed a PCA and kept only two factors, one for debt and the other one for profitability.
    Principal components/correlation
    Number of obs = 9
    Number of comp. = 7
    Trace = 8
    Rotation: (unrotated = principal) Rho = 1.0000

    --------------------------------------------------------------------------
    Component | Eigenvalue Difference Proportion Cumulative
    -------------+------------------------------------------------------------
    Comp1 | 3.90452 2.02562 0.4881 0.4881
    Comp2 | 1.8789 .610633 0.2349 0.7229
    Comp3 | 1.26827 .761312 0.1585 0.8815
    Comp4 | .506957 .183429 0.0634 0.9448
    Comp5 | .323528 .234112 0.0404 0.9853
    Comp6 | .0894166 .06101 0.0112 0.9964
    Comp7 | .0284067 .0284061 0.0036 1.0000
    Comp8 | 6.01525e-07 . 0.0000 1.0000
    --------------------------------------------------------------------------

    Principal components (eigenvectors)

    --------------------------------------------------------------------------------------------------
    Variable | Comp1 Comp2 Comp3 Comp4 Comp5 Comp6 Comp7 | Unexplained
    -------------+----------------------------------------------------------------------+-------------
    Investment | 0.1941 -0.2343 0.6841 0.5303 -0.1490 -0.2530 -0.1667 | 0
    Profit | -0.4123 0.4029 -0.1143 0.0721 -0.1736 0.0448 -0.2658 | 0
    Income | -0.3790 0.3527 0.2719 0.4050 -0.0737 0.5081 0.3570 | 0
    Tax | 0.2978 -0.4402 -0.3934 0.3676 -0.0311 0.5105 0.2128 | 0
    Repayment | -0.4026 -0.2626 0.1072 0.0552 0.8240 -0.0996 0.1709 | 0
    Leverage | 0.4117 0.3513 0.0802 0.0697 0.4771 0.3848 -0.5616 | 0
    Interest | 0.2801 0.4558 -0.3523 0.4692 0.1748 -0.4921 0.3109 | 0
    Liquidity | 0.3872 0.2541 0.3851 -0.4316 0.0647 0.1227 0.5363 | 0
    --------------------------------------------------------------------------------------------------


    I want to use the coefficients of the matrix of weighs and apply it to a new database in the year 2011.
    I saw in this previous chat (https://www.statalist.org/forums/for...pal-components) that the command pcacoefsave could work. But, I can't manage to find the command that comes after pcacoefsave. Do I have to clear ? then import the new database. If so, What do I have to do after then?

    Thank you for your answer
    Last edited by sladmin; 29 Aug 2018, 11:15. Reason: confidential data removed

  • #2
    As Clyde suggested in the topic you link to, saving the coefficients isn't always necessary, and it is not necessary in this case. What is necessary is saving the estimation results created by pca, and subsequently restoring those results. The pcacoefsave command was used in the linked topic to save the coefficients for use as weights, not for use with the predict command to generate predictions.

    Here's sample code that demonstrates the approach.
    Code:
    sysuse auto, clear
    tempfile auto
    save `auto'
    
    use `auto' if foreign==0, clear
    pca weight length turn, components(1)
    estimates save pcaest, replace
    predict score
    summarize score
    
    clear all
    
    use `auto' if foreign==1, clear
    estimates use pcaest
    predict score
    summarize score
    Copy this code into the do-file editor, run it, and review the results. The key parts of the documentation I referred to were
    • help pca
    • help pca postestimation
    • help estimates
    Last edited by William Lisowski; 28 Aug 2018, 18:50.

    Comment


    • #3
      I'm the author of pcacoefsave (SSC, as you are asked to explain). William's advice is exactly what I would suggest. pcacoefsave is for when you want results as a new dataset, but in your case you want to read in a new dataset and apply previous estimation results, which is a different problem.

      Comment


      • #4
        Thank you for your answer !
        However, I am not sure I understand mathematically what is pcaest the estimation results and what is going on exactly here?
        What do i keep from the first estimation (foreign==0) and what how do I apply it to foreign==1?
        This is maybe obvious to you. But i am not sure to understand correctly
        thank you again for your help

        Comment


        • #5
          What do i keep from the first estimation (foreign==0) and how do I apply it to foreign==1?
          Using estimates save keeps the estimation results from foreign==0. Then, estimates use reads them back in, so they are available for predict to use to create the score variable. These are described in the help documentation I recommended.

          If you are not familiar with Stata's approach to estimation commands and postestimation commands, and how results from estimation are used by postestimation, you should review Chapter 20 "Estimation and postestimation commands" in the Stata User's Guide PDF included in your Stata installation and accessible from Stata's help menu.

          And if, as it seems, you are unfamiliar with that chapter, I'd like to give some advice I often give to those new to Stata. When I began using Stata in a serious way, I started - as others here did - by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. All of these manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through Stata's Help menu. The objective in doing this was not so much to master Stata as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and manual.

          Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.

          If you had done that reading already, you would have already read Chapter 20 and, when you were looking to apply your pca results to a new dataset, would have understood the basic approach, although, if you're like me, you would have needed to review the necessary commands. But you would have known where to go for the answers you sought, and that wasn't to run a search that led you a tool (pcaest) that even the author agrees was not appropriate for what you want to do.

          Comment

          Working...
          X