principal component analysis for panel data

e ahmadpour

Join Date: Jun 2014

Posts: 14
#1

principal component analysis for panel data

26 Jan 2015, 14:19

Dear forum members,
I have a panel data and i used PCA for make an index with stata, i'm in doubt with this way , stata give me a wrong index!! i should do it with loop or do it for each country or the usual way is right??
I appreciate if anyone can give a answer,
thanks!!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#2

26 Jan 2015, 14:22

You need to tell us more about your data and your research goals. Why are you making an index: what purpose will it serve in your research? When we know what you are trying to accomplish, we can better advise you how to go about it.
Comment
e ahmadpour

Join Date: Jun 2014

Posts: 14
#3

26 Jan 2015, 14:51

I have 3 variable for make a financial development index (18 country,10year),for doing PCA i use the manual of the stata but i didn't see anything about data types(panel,time series,...)and my concern is that using PCA in panel data has different way or sth more i should do ...
when i use PCA for one of that country it has different result compare with when i use PCA for all country!!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#4

26 Jan 2015, 15:34

When you refer to making an index, I take it that you are trying to create a single variable that in some reasonable way summarizes the three variables that are its constituents: you are trying to reduce financial development from 3 degrees of freedom down to 1. PCA is often a reasonable way to do this, and the fact that you have panel data doesn't really matter, if you are just going to use this index as a variable in later analysis.

If you want to use the index to predict some other outcome in a regression model, though, it might be better to just include the 3 variables of the index directly in the regression model as predictors instead. It isn't clear why combining the three variables into an index is better than that: presumably your data set is large enough that saving 2 degrees of freedom shouldn't be necessary. (Unless you are scrimping on degrees of freedom because you are planning to incorporate a huge number of other variables in your model--in which case you need to seriously consider whether that really makes sense. If you are desperate to save a mere 2 degrees of freedom, you are probably skating on thin ice to start with.) Moreover, by combining into a single index you are constraining subsequent analyses to weigh the three variables in the same way that the index does. If the index in question were some theoretically justified construct that is widely used with well-established properties that might make sense. But given that it is an ad hoc calculation in your data set, that doesn't seem so sensible.

On the other hand, perhaps you are looking for some simple measure of financial development to use as an outcome. In that case forming an index makes sense. If the purpose is to see how this outcome measure responds to other variables that operate in these countries over the time periods you have or to make comparisons of trends in this outcome over time in different countries, you definitely need the index to mean the same thing in each country at each time: so a single PCA would make sense; a separate PCA for each country would not. Really, the only thing I can think of, off hand, that would make me recommend doing a separate PCA for each country is if you want to do separate analyses in each country over time with no plan to compare countries or combine them in any way.

It is tempting to raise the question whether you really want an index calculated by PCA or a latent variable estimated by factor analysis. But that is a much deeper question, and it requires more knowledge of the science underlying the variables in your project than I would be able to muster. It is something you might want to explore with your disciplinary colleagues.
Comment
e ahmadpour

Join Date: Jun 2014

Posts: 14
#5

26 Jan 2015, 23:12

Dear Clyde,

Thanks so mush for your excellent explanation, i want this index for finding cuasality between financial develoment and trade openness, and i think i got my answer;i use a single PCA ,not a seprate PCA for each country.
i hope i'll be right.

thanks again.
Comment
Steffen Heinig

Join Date: Jun 2015

Posts: 7
#6

28 Jan 2016, 13:38

Hi,
do have a quick question, which is somehow linked to this previous post. It is not a STATA question, but more general on quant.
I also have a panel data set and I like to produce an index with the help of PCA, as well.
However, PCA suggests more than only one PC. The Kaiser Rule further suggests, to use all PC's with an eigenvalue above 1.

Anyways, I do have two questions from here:
1. Is there any issue if I would run a PCA on my macro-economic variables; take the 4-5 suggested PC's and than run another PCA on them, lets say multiple times?
2. Will this process end up in a single PC, maybe by chance? Or would this violate the whole PCA?

Hope to get a short answer.
Any suggestions about reading is also appreciated.

Many thanks in advance,
Steffen
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35730
#7

28 Jan 2016, 14:01

Why not try to see? PCs are uncorrelated by construction. If you focus on a subset of them that's still true. There is no information extra in the PCs over and above that in the original data. In fact, there is less information.

Otherwise put, if this was a good idea, it would be a standard tip.
Comment
Steffen Heinig

Join Date: Jun 2015

Posts: 7
#8

29 Jan 2016, 04:07

Thanks Nick!
I think this is what I am going to do.
Comment
Gurpreet Singh

Join Date: Jul 2017

Posts: 21
#9

17 Jul 2017, 12:50

Stata doesn't have the parafac algorithm for PCA in case of longitudinal data. That could've sorted it out.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#10

17 Jul 2017, 13:51

In case of building up an index, PCA may not be the ideal approach. Clyde explored the issue with insightful comments in#4.

That said, shall you deal with OR or HR, perhaps you could think about using a nomogram (after the regression analysis) so as to develop such index.

With PCA, I think the main aim is not exactly "ending up with a single PC", but exploring the dimensions of a composite of variables. As a consequence, you may think the opposite way: instead of "trimming" PCs sequentially until reaching a single "ideal" PC, you may stick with detecting the "main" variables which mirror each PC, say, following the Kaiser rule and picking the factors with eigenvalues over 1 (but you can be less strict and start the "expedition" having selected eigenvalues over 0.7). IMHO, PCA usually gives less than 5 factors in well-developed questionnaires (hopefully it behaves the same way with an array of "to-become-index"variables). Additionally, you may "force" Stata to provide, say, the 3 "nec plus ultra" PCs, if you will. If you are getting lots of PCs, I fear there may be a problem with the array itself...

That said, in order to "sieve through" PCA whereabouts, eigenvalues are surely important, but, arguably, the percentage of variation they reflect as well as the communalities are quite important aspects, not to be neglected.

Last but not least, structural equation models could be mentioned as a remarkable alternative. By the way, it can encompass longitudinal data as well.

Last edited by Marcos Almeida; 17 Jul 2017, 13:57.

Best regards,

Marcos
Comment
Oriol Anguera-Torrell

Join Date: Nov 2018

Posts: 1
#11

20 Sep 2020, 09:58

Dear forum members,

I would like to ask something related to the previous posts. I am constructing an index through principal component analysis (PCA) in Stata aiming to measure the impact of an event. This index should measure such an impact on different individuals and also across time. That is, at the end, I want to have this index for each individual and time period, so I can compare the index across individuals and, also, across time periods. To this end, I have 8 continuous variables measuring different responses to that event (which are very correlated among them) for 15 individuals and 6 time periods. I understand that it is more appropriate to estimate the PCA using all time periods and individuals together than estimating the PCA using all individuals separately for each time period. Otherwise, it would be difficult to be able to compare the evolution of the index across timer periods. Could you confirm that I am right?

Thank you kindly for your time and help.
Best regards,

Oriol
1 like
Comment
Hamid muili

Join Date: Aug 2020

Posts: 94
#12

09 Aug 2023, 22:10

Clyde Schechter , Nick Cox , is there any equivalent package to PCA that can be used to generate a single index from an array of categorical variables in Stata..?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#13

09 Aug 2023, 22:23

Well, if the categorical variables are dichotomous, many people would just use -pca- with the data as is. Others might call that into question, but given that the usual uses to which PCA are put are also procedures that require the application of a large amount of judgment about substantive issues, with the actual PCA calculations playing a limited role, I personally would never quibble about the suitability of those variables for use in PCA. But if you wish to be a purist about it, the -tetrachoric- command can calculate a tetrachoric correlation matrix for the dichotomous outcomes, and then -pcamat- will apply PCA to that.

That said, if the categorical variables are not dichotomous, but are nevertheless ordinal, there is a command written by Stas Kolenikov that calculates the polychoric correlation matrix of the variables and then does PCA on that. If you run -search polychoric- you will find links to where you can download that package. His help file is well written and includes worked examples.

Note that both tetrachoric and polychoric correlation matrices may fail to be positive definite, and if so, PCA cannot work with them. However, both the -tetrachoric- and -polychoric- commands offer the option of forcing the result to be a positive definite matrix that is minimally adjusted from the regular result, and then you can apply PCA to that.

If your categorical variables are polytomous and do not have ordinal properties, I am unaware of any PCA like approach that can be used, and I struggle to understand what that would even mean.

Last edited by Clyde Schechter; 09 Aug 2023, 22:28.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35730
#14

10 Aug 2023, 01:17

The goal of getting a single index out of a bundle of variables is usually optimistic at best and doomed to failure at worst, as the single index can be a poor numerical summary and very hard to interpret any way except as a mishmash. But in this territory someone should mention correspondence analysis as specifically intended for categorical input.

See https://econ-papers.upf.edu/papers/1856.pdf for an accessible version of a recent review. You may have access to the published version at https://www.nature.com/articles/s43586-022-00192-w
Comment

Announcement

principal component analysis for panel data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment