New on SSC: xtnumfac - estimation of the number of factors in panel data

JanDitzen

Join Date: Jan 2015

Posts: 354
#1

New on SSC: xtnumfac - estimation of the number of factors in panel data

16 Mar 2022, 05:49

Thanks to Kit Baum a new package is available: xtnumfac. You call install it by typing ssc install xtnumfac

xtnumfac is joint work with Simon Reese.

xtnumfac estimates the number of common factors in panel datasets using the methods of Bai and Ng (2002), Ahn and Horenstein (2013) Onatski (2010) and Gagliardini et al. (2019). The methods in Bai and Ng (2002) are based on six information criteria, while Ahn and Horenstein (2013) propose two estimation methods. Onatski (2010) and Gagliardini et al. (2019) contribute a single estimator each. In total 10 different methods are displayed.

The program requires the data to be xtset.

Syntax
The Syntax is very simple

Code:

xtnumfac [varname] [if] [in] [, options]

Options allow to set the number of maximum factors, display different degree of detail and different forms of standardisation.

Example
We can use the Stata's own pennxrate dataset and estitimate the number of common factors in the real exchange rate:

Code:

webuse pennxrate xtnumfac realxrate Estimated number of common factors in realxrate N = 151 T = 34 ------------------------------------------------- IC | # factors | IC | # factors ----------+-------------+----------+------------- PC_{p1} | 8 | IC_{p1} | 8 PC_{p2} | 8 | IC_{p2} | 8 PC_{p3} | 8 | IC_{p3} | 8 ER | 1 | GR | 2 GOL | 1 | ED | 3 ------------------------------------------------- 8 factors maximally considered. PC_{p1},...,IC_{p3} from Bai and Ng (2002) ER, GR from Ahn and Horenstein (2013) ED from Onatski (2010) GOL from Gagliardini, Ossola, Scaillet (2019)

Last edited by JanDitzen; 16 Mar 2022, 05:52.
Tags: None

3 likes
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#2

16 Mar 2022, 06:35

Does this have any connection with factor analysis or principal components analysis?
Comment
JanDitzen

Join Date: Jan 2015

Posts: 354
#3

16 Mar 2022, 06:37

Yes, establishing the number of common factors in a panel data is necessary when doing PCA.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#4

16 Mar 2022, 06:44

Okay, and just so I understand, the estimated number of common factors is the number of principal complements we'd choose if we were say, doing principal components regression?

That is, it gives us the number that we could check with the screeplot and use the elbow method?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2624
#5

16 Mar 2022, 06:46

Would it be possible extending the command to allow factor extraction jointly from multiple variables, instead of just from a single variable?

https://www.kripfganz.de/stata/
Comment
Raymond Zhang

Join Date: Jan 2021

Posts: 352
#6

16 Mar 2022, 07:11

Would it be possible extending the command to allow unbalanced panel data?

Best regards.

Raymond Zhang
Stata 17.0,MP
Comment
ericmelse

Join Date: May 2014

Posts: 445
#7

16 Mar 2022, 08:39

Would it be possible to save the analytical result into an indicator variable that codes the factor to which an observation is assigned?
I assume that an option would be required to set any number of such indicator variables following the applied method (e.g. f_er f_gr f_gol f_ed f_pc1 f_pc2 f_pc3 f_ic1 f_ic2 f_ic3 ).

http://publicationslist.org/eric.melse
Comment
JanDitzen

Join Date: Jan 2015

Posts: 354
#8

16 Mar 2022, 08:43

Jared Greathouse: yes to both questions.

Sebastian Kripfganz Do you mean in a time series context? The community contributed baing command implements the Bai and Ng criteria, where each variable is a time series of either a variable or cross-section. In a panel setting, I would need to think about it.

Raymond Zhang At the moment not. We know that not supporting unbalanced panels is a downside of the command. The main difficulty is to calculate the eigenvalue of an unbalanced panel. There are some methods which use for example an expectation maximisation algorithm (I think Susan Athey has some work on it), but this is beyond the scope of this command.
Comment
JanDitzen

Join Date: Jan 2015

Posts: 354
#9

16 Mar 2022, 08:48

Originally posted by ericmelse View Post

Would it be possible to save the analytical result into an indicator variable that codes the factor to which an observation is assigned?
I assume that an option would be required to set any number of such indicator variables following the applied method (e.g. f_er f_gr f_gol f_ed f_pc1 f_pc2 f_pc3 f_ic1 f_ic2 f_ic3 ).

We are not associating a factor with a certain variable or observation. The main scope is to find out how many factors there. The extraction of the factors would then be the next step.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2624
#10

16 Mar 2022, 08:55

JanDitzen
Maybe I am misunderstanding something. In our xtivdfreg command, we extract factors as the eigenvectors corresponding to the largest eigenvalues of the matrix \( \sum_i \mathbf{X}_i \mathbf{X}_i' / NT \), where \( \mathbf{X}_i \) is a TxK matrix of K variables. In other words, multiple variables can be driven by the same factors; see equation (2) in our recent Stata Journal article.

Our command also accommodates unbalanced panels by using an expectation-maximization approach; see section 2.3 of our article.

https://www.kripfganz.de/stata/
1 like
Comment
JanDitzen

Join Date: Jan 2015

Posts: 354
#11

16 Mar 2022, 09:11

Originally posted by Sebastian Kripfganz View Post

JanDitzen
Maybe I am misunderstanding something. In our xtivdfreg command, we extract factors as the eigenvectors corresponding to the largest eigenvalues of the matrix \( \sum_i \mathbf{X}_i \mathbf{X}_i' / NT \), where \( \mathbf{X}_i \) is a TxK matrix of K variables. In other words, multiple variables can be driven by the same factors; see equation (2) in our recent Stata Journal article.

Our command also accommodates unbalanced panels by using an expectation-maximization approach; see section 2.3 of our article.

Now I understand. Both would be an interesting and useful extension. xtivdfreg only calculates the Bai and Ng criteria though.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2624
#12

16 Mar 2022, 09:25

Originally posted by JanDitzen View Post

xtivdfreg only calculates the Bai and Ng criteria though.

Actually, no. xtivdfreg uses the Ahn-Horenstein method. But this is exactly the reason why your command could be so useful. A user could obtain the number of factors with any of the methods provided by your command, and then simply supply this number to our command.

https://www.kripfganz.de/stata/
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#13

16 Mar 2022, 09:29

JanDitzen Okay well you've no idea how much this would help then. I'd been writing a synthetic control estimator that uses principal components regression and I had to find a way to choose the number of components to use beyond "ehh this looks about right", so I'll give this a try.
1 like
Comment
JanDitzen

Join Date: Jan 2015

Posts: 354
#14

13 Jul 2022, 13:00

Thanks to Kit Baum, a new version of xtnumfac is available on SSC. You call install it by typing ssc install xtnumfac

The update extends the functions of xtnumfac the following:
unbalanced panels via an expected maximisation algorithm

for time series

varlists, i.e. estimation of number of common factors within multiple variables

xtnumfac is joint work with Simon Reese.
2 likes
Comment

Announcement

New on SSC: xtnumfac - estimation of the number of factors in panel data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment