Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New on SSC: xtnumfac - estimation of the number of factors in panel data

    Thanks to Kit Baum a new package is available: xtnumfac. You call install it by typing ssc install xtnumfac

    xtnumfac is joint work with Simon Reese.

    xtnumfac estimates the number of common factors in panel datasets using the methods of Bai and Ng (2002), Ahn and Horenstein (2013) Onatski (2010) and Gagliardini et al. (2019). The methods in Bai and Ng (2002) are based on six information criteria, while Ahn and Horenstein (2013) propose two estimation methods. Onatski (2010) and Gagliardini et al. (2019) contribute a single estimator each. In total 10 different methods are displayed.

    The program requires the data to be xtset.

    Syntax
    The Syntax is very simple

    Code:
    xtnumfac [varname] [if] [in] [, options]
    Options allow to set the number of maximum factors, display different degree of detail and different forms of standardisation.

    Example
    We can use the Stata's own pennxrate dataset and estitimate the number of common factors in the real exchange rate:

    Code:
    webuse pennxrate
    
    xtnumfac realxrate
    
    Estimated number of common factors in realxrate
                                       N  =       151
                                       T  =        34
    -------------------------------------------------
     IC       | # factors   |  IC      | # factors
    ----------+-------------+----------+-------------
     PC_{p1}  |     8       |  IC_{p1} |     8
     PC_{p2}  |     8       |  IC_{p2} |     8
     PC_{p3}  |     8       |  IC_{p3} |     8
     ER       |     1       |  GR      |     2
     GOL      |     1       |  ED      |     3
    -------------------------------------------------
    8 factors maximally considered.
    PC_{p1},...,IC_{p3} from Bai and Ng (2002)
    ER, GR from Ahn and Horenstein (2013)
    ED from Onatski (2010)
    GOL from Gagliardini, Ossola, Scaillet (2019)


    Last edited by JanDitzen; 16 Mar 2022, 05:52.

  • #2
    Does this have any connection with factor analysis or principal components analysis?

    Comment


    • #3
      Yes, establishing the number of common factors in a panel data is necessary when doing PCA.

      Comment


      • #4
        Okay, and just so I understand, the estimated number of common factors is the number of principal complements we'd choose if we were say, doing principal components regression?


        That is, it gives us the number that we could check with the screeplot and use the elbow method?

        Comment


        • #5
          Would it be possible extending the command to allow factor extraction jointly from multiple variables, instead of just from a single variable?
          https://www.kripfganz.de/stata/

          Comment


          • #6
            Would it be possible extending the command to allow unbalanced panel data?
            Best regards.

            Raymond Zhang
            Stata 17.0,MP

            Comment


            • #7
              Would it be possible to save the analytical result into an indicator variable that codes the factor to which an observation is assigned?
              I assume that an option would be required to set any number of such indicator variables following the applied method (e.g. f_er f_gr f_gol f_ed f_pc1 f_pc2 f_pc3 f_ic1 f_ic2 f_ic3 ).
              http://publicationslist.org/eric.melse

              Comment


              • #8
                Jared Greathouse: yes to both questions.

                Sebastian Kripfganz Do you mean in a time series context? The community contributed baing command implements the Bai and Ng criteria, where each variable is a time series of either a variable or cross-section. In a panel setting, I would need to think about it.

                Raymond Zhang At the moment not. We know that not supporting unbalanced panels is a downside of the command. The main difficulty is to calculate the eigenvalue of an unbalanced panel. There are some methods which use for example an expectation maximisation algorithm (I think Susan Athey has some work on it), but this is beyond the scope of this command.

                Comment


                • #9
                  Originally posted by ericmelse View Post
                  Would it be possible to save the analytical result into an indicator variable that codes the factor to which an observation is assigned?
                  I assume that an option would be required to set any number of such indicator variables following the applied method (e.g. f_er f_gr f_gol f_ed f_pc1 f_pc2 f_pc3 f_ic1 f_ic2 f_ic3 ).
                  We are not associating a factor with a certain variable or observation. The main scope is to find out how many factors there. The extraction of the factors would then be the next step.

                  Comment


                  • #10
                    JanDitzen
                    Maybe I am misunderstanding something. In our xtivdfreg command, we extract factors as the eigenvectors corresponding to the largest eigenvalues of the matrix \( \sum_i \mathbf{X}_i \mathbf{X}_i' / NT \), where \( \mathbf{X}_i \) is a TxK matrix of K variables. In other words, multiple variables can be driven by the same factors; see equation (2) in our recent Stata Journal article.

                    Our command also accommodates unbalanced panels by using an expectation-maximization approach; see section 2.3 of our article.
                    https://www.kripfganz.de/stata/

                    Comment


                    • #11
                      Originally posted by Sebastian Kripfganz View Post
                      JanDitzen
                      Maybe I am misunderstanding something. In our xtivdfreg command, we extract factors as the eigenvectors corresponding to the largest eigenvalues of the matrix \( \sum_i \mathbf{X}_i \mathbf{X}_i' / NT \), where \( \mathbf{X}_i \) is a TxK matrix of K variables. In other words, multiple variables can be driven by the same factors; see equation (2) in our recent Stata Journal article.

                      Our command also accommodates unbalanced panels by using an expectation-maximization approach; see section 2.3 of our article.
                      Now I understand. Both would be an interesting and useful extension. xtivdfreg only calculates the Bai and Ng criteria though.

                      Comment


                      • #12
                        Originally posted by JanDitzen View Post
                        xtivdfreg only calculates the Bai and Ng criteria though.
                        Actually, no. xtivdfreg uses the Ahn-Horenstein method. But this is exactly the reason why your command could be so useful. A user could obtain the number of factors with any of the methods provided by your command, and then simply supply this number to our command.
                        https://www.kripfganz.de/stata/

                        Comment


                        • #13
                          JanDitzen Okay well you've no idea how much this would help then. I'd been writing a synthetic control estimator that uses principal components regression and I had to find a way to choose the number of components to use beyond "ehh this looks about right", so I'll give this a try.

                          Comment


                          • #14
                            Thanks to Kit Baum, a new version of xtnumfac is available on SSC. You call install it by typing ssc install xtnumfac

                            The update extends the functions of xtnumfac the following:
                            • unbalanced panels via an expected maximisation algorithm
                            • for time series
                            • varlists, i.e. estimation of number of common factors within multiple variables

                            xtnumfac is joint work with Simon Reese.

                            Comment

                            Working...
                            X