Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • XTIVDFREG: new Stata command for instrumental variable estimation of large panel data models with common factors

    Together with Vasilis Sarafidis, I have released a new Stata package called xtivdfreg. The command implements a general instrumental variables approach for estimating large panel data models (large N and large T) with unobserved common factors or interactive effects, as developed by Norkute et al. (2020). The underlying idea of this approach is to project out the common factors from exogenous covariates using principal components analysis, and run IV regression using defactored covariates as instruments. The resulting "IVDF" method is valid for models with homogeneous or heterogeneous slope coefficients, and has several advantages relative to existing popular approaches (e.g. common correlated effects estimation). The algorithm accommodates unbalanced panel data and permits highly flexible instrumentation strategies.

    You can install the command from my personal website:
    Code:
    net install xtivdfreg, from(http://www.kripfganz.de/stata/)
    The syntax and options are explained in the Stata help file:
    Code:
    help xtivdfreg
    The help file also contains a few examples.

    For full details, see our accompanying article: Further reference:
    https://twitter.com/Kripfganz

  • #2
    Due to an issue with the way how Stata deals with interaction terms since Stata 15 (see https://www.statalist.org/forums/for...94#post1576494), using interaction terms with xtivdfreg could result in unexpected error messages.

    A workaround is now implemented in the latest version 1.0.1.
    Code:
    adoupdate xtivdfreg, update
    https://twitter.com/Kripfganz

    Comment


    • #3
      With thanks to Kit Baum, the latest version 1.0.3 of the xtivdfreg command is now also available on SSC (in addition to my personal website):
      Code:
      ssc install xtivdfreg
      Compared to earlier versions, this version has the new suboption fvar() for option iv(), which allows to extract factors from only a subset of the specified instrumental variables. This could for instance be useful if variables x and x2 are used as regressors/instruments, but the squared term should not be used for the factor extraction. Please see the help file for details.

      Our accompanying article was accepted for publication in the Stata Journal:
      https://twitter.com/Kripfganz

      Comment


      • #4
        First, thank you all for continuously making applied work accessible and easily applicable.
        I do have a few questions about the xtivdfreg command (and sorry for the lengthy post): First, I noticed that the "mg" option doesn't report the J-test and the manual didn't say why. How do we then know for sure if after accounting for slope heterogeneity, the J-test may hold assuming under slope heterogeneity it didn't hold? Second, I noticed that the xtdcce2 has the option "full" that allows for the reporting of the individual estimates for panel units. Does the "xtivdfreg" have this feature? Third, I just want to make sure that the "xtivdfreg" accommodates both stationary and nonstationary variables for estimation. For example, if one uses nonstationary variables in a bivariate estimation, can one interpret the coefficient as a long-run parameter. Fourth, in the footnote, you do discuss that if no valid external instrument exists for an endogenous variable, one can include other informative exogeneous variables in the righthand side so that they and their lags can serve as valid instruments. If this is done, can the parameter of interest in the regression be interpreted as causal? Lastly, (and might have missed) if a righthand side variable (say X) is endogenous rather than exogeneous, does it just enter the regression as "xtivdfreg Y X, [options]" or there is a special way of inputting it.

        Thank you and I would really appreciate your response on these.

        -John

        Comment


        • #5
          1. The J-test is not valid for the model with heterogeneous coefficients and therefore not reported. We briefly mention this in Section 2.2 of our article.
          2. There is no such option for xtivdfreg to show the individual coefficients from the MG estimation. I need to think about whether this would be a useful addition.
          3. Since the regressors are assumed to have a factor structure, and the factors are assumed to be stationary, I would conclude that the approach does not allow for nonstationary variables.
          4. There is nothing special about the interpretation of the coefficients. If the model is correctly specified and you have valid instruments, then you can interpret the coefficients as causal in the usual way.
          5. You still specify an endogenous regressor in the list of right-hand side variables but then would need to specify appropriate instruments with the iv() option.
          https://twitter.com/Kripfganz

          Comment


          • #6
            Thank you so much for the prompt response. I really do appreciate it.

            Comment


            • #7
              Hi Sebastian, I was wondering if it is possible to retrieve the residuals (which are free from the unobserved factors) after an xtivdfreg regression, I want to test whether the residuals from my model estimated with xtivdfreg are cross-sectionally independent compared to other estimators. Thanks for the help.

              Comment


              • #8
                Sorry, for all the questions but I do have an additional question. If X is endogenous and one has external instrument (Z), can one specify the iv() option to include both the external variable Z and the lags of the endogenous variable X (i.e., xtivdfreg Y X, iv(X, Z, lags(2)) factmax(N)?.or X shouldn't be included in iv()-- only Z should be included.. I ask this because my takeaway from reading the paper is that when some of the regressors are endogenous with respect to epsilon_it, extracting the principal components from those endogenous regressors can be invalid. I understand that the lags of the endogenous variable, by construction, are not endogenous with respect to epsilon_it; hence, they can be included in iv(). However, I just wanted to clarify if I am thinking about the treatment of the endogenous regressor case correctly. Thanks again!

                Comment


                • #9
                  Originally posted by John Francois View Post
                  I was wondering if it is possible to retrieve the residuals (which are free from the unobserved factors) after an xtivdfreg regression, I want to test whether the residuals from my model estimated with xtivdfreg are cross-sectionally independent compared to other estimators.
                  I am afraid this is not (currently) possible.

                  Originally posted by John Francois View Post
                  If X is endogenous and one has external instrument (Z), can one specify the iv() option to include both the external variable Z and the lags of the endogenous variable X (i.e., xtivdfreg Y X, iv(X, Z, lags(2)) factmax(N)?.or X shouldn't be included in iv()-- only Z should be included.. I ask this because my takeaway from reading the paper is that when some of the regressors are endogenous with respect to epsilon_it, extracting the principal components from those endogenous regressors can be invalid. I understand that the lags of the endogenous variable, by construction, are not endogenous with respect to epsilon_it; hence, they can be included in iv(). However, I just wanted to clarify if I am thinking about the treatment of the endogenous regressor case correctly.
                  Factor extraction is only valid from strictly exogenous variables, i.e. they must be uncorrelated with any future, current, and past errors. The lag of an endogenous variable X does not satisfy this condition and therefore should not be included in iv(). You would need to find a valid external instrument Z.
                  https://twitter.com/Kripfganz

                  Comment


                  • #10
                    Originally posted by Sebastian Kripfganz View Post
                    I am afraid this is not (currently) possible.


                    Factor extraction is only valid from strictly exogenous variables, i.e. they must be uncorrelated with any future, current, and past errors. The lag of an endogenous variable X does not satisfy this condition and therefore should not be included in iv(). You would need to find a valid external instrument Z.
                    Thanks again, Sebastian. I appreciate all the help.

                    Comment


                    • #11
                      An update to xtivdfreg is available from my website:
                      Code:
                      net install xtivdfreg, from(http://www.kripfganz.de/stata/) replace
                      The new version 1.1.0 has substantial speed improvements and now also works with larger data sets.
                      https://twitter.com/Kripfganz

                      Comment


                      • #12
                        Update announcement: The new xtivdfreg version 1.3.1 - available on SSC and my personal website - comes with a significant extension. It now allows to fit spatial panel data models with spatial lags of the dependent and independent variables. For this purpose, the package contains the new spxtivdfreg command, which parses the additional spatial features, but otherwise is a wrapper for xtivdfreg.

                        To illustrate the new features, I am using the example data set from the community-contributed xsmle command. spxtivdfreg can use spatial weights matrices that are available as an spmatrix object, as a Mata matrix, as a Stata matrix, or as a text/Excel file. Here, we are obtaining a Mata matrix from the spmat object (part of the community-contributed sppack package, not to be confused with the official spmatrix), which we then use with spxtivdfreg:
                        Code:
                        . use http://www.econometrics.it/stata/data/xsmle/product.dta, clear
                        . gen lngsp = ln(gsp)
                        . gen lnpcap = ln(pcap)
                        . gen lnpc = ln(pc)
                        . gen lnemp = ln(emp)
                        
                        . spmat use usaww using http://www.econometrics.it/stata/data/xsmle/usaww.spmat
                        . spmat getmatrix usaww W
                        
                        . spxtivdfreg lngsp lnpcap lnpc lnemp, spmatrix(W, mata) splag tlags(1) sptlags(1) iv(lnpcap lnpc lnemp, splags lags(2)) absorb(state) std
                        
                        
                        Defactored instrumental variables estimation
                        
                        Group variable: state                        Number of obs         =       720
                        Time variable: year                          Number of groups      =        48
                        
                        Number of instruments  =     18              Obs per group     min =        15
                        Number of factors in X =      2                                avg =        15
                        Number of factors in u =      1                                max =        15
                        
                        Second-stage estimator (model with homogeneous slope coefficients)
                        ------------------------------------------------------------------------------
                                     |               Robust
                               lngsp | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
                        -------------+----------------------------------------------------------------
                               lngsp |
                                 L1. |   .2532778   .0543828     4.66   0.000     .1466894    .3598661
                                     |
                              lnpcap |  -.2299721   .0442759    -5.19   0.000    -.3167512   -.1431929
                                lnpc |  -.0197032   .0295865    -0.67   0.505    -.0776916    .0382853
                               lnemp |   .7265171   .0615828    11.80   0.000     .6058169    .8472172
                               _cons |   3.687231   .4145171     8.90   0.000     2.874793     4.49967
                        -------------+----------------------------------------------------------------
                        W            |
                               lngsp |
                                 --. |    .502397   .0430744    11.66   0.000     .4179727    .5868213
                                 L1. |  -.3575409   .0523114    -6.83   0.000    -.4600694   -.2550125
                        -------------+----------------------------------------------------------------
                             sigma_f |  .02458298   (std. dev. of factor error component)
                             sigma_e |  .01571357   (std. dev. of idiosyncratic error component)
                                 rho |  .70993317   (fraction of variance due to factors)
                        ------------------------------------------------------------------------------
                        Hansen test of the overidentifying restrictions        chi2(12)    =   20.6457
                        H0: overidentifying restrictions are valid             Prob > chi2 =    0.0558
                        We specified the spatial weights matrix with the spmatrix() option. The options splag tlags(1) sptlags(1) then tell the command to estimate a model with spatial lag, time lag, and spatial time lag of the dependent variable, respectively. This yields a time-space dynamic panel data model with autoregressive components both in the spatial and the time dimension. Similar to the non-spatial xtivdfreg, instruments are specified with the iv() option. Here, the suboption splags requests to also include spatial lags of those instruments as additional instruments. absorb() eliminates fixed effects in the usual way. Finally, std is a new xtivdfreg option, which requests a standardization of the instruments in the factor-extraction process; this can help to stabilize the estimation. The regression output contains the new section labelled "W" for the spatially lagged variables. Here, both time dynamics and spatial dynamics appear to be relevant.

                        As a postestimation feature, we can compute short-run and long-run direct, indirect, and total impacts, which is standard in the spatial econometrics literature. Importantly, it is imperative to specify time lags with the tlags() option of spxtivdfreg. Otherwise, the calculation of the long-run impacts will be incorrect.
                        Code:
                        . estat impact, sr
                        
                        Short-run impacts
                        ------------------------------------------------------------------------------
                                     |            Delta-method
                                     |     Impact   std. err.      z    P>|z|     [95% conf. interval]
                        -------------+----------------------------------------------------------------
                        direct       |
                              lnpcap |  -.2482317   .0473038    -5.25   0.000    -.3409456   -.1555179
                                lnpc |  -.0212676   .0318672    -0.67   0.505    -.0837261    .0411909
                               lnemp |   .7842023   .0563274    13.92   0.000     .6738026     .894602
                        -------------+----------------------------------------------------------------
                        indirect     |
                              lnpcap |   -.213928   .0500893    -4.27   0.000    -.3121012   -.1157547
                                lnpc |  -.0183286   .0270505    -0.68   0.498    -.0713466    .0346895
                               lnemp |   .6758313   .0723162     9.35   0.000     .5340942    .8175685
                        -------------+----------------------------------------------------------------
                        total        |
                              lnpcap |  -.4621597    .091373    -5.06   0.000    -.6412474    -.283072
                                lnpc |  -.0395961   .0588423    -0.67   0.501    -.1549248    .0757326
                               lnemp |   1.460034   .0696914    20.95   0.000     1.323441    1.596626
                        ------------------------------------------------------------------------------
                        
                        . estat impact, lr
                        
                        Long-run impacts
                        ------------------------------------------------------------------------------
                                     |            Delta-method
                                     |     Impact   std. err.      z    P>|z|     [95% conf. interval]
                        -------------+----------------------------------------------------------------
                        direct       |
                              lnpcap |   -.310883   .0582214    -5.34   0.000    -.4249949   -.1967711
                                lnpc |  -.0266353   .0404538    -0.66   0.510    -.1059233    .0526527
                               lnemp |   .9821273   .0515769    19.04   0.000     .8810383    1.083216
                        -------------+----------------------------------------------------------------
                        indirect     |
                              lnpcap |  -.0712153   .0268197    -2.66   0.008     -.123781   -.0186496
                                lnpc |  -.0061015   .0094865    -0.64   0.520    -.0246946    .0124917
                               lnemp |     .22498   .0590485     3.81   0.000      .109247    .3407131
                        -------------+----------------------------------------------------------------
                        total        |
                              lnpcap |  -.3820983   .0770055    -4.96   0.000    -.5330264   -.2311702
                                lnpc |  -.0327368   .0497881    -0.66   0.511    -.1303197    .0648461
                               lnemp |   1.207107   .0325818    37.05   0.000     1.143248    1.270966
                        ------------------------------------------------------------------------------
                        A slight complication arises in the computation of long-run effects when there are distributed lags of the regressors, because the command treats them as separate variables, while their long-run effects should be added up for the respective variable. To circumvent this problem, the post option of estat impact allows to post the impact effects in e(b) and e(V), as if they were the actual estimation results. We can then use standard regression postestimation commands, especially lincom (and test for hypothesis tests). Here is a simplified example with a distributed lag of the variable lnemp:
                        Code:
                        . spxtivdfreg lngsp lnpcap lnpc L(0/1).lnemp, spmatrix(W, mata) splag iv(lnpcap lnpc lnemp, splags lags(2)) tlags(1) sptlags(1) std absorb(state)
                        (output omitted)
                        
                        . estat impact, lr post
                        
                        Long-run impacts
                        ------------------------------------------------------------------------------
                                     |            Delta-method
                               lngsp |     Impact   std. err.      z    P>|z|     [95% conf. interval]
                        -------------+----------------------------------------------------------------
                        direct       |
                              lnpcap |  -.0045801   .0629467    -0.07   0.942    -.1279534    .1187932
                                lnpc |   .0158823   .0526601     0.30   0.763    -.0873296    .1190943
                                     |
                               lnemp |
                                 --. |   2.201223   .2833431     7.77   0.000     1.645881    2.756566
                                 L1. |  -1.239101   .2609633    -4.75   0.000     -1.75058   -.7276224
                        -------------+----------------------------------------------------------------
                        indirect     |
                              lnpcap |   -.000232   .0033171    -0.07   0.944    -.0067335    .0062694
                                lnpc |   .0008047   .0027482     0.29   0.770    -.0045817     .006191
                                     |
                               lnemp |
                                 --. |   .1115222   .1617033     0.69   0.490    -.2054104    .4284548
                                 L1. |  -.0627775   .0899942    -0.70   0.485     -.239163     .113608
                        -------------+----------------------------------------------------------------
                        total        |
                              lnpcap |  -.0048121   .0662475    -0.07   0.942    -.1346549    .1250306
                                lnpc |    .016687   .0551394     0.30   0.762    -.0913843    .1247583
                                     |
                               lnemp |
                                 --. |   2.312745   .2306537    10.03   0.000     1.860672    2.764818
                                 L1. |  -1.301879   .2404403    -5.41   0.000    -1.773133   -.8306243
                        ------------------------------------------------------------------------------
                        
                        . lincom [direct]lnemp + [direct]L.lnemp
                        
                         ( 1)  [direct]lnemp + [direct]L.lnemp = 0
                        
                        ------------------------------------------------------------------------------
                               lngsp | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                        -------------+----------------------------------------------------------------
                                 (1) |   .9621221    .385208     2.50   0.013     .2071284    1.717116
                        ------------------------------------------------------------------------------
                        
                        . lincom [indirect]lnemp + [indirect]L.lnemp
                        
                         ( 1)  [indirect]lnemp + [indirect]L.lnemp = 0
                        
                        ------------------------------------------------------------------------------
                               lngsp | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                        -------------+----------------------------------------------------------------
                                 (1) |   .0487447   .1850592     0.26   0.792    -.3139647    .4114541
                        ------------------------------------------------------------------------------
                        
                        . lincom [total]lnemp + [total]L.lnemp
                        
                         ( 1)  [total]lnemp + [total]L.lnemp = 0
                        
                        ------------------------------------------------------------------------------
                               lngsp | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                        -------------+----------------------------------------------------------------
                                 (1) |   1.010867   .3331856     3.03   0.002      .357835    1.663899
                        ------------------------------------------------------------------------------
                        For full details of the new options and features, please see the help files:
                        Code:
                        help spxtivdfreg
                        help spxtivdfreg postestimation
                        help xtivdfreg
                        help xtivdfreg postestimation
                        The estimator implemented in spxtivdfreg is based on the research article by Cui et al. (forthcoming), which is an extension to spatial models of the defactored instrumental-variables estimator of Norkute et al. (2021) for panel data models with a multifactor error structure. The xtivdfreg command is also desribed in detail in our Stata Journal article (Kripfganz and Sarafidis, 2021).
                        https://twitter.com/Kripfganz

                        Comment


                        • #13
                          A minor bug-fix update to version 1.3.2 is now available on my personal website:
                          Code:
                          net install xtivdfreg, from(http://www.kripfganz.de/stata/) replace
                          https://twitter.com/Kripfganz

                          Comment


                          • #14
                            Another minor bug-fix update to version 1.3.3 is now available, which corrects a wrong stability check introduced in the previous version for the calculation of long-run impacts after spxtivdfreg.
                            Code:
                             
                             net install xtivdfreg, from(http://www.kripfganz.de/stata/) replace
                            https://twitter.com/Kripfganz

                            Comment


                            • #15
                              Unfortunately, there was another bug when options std and doubledefact were combined. This has been fixed now in version 1.3.4.
                              https://twitter.com/Kripfganz

                              Comment

                              Working...
                              X