Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • XTIVDFREG: new Stata command for instrumental variable estimation of large panel data models with common factors

    Together with Vasilis Sarafidis, I have released a new Stata package called xtivdfreg. The command implements a general instrumental variables approach for estimating large panel data models (large N and large T) with unobserved common factors or interactive effects, as developed by Norkute et al. (2020). The underlying idea of this approach is to project out the common factors from exogenous covariates using principal components analysis, and run IV regression using defactored covariates as instruments. The resulting "IVDF" method is valid for models with homogeneous or heterogeneous slope coefficients, and has several advantages relative to existing popular approaches (e.g. common correlated effects estimation). The algorithm accommodates unbalanced panel data and permits highly flexible instrumentation strategies.

    You can install the command from my personal website:
    Code:
    net install xtivdfreg, from(http://www.kripfganz.de/stata/)
    The syntax and options are explained in the Stata help file:
    Code:
    help xtivdfreg
    The help file also contains a few examples.

    For full details, see our accompanying article: Further reference:

  • Sebastian Kripfganz
    replied
    An update to version 1.5.0 is available on my personal website:
    Code:
    net install xtivdfreg, from(http://www.kripfganz.de/stata/) replace
    Besides some minor bug fixes related to collinear instruments, the postestimation predict command now has 3 new options: yabsorb, xabsorb, and iv.
    • yabsorb and xabsorb create new variables in the data set for the dependent and independent variables net of the absorbed fixed effects.
    • iv creates new variables for the defactored instrumental variables as used in the final estimation stage.
    In addition, xtivdfreg now stores the weighting matrix in e(W), and predict with option iv does the same in r(W) with updated row and column titles reflecting the names of the instruments.

    Together, these new features now allow replication of the xtivdfreg results with ivreg2, and therefore provide access to additional postestimation statistics reported by the latter. Here is an example:
    Code:
    use http://www.kripfganz.de/stata/xtivdfreg_example.dta
    xtivdfreg L(0/1).CAR size ROA liquidity, absorb(id t) iv(size ROA liquidity, lags(2)) factmax(3)
    predict ya, yabsorb
    predict xa*, xabsorb
    predict iv*, iv
    matrix W = r(W)
    matrix S = invsym(W)
    ivreg2 ya (xa* = iv*), nocons smatrix(S)
    Note that the inverse of the weighting matrix needs to be supplied to ivreg2 with the smatrix() option. While ivreg2 also has a wmatrix() option, it apparently ignores the supplied weighting matrix unless a (cluster-)robust VCE or the 2-step GMM estimator is requested as well. (ivreg2 does not know that the supplied weighting matrix is already optimal.)

    Also, it is essential to specify the nocons option with ivreg2. (The constant has already been absorbed.)

    Please see the help file for further details:
    Code:
    help xtivdfreg postestimation
    Last edited by Sebastian Kripfganz; 13 Oct 2024, 10:03.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    I am excited to announce that my co-author Vasilis Sarafidis has recorded 2 YouTube videos on the IV method for estimating panel data models with common factors and the implementation with our xtivdfreg package.

    Check them out: https://www.youtube.com/playlist?lis...LYo47_qkDedqqS

    Leave a comment:


  • Sebastian Kripfganz
    replied
    The latest version 1.4.2 of the xtivdfreg package comes with some smaller bug fixes and new functionality for mean-group (MG) estimation. Previously, the mg option would only return the final MG estimates. Now, the group-specific estimates and standard errors can also be accessed through the returned matrices e(b_mg) and e(se_mg), respectively. Moreover, the estimates for a specific group can be directly displayed in standard regression output format by specifying the group's ID number as an argument to the mg() option. Please see the help file for details.

    In relation to MG estimation, another technical modification has been made: The latest version now requests either the absorb() option (with specification of the panel identifier variable) or the noconstant option for the MG estimation. Previously, without either of these options, the heterogeneous slope coefficients were estimated under the implicit assumption of homogeneous intercepts, but the MG intercept was subsequently still computed from heterogeneous intercept estimates. To avoid this inconsistency, the user now needs to take an explicit stance on whether there are heterogeneous intercepts (by using the absorb() option) or not (by using the noconstant option). This is only relevant for MG estimation.

    The latest version can be obtained in the usual way either from SSC or from my personal website.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    An important update is available for the xtivdfreg package. For the absorption of fixed effects, the command relies on Sergio Correia's reghdfe and ftools packages. The latest version of reghdfe introduced some code-breaking syntax changes. To reestablish compatibility between xtivdfreg and reghdfe, please update to the latest xtivdfreg version 1.3.5 from my personal website, and make sure that you have the latest versions of reghdfe (6.12.3) and ftools (2.49.1) from SSC.

    Code:
    net install xtivdfreg, from(http://www.kripfganz.de/stata/) replace
    
    ssc install reghdfe
    ssc install ftools
    Warning: If some of your own programs or do-files use reghdfe, the new version might also break your code. You might want to keep a backup of your current reghdfe version before updating.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Unfortunately, there was another bug when options std and doubledefact were combined. This has been fixed now in version 1.3.4.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Another minor bug-fix update to version 1.3.3 is now available, which corrects a wrong stability check introduced in the previous version for the calculation of long-run impacts after spxtivdfreg.
    Code:
     
     net install xtivdfreg, from(http://www.kripfganz.de/stata/) replace

    Leave a comment:


  • Sebastian Kripfganz
    replied
    A minor bug-fix update to version 1.3.2 is now available on my personal website:
    Code:
    net install xtivdfreg, from(http://www.kripfganz.de/stata/) replace

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Update announcement: The new xtivdfreg version 1.3.1 - available on SSC and my personal website - comes with a significant extension. It now allows to fit spatial panel data models with spatial lags of the dependent and independent variables. For this purpose, the package contains the new spxtivdfreg command, which parses the additional spatial features, but otherwise is a wrapper for xtivdfreg.

    To illustrate the new features, I am using the example data set from the community-contributed xsmle command. spxtivdfreg can use spatial weights matrices that are available as an spmatrix object, as a Mata matrix, as a Stata matrix, or as a text/Excel file. Here, we are obtaining a Mata matrix from the spmat object (part of the community-contributed sppack package, not to be confused with the official spmatrix), which we then use with spxtivdfreg:
    Code:
    . use http://www.econometrics.it/stata/data/xsmle/product.dta, clear
    . gen lngsp = ln(gsp)
    . gen lnpcap = ln(pcap)
    . gen lnpc = ln(pc)
    . gen lnemp = ln(emp)
    
    . spmat use usaww using http://www.econometrics.it/stata/data/xsmle/usaww.spmat
    . spmat getmatrix usaww W
    
    . spxtivdfreg lngsp lnpcap lnpc lnemp, spmatrix(W, mata) splag tlags(1) sptlags(1) iv(lnpcap lnpc lnemp, splags lags(2)) absorb(state) std
    
    
    Defactored instrumental variables estimation
    
    Group variable: state                        Number of obs         =       720
    Time variable: year                          Number of groups      =        48
    
    Number of instruments  =     18              Obs per group     min =        15
    Number of factors in X =      2                                avg =        15
    Number of factors in u =      1                                max =        15
    
    Second-stage estimator (model with homogeneous slope coefficients)
    ------------------------------------------------------------------------------
                 |               Robust
           lngsp | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           lngsp |
             L1. |   .2532778   .0543828     4.66   0.000     .1466894    .3598661
                 |
          lnpcap |  -.2299721   .0442759    -5.19   0.000    -.3167512   -.1431929
            lnpc |  -.0197032   .0295865    -0.67   0.505    -.0776916    .0382853
           lnemp |   .7265171   .0615828    11.80   0.000     .6058169    .8472172
           _cons |   3.687231   .4145171     8.90   0.000     2.874793     4.49967
    -------------+----------------------------------------------------------------
    W            |
           lngsp |
             --. |    .502397   .0430744    11.66   0.000     .4179727    .5868213
             L1. |  -.3575409   .0523114    -6.83   0.000    -.4600694   -.2550125
    -------------+----------------------------------------------------------------
         sigma_f |  .02458298   (std. dev. of factor error component)
         sigma_e |  .01571357   (std. dev. of idiosyncratic error component)
             rho |  .70993317   (fraction of variance due to factors)
    ------------------------------------------------------------------------------
    Hansen test of the overidentifying restrictions        chi2(12)    =   20.6457
    H0: overidentifying restrictions are valid             Prob > chi2 =    0.0558
    We specified the spatial weights matrix with the spmatrix() option. The options splag tlags(1) sptlags(1) then tell the command to estimate a model with spatial lag, time lag, and spatial time lag of the dependent variable, respectively. This yields a time-space dynamic panel data model with autoregressive components both in the spatial and the time dimension. Similar to the non-spatial xtivdfreg, instruments are specified with the iv() option. Here, the suboption splags requests to also include spatial lags of those instruments as additional instruments. absorb() eliminates fixed effects in the usual way. Finally, std is a new xtivdfreg option, which requests a standardization of the instruments in the factor-extraction process; this can help to stabilize the estimation. The regression output contains the new section labelled "W" for the spatially lagged variables. Here, both time dynamics and spatial dynamics appear to be relevant.

    As a postestimation feature, we can compute short-run and long-run direct, indirect, and total impacts, which is standard in the spatial econometrics literature. Importantly, it is imperative to specify time lags with the tlags() option of spxtivdfreg. Otherwise, the calculation of the long-run impacts will be incorrect.
    Code:
    . estat impact, sr
    
    Short-run impacts
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Impact   std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
    direct       |
          lnpcap |  -.2482317   .0473038    -5.25   0.000    -.3409456   -.1555179
            lnpc |  -.0212676   .0318672    -0.67   0.505    -.0837261    .0411909
           lnemp |   .7842023   .0563274    13.92   0.000     .6738026     .894602
    -------------+----------------------------------------------------------------
    indirect     |
          lnpcap |   -.213928   .0500893    -4.27   0.000    -.3121012   -.1157547
            lnpc |  -.0183286   .0270505    -0.68   0.498    -.0713466    .0346895
           lnemp |   .6758313   .0723162     9.35   0.000     .5340942    .8175685
    -------------+----------------------------------------------------------------
    total        |
          lnpcap |  -.4621597    .091373    -5.06   0.000    -.6412474    -.283072
            lnpc |  -.0395961   .0588423    -0.67   0.501    -.1549248    .0757326
           lnemp |   1.460034   .0696914    20.95   0.000     1.323441    1.596626
    ------------------------------------------------------------------------------
    
    . estat impact, lr
    
    Long-run impacts
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Impact   std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
    direct       |
          lnpcap |   -.310883   .0582214    -5.34   0.000    -.4249949   -.1967711
            lnpc |  -.0266353   .0404538    -0.66   0.510    -.1059233    .0526527
           lnemp |   .9821273   .0515769    19.04   0.000     .8810383    1.083216
    -------------+----------------------------------------------------------------
    indirect     |
          lnpcap |  -.0712153   .0268197    -2.66   0.008     -.123781   -.0186496
            lnpc |  -.0061015   .0094865    -0.64   0.520    -.0246946    .0124917
           lnemp |     .22498   .0590485     3.81   0.000      .109247    .3407131
    -------------+----------------------------------------------------------------
    total        |
          lnpcap |  -.3820983   .0770055    -4.96   0.000    -.5330264   -.2311702
            lnpc |  -.0327368   .0497881    -0.66   0.511    -.1303197    .0648461
           lnemp |   1.207107   .0325818    37.05   0.000     1.143248    1.270966
    ------------------------------------------------------------------------------
    A slight complication arises in the computation of long-run effects when there are distributed lags of the regressors, because the command treats them as separate variables, while their long-run effects should be added up for the respective variable. To circumvent this problem, the post option of estat impact allows to post the impact effects in e(b) and e(V), as if they were the actual estimation results. We can then use standard regression postestimation commands, especially lincom (and test for hypothesis tests). Here is a simplified example with a distributed lag of the variable lnemp:
    Code:
    . spxtivdfreg lngsp lnpcap lnpc L(0/1).lnemp, spmatrix(W, mata) splag iv(lnpcap lnpc lnemp, splags lags(2)) tlags(1) sptlags(1) std absorb(state)
    (output omitted)
    
    . estat impact, lr post
    
    Long-run impacts
    ------------------------------------------------------------------------------
                 |            Delta-method
           lngsp |     Impact   std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
    direct       |
          lnpcap |  -.0045801   .0629467    -0.07   0.942    -.1279534    .1187932
            lnpc |   .0158823   .0526601     0.30   0.763    -.0873296    .1190943
                 |
           lnemp |
             --. |   2.201223   .2833431     7.77   0.000     1.645881    2.756566
             L1. |  -1.239101   .2609633    -4.75   0.000     -1.75058   -.7276224
    -------------+----------------------------------------------------------------
    indirect     |
          lnpcap |   -.000232   .0033171    -0.07   0.944    -.0067335    .0062694
            lnpc |   .0008047   .0027482     0.29   0.770    -.0045817     .006191
                 |
           lnemp |
             --. |   .1115222   .1617033     0.69   0.490    -.2054104    .4284548
             L1. |  -.0627775   .0899942    -0.70   0.485     -.239163     .113608
    -------------+----------------------------------------------------------------
    total        |
          lnpcap |  -.0048121   .0662475    -0.07   0.942    -.1346549    .1250306
            lnpc |    .016687   .0551394     0.30   0.762    -.0913843    .1247583
                 |
           lnemp |
             --. |   2.312745   .2306537    10.03   0.000     1.860672    2.764818
             L1. |  -1.301879   .2404403    -5.41   0.000    -1.773133   -.8306243
    ------------------------------------------------------------------------------
    
    . lincom [direct]lnemp + [direct]L.lnemp
    
     ( 1)  [direct]lnemp + [direct]L.lnemp = 0
    
    ------------------------------------------------------------------------------
           lngsp | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             (1) |   .9621221    .385208     2.50   0.013     .2071284    1.717116
    ------------------------------------------------------------------------------
    
    . lincom [indirect]lnemp + [indirect]L.lnemp
    
     ( 1)  [indirect]lnemp + [indirect]L.lnemp = 0
    
    ------------------------------------------------------------------------------
           lngsp | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             (1) |   .0487447   .1850592     0.26   0.792    -.3139647    .4114541
    ------------------------------------------------------------------------------
    
    . lincom [total]lnemp + [total]L.lnemp
    
     ( 1)  [total]lnemp + [total]L.lnemp = 0
    
    ------------------------------------------------------------------------------
           lngsp | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             (1) |   1.010867   .3331856     3.03   0.002      .357835    1.663899
    ------------------------------------------------------------------------------
    For full details of the new options and features, please see the help files:
    Code:
    help spxtivdfreg
    help spxtivdfreg postestimation
    help xtivdfreg
    help xtivdfreg postestimation
    The estimator implemented in spxtivdfreg is based on the research article by Cui et al. (forthcoming), which is an extension to spatial models of the defactored instrumental-variables estimator of Norkute et al. (2021) for panel data models with a multifactor error structure. The xtivdfreg command is also desribed in detail in our Stata Journal article (Kripfganz and Sarafidis, 2021).

    Leave a comment:


  • Sebastian Kripfganz
    replied
    An update to xtivdfreg is available from my website:
    Code:
    net install xtivdfreg, from(http://www.kripfganz.de/stata/) replace
    The new version 1.1.0 has substantial speed improvements and now also works with larger data sets.

    Leave a comment:


  • John Francois
    replied
    Originally posted by Sebastian Kripfganz View Post
    I am afraid this is not (currently) possible.


    Factor extraction is only valid from strictly exogenous variables, i.e. they must be uncorrelated with any future, current, and past errors. The lag of an endogenous variable X does not satisfy this condition and therefore should not be included in iv(). You would need to find a valid external instrument Z.
    Thanks again, Sebastian. I appreciate all the help.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Originally posted by John Francois View Post
    I was wondering if it is possible to retrieve the residuals (which are free from the unobserved factors) after an xtivdfreg regression, I want to test whether the residuals from my model estimated with xtivdfreg are cross-sectionally independent compared to other estimators.
    I am afraid this is not (currently) possible.

    Originally posted by John Francois View Post
    If X is endogenous and one has external instrument (Z), can one specify the iv() option to include both the external variable Z and the lags of the endogenous variable X (i.e., xtivdfreg Y X, iv(X, Z, lags(2)) factmax(N)?.or X shouldn't be included in iv()-- only Z should be included.. I ask this because my takeaway from reading the paper is that when some of the regressors are endogenous with respect to epsilon_it, extracting the principal components from those endogenous regressors can be invalid. I understand that the lags of the endogenous variable, by construction, are not endogenous with respect to epsilon_it; hence, they can be included in iv(). However, I just wanted to clarify if I am thinking about the treatment of the endogenous regressor case correctly.
    Factor extraction is only valid from strictly exogenous variables, i.e. they must be uncorrelated with any future, current, and past errors. The lag of an endogenous variable X does not satisfy this condition and therefore should not be included in iv(). You would need to find a valid external instrument Z.

    Leave a comment:


  • John Francois
    replied
    Sorry, for all the questions but I do have an additional question. If X is endogenous and one has external instrument (Z), can one specify the iv() option to include both the external variable Z and the lags of the endogenous variable X (i.e., xtivdfreg Y X, iv(X, Z, lags(2)) factmax(N)?.or X shouldn't be included in iv()-- only Z should be included.. I ask this because my takeaway from reading the paper is that when some of the regressors are endogenous with respect to epsilon_it, extracting the principal components from those endogenous regressors can be invalid. I understand that the lags of the endogenous variable, by construction, are not endogenous with respect to epsilon_it; hence, they can be included in iv(). However, I just wanted to clarify if I am thinking about the treatment of the endogenous regressor case correctly. Thanks again!

    Leave a comment:


  • John Francois
    replied
    Hi Sebastian, I was wondering if it is possible to retrieve the residuals (which are free from the unobserved factors) after an xtivdfreg regression, I want to test whether the residuals from my model estimated with xtivdfreg are cross-sectionally independent compared to other estimators. Thanks for the help.

    Leave a comment:


  • John Francois
    replied
    Thank you so much for the prompt response. I really do appreciate it.

    Leave a comment:

Working...
X