Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sarah Magd
    replied
    Dear Prof. Sebastian Kripfganz

    Thanks for your constructive replies.

    1. Are there any issues if we restrict our sample to 28 countries and 13 years? We use a one-step system GMM estimator to estimate our model with this sample. Could you please let us know if we still have any issues with this setup?
    2. Given this sample, can we use the diff-GMM for robustness checks? or would you recommend another estimator for robustness?

    Leave a comment:


  • Sebastian Kripfganz
    replied
    1. I would call this a small N, moderately small T sample. You probably do not need to be concerned much with asymptotic efficiency; it might thus be a good idea to use the one-step insted of the two-step estimator, to avoid estimating the weighting matrix. Also, use the available options (collapsing and lag restrictions) to limit the number of instruments. You could still use the system GMM estimator if you can theoretically justify its assumptions. With such a data set, testing these assumptions empirically is challenging and probably not very reliable.
    2. From the outset, we do not know what the true value of the coefficient of the lagged dependent variable is; that is why we are estimating it. There can be different reasons for the observed differences: (i) sampling variability due to the small data set; (ii) endogeneity of the lagged dependent variable (due to neglected serial correlation in the error term) such that the model treating it as predetermined is misspecified; (iii) weak instruments when treating the lagged dependent variable as endogenous, to name a few.

    Leave a comment:


  • Sarah Magd
    replied
    Dear Prof. Sebastian Kripfganz

    1) Can we use the sys-gmm with a sample that has 28 countries and 20 years? Is this considered a big T or can we still use the sys-GMM?
    2) When we define the lagged dependent variable as a predetermined variable, the estimated coefficient of this variable is 0.542. However, when we specify the variable as an endogenous, its magnitude becomes .745. Does the magnitude of the lagged dependent variable have to be close to 1?

    Could you please guide us on these two points.


    Thanks

    Leave a comment:


  • Tugrul Cinar
    replied
    What you are describing is a data set with repeated cross sections. xtdpdgmm requires the data to be declared as panel data; in particular, a panel identifier variable needs to be declared with xtreg. This may not be possible with the type of data you have.
    Thank you very much for the quick response.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    What you are describing is a data set with repeated cross sections. xtdpdgmm requires the data to be declared as panel data; in particular, a panel identifier variable needs to be declared with xtreg. This may not be possible with the type of data you have.

    Leave a comment:


  • Tugrul Cinar
    replied
    Dear Sebastian,

    I am going to use a Micro dataset for an upcoming study. However, this dataset consists of random samples for each year, Essentially, it's a pooled dataset rather than panel data. Moreover, I suspect an endogeneity issue between the dependent and independent variables in the model I'm aiming to estimate. Additionally, the dataset encompasses roughly 100,000 units per year, spanning across seven years.

    Given that the xtdpdgmm command is designed for linear (dynamic) panel data, do you recommend it for analyzing a pooled dataset?

    Leave a comment:


  • Arkangel Cordero
    replied
    Dear Professor @Sebastian Kripfganz


    Understood. This is helpful. Thank you!

    Leave a comment:


  • Sebastian Kripfganz
    replied
    If the variables w k are strictly exogenous (with respect to the idiosyncratic error component), then any serial error correlation does not affect the validity of them (or any of their lags) as instruments. If there is serial error correlation due to the omission of relevant lags of w k as regressors, e.g. due to delayed direct effects of L2.(w k), then w k would not be strictly exogenous in the first place in a model with those omitted lags. Thus, saying that w k are strictly exogenous effectively is also a statement about the correct specification of the model dynamics.

    In this regard, I wonder what your motivation is for including L.(w k) as regressors instead of w k. Sometimes, people do this to avert simultaneous feedback from the dependent variable. In that case, however, L.(w k) may not be endogenous any more, but they cannot be strictly exogenous either. At best, they would be predetermined (weakly exogenous). For predetermined variables, serial error correlation does matter for the validity of the instruments. Probably even more important, simply lagging the regressors for this argument typically creates model misspecification, which then puts the whole analysis in jeopardy.

    Leave a comment:


  • Arkangel Cordero
    replied
    Dear Professor @Sebastian Kripfganz


    That all makes sense! I am grateful for your insights. These exchanges have been quite enlightening! I can clearly see the flexibility of xtdpdgmm. Thank you for this command!

    In order to close these series of exchanges, I just want to confirm that assuming that L(w k) are exogenous, you comment that
    So, yes, would need to adjust both lag orders if there is serial correlation.
    would not apply to the iv(L(w k), model(difference)) iv(L(w k), difference model(level)) part of m3 below. My logic is that if L(w k) are assumed exogenous, then (and once the fixed-effects are expunged), iv(L(w k), model(difference)) iv(L(w k), difference model(level)) cannot possibly be causing any autocorrelation in the residuals by assumption? Is that correct? Do you have any practical guidance on this?

    Code:
    xtdpdgmm L(0/1).n L(w k), model(level) ///
    gmm(L(n), lag(2 3) model(difference)) gmm(L(n), lag(1 1) difference model(level)) ///
    iv(L(w k),  model(difference)) iv(L(w k), difference model(level)) ///
    two vce(cluster id) teffects
    estimate store m3
    Thank you again.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Btw: I elaborate a bit on the lag(0 0) issue with xtabond2 in this other topic: https://www.statalist.org/forums/for...32#post1740632

    Leave a comment:


  • Sebastian Kripfganz
    replied
    1) Yes, m1 still relies on the random-effects assumption.

    2) m2 looks fine; just keep in mind that a system GMM estimator generally requires stronger assumptions about the initial observations than a difference GMM estimator.

    3) - 5) If higher-order serial correlation in the first-differenced residuals is detected, this would generally invalidate the lags for the dependent variable as instruments. Note that your specification is a bit odd in the sense that the instruments for the lagged dependent variable in the first-differenced model allow for second-order serial correlation in the first-differenced errors (because the first instrument is effectively only the third lag of n - the second lag of L.n), while the respective instruments for the level model do not allow for any serial correlation in the level errors (which would correspond to at most first-order serial correlation in the first-differenced errors). In that regard, m3 makes more sense. So, yes, would need to adjust both lag orders if there is serial correlation. In the basic case without serial correlation in level (but first-order serial correlation detected by estat serial in first differences), you would specify gmm(L(n), lag(1 2) model(difference)) gmm(L(n), lag(0 0) difference model(level)).

    Leave a comment:


  • Arkangel Cordero
    replied
    Dear Professor @Sebastian Kripfganz


    Got it. Thank you so much for your helpful insights. I think I'll stick with xtdpdgmm.

    Some final questions in response to your comment at the end of #624 that:
    As an aside, note that iv(L(w k)) for the level model makes the strong assumption that w and k are uncorrelated with the unobserved group-specific effects (akin to a random-effects assumption), which may not be desired.
    Regarding the "iv-style" instruments with exogenous L(w k):

    1) Assuming that i) L(w k) are exogenous but ii) we don't want to rely on the random-effects assumption while iii) still wanting to instrumentalize for w and k in both equations in the a system gmm model, would m2 below be preferable to m1 below?

    2) Do you see anything inherently problematic with m2 below regarding the "iv-style" instruments for exogenous L(w k)?

    3) Does the fact that "estat serial, ar()" test after xtdpdgmm reveals ar(1)--or higher-- are statistically significant have any implications for the lags of the "iv-style" instruments?

    Regarding the "gmm-style" instruments with endogenous L(n):

    4) Assuming that L(n) is endogenous and that the "estat serial, ar()" test after xtdpdgmm reveals that only ar(1) is statistically significant, would m2 be preferable to m3? The difference between these specifications is the lag(0 0) or lag(1 1) in the gmm(L(n), lag(x x) difference model(level)).

    5) I guess my more general question is on the appropriate lags for the first-difference instruments in the levels equation for the gmm-style instruments. Do we need to adjust that lags depending on those results of the "estat serial, ar()" test after xtdpdgmm just as we have to adjust the lags in the difference model? If so, in the most basic case where L(n) is endogenous but only ar(1) is statistically significant, would the lags have to be lag(1 1) so that the instruments for the level equation would be D.L2(n)?



    Code:
    webuse  abdata
    
    xtdpdgmm L(0/1).n L(w k), model(level) ///
    gmm(L(n), lag(2 3) model(difference)) gmm(L(n), lag(0 0) difference model(level)) ///
    iv(L(w k)) iv(L(w k), difference model(difference)) ///
    two vce(cluster id) teffects
    estimate store m1
    
    
    xtdpdgmm L(0/1).n L(w k), model(level) ///
    gmm(L(n), lag(2 3) model(difference)) gmm(L(n), lag(0 0) difference model(level)) ///
    iv(L(w k),  model(difference)) iv(L(w k), difference model(level)) ///
    two vce(cluster id) teffects
    estimate store m2
    
    xtdpdgmm L(0/1).n L(w k), model(level) ///
    gmm(L(n), lag(2 3) model(difference)) gmm(L(n), lag(1 1) difference model(level)) ///
    iv(L(w k),  model(difference)) iv(L(w k), difference model(level)) ///
    two vce(cluster id) teffects
    estimate store m3
    Last edited by Arkangel Cordero; 20 Jan 2024, 14:23.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    In general, iv() is just a collapsed version of gmm(); see also the help file for xtdpdgmm:
    gmmiv(varlist, lagrange(#_1 #_2) collapse) is equivalent to iv(varlist, lagrange(#_1 #_2))
    For xtabond2, I again recommend to specify instruments separately for the level and differenced model, and to explicitly specify lag orders. The default settings can be very confusing. In your example, I do not even understand what is going on with xtabond2. The command appears to create the first lag of the first-differenced instruments (which in your case are therefore lagged twice, because you specified the lag operator in the variable list) for the level model. This is inconsistent with the help file. It should not lag those instruments when you specify lag(0 0); this appears to be another bug. Why it does not replicate xtdpdgmm is another mystery to me. If you explicitly specify the instruments separately for each equation, you should be able to replicate the results.

    The ivreg2 standard errors do not apply the Windmeijer correction.

    Leave a comment:


  • Arkangel Cordero
    replied
    Dear Professor @Sebastian Kripfganz


    Understood! Thank you.

    I have a question regarding your comment that
    As an aside, note that iv(L(w k)) for the level model makes the strong assumption that w and k are uncorrelated with the unobserved group-specific effects (akin to a random-effects assumption), which may not be desired.
    I understand your point. Assuming that L(w k) are exogenous, would the following xtabond2 specification make more sense if we want to have instruments for L(w k) in both equations? gmm(L(w k), lag(0 0)) Or would it have to be gmm(L(w k), lag(1 1))?

    Relatedly, can you please provide some guidance as to why I can't seem to reproduce the xtabond2 model below with xtdpdgmm.

    Code:
    webuse abdata
    
    xtabond2 L(0/1).n L(w k) (yr1978 - yr1984),  iv(yr1978 - yr1984, eq(level)) ///
    gmm(L(n), lag(2 3)) ///
    gmm(L(w k), lag(0 0))  ///
    two cluster(id)
    estimate store m1
    
    
    xtdpdgmm L(0/1).n L(w k), teffects model(level) ///
    gmm(L(n), lag(2 3) model(difference)) gmm(L(n), lag(1 1) difference model(level)) ///
    gmm(L(w k),   lag(0 0) model(difference)) gmm(L(w k),   lag(1 1) difference model(level)) ///
    two vce(cluster id) 
    estimate store m2
    
    quietly predict iv*, iv
    
    ivreg2 n  (L1.(n w k) yr1978 - yr1984 = iv*),     gmm2s  cluster(id) nocollin
    estimate store m3
    
    esttab m1 m2 m3, b(2) se(3)
    
    
    -----------------------------------------------------------
                          (1)             (2)             (3)   
                            n               n               n   
    ------------------------------------------------------------
    L.n                  0.88***         0.97***         0.97***
                      (0.060)         (0.052)         (0.024)   
    
    L.w                  0.04           -0.20           -0.20***
                      (0.078)         (0.107)         (0.049)   
    
    L.k                  0.10*           0.05            0.05** 
                      (0.046)         (0.044)         (0.017)   
    
    yr1978              -0.01                           -0.03** 
                      (0.015)                         (0.012)   
    
    yr1979              -0.01                           -0.05***
                      (0.017)                         (0.014)   
    
    yr1980              -0.06**                         -0.09***
                      (0.019)                         (0.014)   
    
    yr1981              -0.15***                        -0.17***
                      (0.022)                         (0.016)   
    
    yr1982              -0.14***                        -0.15***
                      (0.020)                         (0.014)   
    
    yr1983              -0.10***                        -0.10***
                      (0.026)                         (0.016)   
    
    yr1984              -0.09*                          -0.07***
                      (0.034)                         (0.019)   
    
    1978.year                           -0.03                   
                                      (0.019)                   
    
    1979.year                           -0.05*                  
                                      (0.024)                   
    
    1980.year                           -0.09***                
                                      (0.024)                   
    
    1981.year                           -0.17***                
                                      (0.027)                   
    
    1982.year                           -0.15***                
                                      (0.023)                   
    
    1983.year                           -0.10***                
                                      (0.026)                   
    
    1984.year                           -0.07*                  
                                      (0.030)                   
    
    _cons                0.07            0.73*           0.73***
                      (0.263)         (0.347)         (0.158)   
    ------------------------------------------------------------
    N                     891             891             891   
    ------------------------------------------------------------
    Standard errors in parentheses
    * p<0.05, ** p<0.01, *** p<0.001
    Finally, and I am sure this is a dumb question on my part, why are the standard errors so different between xtdpdgmm and ivreg2? Is there a way to adjust them in ivreg2 to make them match xtdpdgmm?

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Originally posted by Arkangel Cordero View Post
    However, for the purposes of reproducing the results of xtabond2 (point estimates and their standard errors) with xtdpdgmm for the system gmm, I could only do so by including iv(yr1978 - yr1984) and iv(L(w k)) in both equations for xtdpdgmm?
    I am not sure what you mean by that. The following codes without time dummies as instruments for the differenced model still yield the same results, and similarly if I drop corresponding instruments for w and k:
    Code:
    xtabond2 L(0/1).n L(w k) (yr1978 - yr1984), ///
    gmm(L(n), lag(2 3)) iv(yr1978 - yr1984, eq(level)) iv(L(w k), eq(diff)) iv(L(w k), eq(level))  two cluster(id)
    
    xtdpdgmm L(0/1).n L(w k), model(level) gmm(L(n), lag(2 3) model(difference)) ///
    gmm(L(n), lag(1 1) difference model(level)) iv(L(w k)) iv(L(w k), difference model(difference)) two vce(cluster id) teffects
    As an aside, note that iv(L(w k)) for the level model makes the strong assumption that w and k are uncorrelated with the unobserved group-specific effects (akin to a random-effects assumption), which may not be desired.

    Leave a comment:

Working...
X