Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sarah Magd
    replied
    Thanks a lot for the constructive and organized reply.

    As far as I understood, for the case of static regression with fixed effects:
    - The two-step system GMM estimator can control the endogeneity problem resulting from either a reverse causality or omitted variables bias (assuming that appropriate estimators are available).
    - The two-step system GMM estimator is relatively more efficient than the one-step system GMM estimator because it accounts for the extra variance coming from the unobserved fixed effects
    - The validity of the two-step GMM estimator is tested by the Hansen test. If it is insignificant, we can conclude that the results obtained by this estimator are consistent and the GMM can deal with the problem of omitted variable bias.
    - Given the existence of endogenous regressors, the serial correlation would still affect the first admissible lag for the instruments. Therefore, for the Arellano-Bond test for autocorrelation of the first-differenced residuals, if H0: no autocorrelation of order 2 is accepted, then this can be an indication that there are no omitted dynamics nor omitted lags of the regressors.

    Am I right?

    Leave a comment:


  • Sebastian Kripfganz
    replied
    An estimator is either consistent or not. The GMM estimator is consistent if all the moment conditions/instruments are valid (and there are sufficiently many instruments available to estimate all coefficients).

    Efficiency is a relative concept. Among different GMM estimators, the asymptotically efficient estimator uses all non-redundant moment conditions/instruments and an optimal weighting matrix (as the two-step estimator does). If feasible, other estimators (such as a maximum likelihood estimator) might be more efficient in the sense that they achieve a smaller asymptotic variance.

    Omitted variables are a source of endogeneity. If appropriate instruments are available (which are uncorrelated with the omitted variables), then GMM can deal with this problem.

    Serial correlation may or may not be a problem. If all regressors are strictly exogenous, serial correlation can be accounted for by using an optimal weighting matrix and panel-robust standard errors. Sometimes, serial correlation can be an indication of omitted dynamics (which could be an omitted lagged dependent variable or omitted lags of the regressors). In that case, an omitted variables problem could arise.

    Leave a comment:


  • Sarah Magd
    replied
    Dear Prof. Sebastian Kripfganz
    - We have a static panel regression with relatively small T (i.e., T = 13 and cross-section units = 30), an endogenous variable (i.e., due to the reverse causality), and fixed effects.
    The OLS fixed effects with robust standard errors is used first to obtain baseline results. As far as I understood, the two-step system GMM estimator can be used to control only for the endogeneity problem. Is there another statistical issue that is considered by the two-step system GMM estimator in the case of a static specification (i.e., more efficiency or consistency - omitted variable bias - serial autocorrelation - etc.)?

    Leave a comment:


  • Sebastian Kripfganz
    replied
    A general answer is that lots of things can happen to your estimates when you change the underlying assumptions (i.e. one variable is treated as endogenous instead of exogenous). Instrumental variables estimators (including GMM) may help to alleviate the endogeneity problem, but they might create other problems. For example, standard errors might become quite large if instruments are relatively weak. Especially when you have a relatively small sample size, the differences between estimators might appear large because the coefficients are not estimated very precisely.

    I would recommend to change the estimator as little as necessary when you make different assumption, to get the best possible comparison. Say, you start with a fixed-effects estimator:
    Code:
    xtreg Y X1 X2 X3, fe vce(robust)
    Note that you can replicate this regression with xtdpdgmm as follows:
    Code:
    xtdpdgmm Y X1 X2 X3, model(mdev) iv(X1 X2 X3, norescale) small vce(robust)
    Then you assume that X1 is endogenous and you want to instrument it in the typical GMM style:
    Code:
    xtdpdgmm Y X1 X2 X3, model(mdev) iv(X2 X3, norescale) gmm(X1, lag(2 8) collapse model(diff)) twostep small vce(robust, dc)
    Notice that I have left the instruments for X2 X3 in the same format as for the traditional fixed-effects regression. This way, you can best compare the results.

    Leave a comment:


  • Sarah Magd
    replied
    I estimate the Cobb-Douglas production function in a static form as follows:
    GDP per capita = Capital formation per capita + energy consumption per capita + inflation + trade openness + financial development
    My sample is 13 years for 27 countries.
    - I am using the fixed effect regression with robust standard errors and panel corrected standard errors with fixed effects. The two regressions give the expected results of my variable of interest (i.e., financial development). However, since the energy consumption variable is endogenous (i.e., due to the reverse causality), I should use a model that corrects the potential biases of this endogeneity. As I mentioned in #424, I can use the two-step GMM estimator to control for the endogeneity. Nevertheless, the financial development (my main variable) in this regression is insignificant/or counterintuitive.

    - Given my sample size and the static specification, which estimator would be the most relevant to control for the endogeneity?

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Originally posted by Sarah Magd View Post
    should I replace the missing values in the newly generated threshold variables with zero?
    Yes

    Leave a comment:


  • Sarah Magd
    replied
    I tried following command
    xtdpdgmm L(0/1).Y X1*X2_h X1*X2_l X3 X4, model(diff) collapse gmm(Y X3 X4, lag(2 4)) gmm(X1*X2_h X1*X2_l, lag(1 7)) gmm(Y X3 X4, lag(1 1) diff model(level)) gmm(X1*X2_h X1*X2_l, lag(0 0) diff model (level)) vce(r, dc) overid twostep However, it gives this error:
    no observations
    r(2000);


    In this case, should I replace the missing values in the newly generated threshold variables with zero? As follows:

    gen X2_h = X2 if X2 > 0.32
    replace X2_h = 0 if X2_h == .

    gen X2_l = X2 if X2 <= 0.32
    replace X2_l = 0 if X2_l == .

    Leave a comment:


  • Sebastian Kripfganz
    replied
    You would not normally run two separate regressions for the effects above and below the threshold. Just combine everything in a single regression:
    Code:
    xtdpdgmm L(0/1).Y X1*X2_h X1*X2_l X3 X4, model(diff) collapse gmm(Y X3 X4, lag(2 4)) gmm(X1*X2_h X1*X2_l, lag(1 7)) gmm(Y X3 X4, lag(1 1) diff model(level)) gmm(X1*X2_h X1*X2_l, lag(0 0) diff model (level)) vce(r, dc) overid twostep

    Leave a comment:


  • Sarah Magd
    replied
    #################################################
    #Threshold dynamic panel model using xtdpdgmm
    #################################################

    Dear Prof. Kripfganz,

    I want to estimate a threshold dynamic panel model using the xtdpdgmm. I have estimated the threshold value with another command. My problem is how to estimate the model with xtdpdgmm.
    Suppose that X1 is the variable of interest and it is predetermined and X2 is the threshold variable with a threshold value = 0.32. X3 and X4 are control variables (endogenous). Would the following specification be right?

    gen X2_h = X2 if X2 > 0.32
    gen X2_l = X2 if X2 <= 0.32

    xtdpdgmm L(0/1).Y X1*X2_h X3 X4, model(diff) collapse gmm(Y X3 X4, lag(2 4)) gmm(X1*X2_h, lag(1 7)) gmm(Y X3 X4, lag(1 1) diff model(level)) gmm(X1*X2_h, lag(0 0) diff model (level)) vce(r, dc) overid twostep
    xtdpdgmm L(0/1).Y X1*X2_l X3 X4, model(diff) collapse gmm(Y X3 X4, lag(2 4)) gmm(X1*X2_l, lag(1 7)) gmm(Y X3 X4, lag(1 1) diff model(level)) gmm(X1*X2_l, lag(0 0) diff model (level)) vce(r, dc) overid twostep

    If the code is not correctly specified for a threshold dynamic panel model, it would be highly appreciated if you could guide us on the right specification.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    You do not necessarily need stationarity tests.

    I have a couple of comments about your specification:
    1. For the instruments in the iv() option, you are implicitly assuming that all of those variables are uncorrelated with the unobserved country-fixed effects. This is often hardly justifiable with such macroeconomic data.
    2. You are using a system GMM estimator. It is almost never justified to specify the nocons option when you do not have time dummies (as in your second specification). This has the potential to substantially bias your results. There is not really a justification anyway to leave out the time dummies in the subsamples.
    3. Subsample analysis can be difficult if the number of countries in those subsamples are very small. You may not get reliable estimates. The total number of instruments is actually not the most important metric. The number of overidentifying restrictions is what matters. It seems to me that you only have 2 overidentifying restrictions (see the degrees of freedom of the Hansen test) in your second specification. That's quite unproblematic, assuming you still have a reasonably large number of countries in each subsample.
    4. xtdpdgmm cannot exactly replicate your specifications because of the particular way the iv() option is implemented in xtabond2. Notice that iv() without the equation() suboption is not the same as the combination of two iv() options, one with eq(diff) and one with eq(level). If this surprises you, then I recommend to explicitly specify all instruments with the eq() suboption to ensure that you really get what you want. This also assists you in carefully thinking about what instruments you really want to specify.
    5. Instead of trying to replicate your current xtabond2 specification with xtdpdgmm, I suggest that you rebuild your model from scratch (with either command). Think first about the assumptions for each variable (strictly exogenous, predetermined, endogenous; correlated/uncorrelated with the fixed effects) and then build the instruments accordingly. Before you specify a gmm() or an iv() option, make sure to understand its implications. My 2019 London Stata Conference presentation can serve as a guideline.

    Leave a comment:


  • Abiola Ajila
    replied
    Hello Prof. Sebastian

    I have a dataset of 114 countries (N) across 18 year- time period (T). Is it necessary to perform stationarity test?

    For the full sample, I ran the below model for the effect of agric trade openness (lafto) on prevalence of obesity (lPoO)

    xtabond2 lPoO l.lPoO l2.lPoO lafto ///
    lgdpcap lrp lal lap gdpgr pg ind ac infl ///
    i.year, gmm(l.lPoO, lag (1 .) collapse) ///
    gmm(lafto, lag (1 .) collapse) ///
    iv(lgdpcap lrp lal lap gdpgr pg ind ac infl ///
    i.year) ///
    nodiffsargan twostep robust small nocons

    I would like to run a similar model for sub-samples: income categories (such as Low income, lower middle income, upper middle income and HI income countries).
    I ran the below model but I realized that the number of instruments is greater than the number of groups.

    xtabond2 lPoO l.lPoO lafto ///
    lgdpcap lrp lal lap gdpgr pg ind ac infl if inc_gr=="LI", ///
    gmm(l.lPoO, lag (1 2) collapse) ///
    gmm(lafto, lag (1 2) collapse) ///
    iv(lgdpcap lrp lal lap gdpgr pg ind ac infl) ///
    nodiffsargan twostep robust small nocons

    How can I run the xtdpdgmm for my estimations (full sample and sub-samples)?

    Leave a comment:


  • Sebastian Kripfganz
    replied
    1. For the endogenous variable X1, the first admissible lag as an instrument in the first-differenced model is lag 2. Thus, you need to change your first gmm() option into gmm(X1, lag(2 8)). Everything else looks okay.
    2. Your ivreg2 command specification makes much stronger assumptions. It assumes that all variables X1, X2, and X3 are uncorrelated with the unobserved group-specific effects (or that such effects are absent). This might become apparent when you look at it from the perspective of the equivalent xtdpdgmm code:
      Code:
      xtdpdgmm Y X1 X2 X3, iv(X2 X3 L.X1 L.X2, m(level)) twostep
    Last edited by Sebastian Kripfganz; 18 Jun 2022, 11:48.

    Leave a comment:


  • Sarah Magd
    replied
    Thanks a lot Prof. Kripfganz for this update on the xtdpdgmm code. I have run these doubly-corrected (DC) standard errors in estimating a static model. I have two questions:
    (1) Is this two-step system gmm correct for this static model (with X1: endogenous; and X2 and X3: predeterminants)
    xtdpdgmm Y X1 X2 X3, model(diff) collapse gmm(X1, lag(1 8)) gmm(X2 X3, lag(1 7)) gmm(X1, lag(1 1) diff model(level)) gmm(X2 X3, lag(0 0) diff model (level)) vce(r, dc) overid twostep
    Group variable: iso_num Number of obs = 364
    Time variable: year Number of groups = 28

    Moment conditions: linear = 26 Obs per group: min = 13
    nonlinear = 0 avg = 13
    total = 26 max = 13

    (Std. Err. adjusted for 28 clusters in iso_num)
    ------------------------------------------------------------------------------
    | DC-Robust
    Y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    X1 | .56047 .2827629 1.98 0.047 .0062648 1.114675
    X2 | .1636905 .0700395 2.34 0.019 .0264156 .3009653
    X3 | .0537871 .0110046 4.89 0.000 .0322184 .0753557
    _cons | 6.639132 1.125762 5.90 0.000 4.432678 8.845586
    ------------------------------------------------------------------------------
    Instruments corresponding to the linear moment conditions:
    1, model(diff):
    L1.X1 L2.X1 L3.X1 L4.X1 L5.X1 L6.X1 L7.X1 L8.X1
    2, model(diff):
    L1.X2 L2.X2 L3.X2 L4.X2 L5.X2 L6.X2 L7.X2 L1.X3 L2.X3 L3.X3 L4.X3 L5.X3
    L6.X3 L7.X3
    3, model(level):
    L1.D.X1
    4, model(level):
    D.X2 D.X3
    5, model(level):
    _cons

    . estat overid

    Sargan-Hansen test of the overidentifying restrictions
    H0: overidentifying restrictions are valid

    2-step moment functions, 2-step weighting matrix chi2(22) = 26.5779
    Prob > chi2 = 0.2277

    2-step moment functions, 3-step weighting matrix chi2(22) = 27.6004
    Prob > chi2 = 0.1893

    . estat serial

    Arellano-Bond test for autocorrelation of the first-differenced residuals
    H0: no autocorrelation of order 1: z = 2.1103 Prob > |z| = 0.0348
    H0: no autocorrelation of order 2: z = 0.3919 Prob > |z| = 0.6952

    ################################################## ####
    (2) Can I also estimate this static model with the Instrumental Variable GMM model using the following code:
    ivreg2 Y X2 X3 (X1 = l.X1 l.X2) ,gmm2s first robust


    2-Step GMM estimation
    ---------------------

    Estimates efficient for arbitrary heteroskedasticity
    Statistics robust to heteroskedasticity

    Number of obs = 308
    F( 3, 304) = 365.97
    Prob > F = 0.0000
    Total (centered) SS = 43.70746242 Centered R2 = 0.7327
    Total (uncentered) SS = 33773.57935 Uncentered R2 = 0.9997
    Residual SS = 11.6822856 Root MSE = .1948

    ------------------------------------------------------------------------------
    | Robust
    Y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    X1 | .3953722 .033303 11.87 0.000 .3300995 .4606449
    X2 | .378198 .027334 13.84 0.000 .3246244 .4317715
    X3 | .0524587 .0132118 3.97 0.000 .0265641 .0783534
    _cons | 4.559668 .2689765 16.95 0.000 4.032483 5.086852
    ------------------------------------------------------------------------------
    Underidentification test (Kleibergen-Paap rk LM statistic): 102.662
    Chi-sq(2) P-val = 0.0000
    ------------------------------------------------------------------------------
    Weak identification test (Cragg-Donald Wald F statistic): 1.1e+04
    (Kleibergen-Paap rk Wald F statistic): 9730.920
    Stock-Yogo weak ID test critical values: 10% maximal IV size 19.93
    15% maximal IV size 11.59
    20% maximal IV size 8.75
    25% maximal IV size 7.25
    Source: Stock-Yogo (2005). Reproduced by permission.
    NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
    ------------------------------------------------------------------------------
    Hansen J statistic (overidentification test of all instruments): 0.406
    Chi-sq(1) P-val = 0.5242
    ------------------------------------------------------------------------------
    Instrumented: X1
    Included instruments: X2 X3
    Excluded instruments: L.X1 L2.X1
    ------------------------------------------------------------------------------

    Leave a comment:


  • Sebastian Kripfganz
    replied
    It is update time again. This update is all about standard errors. As a new feature, you can now obtain doubly-corrected (DC) standard errors (Hwang, Kang, and Lee; 2022, Journal of Econometrics) as an improvement over the familiar Windmeijer-corrected (WC) standard errors. As these authors point out, the DC standard errors correct for an "overidentification bias" in the variance estimation on top of the WC finite-sample correction. These DC standard errors are also misspecification robust, in the sense that the variance-covariance matrix is consistently estimated even if the moment conditions are misspecified. (Obviously, the estimator for the coefficients is still inconsistent under such misspecification.)

    All you need to do for DC standard errors is specifying the option vce(robust, dc). For backward-compatibility reasons, by default, vce(robust) continues to compute WC standard errors. DC standard errors are available for one-step, two-step, and iterated GMM estimators. For the time being, they are implemented for models with linear moment conditions only. For models with nonlinear moment conditions, WC standard errors are calculated instead.

    In this update, I also improved the calculation of WC standard errors for the iterated GMM estimator, using a simplification of the variance formula exploiting convergence of the iterated GMM estimator. This leads to slightly different standard error estimates than in previous versions. (If the iterated GMM estimator did not converge, the previous iterative variance formula is still applied, analogously to two-step estimation.) I also fixed a small bug in the calculation of conventional two-step standard errors with nonlinear moment conditions.

    As a technical comment with little relevance for most users: While scores computed with the postestimation command predict factor in the Windmeijer correction (if specified), they do not account for the double correction because the respective influence functions are nonstandard. Consequently, generated score variables under vce(robust, wc) and vce(robust, dc) are the same.

    The following table provides an overview about the implications of different options on your standard errors:
    vce(conventional) vce(robust, wc) vce(robust, dc)
    onestep nolevel non-robust SEs robust SEs (sandwich formula) DC-robust SEs
    onestep generally invalid SEs robust SEs (sandwich formula) DC-robust SEs
    onestep nl() robust SEs (sandwich formula) WC-robust SEs WC-robust SEs (for now)
    twostep robust SEs WC-robust SEs DC-robust SEs
    twostep nl() robust SEs WC-robust SEs WC-robust SEs (for now)
    igmm robust SEs WC-robust SEs DC-robust SEs
    igmm nl() robust SEs WC-robust SEs WC-robust SEs (for now)
    cugmm robust SEs robust SEs robust SEs
    cugmm nl() robust SEs robust SEs robust SEs
    The SE labels in the xtdpdgmm regression output have been adjusted accordingly.

    You can update to the latest version 4.2.1 of xtdpdgmm (or install it for the first time) by typing the following in Stata's command window:
    Code:
    net install xtdpdgmm, from(http://www.kripfganz.de/stata) replace
    Disclaimer: I have extensively tested this new version and cross-checked the results with alternative software, where possible. However, due to the complexity of the command and the variety of options, I cannot guarantee that the implementation is error-free. Please let me know if you spot any irregularities.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    I had a quick look into the first article. I believe they used a two-step "level" GMM estimator, with standard instruments for the level model only. They did not seem to specify which instruments they actually used. In any case, I do not think there is anything special about it. With xtdpdgmm, you would simply specify appropriate instruments with the iv() option for model(level). Of course, finding appropriate instruments is the key task.

    Leave a comment:

Working...
X