Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Zainab Mariam
    replied
    Dear Professor Sebastian,

    Many thanks for your swift useful reply. Your cooperation and support are invaluable, Professor!

    1) Please, correct me if I am wrong! What I know is that the Difference GMM estimator has a problem with weak instruments if the time series is persistent (i.e., the time series is near unit root/random walk) and the dataset is short. Thus, the System GMM estimator should be applied instead. In other words, the System GMM estimator should be applied when there is no stationarity (i.e., we cannot rely on the findings of the Difference GMM estimator if the time series is not stationary, and thus we should apply the System GMM estimator). Am I wrong or right?

    2) Also, how can I check that the Difference GMM estimator has a problem with weak instruments?

    3) How to check if the time series is persistent (i.e., the time series is near unit root/random walk)?

    Your patience, cooperation and help are highly appreciated, Professor!

    Leave a comment:


  • Sebastian Kripfganz
    replied
    1) The system GMM estimator requires that the initial deviations from the long-run mean should not be systematically related to that long-run mean. In other words, groups which are initially further away from their steady state should not systematically experience higher growth rates. In practical terms, groups with large fixed effects should not systematically experience larger or smaller changes of the relevant variables over time. Mean stationarity ensures that this is not the case. In practice, it can however often be argued theoretically that this assumption is violated (even though too often in the empirical practice such theoretical concerns are ignored).

    2) To use these instruments for the lagged dependent variable, this should be satisfied for all variables. But you might otherwise still be able to selectively use instruments for some independent variables if the assumption is satisfied for them (as I have done in the model selection example in my presentation). Stationarity needs to be satisfied for the levels.

    3) N/A

    4) Separate stationarity tests are usually not applicable due to the short time horizon in typical data sets when we use these GMM methods. However, they are also needed. If there are no strong theoretical arguments already ruling out the mean stationarity assumption, we can simply use the Difference-in-Hansen test as a validity check.

    5.1) Yes. If the required assumption (mean stationarity) is violated, the respective instruments will be invalid. This is precisely what the Difference-in-Hansen test checks. The null hypothesis is that the respective instruments are valid (which requires mean stationarity to hold). The alternative hypothesis is that they are not valid (which could be due to a violation of the mean stationarity assumption or due to any other model misspecification).

    5.2) You could in principle check for each variable separately by only including the instruments for the respective variable in the level equation and leaving out all the instruments for all other variables in the level equation.

    6.1) Yes, please check the model selection section in my presentation.

    6.2) Again, please check the model selection section in my presentation.

    Leave a comment:


  • Zainab Mariam
    replied
    Dear Professor Sebastian,

    Thank you very much for your swift valuable reply. You are a magnificent supervisor. It is not a compliment; it is the truth. I am very grateful to you for all your support and effort, professor! Please, if I may follow up with your response!

    1) To apply the System GMM estimator, should the stationarity be satisfied? Or should the non-stationarity be satisfied to apply the System GMM estimator?

    And what is the reason/rationale behind this condition/moment that makes this condition/moment required to be satisfied to apply the System GMM estimator?

    2) If the stationarity should be satisfied to apply the System GMM estimator, do all variables have to be stationary? Or does the dependent variable y only have to be stationary? Or do both the dependent variable y and the main independent variable have to be stationary?

    Also, should the stationarity be satisfied for a variable at level or at difference?

    3) If the non-stationarity should be satisfied to apply the System GMM estimator, do all variables have to be non-stationary? Or does the dependent variable y only have to be non-stationary? Or do both the dependent variable y and the independent variable have to be non-stationary?

    Also, should the non-stationarity be satisfied for a variable at level or at difference?

    4) To apply the System GMM estimator, can the popular stationarity tests (such as the Augmented Dickey-Fuller test) be used to check for stationarity/non-stationarity? Can the popular stationarity tests (such as the Augmented Dickey-Fuller test) be used to check for the additional assumption/condition of Blundell and Bond?

    5) Regarding the Difference-in-Hansen test, I have the following questions, please!

    5.1) Can the Difference-in-Hansen test be used to check for stationarity/non-stationarity? If so, how can the Difference-in-Hansen test check for stationarity/non-stationarity? What is the null hypothesis of the Difference-in-Hansen test in terms of stationarity/non-stationarity? e.g., the null hypothesis of the popular tests for stationarity/non-stationarity is H0: All panels contain unit roots.

    5.2) The popular stationarity tests (such as the Augmented Dickey-Fuller test) check for stationarity/non-stationarity for each variable individually. Thus, does the Difference-in-Hansen test check for stationarity/non-stationarity for each variable separately/individually? If so, how? Or does the Difference-in-Hansen test check for stationarity/non-stationarity for all variables together? If so, how?

    6) For the classification of a variable whether it is exogenous, predetermined or endogenous, I have the following questions, please!

    6.1) Can the Difference-in-Hansen test be used to check for the classification of the variables included in the regression model whether the variable is endogenous, predetermined or exogenous? If so, how? And what is the null hypothesis of the Difference-in-Hansen test in terms of the classification of the variables?

    6.2) Does the Difference-in-Hansen test check for the classification of each variable individually? If so, how? Or does the Difference-in-Hansen test check for the classification of all variables together? If so, how?

    Your help, patience and cooperation are highly appreciated, Professor!

    Leave a comment:


  • Sebastian Kripfganz
    replied
    1.1) If you change the estimator, coefficients can turn from insignificant to significant or from significant to insignificant. If there was no possibility of such a change, then we would not need to think about different estimators in the first place. It could possibly be that some of the additional assumptions of the system GMM estimator are violated, consequently biasing the coefficient estimates.

    1.2) It is difficult to give a general answer to this question. If the tests are passed and there is no indication of potentially weak instruments, then the difference GMM estimator should be sufficient and more robust, as it does not rely on the additional assumptions for the system GMM estimator. The motivation for the system GMM estimator is typically that it may help overcome a potential weak-instruments problem of the difference GMM estimator.

    1.3) The system GMM estimator relies on additional assumptions (essentially mean stationarity of all variables). If those assumptions are violated, then the estimator is biased/inconsistent and thus likely worse than the difference GMM estimator. If those assumptions are satisfied, then the system GMM estimator uses stronger instruments and typically performs better than the difference GMM estimator.

    2) The Difference-in-Hansen test can help to assess the validity of the additional assumptions needed for the system GMM estimator. So, yes, it generally should be checked. A rejection of the difference-in-Hansen test comparing the difference and system GMM estimates tells us that those additional assumptions are likely violated (assuming that the difference GMM estimator used for the comparison is correctly specified in the first place).

    3) There are two asymptotically equivalent ways of computing the difference-in-Hansen test for this purpose. (i) You can compute the difference GMM and system GMM estimator separately and then contrast the two; see slide 49 of my 2019 London Stata Conference presentation. (ii) Alternatively, you can just compute the system GMM estimator and compute a difference-in-Hansen statistic directly from there; see the subsequent slides in my presentation.

    4) Possibly, yes. If you make stronger assumptions (say, strictly exogenous regressor), this can bias the coefficient estimates if this assumption is incorrect; if the bias is towards zero, it can mean that the estimate turns insignificant. If you make weaker assumptions (say, endogenous regressor), the respective instruments become weaker, which generally increases the standard errors and thus can again lead to less significant results. As you can see, there is no automatism that guarantees you significant results. Statistical significance of coefficients should not be used to choose a particular estimator or model specification; if we adjust our estimator until we get the desired result, then why to do a statistical analysis in the first place?

    5) Possibly, yes. If there are (too) many instruments, this can lead to overfitting of the endogenous regressors and difficulties with estimating the optimal weighting matrix. One of the consequences could be unreliable results from the Hansen test. Again, there is no automatism, although if you are using an extremely large number of instruments, it is likely that the p-value of the Hansen test might be biased towards 1.

    6.1) The ARDL panel data model is simply a model in which you include a lagged dependent variable (or possibly several lags of it) and possibly lags of the independent variables. Model selection criteria could be used to choose between model specifications; see the section on model selection in my presentation. The reason for using lags of the variables is to obtain a dynamically complete model. In dynamically incomplete models, some of the dynamic adjustment processes are left unexplained, which could lead to serial correlation in the error term (which in turn might invalidate some of the instruments). Thus, if there is evidence of residual serial correlation by the Arellano-Bond test, adding lags of the variables could be a promising approach for dealing with that.

    6.2) There is no general answer to this question. It depends on whether your theory suggests that there might be delayed effects of the independent variables on your dependent variables. And you can use the empirical approach suggested in 6.1).

    7.1) You do not necessarily have to exclude any firms. If there are insufficient observations for a firm, the command will not use that firm automatically. However, sometimes the performance of the estimator might be better if the panel is not heavily unbalanced. Also, ask yourself whether those firms with few observations are systematically different from your other firms. If they are, then maybe they are not representative for your target population of firms, and you may want to exclude them.

    7.2) You cannot selectively use some variables for some firms. If data is missing for some variables and firms, the whole observation - i.e., all variables - won't be used.

    7.3) You can use the if condition in the command syntax.

    Leave a comment:


  • Zainab Mariam
    replied
    Dear Professor Sebastian,

    Many thanks for your beneficial response. Professor, you really deserve to mention your name as a supervisor. Your help is much more than my supervisors’. Your support is a main contributor to the empirical part of my thesis. I still have the following questions, please! Sorry!

    1) The coefficient of the independent variable is significant if the Difference GMM estimator is applied using your command xtdpdgmm, while the coefficient of the independent variable became insignificant when the System GMM estimator is applied. Thus, I have the following questions, please!

    1.1) Is there any justification for that? i.e., what is the reason behind having an insignificant coefficient of the independent variable when the System GMM estimator is applied, while that coefficient is significant when the Difference GMM estimator is applied?

    1.2) Is it sufficient to apply the Difference GMM estimator and rely on the Difference GMM findings? Given that the tests corresponding to the Difference GMM estimator (i.e., the tests of serial correlation and overidentification) passed.

    1.3) Is the System GMM estimator superior to the Difference GMM estimator even if the serial correlation and overidentification tests, corresponding to the Difference GMM estimator, passed? i.e., is applying the System GMM estimator better than the Difference GMM estimator (does the System GMM estimator outperform the Difference GMM estimator) even if the tests corresponding to the Difference GMM estimator (i.e., the tests of serial correlation and overidentification) passed?

    2) When the System GMM estimator is applied using your command xtdpdgmm, is there any need to apply the Difference-in-Hansen test after running the regression of the System GMM estimator? If so, why? Also, how do I read the findings of the Difference-in-Hansen test corresponding to the System GMM estimator (what is the interpretation of the outcomes of the Difference-in-Hansen test which is applied after running the System GMM estimator regression)?

    3) To apply the System-GMM estimator using your command xtdpdgmm, do I have first to apply the Difference GMM estimator i.e., do I have to apply the Difference GMM estimator before applying the System GMM estimator? Or can I apply the System GMM estimator directly without any need to apply the Difference GMM estimator? Is there an order of steps to apply the System GMM estimator?

    4) Does the classification of a variable whether it is exogenous, predetermined, or endogenous affect the significance of the variable’s coefficient? i.e., is there any relation between the significance of the variable’s coefficient and the classification of that variable whether it is exogenous, predetermined, or endogenous?

    5) Does the number of instruments affect negatively the findings of tests e.g., Hansen test findings? Is there any relation between the findings of the Hansen test and increasing the number of instruments? When increasing the number of instruments, does that increase the probability of the test not passing?

    6) Regarding the Autoregressive Distributed Lag (ARDL) panel data model mentioned on slide 9 of your 2019 London Stata Conference presentation, I have the following questions, please!

    6.1) Why to apply the ARDL panel data model? And how to apply the ARDL panel data model?

    6.2) Do I have to apply ARDL for the dependent variable y only, or for only both the dependent variable y and the independent variable, or individually for each variable included in the regression model, or for all variables together in the same code?

    7) To apply GMM estimation (the Difference GMM estimator and the System GMM estimator), I have the following questions, please!

    7.1) Do I have to exclude all firms with less than 5 consecutive years of data? Or do I have to keep only the firms that have at least 3 continuous time series observations during the research time period?

    7.2) Suppose that I have to keep only the firms that have at least 5 consecutive years of data, then, do I have to exclude those firms for each variable (i.e., for all variables) included in the regression model? Or do I have to exclude those firms for only the dependent variable y and the main independent variable?

    7.3) Is there a function/command/expression in Stata to perform that exclusion of firms to apply GMM estimation?

    Your patience, support and effort are highly appreciated, Professor!

    Leave a comment:


  • Sebastian Kripfganz
    replied
    1.1) That's a system GMM estimator.
    1.2) N/A
    1.3) I believe, incremental overidentification tests may not have supported the inclusion of these instruments for the level model.

    2.1) Strictly speaking, it is a system estimator, although it only uses the instruments for the dummy variables in the level model. This example demonstrates why the terms "difference GMM" and "system GMM" can be misleading. Different people might have different things in mind when using these terms. Additionally, with the nl(noserial) option, we are not looking at the traditional difference/system GMM estimator anymore anyway.
    2.2) Instruments for such dummy variables achieve the strongest correlation with the instrumented dummies if the instruments are specified for the level model. Those dummies are typically assumed to be uncorrelated with the idiosyncratic and the group-specific error components, which is often innocuous given that those coefficients are not assigned a structural interpretation. Often, additional instruments for them in the first-differenced model are then redundant.
    2.3) The additional assumptions for the validity of those instruments in the level model may not be satisfied.

    3.1) Same answer as in 2.1).
    3.2) Same answer as in 2.2).
    3.3) Same answer as in 2.3).

    4.1) Here, it is just a difference GMM estimator.
    4.2) You can also use the level time dummies; it should not matter. Generally, in the first-differenced model the correlation between first-differenced regressors and instruments is maximized if the latter are differenced as well.

    5) This is an example of what should not be done. In a balanced panel model, the instruments for the time dummies in the first-differenced model are redundant once the respective instruments are specified for the level model.

    6.1) That is a system GMM estimator.
    6.2) The option model(diff) indicates that the variables are specified for the differenced model. In the absence of another transformation option, no transformation is applied (i.e., level instruments for differenced model).

    7) The two codes are equivalent.

    8) The two codes are equivalent.

    Leave a comment:


  • Zainab Mariam
    replied
    Dear Professor Sebastian,

    Many thanks for your crystal clear answers. I do not know how to thank you, Professor! Indeed, saying “thank you very much” is not enough. Your cooperation and support are priceless. You are an invaluable source of information.
    Please, if I may follow up with your response!

    1) Your code on slide 118 of your 2019 London Stata Conference presentation is: xtdpdgmm L(0/2).n L(0/2).w k L(0/3).ys c.w#c.w c.w#c.k, model(fod) collapse gmm(n, lag(1 .)) gmm(w, lag(0 .)) gmm(k, lag(0 .)) gmm(ys, lag(1 .)) gmm(c.w#c.w, lag(0 .)) gmm(c.w#c.k, lag(0 .)) gmm(k, lag(0 0) model(md)) gmm(w k, lag(0 0) diff model(level)) teffects two vce(r) overid

    I have the following questions, please!

    1.1) Are you applying the Difference GMM estimator or the System GMM estimator?

    1.2) If the System GMM estimator is not applied, why to instrument the variables w and k in the level model?

    1.3) If the System GMM estimator is applied, why not instrumenting the dependent variable n, the variables ys, w squared, and the interaction between w and k in the level model?

    2) Your code on slide 86 of your 2019 London Stata Conference presentation is: xtdpdgmm L(0/1).n w k i.ind, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) iv(i.ind, model(level)) nl(noserial) teffects igmm vce(r)

    2.1) Are you applying the Difference GMM estimator or the System GMM estimator?

    2.2) If the Difference GMM estimator is applied, why to instrument the industry dummies ‘ind’ in the level model? And why to use the level instruments for the industry dummies in the level model?

    2.3) If the System GMM estimator is applied, why not instrumenting the dependent variable n, the variables w and k in the level model? And why to use the level instruments for the industry dummies in the level model i.e., why not using the differenced instruments for the industry dummies in the level model?

    3) Your code on slide 78 of your 2019 London Stata Conference presentation is: quietly xtdpdgmm L(0/1).n w k yr1978-yr1984, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) iv(yr1978-yr1984, model(level)) two vce(r)

    3.1) Are you applying the Difference GMM estimator or the System GMM estimator?

    3.2) If the Difference GMM estimator is applied, why to instrument the year dummies in the level model? And why to use the level instruments for the year dummies in the level model?

    3.3) If the System GMM estimator is applied, why not instrumenting the dependent variable n, the variables w and k in the level model? And why to use the level instruments for the year dummies in the level model?

    4) Your code on slide 75 of your 2019 London Stata Conference presentation is: xtdpdgmm L(0/1).n w k yr1980-yr1982, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) iv(yr1980-yr1982, diff) two vce(r)

    4.1) Are you applying the Difference GMM estimator or the System GMM estimator?

    4.2) If the Difference GMM estimator is applied, why to use the differenced instruments for the year dummies in the differenced model i.e., why not using the level instruments for the year dummies in the differenced model? What I know regarding applying the Difference GMM estimator is to use the level instruments in the differenced model.

    5) Your code on slide 77 of your 2019 London Stata Conference presentation is: xtdpdgmm L(0/1).n w k yr1980-yr1982, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) iv(yr1980-yr1982, diff) iv(yr1980-yr1982, model(level)) two vce(r)

    5.1) Are you applying the Difference GMM estimator or the System GMM estimator?

    5.2) If the Difference GMM estimator is applied, why to instrument the year dummies in the level model? And why not using the level instruments for the year dummies in the differenced model?

    5.3) If the System GMM estimator is applied, why to instrument the year dummies in the differenced model? And why not using the differenced instruments for the year dummies in the level model?

    5.4) If the System GMM estimator is applied, why not instrumenting the dependent variable n, the variables w and k in the level model?

    6) Your code on slide 38 of your 2019 London Stata Conference presentation is: xtdpdgmm L(0/1).n w k, collapse gmm(n, lag(2 4) model(diff)) gmm(w k, lag(1 3) model(diff)) gmm(n, lag(1 1) diff) gmm(w k, lag(0 0) diff) two vce(r)

    6.1) Are you applying the Difference GMM estimator or the System GMM estimator?

    6.2) When you write ‘gmm(n, lag(2 4) model(diff)) gmm(w k, lag(1 3) model(diff))’, are you instrumenting the variables n, w, k in the differenced model or in the level model i.e., are you instrumenting the differenced model or the level model? Also, are you using the differenced instruments or the level instruments for the variables n, w, k?

    7) Are the following codes equivalent or different?

    xtdpdgmm L(0/1).n w k, collapse gmm(n, lag(2 4) model(diff)) gmm(w k, lag(1 3) model(diff)) two vce(r)

    xtdpdgmm L(0/1).n w k, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) two vce(r)

    8) Are the following codes equivalent or different?

    xtdpdgmm L(0/1).n w k, collapse gmm(n, lag(2 4) model(diff)) gmm(w k, lag(1 3) model(diff)) gmm(n, lag(1 1) diff) gmm(w k, lag(0 0) diff) two vce(r)

    xtdpdgmm L(0/1).n w k, collapse gmm(n, lag(2 4) model(diff)) gmm(w k, lag(1 3) model(diff)) gmm(n, lag(1 1) diff model(level)) gmm(w k, lag(0 0) diff model(level)) two vce(r)

    I am very grateful to you for all your support and effort, professor!

    Leave a comment:


  • Sebastian Kripfganz
    replied
    1-3) The model on slide 81 accounts for year effects. Some of those year effects are statistically significant. You could test for their joint statistical significance with the command
    Code:
    testparm i.year
    So, given the statistical significance, you could argue that there are year effects. But as I mentioned before, some authors simply say that they include year effects whenever they include those dummy variables, whether they are significant or not.
    On slide 122, the year effects are not statistically significant. You could probably remove them from the regression model if you want to estimate a more parsimonious model.

    4.1) The suboption model(level) implies that the industry dummies are instrumented in the level model. This suboption takes precedence over the global option model(diff).

    4.2) Because there are no other suboptions specified, the default applies, which means that level instruments are used for the industry dummies.

    5) Same as 4).

    6.1) Here, no suboption for the model transformation is specified; therefore, the default applies, which is set by the global option model(diff). Thus, the instruments here are specified for the first-differenced model.

    6.2) The diff suboption implies that first-differences of the time dummies are used as instruments.

    7.1) Here, the time dummies are instrumented in both models.

    7.2) Differenced instruments are used for the first-differenced model - as in 7) - and level instruments are used for the level model - as in 5).

    8.1) Here, no suboption for the model transformation is specified; therefore, the default applies, which in this case is the level model because no global option for a different model transformation was specified.

    8.2) Same as 6.2).

    Leave a comment:


  • Zainab Mariam
    replied
    Dear Professor Sebastian,

    Thank you very much for your valuable reply. I am very grateful to you for all your support and effort, professor! Please, if I may follow up with your response!

    1) According to the table on slide 122 of your 2019 London Stata Conference presentation, can we say that there is ‘Year effect’ or can we say that there is no ‘Year effect’? Also, according to the table on slide 81 of your 2019 London Stata Conference presentation, can we say that there is ‘Year effect’ or can we say that there is no ‘Year effect’?

    2) According to the table on slide 122 of your 2019 London Stata Conference presentation, how to interpret/comment on the findings of ‘year’?

    3) According to the table on slide 81 of your 2019 London Stata Conference presentation, how to comment/ interpret the findings of ‘year’?

    4) Your code on slide 86 of your 2019 London Stata Conference presentation is: xtdpdgmm L(0/1).n w k i.ind, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) iv(i.ind, model(level)) nl(noserial) teffects igmm vce(r)

    When you write ‘iv(i.ind, model(level))’, I have the following questions, please!

    4.1) Are you instrumenting the industry dummies in the differenced model or in the level model?

    4.2) Are you using the differenced instruments or the level instruments for the industry dummies?

    5) Your code on slide 75 of your 2019 London Stata Conference presentation is: xtdpdgmm L(0/1).n w k yr1980-yr1982, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) iv(yr1980-yr1982, model(level)) two vce(r)

    When you write ‘iv(yr1980-yr1982, model(level))’, I have the following questions:

    5.1) Are you instrumenting the year dummies in the differenced model or in the level model?

    5.2) Are you using the differenced instruments or the level instruments for the year dummies?

    6) Your code on slide 75 of your 2019 London Stata Conference presentation is: xtdpdgmm L(0/1).n w k yr1980-yr1982, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) iv(yr1980-yr1982, diff) two vce(r)

    When you write ‘iv(yr1980-yr1982, diff)’, I have the following questions:

    6.1) Are you instrumenting the year dummies in the differenced model or in the level model?

    6.2) Are you using the differenced instruments or the level instruments for the year dummies?

    7) Your code on slide 77 of your 2019 London Stata Conference presentation is: xtdpdgmm L(0/1).n w k yr1980-yr1982, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) iv(yr1980-yr1982, diff) iv(yr1980-yr1982, model(level)) two vce(r)

    When you write ‘iv(yr1980-yr1982, diff) iv(yr1980-yr1982, model(level))’, I have the following questions, please!

    7.1) Are you instrumenting the year dummies in the differenced model or in the level model or in both models?

    7.2) Are you using the differenced instruments or the level instruments or both instruments for the year dummies?

    8) Your code on slide 38 of your 2019 London Stata Conference presentation is: xtdpdgmm L(0/1).n w k, collapse gmm(n, lag(2 4) model(diff)) gmm(w k, lag(1 3) model(diff)) gmm(n, lag(1 1) diff) gmm(w k, lag(0 0) diff) two vce(r)

    When you write ‘gmm(n, lag(1 1) diff) gmm(w k, lag(0 0) diff)’, I have the following questions, please!

    8.1) Are you instrumenting the variables n, w, k in the differenced model or in the level model i.e., are you instrumenting the differenced model or the level model?

    8.2) Are you using the differenced instruments or the level instruments for the variables n, w, k?

    Your patience, support and effort are highly appreciated, Professor!

    Leave a comment:


  • Sebastian Kripfganz
    replied
    1.1) We need to be careful here what type of "exogeneity" we have in mind. In the dynamic panel data literature, exogeneity typically refers to the stochastic relationship between the respective variables and the idiosyncratic error component. Thus, we typically call a variable "strictly exogenous" if it is uncorrelated with the idiosyncratic error component for all time periods, even though it might be correlated with the unobserved group-specific error component (aka "fixed effects"). Strictly speaking, the latter correlation still turns those variables endogenous in the classical sense. Now, when it comes to time-invariant regressors, they may or may not be correlated with either of the error components, although typically we would assume it to be uncorrelated with the idiosyncratic time-varying error component. In this regard, time-invariant regressors would be strictly exogenous in the dynamic panel data sense, but this is not of much help because we cannot use the typical instruments (lagged differences for levels or lagged levels for differences), because the differences of time-invariant regressors vanish.

    1.2) The question whether a dummy variable is exogenous or not is no different to the same question for any other regressor. It may be exogenous, predetermined, or endogenous. It may be correlated with the group-specific effects or not.

    1.3) Dummy variables are often treated as exogenous, but this should not be an automatism. Whether you can treat a dummy variable as uncorrelated with the group-specific effects typically depends on what unobserved characteristics you think those group-specific effects represent. Considering time dummies, there is usually no reason not to treat them as exogenous; but we would not give them any structural interpretation anyway.

    2.1) Without the factor-variable prefix i., you would include a linear time trend instead of separate time dummies for every year. This would be fine if there is such a linear trend in the time effects indeed.

    2.2) If you use i. for the regressors, you should also use i. for the instruments.

    2.3) Time dummies are usually treated as exogenous.

    3.1) For a binary dummy variable which takes only values 1 or 0, the i. prefix is optional. The results will be the same with or without the prefix.

    3.2) See above.

    3.3) This depends on what you think the unobserved group-specific error component represents and whether you want to give the country dummy a structural interpretation. If these should be a Japan-specific effect conditional on some other unobserved time-invariant characteristic which differs systematically across countries, then you need to find an alternative instrument which also differs systematically across countries but is uncorrelated with the unobserved characteristic you want to hold fixed. Normally, you would not care too much about such a structural interpretation, and then can just treat the country dummy as exogenous.

    4) You don't normally have to include lags of those dummies. Normally, those lags would be dropped because of collinearity anyway.

    5) Same as in 4).

    6.1) Time dummies are often included to account for global shocks which affect all firms simultaneously. If a global shock affects both the dependent and the independent variables, then omitting the time dummies could lead to spuriously significant coefficient estimates.

    6.2) You would include i.cf in the list of independent variables, together with option iv(i.cf).

    6.3) Including both cf and i.cf leads to a problem of perfect collinearity. There is no need (and usually no reason) to include cf once you included i.cf (or teffects).

    7.1) You need to include the "yes" or "no" manually in the tables of your research paper. The command is not producing anything like that. If you have included time dummies (i.cf or option teffects), you can write "yes"; similarly for country dummies. People still write "yes" even if those dummies are not statistically significant. It is usually just an indication that those dummies are included in the model.

    7.2) Whether there are time effects or country effects could be assessed by checking their (joint) statistical significance; but again, the "yes"/"no" in 7.1) is typically not based on such a test.

    Leave a comment:


  • Zainab Mariam
    replied
    Dear Professor Sebastian,

    Many thanks for all your prior help, and sorry for coming back!

    1) Regarding dummy variables, I have the following questions, please!

    1.1) Can time-invariant dummies be classified as exogenous?

    1.2) Can time-variant dummies be classified as exogenous?

    1.3) Can all dummies be classified as exogenous?

    2) My regression model includes the dummy variable (cf) that takes the value of One for the 3 years 2008, 2009, 2010. Thus, my questions are:

    2.1) Do I have to put ‘i.’ before the dummy variable (cf)? i.e., do I have to put ‘i.cf’ in the regression?

    2.2) Also, do I have to put ‘i.’ before the dummy variable (cf) when instrumenting the dummy variable (cf) i.e., ‘iv(i.cf, …)’?

    2.3) Can I consider the dummy variable (cf) as exogenous or as endogenous?

    3) Suppose I have two countries (Japan and UK), thus, my regression model includes the dummy variable (mn) that takes the value of One if the firm is in Japan, and Zero otherwise. Thus, my questions are:

    3.1) Do I have to put ‘i.’ before the dummy variable (mn)? i.e., do I have to put ‘i.mn’ in the regression?

    3.2) Also, do I have to put ‘i.’ before the dummy variable (mn) when instrumenting the dummy variable (mn) i.e., ‘iv(i.mn, …)’?

    3.3) Can I consider the dummy variable (mn) as exogenous or as endogenous?

    4) For the dummy variable cf (that takes the value of 1 for the 3 years 2008, 2009, 2010) and regarding the dummy variable mn {that takes the value of One if the firm is in Japan, and Zero otherwise, given I have two countries (Japan and UK)}, my question is: do I have to include lags in the iv() option for dummies for (cf) and (mn) when instrumenting these dummies?

    5) Do I have to include lags in the iv() option for dummy variables when instrumenting the dummies?

    6) Regarding time dummies, I have the following questions, please!

    6.1) Why to include the time dummies in the regression model? i.e., what is the rationale behind including the time dummies in the regression model?

    6.2) If I am not using the teffects option, then how do I have to include the time dummies explicitly in my regression model? i.e., how do I have to express/type the time dummies explicitly in my regression model? Suppose the research’s time period is 2000-2020.

    6.3) My regression model includes the dummy variable (cf) that takes the value of One for the 3 years 2008, 2009, 2010. Thus, my question is: Is it correct to include both the dummy variable (cf) and the time dummies in the same regression model (in the same code)? If so, how do I have to express/type both the dummy variable (cf) and the time dummies using the teffects option and without using the teffects option?

    7) Tables of research and articles show ‘Year Effects’ and ‘Country Effects’. Regarding the ‘Year Effects’ and the ‘Country Effects’, those tables show ‘Yes’, and sometimes show ‘No’. Thus, my questions are:

    7.1) How can I get ‘Yes’ ‘No’ from my regression using your command xtdpdgmm? i.e., what do I have to apply/perform in order to obtain/get ‘Yes’ ‘No’ regarding the ‘Year Effects’ and the ‘Country Effects’? Is there any option/expression I have to include in the regression model to get ‘Yes’ ‘No’ regarding the ‘Year Effects’ and the ‘Country Effects’? Is there any option/test/expression I have to apply/perform to get ‘Yes’ ‘No’ regarding the ‘Year Effects’ and the ‘Country Effects’?

    7.2) How to know/decide whether there is ‘Year Effects’ or there is no ‘Year Effects’? Also, how to know/decide whether there is ‘Country Effects’ or there is no ‘Country Effects’?

    Your patience, support and effort are highly appreciated, Professor!

    Leave a comment:


  • Sebastian Kripfganz
    replied
    I am sorry for the confusion. Yes, you are absolutely right. I have edited my previous post.

    Leave a comment:


  • Joseph L. Staats
    replied
    Thanks again. I'm a bit confused about your comments concerning my underidentification results. Don't I want low p-value results? That's what slides 111 and 114 of your 2019 London Stata Conference presentation seem to suggest. Is it possible you thought I had asked about overidentification?

    Leave a comment:


  • Sebastian Kripfganz
    replied
    If there are differences in the direction of the effects, then those dummy variables have a place in your model. If you change the model in this or any other way, it is not surprising that the effects of other variables can change as well, especially if this other variable is strongly correlated with the dummy variables. This could go in any direction. If you believe in the model with dummy variables, then the smalle reffect of your main variable is not a concern but a feature of this model.

    For the underidentification tests, a p-value of 0.3 might indeed be worrying. Using lags of dummy variables as instruments can often lead to weak instrument. I am afraid, I don't have a general solution for this problem.
    Last edited by Sebastian Kripfganz; 02 Dec 2022, 09:34. Reason: Incorrect statement about underidentification tests amended

    Leave a comment:


  • Joseph L. Staats
    replied
    Thanks so much. I have a couple of follow-up questions. When using these positive/negative dummy variables, I note that the coefficient of my main independent variable of interest for the project drops a lot and is no longer statistically significant. Is that something I should be concerned with, or is it just a product of including dummy variables that don't really belong in the model except for the specific purpose of testing whether the negative direction of the bond rating change is stronger than the positive direction? Also, when I include the dummy variables in my models, the overidentification test results are fine for both rating companies I am looking at, but underidentification for one company is about p=.09 and for the other company about .300. How concerned about underidentification at these levels should I be?

    Leave a comment:

Working...
X