Dynamic panel-data, two step system GMM, instrumental variables

John Stavrakiev

Join Date: Apr 2020

Posts: 2
#1

Dynamic panel-data, two step system GMM, instrumental variables

14 Apr 2020, 14:24

Dear all,

I am trying to answer a question regarding capital structure of companies. Since my data is dynamic panel, I use the proposed two-step system GMM as used in similar papers. However, my pc runs into problems when i try to estimate the following equation:

xtabond2 LeverageBV L.LeverageBV L.ROA L.Size L.Tangibility L.MTB CrisisGlobal L.GovD M3 t* j* i*, gmm(L.LeverageBV) iv(L.ROA L.Size L.Tangibility L.MTB t* j* i*, equation(level)) nodiffsargan twostep robust

and runs perfectly when I remove the i*. My questions to you is how can I incorporate a way to account for the firm-fixed effects by having the i* without my pc crashing or my friend's pc loading for 10 hours without even providing the results. i* is a dummy created for each observation so it ranges from i1-i3970.

Thanks in advance.
Tags: None
Sebastian Kripfganz

Join Date: May 2014

Posts: 2593
#2

15 Apr 2020, 11:49

Why would you want to include a dummy variable for each observation? Such a model is generally not estimable.

Some potentially useful general information on dynamic panel data GMM estimation in Stata, although not specific to your question:
Kripfganz, S. (2019). Generalized method of moments estimation of linear dynamic panel data models. Proceedings of the 2019 London Stata Conference.

XTDPDGMM: new Stata command for GMM estimation of linear (dynamic) panel data models

https://www.kripfganz.de/stata/
Comment
John Stavrakiev

Join Date: Apr 2020

Posts: 2
#3

15 Apr 2020, 16:00

Dear Sebastian Kripfganz,

I am new to this way of estimation in general and I still dont understand the syntax. What I found in similar papers is that they do include firm fixed effects, which I assume they do by using the dummy variable for each observation. After running for 8 hours straight, it did produce results, but all the coefficients of interest dropped due to collinearity. I believe the reason was that instruments were around 4200~ while groups were 3790. Would collapse help then or still it doesnt make sense ?

Maybe you could also help me out a little with the whole N and T notion and instruments. From what I understand the function should look like xtabond2 Depvar Indvar gmm(endogeneous variables) iv(exogeneous variables) and always have the same amount of variables in the xtabond2 as in the gmm and iv brackets, otherwise regressors are more than instruments and it results in an error. Also maybe I should mention, my T is 20 years (1999-2019) and 28 countries. Overall this is the regression of interest and after being regressed, retrieve the predicted values:

Any help is appreciated as Im entirely new to this syntax and way of regressing. And thanks a lot for the two articles, ive been through the help manual and 2009 article couple of times but econometrics isnt my strongest trait.

Kindest regards,
John
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2593
#4

17 Apr 2020, 04:20

Group-fixed effects (i.e. firm-fixed effects in your case) are not directly added in the form of dummy variables when estimating such dynamic panel data models. Instead, they remain part of the unobserved error term and instrumental variables are chosen in a way that they are uncorrelated with these group-fixed effects. Thus, do not explicitly include the i* variables.

The exogenous variables that you put into the iv() option for the level model must be assumed to be uncorrelated with those group-fixed effects, not just with the idiosyncratic error component.

In principle, there can be more or fewer variables specified in gmm() and iv() as long as the total number of instruments is at least as large as the number of coefficients to be estimated.

https://www.kripfganz.de/stata/
Comment
Zeenat Murtaza

Join Date: Aug 2021

Posts: 44
#5

03 Sep 2022, 06:10

Hi Sebastian,

Hope you are doing well. Sebastian, I shall be grateful if you can please comment on the below system GMM command (N=1880; T=25) which includes lag of Y as dependent variable. MTB is a ratio of financial variables while S is difference between time t and t-1. Both these variables are established as predetermined ones according to literature while the rest of financial variables are endogenous. The reason for including CHC in predetermined variables bracket is that this variable revealed a strong correlation with STI so because of that as you mentioned in your lecture and slides i added it to predetermined variables. i.FC is a dummy variable which refers to firm type. Actually, the sample is divided into 2 categories: founder firms and non-founder firms so 1 for founder firms and 0 for non-founder firms. Main focus of study is founder firms so I also added it to command below:

xtdpdgmm L(0/1).Y Y^2 MTB S CFW CFW_L1 STI_ta STI_L1 DEI_ta DEI_L1 CHC_ta CHC_L1 i.FC, model(diff) collapse gmm(L.Y CFW_ta STI_ta DBI_ta, lag(2 6)) gmm(MTB S CHC_ta, lag( 1 3)) gmm(L.Y CFW_ta STI_ta DBI_ta, lag(1 2) diff model(level)) gmm(MTB S CHC_ta, lag(0 1) diff model(level)) iv(i.FC, model (level)) teffects two vce(r)

Diagnostic testing Results are as follows:

Arellano-Bond test for autocorrelation of the first-differenced residuals
H0: no autocorrelation of order 1 z = -4.0922 Prob > |z| = 0.0000
H0: no autocorrelation of order 2 z = 0.0646 Prob > |z| = 0.9485
H0: no autocorrelation of order 3 z = -0.9219 Prob > |z| = 0.3566

Sargan-Hansen test of the overidentifying restrictions
H0: overidentifying restrictions are valid

2-step moment functions, 2-step weighting matrix chi2(31) = 28.3170
Prob > chi2 = 0.6048

2-step moment functions, 3-step weighting matrix chi2(31) = 39.4525
Prob > chi2 = 0.1419

But, underidentification tests are not satisfactory as test states to observe some collinearities in variables

Also, I have to test how the variable CHC in only founder firms affect the dependent variable. So is this command right where I included an interaction term between two variables:

xtdpdgmm L(0/1).Y Y2 MTB S CFW CFW_L1 STI_ta STI_L1 DEI_ta DEI_L1 CHC_ta CHC_L1 i.FC c.CHC_L1#1.FC, model(diff) collapse gmm(L.Y CFW_ta STI_ta DBI_ta, lag(2 5)) gmm(MTB S CHC_ta, lag(1 2)) gmm(c.CHC_L1#1.FC, lag(0 2)) gmm(L.Y CFW_ta STI_ta DBI_ta, lag(1 2) diff model(level)) gmm(MTB S CHC_ta, lag(0 0) diff model(level)) gmm(c.CHC_L1#1.FC, lag(0 0) diff model(level)) iv(i.FC, model (level)) teffects two vce(r)

Non of the time series effects are significant in either of the commands results. Also, this interaction term results in insignificant results for some of the variables which contradict theoretical findings. AR & Hansen results are as follows. Hansen test results are a bit worrying.

Arellano-Bond test for autocorrelation of the first-differenced residuals
H0: no autocorrelation of order 1 z = -2.7039 Prob > |z| = 0.0069
H0: no autocorrelation of order 2 z = -0.6679 Prob > |z| = 0.5042
H0: no autocorrelation of order 3 z = -0.1927 Prob > |z| = 0.8472

Sargan-Hansen test of the overidentifying restrictions
H0: overidentifying restrictions are valid

2-step moment functions, 2-step weighting matrix chi2(24) = 26.9837
Prob > chi2 = 0.3052

2-step moment functions, 3-step weighting matrix chi2(24) = 36.8710
Prob > chi2 = 0.0451

Underid test is again not satisfactory. so overid and underid results for interaction terms revelaed both model misspecification and weak instrument problem. Can you please suggest on how to improve these results? Also, CHC has increased over time so can i run DID within system GMM to test CHC increase impact across founder firms on the dependent variable and how does this effect before and after increase?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2593
#6

05 Sep 2022, 09:20

My time is quite limited these days, so I will only be able to give a very short answer. Apologies.

In your specifications, you treat the lagged dependent variable as endogenous. Usually, it is considered predetermined. If you change the command accordingly, this might possibly help with the underidentification problem.

You lag selection for the instruments look a bit arbitrary. Why do you use up to the 6th lag for endogenous but only up to the 3rd lag for predetermined variables?

For the level model, it is uncommon to include more than just a single lag as instrument.

Normally, it is good practice to include time dummies, e.g. with option teffects.

https://www.kripfganz.de/stata/
Comment

Announcement

Dynamic panel-data, two step system GMM, instrumental variables

Comment

Comment

Comment

Comment

Comment