testing whether to include lagged dependent variable, xtabond2 with dummy variables

Sam Murgatroyd

Join Date: Oct 2023

Posts: 33
#1

testing whether to include lagged dependent variable, xtabond2 with dummy variables

25 Oct 2023, 20:18

Hello!

I have country-level panel data (N=40, T=5) on cigarette smoking prevalence, per capita cigarette consumption, and a set of dependent variables that include both binary indicators, and continuous variables. The coefficients on the dummy variables are my key variables of interest - the type of cigarette tax structure that a country implements has been graded by other researchers on a scale of 1 to 5, where a score of 1 means that country has a weak tax structure, and a score of five represents that they have the most desirable tax structure. I want to use these scores, which are ultimately proxies for different types of tax structures, to answer two questions:

1. Do countries with simpler excise tax structures (i.e, a higher score) have lower smoking prevalence and lower cigarette consumption than countries adopting more complex excise tax structures (a lower score)?

2. What is the impact of a change in excise tax structure (reflected by a change in the tax structure score) on cigarette smoking prevalence and cigarette consumption?

To answer question 1, I have run the following regression for cigarette smoking prevalence, using a fractional logit regression:

Smoking_prevalence_jt= β₀ + β₁TSS2_jt + β₂TSS3_jt + β₃TSS4_jt + β₄TSS5_jt + β₅X_jt + δ_t + u_it (eq1)

where TSS2 is a dummy variable =1 if a country's tax structure scored a 2, TSS3 is a dummy variable =1 if a country's tax structure scored a 3 etc. A score of 1 (TSS1) is the omitted category. Xjt is a vector of country-level demographic/macroeconomic - all variables in X_jt are continuous. δ_t represents year dummies.

For per capita cigarette consumption, I used OLS to estimate an identical equation, with per capita cigarette consumption as the dependent variable, as shown in eq2.

Per_capita_cig_consumption_jt= β₀ + β₁TSS2_jt + β₂TSS3_jt + β₃TSS4_jt + β₄TSS5_jt + β₅X_jt + δ_t + u_it (eq2)

In both estimations (eq1 and eq2), I clustered standard errors at the country-level.

To answer question 2 - what is the impact of a change in excise tax structure on cigarette smoking prevalence and cigarette consumption - I changed the specifications shown in equations 1 and 2 by adding country-fixed effects, so that I estimate the models shown in equations 3 and 4.

Smoking_prevalence_jt= β₀ + β₁TSS2_jt + β₂TSS3_jt + β₃TSS4_jt + β₄TSS5_jt + β₅X_jt + δ_t + α_j + u_it (eq3)

Per_capita_cig_consumption_jt= β₀ + β₁TSS2_jt + β₂TSS3_jt + β₃TSS4_jt + β₄TSS5_jt + β₅X_jt + δ_t + α_j + u_it (eq4)

My understanding is that the use of two-way fixed effects will ensure that only within-country variation is used for model identification, allowing an analysis of the impact of a change in excise tax structure on smoking prevalence and per capita cigarette consumption in the sampled countries. Again, in both eq3 and eq4, standard errors were clustered by country. Following guidance from Carlo Lazzoro on this forum, equations 3 and 4 were implemented using -xtreg-.

However, I want to know how I can test whether a lag of the dependent variables should be included in the relevant models. Conceptually, it makes sense that, at the country-level, past smoking prevalence determines current smoking prevalence, and past consumption determines current consumption, given that cigarette smoking is addictive. Question: is there a formal test I can conduct to know if adding a lag of the dependent variable is the right thing to do?

Additionally, I am aware that if I do end up making this a dynamic panel data model, OLS will be biased. I have learned based on reading this forum, supplemented with these lecture notes, that because my N>T, I should be using systems GMM with -xtabond2-. However, while there is variation in my dummy variables within many (but not within all) countries over time, I am concerned that one cannot employ systems GMM with binary indicators (I haven't seen examples in the literature). I did find this forum in which it is indicated that dummy variables aren't a special case when it comes to differencing; but I am not confident that this applies in the application of systems GMM and the lack of empirical studies in my field that have used dummy variables makes me nervous of my own understanding. Question: is it possible to use estimate a model that includes dummy variables using -xtabond2-?

Thank you in advance for taking the time to read this.

Kind regards,

Sam
Tags: dummyvariables, lagged dependent variable, panel data, xtabond2
Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#2

26 Oct 2023, 03:46

You could first run a serial correlation test after estimation the static model; see xtserial for one possibility. More possibilities were presented by Jesse Wursten in a 2018 Stata Journal article: Testing for Serial Correlation in Fixed-effects Panel Models

Alternatively, you could estimate a dynamic model and test for statistical significance of the coefficient of the lagged dependent variable. With N=40, however, estimating a dynamic panel data model might be challenging. Generally, as long as there is some time variation in your dummy variables, you can apply the same procedures as with continuous variables. However, if those dummies only change rarely and stay constant for most of the time, you might run into issues with weak instruments. More on dynamic panel GMM estimation:
Kripfganz, S. (2019). Generalized method of moments estimation of linear dynamic panel data models. Proceedings of the 2019 London Stata Conference.

https://www.kripfganz.de/stata/
1 like
Comment
Sam Murgatroyd

Join Date: Oct 2023

Posts: 33
#3

26 Oct 2023, 17:05

Dear Sebastian,

Thank you for answering my questions, and for sharing these resources: -xtserial- for first-order correlation and - xtistest- and - xtqptest- for serial correlation up to any order are new to me and I am happy to learn about it.

I ran the - xtistest- command and it appears that I will need to include the lag of the dependent variables. While I feel okay about this for cigarette consumption, I am lost on how - xtabond2 - can be altered to account for a bounded dependent variable (in my case, smoking prevalence, which is between 0% and 100%). Fractional logit was my solution for modelling smoking prevalence in the static panel. Question: is it possible to account for the bounded nature of the dependent variable when using -xtabond2-?

Also, my understanding of -xtabond2- is that the country fixed effects are included. This is helpful for my second research question - what is the impact of a change in excise tax structure (reflected by a change in the tax structure score) on cigarette smoking prevalence and cigarette consumption?

However, given that I now know that I need to specify a model with a lag of the dependent variable, I am unsure of the appropriate dynamic panel estimator required to answer my first research question - do countries with simpler excise tax structures (i.e, a higher score) have lower smoking prevalence and lower cigarette consumption than countries adopting more complex excise tax structures (a lower score)?

Originally, I just did this by excluding the country fixed effects in the fractional logit for smoking prevalence, and the OLS for cigarette consumption; but now my knowledge that I need to use the a lagged dependent variable renders these estimation techniques inappropriate. Question: Are there recommended methods for estimating models that include lagged dependent variables, but no country fixed effects, in the case of data with short T, in which N>T? Or does one just estimate the models with the lag and acknowledge the bias resulting from the use of fractional logit and OLS?

Thank you!

Sam

Last edited by Sam Murgatroyd; 26 Oct 2023, 17:45. Reason: spelling mistake and grammatical error
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#4

27 Oct 2023, 03:52

There is no particular way the xtabond2 command can be adjusted for fractional outcome variables. It estimates a linear model with all the usual caveats. For dynamic probit or logit models, one might use the community-contributed commands probitfe or logitfe, as presented in a 2017 Stata Journal article by Cruz-Gonzalez, Fernández-Val, and Weidner: Bias Corrections for Probit and Logit Models with Two-way Fixed Effects

Country-fixed effects are not explicitly included, but they are accounted for by choosing instruments that are orthogonal/uncorrelated to them. (The fixed effects are part of the error term.)

Regarding your last question, I am afraid that depends on what is deemed acceptable in the particular literature you are contributing to. Both options are problematic.

https://www.kripfganz.de/stata/
Comment
Sam Murgatroyd

Join Date: Oct 2023

Posts: 33
#5

27 Oct 2023, 11:53

Dear Sebastian,

Thank you so much for your time on this. Your responses, and the resources you have shared, are incredibly helpful.

Sam
Comment

Announcement

testing whether to include lagged dependent variable, xtabond2 with dummy variables

Comment

Comment

Comment

Comment