While having only two variables of patients across time, I got lost into dynamic panel data modelling. ‘How hard can it be?’, I thought originally.
I use data on scores of both physical health and mental health of patients at four different times and intend to do cross-lagged regression analysis. Being a newbie in this field I used Rabe-Hesketh & Skrondal’s (RH&S’s) book on Multilevel and Longitudinal Modeling Using Stata (2012) as a guideline.
Having declared the panel data and using Stata’s easy way of referring to lagged variables, the naïve way would be
Since (repeated) observations are nested within patients, residuals are expected to be correlated, and I became tempted to use
RH&S state that this is be problematic, since “it would produce inconsistent estimates of the regression coefficients because lagged responses included as covariates, are correlated with the random intercept” (p273).
The authors suggest two ways out: 1. the Anderson-Hsiao approach and 2. the Arrelano-Bond approach.
In the Anderson-Hsiao approach, the second lag of the responses is used as instrumental variable for the lagged difference. Translating their example (p275) to our situation I used:
For the Arrelano-Bond approach (p277) I used:
Additionally, as a third method, I used the xtdpdml command by Allison, Moral-Benito and Williams (2015)
The regression parameters that I found for L.mental (with SE) and L.physical (with SE) were respectively:
regress ______ 0.74 (0.06) 0.01 (0.01)
mixed ________ 0.94 (0.04) 0.00 (0.01)
ivregress ___ -0.58 (0.34) 0.01 (0.04)
xtabond ______ 0.40 (0.50) 0.09 (0.05)
xtdpdml ______ 0.30 (0.24) 0.06 (0.03)
The results clearly illustrate how the two naïve methods differ from the latter three methods. Somehow I hoped that the results of the latter three methods would coincide. Unfortunately, they do not. The result that puzzles me most is the deviant estimate for L.mental coming from the ivregress method (notice the minus sign). In fact, in the Stata output of ivregress the parameter is named LD.mental - not L.mental, but RH&S state it estimates the same parameter “gamma” (Table 5.2, p272). My question is: Are there any Stata users that experienced similar differences in estimates from ivregress and xtabond? Am I missing something?
Kind regards,
Adriaan Hoogendoorn
References
Rabe-Hesketh, Sophia & Anders Skrondal (2012) Multilevel and Longitudinal Modeling Using Stata (Third edition), Stata Press.
Allison, Moral-Benito and Williams (2015) https://ideas.repec.org/p/boc/scon15/11.html
I use data on scores of both physical health and mental health of patients at four different times and intend to do cross-lagged regression analysis. Being a newbie in this field I used Rabe-Hesketh & Skrondal’s (RH&S’s) book on Multilevel and Longitudinal Modeling Using Stata (2012) as a guideline.
Having declared the panel data and using Stata’s easy way of referring to lagged variables, the naïve way would be
Code:
xtset id t regress mental L.mental L.physical
Code:
mixed mental L.mental L.physical || id: , res(uns, t(t)) reml
RH&S state that this is be problematic, since “it would produce inconsistent estimates of the regression coefficients because lagged responses included as covariates, are correlated with the random intercept” (p273).
The authors suggest two ways out: 1. the Anderson-Hsiao approach and 2. the Arrelano-Bond approach.
In the Anderson-Hsiao approach, the second lag of the responses is used as instrumental variable for the lagged difference. Translating their example (p275) to our situation I used:
Code:
ivregress 2sls D.mental LD.(physical) (LD.mental=L2.mental)
Code:
xtabond mental L.physical , twostep noconstant vce(robust)
Additionally, as a third method, I used the xtdpdml command by Allison, Moral-Benito and Williams (2015)
Code:
xtdpdml mental L.physical
The regression parameters that I found for L.mental (with SE) and L.physical (with SE) were respectively:
regress ______ 0.74 (0.06) 0.01 (0.01)
mixed ________ 0.94 (0.04) 0.00 (0.01)
ivregress ___ -0.58 (0.34) 0.01 (0.04)
xtabond ______ 0.40 (0.50) 0.09 (0.05)
xtdpdml ______ 0.30 (0.24) 0.06 (0.03)
The results clearly illustrate how the two naïve methods differ from the latter three methods. Somehow I hoped that the results of the latter three methods would coincide. Unfortunately, they do not. The result that puzzles me most is the deviant estimate for L.mental coming from the ivregress method (notice the minus sign). In fact, in the Stata output of ivregress the parameter is named LD.mental - not L.mental, but RH&S state it estimates the same parameter “gamma” (Table 5.2, p272). My question is: Are there any Stata users that experienced similar differences in estimates from ivregress and xtabond? Am I missing something?
Kind regards,
Adriaan Hoogendoorn
References
Rabe-Hesketh, Sophia & Anders Skrondal (2012) Multilevel and Longitudinal Modeling Using Stata (Third edition), Stata Press.
Allison, Moral-Benito and Williams (2015) https://ideas.repec.org/p/boc/scon15/11.html

Comment