Hello,
I am running a difference-in-difference analysis to evaluate the impact of policy introduction on treatment rate for a specific disease. I have a country-level panel of annual observations from 2004-2016. The policy has a staggered introduction in 2014 and in 2015. In total, I have ~30 countries of which about half receive the policy and the remaining are the controls that never receive the policy. Data is xtset at the country and year level (~300 observations).
My outcome variable is treatment rate per 1,000 people diagnosed with the disease (defined as no. treated per year/total diseased *1000).
At present I have specified the following fixed-effects and random-effects models:
Where policyyrs is my DID estimator and is equal to one from the year of introduction of the policy in those countries eligible for the policy & zero otherwise (instead of an interaction term due to staggered introduction). Group is the treatment group indicator.
In later models, I build the basic model up to include i) important covariates & ii) an interaction term between region & year (i.region##i.year).
I am a bit stuck about how to progress & have a few questions about my specifications:
1) I think I need to use robust standard errors clustered at the country level, such that the errors are correlated within countries (& independent across). I am nervous about this approach as I only have ~30 clusters, & only ~10 observations per cluster & I'm not sure how well it would perform/whether it is necessary. If it's relevant, I have already included country fixed-effects in the model. I could also look at Wild cluster bootstrap errors.
2) By using treatment rate, I lose important information about the size of the population & the standard error associated with the treatment rate estimate. How can I account for this in the model? I have the denominator of the treatment rate, so could include this in the model predictors, however, it is endogenous with the outcome variable & therefore could be problematic?
Alternatively, as I am modelling rate, should I be using a poisson model? If a poisson model is more appropriate, should my outcome be i) the treatment rate or ii) the count of people treated (which I could calculate from the denominator I have). The spread of my observations is very wide - the median number treated is ~1000 and the range is ~10-200,000 and the median denominator is 31,000 (range 1000-1,500,000). I can run these with fixed or random effects, and with robust SEs as above.
I would really appreciate any help thinking through these questions.
Best wishes,
Bryony
I am running a difference-in-difference analysis to evaluate the impact of policy introduction on treatment rate for a specific disease. I have a country-level panel of annual observations from 2004-2016. The policy has a staggered introduction in 2014 and in 2015. In total, I have ~30 countries of which about half receive the policy and the remaining are the controls that never receive the policy. Data is xtset at the country and year level (~300 observations).
My outcome variable is treatment rate per 1,000 people diagnosed with the disease (defined as no. treated per year/total diseased *1000).
At present I have specified the following fixed-effects and random-effects models:
Code:
xtreg treatmentrate policyyrs group i.country i.year, fe xtreg treatmentrate policyyrs group i.country i.year, re
In later models, I build the basic model up to include i) important covariates & ii) an interaction term between region & year (i.region##i.year).
I am a bit stuck about how to progress & have a few questions about my specifications:
1) I think I need to use robust standard errors clustered at the country level, such that the errors are correlated within countries (& independent across). I am nervous about this approach as I only have ~30 clusters, & only ~10 observations per cluster & I'm not sure how well it would perform/whether it is necessary. If it's relevant, I have already included country fixed-effects in the model. I could also look at Wild cluster bootstrap errors.
2) By using treatment rate, I lose important information about the size of the population & the standard error associated with the treatment rate estimate. How can I account for this in the model? I have the denominator of the treatment rate, so could include this in the model predictors, however, it is endogenous with the outcome variable & therefore could be problematic?
Alternatively, as I am modelling rate, should I be using a poisson model? If a poisson model is more appropriate, should my outcome be i) the treatment rate or ii) the count of people treated (which I could calculate from the denominator I have). The spread of my observations is very wide - the median number treated is ~1000 and the range is ~10-200,000 and the median denominator is 31,000 (range 1000-1,500,000). I can run these with fixed or random effects, and with robust SEs as above.
Code:
/* Model i */ xtpoisson treatmentrate policyyrs group i.country i.year /* Model ii */ xtpoisson treatednum policyyrs group i.country i.year, exposure(popdiagnosed)
Best wishes,
Bryony