Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference-in-difference model for treatment rate with small sample

    Hello,

    I am running a difference-in-difference analysis to evaluate the impact of policy introduction on treatment rate for a specific disease. I have a country-level panel of annual observations from 2004-2016. The policy has a staggered introduction in 2014 and in 2015. In total, I have ~30 countries of which about half receive the policy and the remaining are the controls that never receive the policy. Data is xtset at the country and year level (~300 observations).

    My outcome variable is treatment rate per 1,000 people diagnosed with the disease (defined as no. treated per year/total diseased *1000).

    At present I have specified the following fixed-effects and random-effects models:

    Code:
    xtreg treatmentrate policyyrs group i.country i.year, fe
    xtreg treatmentrate policyyrs group i.country i.year, re
    Where policyyrs is my DID estimator and is equal to one from the year of introduction of the policy in those countries eligible for the policy & zero otherwise (instead of an interaction term due to staggered introduction). Group is the treatment group indicator.

    In later models, I build the basic model up to include i) important covariates & ii) an interaction term between region & year (i.region##i.year).

    I am a bit stuck about how to progress & have a few questions about my specifications:

    1) I think I need to use robust standard errors clustered at the country level, such that the errors are correlated within countries (& independent across). I am nervous about this approach as I only have ~30 clusters, & only ~10 observations per cluster & I'm not sure how well it would perform/whether it is necessary. If it's relevant, I have already included country fixed-effects in the model. I could also look at Wild cluster bootstrap errors.

    2) By using treatment rate, I lose important information about the size of the population & the standard error associated with the treatment rate estimate. How can I account for this in the model? I have the denominator of the treatment rate, so could include this in the model predictors, however, it is endogenous with the outcome variable & therefore could be problematic?

    Alternatively, as I am modelling rate, should I be using a poisson model? If a poisson model is more appropriate, should my outcome be i) the treatment rate or ii) the count of people treated (which I could calculate from the denominator I have). The spread of my observations is very wide - the median number treated is ~1000 and the range is ~10-200,000 and the median denominator is 31,000 (range 1000-1,500,000). I can run these with fixed or random effects, and with robust SEs as above.

    Code:
    /* Model i */  xtpoisson treatmentrate policyyrs group i.country i.year
    /* Model ii */ xtpoisson treatednum policyyrs group i.country i.year, exposure(popdiagnosed)
    I would really appreciate any help thinking through these questions.

    Best wishes,
    Bryony
Working...
X