I want to estimate the impact of deforestation (X) on a disease cases (Y), a count variable.The data is a panel aggregated at municipal level and municipalities are very heterogeneous in population (P) and area. I'm wondering how to correctly account for the differences in exposed population (which is all the population, no age or other restrictions) in each municipality.
More specifically, when estimating the (fixed effects) Poisson or Negative Binomial, should I:
a) compute the ration Y/P and use that as dependent variable
b) include P as a regressor with coefficient locked at 1 (implemented with the "offset(P)" option).
c) include P as a regressor without restricting its coefficient.
I've found other threads that tangentially indicate that "b)" is the correct solution. Is there a good textbook section or paper explaining this and that I could cite as authoritative source? Also does this affects the coefficient interpretation?
Many of the articles we reviewed just use "a)", the number of cases per thousand inhabitants as the dependent variable. Does this bias the estimates or s.d. in any particular way?
regards
Lucas
More specifically, when estimating the (fixed effects) Poisson or Negative Binomial, should I:
a) compute the ration Y/P and use that as dependent variable
b) include P as a regressor with coefficient locked at 1 (implemented with the "offset(P)" option).
c) include P as a regressor without restricting its coefficient.
I've found other threads that tangentially indicate that "b)" is the correct solution. Is there a good textbook section or paper explaining this and that I could cite as authoritative source? Also does this affects the coefficient interpretation?
Many of the articles we reviewed just use "a)", the number of cases per thousand inhabitants as the dependent variable. Does this bias the estimates or s.d. in any particular way?
regards
Lucas
Comment