Hello,
This is my first forum post. As a quick introduction, I'm a master's student in public health working on my thesis project, and I'm very excited to have found out about the statalist forum. Thank you to everyone for letting me share my questions!
My current project examines the association between illegal wildlife trade shipments (as a proxy for human-wildlife contact) and zoonotic disease transmission (transmission from animal to human). I am using Ebola as an ecological case study (country-level), and I am limited to a very small sample size of n=32 countries who have known or predicted geographic distribution of Ebola virus. My outcome variable is index cases of Ebola hemorrhagic fever, of which only seven countries have reported index cases. My primary exposure variable is illegal wildlife shipment of host mammals (i.e., mammals capable of hosting the disease pathogen) by country of origin. Other covariates include human population density, forest area (% of land area), health expenditure per capita, etc.
So far, I have run a series of Poisson regression models using a stepwise process to identify significant variables. The most parsimonious model and model of best fit based on AIC/BIC includes the independent variables wildlife trade shipments, human population density (log-transformed), and % forest area. Using these variables, I conducted a negative binomial regression model to account for over dispersion and excess zeroes, results of which produced no significant associations, but much lower AIC/BIC values. I also explored a zero-inflated Poisson model (inflating the variable population density), which produced the lowest AIC/BIC values and a significant Vuong's test z-statistic (p<0.05), which supposedly indicates that the zero-inflated is preferable to the standard Poisson model. This zero-inflated model produced all statistically significant associations, although the inflated variable, pop. density, showed an opposite effect on the two processes (certain zero vs. non-certain zero countries).
I'm aware that it's not recommended to use Poisson, Negative Binomial, ZINB, and ZIP on small sample sizes. I'm not aware, however of any alternative models.
Are there alternatives regression models for count variables limited by a very small sample size? Or are there ways to validate the models given that my data does not meet certain model assumptions? I'm new to the concepts of cross-validation and bootstrapping, but from what I've read, k-fold cross-validation requires a large sample size, but is often the preferred method of model validation.
I apologize as I'm quite new to statistics, but any recommendations on how to incorporate more robust methods in multivariate analyses (given a small sample size) would be greatly appreciated!
Thank you very much,
Katie
This is my first forum post. As a quick introduction, I'm a master's student in public health working on my thesis project, and I'm very excited to have found out about the statalist forum. Thank you to everyone for letting me share my questions!
My current project examines the association between illegal wildlife trade shipments (as a proxy for human-wildlife contact) and zoonotic disease transmission (transmission from animal to human). I am using Ebola as an ecological case study (country-level), and I am limited to a very small sample size of n=32 countries who have known or predicted geographic distribution of Ebola virus. My outcome variable is index cases of Ebola hemorrhagic fever, of which only seven countries have reported index cases. My primary exposure variable is illegal wildlife shipment of host mammals (i.e., mammals capable of hosting the disease pathogen) by country of origin. Other covariates include human population density, forest area (% of land area), health expenditure per capita, etc.
So far, I have run a series of Poisson regression models using a stepwise process to identify significant variables. The most parsimonious model and model of best fit based on AIC/BIC includes the independent variables wildlife trade shipments, human population density (log-transformed), and % forest area. Using these variables, I conducted a negative binomial regression model to account for over dispersion and excess zeroes, results of which produced no significant associations, but much lower AIC/BIC values. I also explored a zero-inflated Poisson model (inflating the variable population density), which produced the lowest AIC/BIC values and a significant Vuong's test z-statistic (p<0.05), which supposedly indicates that the zero-inflated is preferable to the standard Poisson model. This zero-inflated model produced all statistically significant associations, although the inflated variable, pop. density, showed an opposite effect on the two processes (certain zero vs. non-certain zero countries).
I'm aware that it's not recommended to use Poisson, Negative Binomial, ZINB, and ZIP on small sample sizes. I'm not aware, however of any alternative models.
Are there alternatives regression models for count variables limited by a very small sample size? Or are there ways to validate the models given that my data does not meet certain model assumptions? I'm new to the concepts of cross-validation and bootstrapping, but from what I've read, k-fold cross-validation requires a large sample size, but is often the preferred method of model validation.
I apologize as I'm quite new to statistics, but any recommendations on how to incorporate more robust methods in multivariate analyses (given a small sample size) would be greatly appreciated!
Thank you very much,
Katie
Comment