Dear all,

I'm comparing 3 treatments over time, each treatment has a different number of cases (houses). The outcome var is the reduction in numbers of insects across treatment through time.

I have a lot of zeros in the dataset, which means that the evaluation were made but no insects were found in the houses (so, it is not zero inflated case, from my understanding).

As the baseline evaluation shows different numbers of insects for each house in each treatment, I'm generating an offset in the baseline value for the outcome var.

My doubt is if the GLM is accounting for these zeros as negative results or if I should inform it to model? I've seen some references where authors transform the outcome var using log+1 previous to run the model. But it is not clear for me if I should transform the data to conduct the GLM...

Q1: Is it sensible to use the raw data with plenty of zeros, or should I transform the outcome with log+1 before run the model?

Q2: In case that I need transform the outcome var (log+1), how should I deal with the offset (raw or transformed)?

Following the codes:

****generating offset based on the value obtained for the baseline:

sort houseid followup

bysort houseid:gen offset1=num_insects[1]

***model interaction treatment*days post IRS

glm num_insects i.treat*daysfromirs i.empty i.season i.presence if irsround==1, offset(offset1) fam(nbinomial) l(log)

Thank you so much for your time,

Regards,

Raq

## Comment