Weighted Poisson Regression Advice

Hannah Moody

Join Date: Jul 2018
Posts: 10

Weighted Poisson Regression Advice

08 Aug 2018, 09:03

I'm using poisson model to regress number of resistant bacteria (to antibiotic 1) over time. However, I would like to adjust for the fact that every year, there will inevitably be more resistant bacteria because the total amount of bacteria being reported into the lab is increasing per year. Below I have created a fake data frame with the first year of data (there are 5 years in total). With each year there is an increase in total number of bacteria tested. I understand I have to use the offset command to do this. Can I have some advice on how to do this with my data frame?I have used the following commands in stata but realised that when i offset in this way, it doesn't actually help because its not adjusting for the fact that more bacteria are being reported per year:

poisson resistance1 yearmo, irr offset(total1)

Additionally, when I put the above code into stata, I get the following output:

Iteration 4: log likelihood = -5145028.5 (not concave)
Iteration 5: log likelihood = -5145007.3 (not concave)
Iteration 6: log likelihood = -5144995 (not concave)
Iteration 7: log likelihood = -5144989.6 (not concave)

Year	month	number of resistant bacteria to antibiotic 1 (resistance1)	total number of bacteria tested against antibiotic 1 (total1)	year and month (yearmo)
2014	1	644	1673	2013m1
2014	2	658	1691	2013m2
2014	3	715	1798	2013m3
2014	4	706	1912	2013m4
2014	5	700	1929	2013m5
2014	6	756	1967	2013m6
2014	7	870	2151	2013m7
2014	8	870	2164	2013m8
2014	9	817	2095	2013m9
2014	10	811	2096	2013m10
2014	11	724	1891	2013m11
2014	12	765	1908	2013m12

Any help is much appreciated.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30064
#2

08 Aug 2018, 09:50

There is no -offset- command in Stata. Some estimation commands include an -offset()- option, and -poisson- is among them. However, I don't think it would be appropriate in your situation to use it. For a -poisson- model like this one, the -exposure()- option would be more suistable. So

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float date int(n_resistant n_tested) 648 644 1673 649 658 1691 650 715 1798 651 706 1912 652 700 1929 653 756 1967 654 870 2151 655 870 2164 656 817 2095 657 811 2096 658 724 1891 659 765 1908 end format %tm date poisson n_resistant date, exposure(n_tested)

That said, I don't think a Poisson model is appropriate for this data in the first place. Poisson models are used where the outcome will be proportional to the size of an exposure, but it is not appropriate when the outcomes are themselves a subset of the exposure. So, for example, Poisson would be appropriate to estimate number of chocolate chips per pound of cookies, or number of potholes per mile of road. But it is not appropriate to model, for example, the ratio of male births to total live births, nor resistant bacterial cultures per number of cultures done. One reason for this is that the Poisson model explicitly supports the possibility of the outcome number being arbitrarily large, whereas the number of resistant cultures cannot exceed the total number of cultures. A better model for this is a binomial regression:

Code:

glm n_resistant date, family(binomial n_tested)

In the future, when showing data examples, please use the -dataex- command, as I have done in this reply. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
1 like
Comment
Hannah Moody

Join Date: Jul 2018

Posts: 10
#3

08 Aug 2018, 10:00

Dear Clyde

Thank you for your extremely detailed and helpful response. On further reflection, it seems that it would indeed be wise to use the binomial regression. I am guessing binomial regressions can be used for count data such as my above data set?

Additionally, I was wondering what your thoughts were regarding weighting the data. The reason I ask is that there is a general trend in my data set whereby the number of bacterial cultures reported to the laboratory have increased per year. This could have implications for my data analysis. Do you have any advice regarding this?

Thank you for the advice about using -dataex- also.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30064
#4

08 Aug 2018, 10:06

The binomial regression model is suitable when the outcome is a count variable and it represents a subset of a total number of opportunities, as here. The number of opportunities must also be a variable in the data set, and it appears in the -family(binomial ...)- option as the "denominator." It is not suitable for use when the outcome is not a subset of a total number of opportunities. So, for example, it should not be used to model the number of potholes per mile of road.

The binomial regression model already captures the idea that the number of resistant cultures will rise in proportion to the total number of cultures, all else being equal. So there is no need to weight the data in any way to reflect this.
1 like
Comment
Hannah Moody

Join Date: Jul 2018

Posts: 10
#5

08 Aug 2018, 10:10

Dear Clyde

Your response was clear and informative. Thank you for this. Do you have any suggestions for where I can find some literature regarding this type of GLM method? I ask because I'd like to further my own understanding and fortify what you've advised me above.

All the best
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30064
#6

08 Aug 2018, 10:52

I learned about generalized linear models a long time ago, when they were fairly new. I'm not really well-positioned to recommend a reference on them that isn't highly technical. I did some Googling and came up with https://www.amazon.com/Regression-Ca.../dp/0803973748, which looks like it might be appropriate.
Comment
Hannah Moody

Join Date: Jul 2018

Posts: 10
#7

09 Aug 2018, 02:43

Thank you Clyde! Much appreciated.
Comment
Richard Hofler

Join Date: Apr 2014

Posts: 12
#8

09 Aug 2018, 08:18

Hannah,

This might be worth a look. I use it in my undergrad GLM course.

An Introduction to Generalized Linear Models, Annette J. Dobson and Adrian G. Barnett, third ed., CRC Press, 2008.

https://www.amazon.com/Introduction-.../dp/1584889500
Comment
Hannah Moody

Join Date: Jul 2018

Posts: 10
#9

16 Aug 2018, 07:38

This is great, thanks Richard. I will certainly look at purchasing this book
Comment

Announcement

Weighted Poisson Regression Advice

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment