Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Weighted Poisson Regression Advice

    I'm using poisson model to regress number of resistant bacteria (to antibiotic 1) over time. However, I would like to adjust for the fact that every year, there will inevitably be more resistant bacteria because the total amount of bacteria being reported into the lab is increasing per year. Below I have created a fake data frame with the first year of data (there are 5 years in total). With each year there is an increase in total number of bacteria tested. I understand I have to use the offset command to do this. Can I have some advice on how to do this with my data frame?I have used the following commands in stata but realised that when i offset in this way, it doesn't actually help because its not adjusting for the fact that more bacteria are being reported per year:

    poisson resistance1 yearmo, irr offset(total1)
    Additionally, when I put the above code into stata, I get the following output:

    Iteration 4: log likelihood = -5145028.5 (not concave)
    Iteration 5: log likelihood = -5145007.3 (not concave)
    Iteration 6: log likelihood = -5144995 (not concave)
    Iteration 7: log likelihood = -5144989.6 (not concave)
    Year month number of resistant bacteria to antibiotic 1 (resistance1) total number of bacteria tested against antibiotic 1 (total1) year and month (yearmo)
    2014 1 644 1673 2013m1
    2014 2 658 1691 2013m2
    2014 3 715 1798 2013m3
    2014 4 706 1912 2013m4
    2014 5 700 1929 2013m5
    2014 6 756 1967 2013m6
    2014 7 870 2151 2013m7
    2014 8 870 2164 2013m8
    2014 9 817 2095 2013m9
    2014 10 811 2096 2013m10
    2014 11 724 1891 2013m11
    2014 12 765 1908 2013m12


    Any help is much appreciated.

  • #2
    There is no -offset- command in Stata. Some estimation commands include an -offset()- option, and -poisson- is among them. However, I don't think it would be appropriate in your situation to use it. For a -poisson- model like this one, the -exposure()- option would be more suistable. So

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float date int(n_resistant n_tested)
    648 644 1673
    649 658 1691
    650 715 1798
    651 706 1912
    652 700 1929
    653 756 1967
    654 870 2151
    655 870 2164
    656 817 2095
    657 811 2096
    658 724 1891
    659 765 1908
    end
    format %tm date
    
    poisson n_resistant date, exposure(n_tested)
    That said, I don't think a Poisson model is appropriate for this data in the first place. Poisson models are used where the outcome will be proportional to the size of an exposure, but it is not appropriate when the outcomes are themselves a subset of the exposure. So, for example, Poisson would be appropriate to estimate number of chocolate chips per pound of cookies, or number of potholes per mile of road. But it is not appropriate to model, for example, the ratio of male births to total live births, nor resistant bacterial cultures per number of cultures done. One reason for this is that the Poisson model explicitly supports the possibility of the outcome number being arbitrarily large, whereas the number of resistant cultures cannot exceed the total number of cultures. A better model for this is a binomial regression:

    Code:
    glm n_resistant date, family(binomial n_tested)
    In the future, when showing data examples, please use the -dataex- command, as I have done in this reply. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Dear Clyde

      Thank you for your extremely detailed and helpful response. On further reflection, it seems that it would indeed be wise to use the binomial regression. I am guessing binomial regressions can be used for count data such as my above data set?

      Additionally, I was wondering what your thoughts were regarding weighting the data. The reason I ask is that there is a general trend in my data set whereby the number of bacterial cultures reported to the laboratory have increased per year. This could have implications for my data analysis. Do you have any advice regarding this?

      Thank you for the advice about using -dataex- also.

      Comment


      • #4
        The binomial regression model is suitable when the outcome is a count variable and it represents a subset of a total number of opportunities, as here. The number of opportunities must also be a variable in the data set, and it appears in the -family(binomial ...)- option as the "denominator." It is not suitable for use when the outcome is not a subset of a total number of opportunities. So, for example, it should not be used to model the number of potholes per mile of road.

        The binomial regression model already captures the idea that the number of resistant cultures will rise in proportion to the total number of cultures, all else being equal. So there is no need to weight the data in any way to reflect this.

        Comment


        • #5
          Dear Clyde

          Your response was clear and informative. Thank you for this. Do you have any suggestions for where I can find some literature regarding this type of GLM method? I ask because I'd like to further my own understanding and fortify what you've advised me above.

          All the best

          Comment


          • #6
            I learned about generalized linear models a long time ago, when they were fairly new. I'm not really well-positioned to recommend a reference on them that isn't highly technical. I did some Googling and came up with https://www.amazon.com/Regression-Ca.../dp/0803973748, which looks like it might be appropriate.

            Comment


            • #7
              Thank you Clyde! Much appreciated.

              Comment


              • #8
                Hannah,

                This might be worth a look. I use it in my undergrad GLM course.

                An Introduction to Generalized Linear Models, Annette J. Dobson and Adrian G. Barnett, third ed., CRC Press, 2008.

                https://www.amazon.com/Introduction-.../dp/1584889500



                Comment


                • #9
                  This is great, thanks Richard. I will certainly look at purchasing this book

                  Comment

                  Working...
                  X