Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with evaluating trends over time using Poisson regression

    Hi there. I’m new to this forum. I am an intermediate user of Stata (version 13.1) and have a question regarding a dataset I am currently analysing. It is a dataset of 15,000 bacteria from patients with an infection. The dataset contains count data. These bacteria have all been tested for their susceptibility to 5 antibiotics. Susceptibility means whether or not the bacteria are killed off by the antibiotic. If they are not killed off, it is a resistant bacterium. For each antibiotic, the outcome is either ‘resistant’ or ‘susceptible’.

    There are also other variables in the dataset such as age of patient from which the bacteria came from, region from which the bacteria/patient came from and the year and month the infection occurred. I want to run a Poisson regression in order to quantify the trends of resistance to these 5 antibiotics over time (i.e. per month per year). I have generated a new variable which only contains the number of ‘resistant’ infections to each antibiotic.

    I want to use this variable to represent the resistant cases. I’ve read in previous literature that the Poisson regression as the best way to analyse trends over time for count data. I understand that I also need to look at over-dispersion which means I may need to use a negative binomial regression. I am not using a time-series analysis as I want to simply model trends over time.
    My specific questions are the following
    1. How exactly can I effectively use the Poisson command in Stata to model trends over time? I have tried to use it with the code: poisson cases year i.month – but I’m not sure what other considerations I need to make. For example, sometimes I’ve seen this command followed by ‘irr’ or ‘robust’ and I am not entirely sure what these mean.
    2. Do I need to add the other variables in my regression e.g. region and age?
    3. Can I then create a graph once I’ve quantified these trends? I would like to create a graph that has months on the x avis and trends on the y axis.
    Any help would be really appreciated. I am sorry if this isn’t enough information – this is my first post on this forum. If I could improve in any way, please let me know.


  • #2
    You'll increase your chances of a helpful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. Also try to keep your posting as short as possible.

    If the count data has a large number of values, then many would ignore the count issue and treat these as continuous. However, if you want the exponential functional form, then poisson may be the easiest way to get it in Stata. [Some on this website question the automatic use of negative binomial estimators in cases of over dispersion - search for poisson or dispersion on the Forum and you'll find the discussions.]

    What exactly do you mean by "model trends over time"? Are you just interested in running a single kind of case versus time? If so, you probably want to create a year-month variable and use it instead of year and i.month. What you've done would treat year as a level and have dummies for month which would only make sense if there is within-year cycles.

    Whether you need to add other variables depends completely on the point of the analysis. What do you want to learn? Are you testing a model to explain cases or do you just want to predict, or what? Most of the time, we do analyses to try to understand how iv's influence the dv, in which case you'll want the important iv's in the model. Given the point of the analysis, you would want to look at how such analyses are handled in your discipline.

    After any Stata estimator, you can use the margins command to generate predicted values (for specified values of the iv's) and marginsplot to plot those values. Alternatively, you could use predict and then twoway graph it versus the iv, but this does not give as neat a graph.

    Comment


    • #3
      Thank you for your advice on how to ask my question better, Phil. This is hugely appreciated.

      By modelling trends over time, I mean I'd like to understand what is happening to resistance patterns over time. I have 5 years worth of data and I'd like to get an understanding of what is happening during these months and years. Perhaps it would be easier to make a year and month variable combined, this is entirely true.

      I understand that adding other variables depends on the point of the analysis. I am testing this model simply to understand resistance patterns over time and don't want to look at reasons for resistance yet. Therefore perhaps I don't need other variables in the model at this time.

      Thank you for introducing me to the marginsplot command to plot a graph. I did not know about this command before.

      Thanks for your time and useful input.

      Comment


      • #4
        Phil gave excellent advice but you haven't followed it all! In particular, without a data example, I can't be certain but I guess that your month variable does mean month of year and is numerically 1 to 12

        If so I might first test a simple model in terms of mdate (say) from

        Code:
        gen mdate = ym(year, month)
        format mdate %tm 
        except that lumping patients together may be medically far too crude, so some panel model seems called for. Yet again, patients at any one time may be in quite different phases of infection, so is there a case for patient-specific times?

        Using an indicator variable for month of year is a favourite device of economists. I doubt that bacteria know the calendar but both they and the patients may be sensitive to time of year. A more parsimonious way to model seasonality is in terms of sines and cosines. See https://www.stata-journal.com/sjpdf....iclenum=st0116 for an introduction.

        Trend and seasonality can be examined in a more complicated model.

        As you're new here I will stress that I am not a medic or medical statistician or epidemiologist.
        Last edited by Nick Cox; 30 Jul 2018, 08:45.

        Comment


        • #5
          Hannah:
          as an aside to previous excellent advice, it looks strange asking for aregression model specification after you've obtained the data(set).
          Surely other researchers have already published on the same topic and might be inspiring as far as predictors are concerned.
          As Nick said, I've read (but not experienced myself, though) the use of sine and cosine in Poisson regression in order to modelling inward hospitalization in Italy due to flu seasonal outbreak (https://doi.org/10.1177/2284240318777148; Warning: the article is in Italian with an abstract in English).
          However, it is also frequent that empirical count data are overdispersed and the researcher should switch to negative binomial distribution.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Dear Nick. Thank you for your input. I have used your excellent code and created a variable that is year and month combined. I have also gone away and tried to create a better question and will add an excerpt of the dataset and code I am using. Also, the attached PDF regarding sines and cosines will definitely come in handy. I will have a read of this.

            Dear Carlo. Thanks also for your paper. I understand your point regarding negative binomial regressions. I have looked into this quite extensively and have stayed with using poisson regression as this is what my research group have advised me since posting this question.

            Thanks all!

            Comment

            Working...
            X