Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with time series data analysis

    Hello all,
    Statistics novice here trying to find the best test to use for my data. I am currently looking at the number of cases of a disease presenting to emergency departments every year from 2010 to 2020. I would like to find the most appropriate test to see if there is a significant increase from 2019 - 2020. There has been a gradual increase in the number of cases prior years but there is a huge jump in 2020 and I would like to know if this is significant. Any thoughts?

  • #2
    If the underlying process is normally a simple constant growth rate over the years, and you wish to see if 2020 substantially broke the trend, you could run something like this:

    Code:
    poisson case_count year 2020.year, irr vce(robust)
    This assumes that you have one observation for each year, case_count is the number of cases presenting to the ED in the year, and year is the year itself (2010 to 2020). The results for 2020.year will tell you the extent (incidence rate ratio) to which the incidence in 2020 departed from what would be expected by continuation of the constant growth trend.

    This is a simple model. There may be other factors that influence the annual incidence of the disease, and they are not accounted for in this model. Then again if you have only 11 observations, you don't have enough data to model the influence of any other factors anyway. Indeed, it is a stretch just doing this analysis.

    To get the most out of your Statalist posts, it is better not to pose questions that leave the nature of the data to the reader's imagination. If your data do not match my description, then the above code may not be helpful. The simplest way to provide good information about the kind of data you are working with is to post example data using the -dataex- command. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Hey Clyde, thank you so much for your help. Good to know about the dataex, I will definitely use that now moving forward when posting on here. Here below is my data. If the number of total cases (including both dog bites and non-dog bites) varies from year to year is it still valid to use raw number of bites in my model or should I use ratios to normalize over the 10 year period?

      Comment


      • #4
        Apologies it appears this didn't make it into the last post
        Attached Files

        Comment


        • #5
          OK. Thanks. You didn't quite use -dataex- as intended. The idea is not to post a screenshot of the -dataex- output but to copy/paste it from the Results window into the edit window here on Statalist. That way I'd be able to copy/paste it into my do-file editor and actually work with your data. A screenshot can't be imported.

          Nevertheless, for present purposes, this is great information. The data you have is much richer--it looks like each observation corresponds to a single ED visit and includes information about the patient demographics as well as information about whether or not the visit was for a bite. Each observation also contains summary data for the year (total number of visits, total number of bites, and the ratio of those.)

          So, first you need to be clear on what you want to find out. Do you want to know if the total number of bite visits jumped in 2020, or if the proportion of visits that are for bites jumped? Those questions might have different answers. The proportion question would be less influenced by variation in overall ED utilization over time, so the answers might be easier to understand.

          I vaguely remember learning once that, at least among children, dog bites are more common in boys than girls. So one might want to refine the analysis looking at what's going on separately in boys, girls, and adults. I don't think there's a sex difference in dog bite incidence among adults. I'm not sure if there are racial disparities. But don't trust my fading memory about these demographic issues--see what the literature says about that.

          Anyway, I'm thinking along these lines:

          Code:
          // PROPORTION OF BITE VISITS AMONG ADULTS
          logistic bite year 2020.year if peds == 0
          
          // PROPORTION OF BITE VISITS AMONG BOYS
          logistic bite year 2020.year if peds == 1 & Sex == 1
          
          // PROPORTION OF BITE VISITS AMONG GIRLS
          logistic bite year 2020.year if peds == 1 & Sex == 2
          Note: In the above I guessed that Sex is coded as 1 = Male and 2 = Female. If I got that backwards, change the code accordingly. If Race is a relevant factor, I think it is clear how to modify the code to take it into account. If you want an overall assessment not broken down by demographic categories, just don't use an -if- clause.

          In each output, the odds ratio for 2020.year will tell you how much greater (or less) the odds that a visit would be for a dog bite in 2020 than would be expected based on the time trends that otherwise prevailed in the era.

          Comment


          • #6
            Sir,
            Thank you kindly for sharing your knowledge. It appears that logistic regression does indeed work better if my hypothesis is exploring a change in the proportion of visits. If I were to also want look at changes in number of cases would Poisson be the test of choice?

            edvisits:
            year | mean
            ---------+----------
            2010 | 3.23e+08
            2011 | 3.12e+08
            2012 | 3.11e+08
            2013 | 2.83e+08
            2014 | 2.76e+08
            2015 | 2.67e+08
            2016 | 2.83e+08
            2017 | 2.94e+08
            2018 | 2.66e+08
            2019 | 2.63e+08
            2020 | 2.09e+08
            ---------+----------
            Total | 2.84e+08
            --------------------

            Comment


            • #7
              Yes. Here's how I would approach that. I would first reduce the dataset to grouped data.

              Code:
              collapse (sum) bite_cases = bite (count) visits = bite, by(peds Sex Year)
              poisson bite_cases year 2020.year, irr exposure(visits) // ALL COMERS
              Use the various -if- conditions to get results for adults, boys, and girls. (You just have to do the collapse once, then the data is good for all of the Poisson regresions). The use of the -exposure(visits)- option normalizes the data and accounts for variations in the utilization of the ED overall.

              Comment

              Working...
              X