Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate event/month data to use in time series graph

    Hello,

    Please could you advise. I have a large data project in long format - > 50,000 rows. I am trying to produce a graph with x-axis time (calendar month between Jan 2017 and Dec 2022) and y-axis count (number of admissions per month) - line or bar.
    Data is in long format (see below) - each row correlates to a different admission date for a person. Each person has a different number of admission dates which may be 1 up to 20 or 30 over the 5 year period in question. Each person is identified by a study id number. There is a variable for the admission date.

    My id variable to identify individual people is studyid
    My date of admission variable is addateonly - currently in %td form but easy to convert to %tm format
    I have included an event happening variable if needed called max_epi - not sure if simplifies mecahnism to count

    Questions:
    1. do I need to generate a new variable for count data per month (e.g. egen monthlyadmissions = count (addateonly) )
    2. if so, do I then use a simpline line or bargraph code
    3. is there a way I can use time-series function (tsset and tsline) to generate automatically?
    4. finally at the end of graph creation, what code should I use to convert stata date back to 'real life date' i.e. Dec 2021, Jan 2022 etc
    input float(studyid addateonly max_epi)
    1 21700 1
    1 21839 1
    1 22216 1
    2 21399 1
    2 21418 1
    3 21046 1
    3 21077 1
    3 21548 1
    3 21650 1
    3 21753 1
    3 21799 1
    3 21817 1
    3 21850 1
    4 21406 1
    4 22109 1
    4 22231 1
    4 22830 1
    5 21370 1
    6 20891 1
    6 20894 1
    6 20912 1
    6 20937 1
    6 21084 1
    6 21334 1
    6 21375 1
    6 21546 1
    7 21754 1
    7 21814 1
    7 21850 1
    7 22158 1
    7 22725 1
    7 22919 1
    7 22939 1
    8 21151 1
    8 22378 1
    8 22469 1
    9 22411 1
    9 22452 1
    9 22664 1
    9 22692 1
    9 22720 1
    9 22799 1
    9 22909 1
    9 22980 1
    10 22662 1
    11 22468 1
    11 22670 1
    11 22782 1
    12 20861 1
    12 20877 1
    13 20848 1
    14 20835 1
    14 22720 1
    14 22759 1
    14 22788 1
    14 22797 1
    14 22798 1
    14 22813 1
    15 20836 1
    15 20944 1
    15 21007 1
    15 22764 1
    16 20927 1
    16 21025 1
    16 21395 1
    16 21637 1
    17 20914 1
    17 21350 1
    17 21474 1
    17 21540 1
    17 21567 1
    17 21585 1
    17 21586 1
    17 22134 1
    17 22154 1
    17 22160 1
    17 22169 1
    17 22210 1
    17 22215 1
    18 21217 1
    18 21257 1
    18 21381 1
    18 21453 1
    18 21597 1
    18 21700 1
    18 22340 1
    18 22459 1
    18 22483 1
    18 22537 1
    18 22545 1
    18 22556 1
    18 22567 1
    18 22574 1
    18 22586 1
    18 22601 1
    18 22622 1
    18 22710 1
    18 22768 1
    18 22771 1
    18 22785 1

    Thanks in advance

    Chris

  • #2
    forgot to say using stata 17.0

    Comment


    • #3
      You can go

      Code:
      gen mdate = mofd(addateonly)
      
      bysort mdate : gen count = _N 
      format %tm mdate 
      
      line count mdate 
      
      twoway bar count mdate
      Notes. Here Q3 is your 3rd question and so on.

      1. You need a new variable (Q2). Mine counts admissions from the data ignoring identifier, so a given person being admitted two or more times in a given month would get counted that many times.

      2. There is no need here for, and no advantage in, using commands (not functions) tsset and tsline (Q3).

      3. Mapping from daily dates to monthly dates is easiest with the
      mofd() function; it is not achieved by changing the display format if that is what you're thinking. See e.g. https://journals.sagepub.com/doi/pdf...867X1201200415 if you want more detail on that. (Q4).

      Comment

      Working...
      X