Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting observations over duplicate dates

    Hi all,

    I'm an economics student in college, doing my best to learn STATA. I've browsed the forum and googled everything I could, it took me too long to realize I should just ask. I'm not extremely well versed in STATA, but I know my way around. I feel what I'm trying to do is pretty basic, so hopefully any help I receive will seem familiar to me.

    In short I'm working towards doing an interrupted time series with a speed limit increase and crash data. My main objective right now is counting the number of crashes that occurred in each month so I can graph it. I have the date in different variables (as a date(crash_datetime "MDYhm") and as separate varables as month, year, etc.).

    Since there are mutiple crashes on any given day, I need to count the observation on a given month/year, but I have not been able to figure it out. My knowledge (and apparently researching ability) of egen, count, and dates is inadequate.

    Any help or nudge in the right direction is much appreciated,

    Caleb

  • #2
    I'm going to assume that each observation in your data set refers to a single crash.

    Code:
    gen mdate = mofd(crash_datetime)
    format mdate %tm
    by mdate, sort: gen month_crash_count = _N
    Note: you could do this using your separate month and year variables instead of creating mdate, but it is almost always better in Stata to use the combined monthly-date variable--and you'll probably need the mdate variable for other things later in your analysis anyway.

    Comment


    • #3
      That worked beautifully, though now I'm not sure how to graph it appropriately, since I now have a bunch of duplicates in both mdate and month_crash_count. What do you recommend?

      Comment


      • #4
        After a little trial and error, I tried * twoway (line month_crash_count mdate)*. Is there a better way?

        Comment


        • #5
          Welcome to Statalist.

          I've browsed the forum and googled everything I could
          Since you omit the word "documentation" from your past activities, let me suggest the following to improve your Stata expertise.

          When I began using Stata in a serious way, I started - as others here did - by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. All of these manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through Stata's Help menu.

          The objective in doing this was not so much to master Stata as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and manual.

          Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.

          Stata also supples YouTube videos, if that's your thing.

          One particularly relevant piece of documentation is that concerning Stata's "date and time" variables, which are complicated and have a lot to learn, as you have encountered in this problem. If you have not already read the very detailed Chapter 24 (Working with dates and times) of the Stata User's Guide PDF, do so now. If you have, it's time for a refresher. After that, the help datetime documentation will usually be enough to point the way. You can't remember everything; even the most experienced users end up referring to the help datetime documentation or back to the manual for details. But at least you will get a good understanding of the basics and the underlying principles. An investment of time that will be amply repaid.

          Comment


          • #6
            To get a clean graph with just one point per month:
            Code:
            egen month_flag = tag(mdate)
            graph twoway line month_crash_count mdate if flag

            Comment


            • #7
              It returns "flag not found." I changed it to "if month_flag" and returned a graph, though I'm not sure if that was the correct fix.

              Comment


              • #8
                Yes, sorry, I meant -if month_flag-. That was the correct fix.

                Comment

                Working...
                X