Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Group by order of event occurrence

    Hi,
    It may be a relatively simple problem to order dates, but I cannot find an elegant solution.
    I have the dates of several events. The events can occur in any order.
    I would like to identify for each observation, the order of each event, and report event_1 1st n,%, event_1 2nd n,%, event_1 3rd n,%....., event_2 1st n,%, event_2 2nd n,%, ….. etc.
    My solution is clumsy, loop by each observation, to identify event position in the sequence of events, 1st,2nd, 3rd etc.
    Thank you,
    Horia

  • #2
    I can imagine several ways your data might be organized and coded, each of which would call for different code. I don't think anyone can give you specific advice until you post some example data. Use the -dataex- command to do that.

    If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Good evening Clyde,

      The following is a sample of my dateset: All dates correspond to an event.
      I am using version 16

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte id int(dodx doctreo_s dlan_s dsx dcx)
       1 19667     . 20423 19824     .
       3 17829 18141 19905     .     .
       4 20431     . 20417 20431     .
       5 20328     . 20388     .     .
       6 20361     . 20408 20391     .
       7 13515     . 20480 14901     .
       8 19632     . 20458     . 19663
       9 20444     . 20464     .     .
      10 19787     . 20492 20026     .
      11 17667     . 20494 17834     .
      12 20405 21686 20543     .     .
      end
      format %dM_d,_CY dodx
      format %dM_d,_CY doctreo_s
      format %dM_d,_CY dlan_s
      format %dM_d,_CY dsx
      format %dM_d,_CY dcx

      I appreciate the help received from Stata community,
      Horia

      Comment


      • #4
        https://www.stata-journal.com/articl...article=pr0046 may help. See the rowsort and rowranks commands.

        That said, you may well be better off with a reshape long.

        Comment


        • #5
          Building on Nick's suggestions, I would recommend going to long layout. While -rowsort- and -rowranks- can find the ordering you need, the reporting you want to do will be easier in long layout, as will almost anything else you want to do afterwards. Wide layout is only useful in limited circumstances in Stata, and unless you know that you are doing something that is best done wide, your default data organization should be long.

          One issue in converting to long is identifying all your date variables. There are a couple of easy ways that work in your example but might not work in your real data: they are all the variables except id, and they are all the variables that start with d. A more robust way perhaps is to exploit the fact that they are dates and have a date display format. I will rely on the latter:

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input byte id int(dodx doctreo_s dlan_s dsx dcx)
           1 19667     . 20423 19824     .
           3 17829 18141 19905     .     .
           4 20431     . 20417 20431     .
           5 20328     . 20388     .     .
           6 20361     . 20408 20391     .
           7 13515     . 20480 14901     .
           8 19632     . 20458     . 19663
           9 20444     . 20464     .     .
          10 19787     . 20492 20026     .
          11 17667     . 20494 17834     .
          12 20405 21686 20543     .     .
          end
          format %dM_d,_CY dodx
          format %dM_d,_CY doctreo_s
          format %dM_d,_CY dlan_s
          format %dM_d,_CY dsx
          format %dM_d,_CY dcx
          
          ds, has(format %d*)
          local date_vars `r(varlist)'
          
          rename (`date_vars') when=
          reshape long when, i(id) j(event) string
          drop if missing(when)
          by id (when), sort: gen chron_order = _n
          
          by chron_order, sort: tab event
          by event, sort: tab chron_order
          Now, I am not sure what you have in mind when you say "report event_1 1st n,%, event_1 2nd n,%, event_1 3rd n,%....., event_2 1st n,%, event_2 2nd n,%, ….. etc." But I imagine it is the output of one of the two -tab- commands at the end of this code. If that's not true, please post back illustrating what you want the output to look like.

          Comment


          • #6
            Hi Nick, rowranks works great , I am also interested to see, if you don't mind, an example using reshape long

            Clyde, I wonder if you have a different approach

            Thank you
            Horia

            Comment


            • #7
              Clyde Schechter gave both exhortation and example on reshape long and I am not clear what else you seek.

              Comment


              • #8
                Sorry , I missed the post, slow refresh of the forum page.
                Thank you,
                Horia

                Comment

                Working...
                X