Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ordering dates

    I have what's probably a simple issue for most. I'm using data from experimental games and trying to create an ordering variable based on dates. So, all observations recorded on the first date 14may2013 would be coded as 1, second date 15may2013 as 2, third date 17may2013 as 3, etc. Note that not all dates are consecutive, that is date + 1. There are anywhere from 24-45 observations per date. I'm tinkering with egen seq() and levelsof commands but to no avail. Any tips would be much appreciated. Let me know any additional clarification that's needed.

  • #2
    Erin,

    Check out egen rank().

    Regards,
    Joe

    Comment


    • #3
      Contrary to Joe, I don't think this sounds like a ranking problem at all.

      egen, group()
      will use the sort order of a date variable just like any other numeric variable and map groups to 1, 2, 3, ....

      For some purposes and for some users it is really puzzling that date variables are just numeric variables decorated with particular display formats. Often, however, and in particular here, what you want comes automatically and there is nothing special needed just because you have dates.

      Comment


      • #4
        Thanks, Joe and Nick. Nick, your suggestion was what I was looking for.

        Comment


        • #5
          Nick is, as always, correct; I always think of egen group() as something to use to make unique groups out of multiple variables, but it also has its uses with a single variable.

          I will quibble with Nick a little on this:
          I don't think this sounds like a ranking problem
          It actually is a ranking problem, just not in the statistical/Stata sense. In vernacular speech we often say that several people/things were "tied for first" or "tied for second". In a case, for example, where there are three people tied for first, we don't typically think of the next person as having come in fourth. For the purpose of awarding prizes, they may receive less than second prize money because the first prize money needs to be divided three ways, but as far as their actual achievement is concerned, they came in second.

          I have wished from time to time that Stata had an option in the egen rank() command that accommodated this interpretation (and apparently this wish has clouded my reality). Even though it is not a common statistical interpretation, it still has practical value in situations like the one that Erin describes. To me, egen group() is not an intuitive choice because it is not obvious until you read the details of the command that the groups are ordered in the sort order of the original variable. Maybe it's just me....

          Comment


          • #6
            A follow-up question to above:

            I've now ranked my observations based on dates using egen rank; however, the inability to use -by- with egen rank is holding me up a bit. I now want to rank observations (villages) within sectors, according to dates visited. So something along the lines of :

            Code:
            bys sector: egen order = group(date_ass)
            Know of any word arounds?

            Comment


            • #7
              We are disagreeing amiably about a very small patch of territory.

              History shows that many of the options in egen, rank() go back to programs that Rich Goldstein (active here too) and I wrote nearly 20 years ago, so I am partly to blame for any deficiencies here.

              I am afraid also that I tend to equate "ranking problem" with "what Stata calls a ranking problem", which is no doubt myopic. Actually there are many situations in which the next person after three who tie for first is called fourth in rank. This is precisely how I and my contemporaries were ranked in British secondary (high) school in the 1960s and I've seen the same rule called "schoolmaster's rank", although on several other grounds, not least sexism, that is a terrible term.

              But more narrowly, the reason why rank() doesn't do this in egen is precisely that group() does it any way.

              Nothing is familiar until it's familiar, but group() is something I use almost daily, so it doesn't puzzle me. I can't think of a systematic order to use other than sort order, but there you go.

              Comment


              • #8
                Thanks Nick. Per my above question, anyone have a suggested workaround to wanting to further group by another variable (sector) when -by- can't be used?

                Comment


                • #9
                  This is the trouble with egen; it is a Swiss Army knife but there is no way it can include all possible small tools.

                  So you need to use standard technique with by:

                  Code:
                  bys sector (date_ass): gen new_order = sum(date_ass != date_ass[_n-1])
                  We first score 1 in every observation that the date is not the same as the previous date (and 0 in every observation that is the same date). This works at the beginning of a panel too as the previous date for the beginning of a panel can only be returned as missing, and missing date != first date. If scores within each panel run 1, 0, ..., 1, 0, ..., or whatever, then their cumulative sum runs 1, 1, ..., 2, 2, ... or whatever.

                  For the same stuff (and some roundabout and incorrect alternatives!) see http://stackoverflow.com/questions/1...roups-in-stata

                  For a tutorial see e.g. http://www.stata-journal.com/article...article=pr0004

                  Last edited by Nick Cox; 30 Sep 2015, 10:51.

                  Comment

                  Working...
                  X