Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to know a ddmmyyyy variable's corresponding value in numbers

    Dear statalist,

    I have some dates imported from excel, the dates are in string format. I use
    Code:
    generate publishdate = date(publish_date, "YMD")
    format %td publishdate
    to convert them into readable numerically encoded Stata date.

    Now I want to preserve only observations within a certain period, specifically, I want to preserve those with publishdate between 23jan2019 to 8apr2019 and between 23jan2020 to 8apr2020. I tried
    Code:
    keep if 23jan2019 < publishdate < 8apr2019
    and stata says "23jan2019 invalid name", I think I need those dates expressed in numbers (I vaguely remember it stands for the number of days since a certain date way back in the 1980s?). How can I know this number, or how can I calculate this?

    Thanks a lot for any help!

  • #2

    Code:
    keep if inrange(publishdate, date("24jan2019", "MDY"), date("7apr2019", "MDY"))
    is one solution that follows from code you have already used.

    Here is another

    Code:
     
    keep if inrange(publishdate, mdy(1, 24, 2019), mdy(4, 7, 2019))
    See also

    Code:
    help td()

    .

    Comment


    • #3
      Thanks it works!

      Comment


      • #4
        As an aside, I'd like to point out that the syntax -keep if a < x < b- is perfectly legal, but is not at all equivalent to -keep if inrange(x, a, b)-.

        The semantics of -a < x < b- in Stata is based on logical (Boolean) expressions and how they are handled. -a < x- is a logical expression: it is either true or false, depending on the values of a and x. If true, Stata considers it to be 1, and if false Stata considers it to be 0. So -a < x < b- will be understood as -1 < b- or -0 < b-, according to whether a is less than x or not. The final result then depends on the ordering relationship of b compared to 1 or 0. So, if we replace the illegal date representations asked about in #1 by any of their legal equivalents proposed in #2 we would get:
        Code:
        mdy(1, 4, 2019) < publishdate < mdy(4, 7, 2019)
        Now, as noted, mdy(1, 4, 2019) < publishdate will be evaluated as either 0 or 1, depending on what publishdate is. But either way, mdy(1, 4, 2019) < publishdate can only be either 0 or 1. And while I don't know the exact numeric value of mdy(4, 7, 2019) off hand, I can guarantee you that it is way larger than 1. So -mdy(1, 4, 2019) < publishdate < mdy(4, 7, 2019)- will always be true, no matter what the value of publishdate is.

        The a < x < b notation is convenient in mathematics. But it should be avoided in Stata because, even though legal, its meaning is quite different from the mathematical expression, and it takes some effort to figure out exactly how it will work in given instances.

        Comment


        • #5
          In addition, inrange(x, a, b) would in mathematics (but not Stata) be a <= x <= b

          Comment

          Working...
          X