Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • assign a random range of dates to one particular month

    I am trying to assign a range of dates to the month that the most days occur in. The ranges are random (but are usually about 1 month).

    This is not the cleanest way to read in some random dates, but it works.

    Code:
    clear
    input obs str8 firstdate str8 seconddate
    1 "20180712" "20180803"
    2 "20180804" "20180903"
    3 "20180904" "20181005"
    4 "20181006" "20181028"
    5 "20181029" "20181128"
    6 "20181129" "20190103"
    end
    
    destring firstdate seconddate, replace
    gen year = floor(firstdate/10000)
    gen month = floor(mod(firstdate, 10000)/100)
    gen day = mod(firstdate, 100)
    gen year2 = floor(seconddate/10000)
    gen month2 = floor(mod(seconddate, 10000)/100)
    gen day2 = mod(seconddate, 100)
    
    gen first_date = mdy(month, day, year)
    gen second_date = mdy(month2, day2, year2)
    drop year* month* day* firstdate seconddate
    
    format first_date second_date %tdCY-N-D
    I would like a new variable/column called "month" that returns the month that the most days between first_date and second_date occur in. The answers should be numeric (1 represents january, 2 represents february and so on). The "month" variable/column for the 6 ranges in the dataex example should be 7, 8, 9, 10, 11, 12.

    Thanks in advance.

  • #2
    Suppose a range include, say, both July and August. Should the answer be 7 or 8? Whenever you are looking for the "biggest" or "longest" or "largest" ... anything, you always have to have a rule for breaking ties.

    Comment


    • #3
      Thanks Clyde.

      Whichever month has more days in the range. So if the range included 17 days in Jul and 13 days in Aug, July would be the answer. In the case of tie, lets go with the first month. So if there were 15 days in each of Jan and Feb, then the answer would be Jan.

      Comment


      • #4
        Yes, but what if the range were from June 15 to September 10? Then both July and August would be in the range, and both are 31 days, which is longer than the number of days covered in either June or September. So is it July or is it August?

        Comment


        • #5
          I see. There are not cases where it exceeds 60 days. It could take place in 3 months, but for sure would be only a portion of 2 of those months (first and last) and the whole of the third (middle) month.

          If we need to account for the situations you asked, the first full month would be the correct answer. In your case, July.

          Sorry for not being more clear. Appreciate your questions.

          Comment


          • #6
            Thanks for the clarification.

            Code:
            foreach v of varlist firstdate seconddate {
                gen _`v' = daily(`v', "YMD"), after(`v')
                assert missing(_`v') == missing(`v')
                format _`v' %td
                drop `v'
                rename _`v' `v'
            }
            gen expander = mofd(seconddate) - mofd(firstdate) + 1
            expand expander
            by obs, sort: gen mm = mofd(firstdate) + _n - 1
            replace mm = dofm(mm)
            format mm %td
            by obs (mm), sort: assert mofd(seconddate) == mofd(mm[_N])
            gen dim = daysinmonth(mm)
            by obs (mm), sort: replace dim ///
                = lastdayofmonth(firstdate) - firstdate + 1 if _n == 1
            by obs (mm): replace dim ///
                = seconddate - firstdayofmonth(seconddate) + 1 if _n == _N
            gsort obs dim -mm
            by obs: keep if _n == _N
            gen wanted = month(mm)

            Comment


            • #7
              Thanks Clyde. Very slick.

              Comment

              Working...
              X