Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Select value if date is in a date range

    Hi

    I'm trying to "link" a second date variable to my main date variable.

    I think it might be easiest to first post an example. I picked the values below myself to easier show what i want to do.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long date int exdate float(div_sum div)
    17594 17595   0   0
    17594 17594   0   0
    17596 17609 1.5 1.5
    17596 17984 1.5 4.2
    17613 17615 1.5 2.5
    17613 17617 1.5 2.5
    17615 17617 2.5 2.5
    17616 17619 2.5 4.2
    17618 17985 4.2 4.2
    17621 17626 4.2 ?
    end
    format %td date
    format %td exdate

    My main date variable is date which is in ascending order and exists for several (480) companies.

    The variable div_sum is a sum of the dividends of each company's stock that have been paid to date (i generated this earlier on).

    The variable i need help with is div. To generate this i need to match exdate with the largest value of date that is still below or equal to exdate.
    I.e. as seen in observations 1 and 2 they have an exdate below 05mar2008 but above or equal to 03mar2008. So here the value of div should be 0.
    Observation 3 has an exdate below 22mar2008, but above 05mar2008, so here the value for div would be the div_sum value for 05mar2008 which is 1.5.
    And so on - hope it makes sense.

    The main problem is that exdate isn't sorted and sometimes doesn't even match a date included in the date variable (date only contains trading dates and sometimes exdate is on a Saturday for some reason).

    I've tried a lot of things, but can't seem to figure this out, so hope you can help.


    I'm using Stata SE 12.1 for Mac if that matters.


    Rasmus

  • #2
    Thanks for using dataex to provide your data example. As a note of caution to others, please do not edit the output of dataex in your posts, the question mark in the last observation was not generated by dataex and Stata will skip that observation if you run the code as is.

    If I understand correctly what you are trying to do, you want to find the value of div_sum on the latest date that is within the interval bounded by the observation's date and exdate.

    You can do this easily with rangestat, a new user-written program available from SSC. To install it, type in Stata's Command window:

    Code:
    ssc install rangestat
    You have cases with more than one observation per distinct value of date. In your example data, the value of div_sum is constant within date groups so I assume that it's ok to consider that there is only one div_sum value per date. Then the solution would look something like:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str9 date_s long date str9 exdate_s int exdate float(div_sum div)
    "03mar2008" 17594 "04mar2008" 17595   0   0
    "03mar2008" 17594 "03mar2008" 17594   0   0
    "05mar2008" 17596 "18mar2008" 17609 1.5 1.5
    "05mar2008" 17596 "28mar2009" 17984 1.5 4.2
    "22mar2008" 17613 "24mar2008" 17615 1.5 2.5
    "22mar2008" 17613 "26mar2008" 17617 1.5 2.5
    "24mar2008" 17615 "26mar2008" 17617 2.5 2.5
    "25mar2008" 17616 "28mar2008" 17619 2.5 4.2
    "27mar2008" 17618 "29mar2009" 17985 4.2 4.2
    "30mar2008" 17621 "04apr2008" 17626 4.2   .
    end
    format %td date
    format %td exdate
    
    * save div_sum values per date; reduce to one obs per date
    preserve
    keep date div_sum
    bysort date (div_sum): keep if _n == _N
    rename date date_max
    rename div_sum div_max
    save "date_div_sum.dta", replace
    restore
    
    * find the largest date from the observation's date up to exdate
    * to install rangestat, type in Stata's command window: ssc install rangestat
    rangestat (max) date, interval(date date exdate)
    format %td date_max
    
    * match with the saved div_sum values
    merge m:1 date_max using "date_div_sum.dta", keep(master match) nogen 
    
    sort date exdate
    list

    Comment


    • #3
      Hi Robert!

      Thank you very much for your reply. I checked the date_div_sum.dta file (still running the rangestat command, it's very slow it seems).

      Thing is I'm not sure this solution takes in account that i have to do this thing for 480 different companies?

      Cause in the date_div_sum.dta file the div_max on 31mar2008 is 51.33, but for example for company 1 the max dividend on that date would actually be 0.

      Looking at my dataex code i see that this was not really clear, but do you think it's possible to alter this solution so it does this thing for each of my 480 companies separately?

      Thanks

      Comment


      • #4
        Well your sample data did not have company identifiers. Just use the by() option in rangestat. So the call should be something like:

        Code:
        rangestat (max) date, interval(date date exdate) by(id)

        Comment


        • #5
          Yes sorry about that!

          I got it fixed by the code you wrote and i change in the preserve-restore part.
          Here's the code i ended up with if anyone should be interested:
          Code:
           preserve
            keep date div_sum company
            bysort company date (div_sum): keep if _n == _N
            rename date date_max
            rename div_sum div_max
            save "date_div_sum.dta", replace
            restore
             
            rangestat (max) date, interval(date date exdate) by(company)
            format %td date_max
             
            merge m:1 company date_max using "date_div_sum.dta", keep(master match) nogen 
             
            sort company date exdate

          Comment


          • #6
            And thanks a lot again for your help Robert!

            Comment

            Working...
            X