Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to identify one or more dates within a period of time from two different data sets

    Dear statalist,

    I have one set of data about some deals, for each deal, there is announcement date, end date, deal number, and symbol (to identify which firm the deal belongs to). A firm could have multiple deals in a year, and there are different deal numbers to distinguish them. I have another set of data, which lists firms' declare date, there is only one declare date for each firm in a year. In this data, variables include year, symbol, and declare date.

    The period between announcement date and end date is the deal period, and I want to identify all the declare dates that fall within the deal period for each deal of a firm. I wonder how this can be achieved? Thanks a lot for any advice!

    Data for deals and dates:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long DealNumber int Announceddate float end_date long Symbol
    1000012032 17493 18341   38
    1943185129 21986 22046   42
    1939082254 21832 21966   42
    1943173585 21967 21984   42
    1943084825 21402 21405   46
    1907166257 20637 20638   46
    1603014662 17960 18690   46
    1600001163 17546 18278   46
    1943042385 21355 21356   46
    1943042389 21340 21341   46
    1943089014 21160 21182   46
    1943089011 20950 20961   46
    1943124865 21778 21833   48
    1907137044 20473 21286   65
    1633058893 18926 18991   69
    1907068614 19772 19773   69
    1907000052 19536 19900   69
    1633062049 18993 19023   69
    end
    format %td Announceddate
    format %td end_date
    Data for firms' declare dates:
    Code:
    Symbol    declaredate    year
    38    24apr2010    2010
    38    27apr2011    2011
    38    20apr2012    2012
    38    24apr2013    2013
    38    25apr2014    2014
    38    28apr2015    2015
    38    21apr2016    2016
    38    15apr2017    2017
    38    21apr2018    2018
    38    30apr2019    2019
    38    23apr2020    2020
    42    10apr2010    2010
    42    13apr2011    2011
    42    12apr2012    2012
    42    13apr2013    2013
    42    28mar2014    2014
    42    21mar2015    2015
    42    29mar2016    2016
    42    29apr2017    2017
    42    27apr2018    2018
    42    13apr2019    2019
    42    28apr2020    2020
    46    02mar2010    2010
    46    16apr2011    2011
    46    21apr2012    2012
    46    20apr2013    2013
    46    28mar2014    2014
    46    04apr2015    2015
    46    31mar2016    2016
    46    25apr2017    2017
    46    28apr2018    2018
    46    17apr2019    2019
    46    29apr2020    2020
    65    23mar2010    2010
    65    29mar2011    2011
    65    13mar2012    2012
    65    26mar2013    2013
    65    31mar2014    2014
    65    24mar2015    2015
    65    31mar2016    2016
    65    29apr2017    2017
    65    28mar2018    2018
    65    02apr2019    2019
    65    20apr2020    2020
    69    22apr2010    2010
    69    04mar2011    2011
    69    17mar2012    2012
    69    09mar2013    2013
    69    13mar2014    2014
    69    21mar2015    2015
    69    23apr2016    2016
    69    08apr2017    2017
    69    27apr2018    2018
    69    30mar2019    2019
    69    28apr2020    2020
    Last edited by Alice Yang; 25 Sep 2022, 09:56.

  • #2
    This solution uses the community-contributed command -rangejoin-, which itself uses -rangestat-. You will probably need to install both:

    Code:
    ssc install rangestat
    ssc install rangejoin
    And then it's essentially one command, but showing you the full code:

    Code:
    clear
    input long Symbol str9 disclosedate int year
    38    24apr2010    2010
    38    27apr2011    2011
    38    20apr2012    2012
    38    24apr2013    2013
    38    25apr2014    2014
    38    28apr2015    2015
    38    21apr2016    2016
    38    15apr2017    2017
    38    21apr2018    2018
    38    30apr2019    2019
    38    23apr2020    2020
    42    10apr2010    2010
    42    13apr2011    2011
    42    12apr2012    2012
    42    13apr2013    2013
    42    28mar2014    2014
    42    21mar2015    2015
    42    29mar2016    2016
    42    29apr2017    2017
    42    27apr2018    2018
    42    13apr2019    2019
    42    28apr2020    2020
    46    02mar2010    2010
    46    16apr2011    2011
    46    21apr2012    2012
    46    20apr2013    2013
    46    28mar2014    2014
    46    04apr2015    2015
    46    31mar2016    2016
    46    25apr2017    2017
    46    28apr2018    2018
    46    17apr2019    2019
    46    29apr2020    2020
    65    23mar2010    2010
    65    29mar2011    2011
    65    13mar2012    2012
    65    26mar2013    2013
    65    31mar2014    2014
    65    24mar2015    2015
    65    31mar2016    2016
    65    29apr2017    2017
    65    28mar2018    2018
    65    02apr2019    2019
    65    20apr2020    2020
    69    22apr2010    2010
    69    04mar2011    2011
    69    17mar2012    2012
    69    09mar2013    2013
    69    13mar2014    2014
    69    21mar2015    2015
    69    23apr2016    2016
    69    08apr2017    2017
    69    27apr2018    2018
    69    30mar2019    2019
    69    28apr2020    2020
    end
    
    gen _disclosedate = daily(disclosedate,"DMY")
    drop disclosedate
    ren _* *
    format disclosedate %td
    
    tempfile declaredates
    save `declaredates'
    
    clear
    input long DealNumber int Announceddate float end_date long Symbol
    1000012032 17493 18341   38
    1943185129 21986 22046   42
    1939082254 21832 21966   42
    1943173585 21967 21984   42
    1943084825 21402 21405   46
    1907166257 20637 20638   46
    1603014662 17960 18690   46
    1600001163 17546 18278   46
    1943042385 21355 21356   46
    1943042389 21340 21341   46
    1943089014 21160 21182   46
    1943089011 20950 20961   46
    1943124865 21778 21833   48
    1907137044 20473 21286   65
    1633058893 18926 18991   69
    1907068614 19772 19773   69
    1907000052 19536 19900   69
    1633062049 18993 19023   69
    end
    format %td Announceddate
    format %td end_date
    
    rangejoin disclosedate Announceddate end_date using `declaredates', by(Symbol)
    This produces:

    Code:
    list , noobs sepby(DealNumber)
    
      +----------------------------------------------------------------+
      | DealNumber   Announc~e    end_date   Symbol   year   disclos~e |
      |----------------------------------------------------------------|
      | 1000012032   23nov2007   20mar2010       38      .           . |
      |----------------------------------------------------------------|
      | 1943185129   12mar2020   11may2020       42   2020   28apr2020 |
      |----------------------------------------------------------------|
      | 1939082254   10oct2019   21feb2020       42      .           . |
      |----------------------------------------------------------------|
      | 1943173585   22feb2020   10mar2020       42      .           . |
      |----------------------------------------------------------------|
      | 1943084825   06aug2018   09aug2018       46      .           . |
      |----------------------------------------------------------------|
      | 1907166257   02jul2016   03jul2016       46      .           . |
      |----------------------------------------------------------------|
      | 1603014662   04mar2009   04mar2011       46   2010   02mar2010 |
      |----------------------------------------------------------------|
      | 1600001163   15jan2008   16jan2010       46      .           . |
      |----------------------------------------------------------------|
      | 1943042385   20jun2018   21jun2018       46      .           . |
      |----------------------------------------------------------------|
      | 1943042389   05jun2018   06jun2018       46      .           . |
      |----------------------------------------------------------------|
      | 1943089014   07dec2017   29dec2017       46      .           . |
      |----------------------------------------------------------------|
      | 1943089011   11may2017   22may2017       46      .           . |
      |----------------------------------------------------------------|
      | 1943124865   17aug2019   11oct2019       48      .           . |
      |----------------------------------------------------------------|
      | 1907137044   20jan2016   12apr2018       65   2016   31mar2016 |
      | 1907137044   20jan2016   12apr2018       65   2017   29apr2017 |
      | 1907137044   20jan2016   12apr2018       65   2018   28mar2018 |
      |----------------------------------------------------------------|
      | 1633058893   26oct2011   30dec2011       69      .           . |
      |----------------------------------------------------------------|
      | 1907068614   18feb2014   19feb2014       69      .           . |
      |----------------------------------------------------------------|
      | 1907000052   27jun2013   26jun2014       69   2014   13mar2014 |
      |----------------------------------------------------------------|
      | 1633062049   01jan2012   31jan2012       69      .           . |
      +----------------------------------------------------------------+

    Comment


    • #3
      Thanks Hemanshu, it works well!

      Comment


      • #4
        Hi again,

        I have one more question. Regarding those deals without disclosure dates that fall within the period between announcement date and end date (deal period), such as deal number 1000012032 with symbol 38, I want to list all the years between the deal period, so for the example cited, the years in between would be 2007, 2008, 2009, 2010. I tried to apply -rangejoin- but that doesn't work the same way. Do you have any idea how this one can be solved? Thanks!

        Comment


        • #5
          Consider this somewhat clunky way of doing it:

          Code:
          rangejoin disclosedate Announceddate end_date using `declaredates', by(Symbol)
          
          gen int year_announced = year(Announceddate)
          gen int year_end = year(end_date)
          
          sum year_announced, meanonly
          local year_low = r(min)
          sum year_end, meanonly
          local year_high = r(max)
          
          preserve
              clear
              set obs `=`year_high'-`year_low'+1'
              gen int year_btw = `year_low' + _n - 1
              tempfile years
              save `years'
          restore
          
          rangejoin year_btw year_announced year_end using `years' 
          duplicates drop DealNumber year_btw if missing(disclosedate), force
          duplicates drop DealNumber disclosedate if !missing(disclosedate), force
          replace year = year_btw if missing(year)
          drop year_*
          which produces:
          Code:
          list , noobs sepby(DealNumber)
            +----------------------------------------------------------------+
            | DealNumber   Announc~e    end_date   Symbol   year   disclos~e |
            |----------------------------------------------------------------|
            | 1000012032   23nov2007   20mar2010       38   2007           . |
            | 1000012032   23nov2007   20mar2010       38   2008           . |
            | 1000012032   23nov2007   20mar2010       38   2009           . |
            | 1000012032   23nov2007   20mar2010       38   2010           . |
            |----------------------------------------------------------------|
            | 1943185129   12mar2020   11may2020       42   2020   28apr2020 |
            |----------------------------------------------------------------|
            | 1939082254   10oct2019   21feb2020       42   2019           . |
            | 1939082254   10oct2019   21feb2020       42   2020           . |
            |----------------------------------------------------------------|
            | 1943173585   22feb2020   10mar2020       42   2020           . |
            |----------------------------------------------------------------|
            | 1943084825   06aug2018   09aug2018       46   2018           . |
            |----------------------------------------------------------------|
            | 1907166257   02jul2016   03jul2016       46   2016           . |
            |----------------------------------------------------------------|
            | 1603014662   04mar2009   04mar2011       46   2010   02mar2010 |
            |----------------------------------------------------------------|
            | 1600001163   15jan2008   16jan2010       46   2008           . |
            | 1600001163   15jan2008   16jan2010       46   2009           . |
            | 1600001163   15jan2008   16jan2010       46   2010           . |
            |----------------------------------------------------------------|
            | 1943042385   20jun2018   21jun2018       46   2018           . |
            |----------------------------------------------------------------|
            | 1943042389   05jun2018   06jun2018       46   2018           . |
            |----------------------------------------------------------------|
            | 1943089014   07dec2017   29dec2017       46   2017           . |
            |----------------------------------------------------------------|
            | 1943089011   11may2017   22may2017       46   2017           . |
            |----------------------------------------------------------------|
            | 1943124865   17aug2019   11oct2019       48   2019           . |
            |----------------------------------------------------------------|
            | 1907137044   20jan2016   12apr2018       65   2016   31mar2016 |
            | 1907137044   20jan2016   12apr2018       65   2017   29apr2017 |
            | 1907137044   20jan2016   12apr2018       65   2018   28mar2018 |
            |----------------------------------------------------------------|
            | 1633058893   26oct2011   30dec2011       69   2011           . |
            |----------------------------------------------------------------|
            | 1907068614   18feb2014   19feb2014       69   2014           . |
            |----------------------------------------------------------------|
            | 1907000052   27jun2013   26jun2014       69   2014   13mar2014 |
            |----------------------------------------------------------------|
            | 1633062049   01jan2012   31jan2012       69   2012           . |
            +----------------------------------------------------------------+

          Comment


          • #6
            Hi Hemanshu,

            Thanks a lot for your detailed solutions. I have encountered a new problem when running
            Code:
            preserve
                clear
                set obs `=`year_high'-`year_low'+1'
                gen int year_btw = `year_low' + _n - 1
                tempfile years
                save `years'
            restore
            and stata says
            Code:
                 set obs `=`year_high'-`year_low'+1'
            observation number out of range
                Observation number must be between 0 and 2,147,483,619.  (Observation numbers are typed without
                commas.)
            r(198);
            What does out of range mean? It is strange that you can run the code without any problem but I have... Do you have any idea how to solve this? I have 581 observations in total.

            Comment


            • #7
              That's strange. That probably means that `year_high' - `year_low' + 1 is working out to a negative number. Can you insert a line before the preserve and check the output for it?

              Code:
              dis "Years run from `year_low' to `year_high'."

              Comment


              • #8
                The output is
                Code:
                . dis "Years run from `year_low' to `year_high'."
                Years run from 2006 to 2022.

                Comment


                • #9
                  So that should work fine. My best guess is that you tried to enter the command interactively on the command line, or ran just that one line from the do-file editor. In either of those cases, Stata does not have the locals year_low and year_high stored in memory, and so it produces an error. Essentially, what Stata reads is

                  Code:
                  set obs `=-+1'
                  which gives that error you saw.

                  Just run the full code, and it should be fine.

                  Comment


                  • #10
                    Omg that's the problem! I should have run the full code. Thanks for your kind help and patience!

                    Comment

                    Working...
                    X