Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to check whether dates between start and end date is within ranges

    There are two dates, start date and end date. I want to know whether any date between start and end date are included in certain date ranges (recession dates)

    For example, (start date) (end date)
    ID /YEAR /BECAMECEO/ LEFTCEO

    00003| 1992 | 01dec1989 | 31dec1998
    00003| 2001 | 01apr2001 | 30oct2009
    00006| 1992 | 01nov1991 | 15feb2001
    00006| 2007 | 02jul2007 | 31mar2008
    00008| 1996 | 01may1996| 28dec2000
    00026| 2002 | 03jul2001 | 01feb2002



    I want to know, for each ID, whether any day of CEO duration period (any date between start date to end date) is included in the ranges of recession dates and if so, how many ranges. If each ID can work multiple times and work dates can be included in multiple ranges

    ((FOR EXAMPLE)) for ID 00003, she worked twice during (01dec1989 - 31dec1998 & 01apr2001 - 30oct2009) and dates within this period are included in three recession ranges as shown below (i.e. recession ranges of (1) July 1990-March 1991 / (2) March 2001 to Nov 2001/ and (3) dec 2007 to June 2009) , I want result value "recession_ceo==3"
    For ID 00006, working twice during (01nov1991- 15feb2001 & 02jul2007 - 31mar2008), and working dates are included in one range of recession (dec 2007 to june 2009) and I want result value "recession_ceo==1"
    For ID 00008, she worked during (01may1996 - 28dec2000) and dates are not included in any range of recession, thus "recession_ceo ==0"
    For ID 00026, working during 03jul2001- 01feb2002, thus one recession range is included (mar2001 to nov 2001), thus "recession_ceo==1"




    Recession dates Ranges:
    August 1929 ~ March 1933
    May 1937 ~ June 1938
    February 1945 ~ October 1945
    November 1948 ~ October 1949
    July 1953 ~ May 1954
    August 1957 ~ April 1958
    April 1960 ~ February 1961
    December 1969 ~ November 1970
    November 1973 ~ March 1975
    January 1980 ~ July 1980 (
    July 1981 ~ November 1982
    July 1990 ~ March 1991
    March 2001 ~ November 2001
    December 2007 ~ June 2009
    February 2020 ~ April 2020


    My coding idea is, for each ID, starting from BECAMECEO, check whether it is in the recession ranges. If yes, then result value (recesssion_ceo ==1). Otherwise, keep adding the date until end date (LEFTOFC) and check any date is included in recession ranges. If none of date between start and end date is included in range, then result value of recession_ceo ==0. and then end the loop.


    gen recession_ceo =0
    gen n =LEFTOFC-BECAMECEO

    forval i = 1/n {
    if betweendate+i = ( inrange(BECAMECEO, td(01aug1929),td(31mar1933)) | inrange(BECAMECEO, td(01may1937) , td(30jun1938)) | inrange(BECAMECEO, td(01feb1945), td(31oct1945)) | inrange(BECAMECEO,td(01nov1948), td(31oct1949)) | inrange(BECAMECEO, td(01jul1953) , td(31may1954)) | inrange(BECAMECEO, td(01aug1957), td(30apr1958)) | inrange(BECAMECEO, td(01apr1960), td(28feb1961)) | inrange(BECAMECEO, td(01dec1969), td(30nov1970)) | inrange(BECAMECEO,td(01nov1973), td(31mar1975)) | inrange(BECAMECEO, td(01jan1980), td(31jul1980)) | inrange(BECAMECEO, td(01jul1981), td(30nov1982)) | inrange(BECAMECEO, td(01jul1990), td(31mar1991)) | inrange(BECAMECEO, td(01mar2001) , td(30nov2001)) | inrange(BECAMECEO, td(01dec2007) , td(30jun2009)) | inrange(BECAMECEO, td(01feb2020) , td(30apr2020))
    {replace recession_ceo==1}
    else {recession_ceo==0}

    }
    Last edited by BB Kim; 14 Feb 2023, 13:00.

  • #2
    It can be done more cleanly and simply than that:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str5 id str11 start_date str10 end_date
    "00003" "01dec1989" "31dec1998"
    "00003" "01apr2001" "30oct2009"
    "00006" "01nov1991" "15feb2001"
    "00006" "02jul2007" "31mar2008"
    "00008" "01may1969" "28dec2003"
    "00026" "03jul2001" "01feb2002"
    end
    
    foreach v of varlist start_date end_date {
        gen _`v' = daily(`v', "DMY"), after(`v')
        assert missing(`v') == missing(_`v')
        drop `v'
        rename _`v' `v'
        format `v' %td
    }
    
    local recession1    August 1929     March 1933
    local recession2    May 1937     June 1938
    local recession3    February 1945     October 1945
    local recession4    November 1948     October 1949
    local recession5    July 1953     May 1954
    local recession6    August 1957     April 1958
    local recession7    April 1960     February 1961
    local recession8    December 1969     November 1970
    local recession9    November 1973     March 1975
    local recession10   January 1980     July 1980
    local recession11   July 1981     November 1982
    local recession12   July 1990     March 1991
    local recession13   March 2001     November 2001
    local recession14   December 2007     June 2009
    local recession15   February 2020     April 2020
    
    //  CLEAN UP THE RECESSION LOCALS AND MAKE THEM NUMERIC
    forvalues i = 1/15 {
        local m1: word 1 of `recession`i''
        local y1: word 2 of `recession`i''
        local m2: word 3 of `recession`i''
        local y2: word 4 of `recession`i''
        local recession_start`i' = dofm(monthly("`m1' `y1'", "MY"))
        local recession_end`i' = dofm(monthly("`m2' `y2'", "MY"))
        assert !missing(`recession_start`i'', `recession_end`i'')
    }
    
    gen worked_during_recession = 0
    forvalues i = 1/15 {
        replace worked_during_recession = 1 ///
            if min(end_date, `recession_end`i'') >= max(start_date, `recession_start`i'')
    }
    Added:
    For ID 0006 in your example, that CEO has one work spell that overlaps a recession and another that doesn't. The code above reports 1 and 0 outcomes, separately, for those observations, respectively. If what you actually want is a single 1 response indicating that the CEO ever worked during a recession, you can follow the above code with:
    Code:
    by id (worked_during_recession), sort: replace worked_during_recession = worked_during_recession[_N]
    Also, your proposed code, in addition to being overly complicated, will not work correctly. Among other things, your use of the -if- command is misplaced here and will result in every ceo being reported as a recession CEO at all times. Moreover, what follows the -if- is syntactically illegal.

    Added later:
    In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your dat
    Last edited by Clyde Schechter; 14 Feb 2023, 13:16.

    Comment


    • #3
      It shows the assertion is false. What is the problem?

      (I have data set (.dta) of BECAMECEO and LEFTOFC , which is long type, so I changed it to string data first using code below)

      tostring BECAMECEO, gen(start_date)
      tostring LEFTOFC, gen(end_date)



      .
      . tostring BECAMECEO, gen(start_date)
      start_date generated as str5

      . tostring LEFTOFC, gen(end_date)
      end_date generated as str5

      . .
      .
      . foreach v of varlist start_date end_date {
      2. gen _`v' = daily(`v', "DMY"), after(`v')
      3. assert missing(`v') == missing(_`v')
      4. drop `v'
      5. rename _`v' `v'
      6. format `v' %td
      7. }
      (333,179 missing values generated)
      333,179 contradictions in 333,179 observations
      assertion is false
      r(9);
      Last edited by BB Kim; 14 Feb 2023, 14:34.

      Comment


      • #4
        Also, I want not only worked_during_recession = 1 , but also the value of worked_during_recession = 2, 3, .....if the dates the ID worked is included in multiple ranges of recession.

        For example, For ID 00003, she worked twice during (01dec1989 - 31dec1998 & 01apr2001 - 30oct2009) and dates within this period are included in three recession ranges as shown below (i.e. recession ranges of (1) July 1990-March 1991 / (2) March 2001 to Nov 2001/ and (3) dec 2007 to June 2009) , I want result value "worked_during_recession=3"

        ID / start_data / end_date
        "00003" "01dec1989" "31dec1998"
        "00003" "01apr2001" "30oct2009"


        I wish the code is available to capture this, since each ID worked multiple times and each work dates can be included in multiple recession ranges.
        Last edited by BB Kim; 14 Feb 2023, 14:14.

        Comment


        • #5
          Concerning #3, I can't respond without your providing a -dataex- data example (and, no, no other way of presenting the data will suffice here.) The assertion says that the conversion of the start_date or date_end variables from string to Stata numeric date variables was not possible. That means that there is something invalid about the input.
          (I have data set (.dta) of BECAMECEO and LEFTOFC , which is long type, so I changed it to string data first using code below)

          tostring BECAMECEO, gen(start_date)
          tostring LEFTOFC, gen(end_date)



          .
          . tostring BECAMECEO, gen(start_date)
          start_date generated as str5

          . tostring LEFTOFC, gen(end_date)
          end_date generated as str5
          doesn't provide enough information because there is no indication of what the resulting variables actually contain. One thing is clear, though, since start_date and end_date were generated as str5, they cannot possibly look like 01dec1989 and the other examples you gave in #1. So, when you present your data in an inaccurate way, you should not be surprised that code written based on what you showed will fail to work with your actual data. There are no minor details in coding. You need to be 100% accurate and complete in presenting what you are working with. That is why we have -dataex-, to make it easy to do that.

          With regard to wanting the result to show the total number of recessions during which each CEO worked, that is a minor modification of the code. (I have not changed the part where you are getting the -assertion is false- error because you have not provided the information needed to fix that.)

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str5 id str11 start_date str10 end_date
          "00003" "01dec1989" "31dec1998"
          "00003" "01apr2001" "30oct2009"
          "00006" "01nov1991" "15feb2001"
          "00006" "02jul2007" "31mar2008"
          "00008" "01may1969" "28dec2003"
          "00026" "03jul2001" "01feb2002"
          end
          
          foreach v of varlist start_date end_date {
              gen _`v' = daily(`v', "DMY"), after(`v')
              assert missing(`v') == missing(_`v')
              drop `v'
              rename _`v' `v'
              format `v' %td
          }
          
          local recession1    August 1929     March 1933
          local recession2    May 1937     June 1938
          local recession3    February 1945     October 1945
          local recession4    November 1948     October 1949
          local recession5    July 1953     May 1954
          local recession6    August 1957     April 1958
          local recession7    April 1960     February 1961
          local recession8    December 1969     November 1970
          local recession9    November 1973     March 1975
          local recession10   January 1980     July 1980
          local recession11   July 1981     November 1982
          local recession12   July 1990     March 1991
          local recession13   March 2001     November 2001
          local recession14   December 2007     June 2009
          local recession15   February 2020     April 2020
          
          //  CLEAN UP THE RECESSION LOCALS AND MAKE THEM NUMERIC
          forvalues i = 1/15 {
              local m1: word 1 of `recession`i''
              local y1: word 2 of `recession`i''
              local m2: word 3 of `recession`i''
              local y2: word 4 of `recession`i''
              local recession_start`i' = dofm(monthly("`m1' `y1'", "MY"))
              local recession_end`i' = dofm(monthly("`m2' `y2'", "MY"))
              assert !missing(`recession_start`i'', `recession_end`i'')
          }
          
          gen number_recessions_worked = 0
          forvalues i = 1/15 {
              replace number_recessions_worked = number_recessions_worked + 1 ///
                  if min(end_date, `recession_end`i'') >= max(start_date, `recession_start`i'')
          }
          collapse (sum) number_recessions_worked, by(id)
          Modifications shown in bold face.
          Last edited by Clyde Schechter; 14 Feb 2023, 15:24.

          Comment


          • #6
            I figured out the reason since there were some missing dates in the data.
            After I deleted missing values ".", it worked.



            However, the last code with forvalues shows error message (all code before this forvalues works)
            Could you help me figure out the problem of forvalue loop?


            . forvalues i = 1/15 {
            2. replace number_recessions_worked = number_recessions_worked + 1 ///
            > if min(end_date, `recession_end`i'') >= max(start_date, `recession_start`i'')
            3. }
            invalid syntax
            r(198);
            Last edited by BB Kim; 14 Feb 2023, 16:18.

            Comment


            • #7
              The problem of your code is that , for example, if the person's CEO start_date is Mar 2008 and end_date is May 2020, it includes two ranges of recessions( December 2007 - June 2009) and (February 2020 - April 2020) and start_date is after recession_start 14 and end_date is after recession_end 15 ( thus not the same 'i' here in the code of last line if min(end_date, `recession_end`i'') >= max(start_date, `recession_start`i''))
              and for this example, I should get number_recessions_worked =2 (since two recession ranges are included) from the forvalues loop.
              The person can work more than one recession ranges in a row.

              Could you please revise the code reflecting this issue?


              forvalues i = 1/15 {
              replace number_recessions_worked = number_recessions_worked + 1 ///
              if min(end_date, `recession_end`i'') >= max(start_date, `recession_start`i'')
              }
              Last edited by BB Kim; 14 Feb 2023, 17:24.

              Comment


              • #8
                Sorry for not replying sooner. I did not see your responses to #5.

                Regarding #6, I cannot reproduce that error. In my setup, the code in #5 runs with no error messages.

                Regarding #7, I do not completely follow your reasoning, and to the extent I do, I don't think you are correct. I have added a new observation to your data, with id = "99999" and given it the start and end dates you suggested: (1)March 2008 and (31)May 2020. Running the code shown in #5, and then listing the output for this new id gives:
                Code:
                . * Example generated by -dataex-. For more info, type help dataex
                . clear
                
                . input str5 id str11 start_date str10 end_date
                
                            id   start_date    end_date
                  1. "00003" "01dec1989" "31dec1998"
                  2. "00003" "01apr2001" "30oct2009"
                  3. "00006" "01nov1991" "15feb2001"
                  4. "00006" "02jul2007" "31mar2008"
                  5. "00008" "01may1969" "28dec2003"
                  6. "00026" "03jul2001" "01feb2002"
                  7. "99999" "01mar2008" "31may2020"
                  8. end
                
                .
                . foreach v of varlist start_date end_date {
                  2.     gen _`v' = daily(`v', "DMY"), after(`v')
                  3.     assert missing(`v') == missing(_`v')
                  4.     drop `v'
                  5.     rename _`v' `v'
                  6.     format `v' %td
                  7. }
                
                .
                . local recession1    August 1929     March 1933
                
                . local recession2    May 1937     June 1938
                
                . local recession3    February 1945     October 1945
                
                . local recession4    November 1948     October 1949
                
                . local recession5    July 1953     May 1954
                
                . local recession6    August 1957     April 1958
                
                . local recession7    April 1960     February 1961
                
                . local recession8    December 1969     November 1970
                
                . local recession9    November 1973     March 1975
                
                . local recession10   January 1980     July 1980
                
                . local recession11   July 1981     November 1982
                
                . local recession12   July 1990     March 1991
                
                . local recession13   March 2001     November 2001
                
                . local recession14   December 2007     June 2009
                
                . local recession15   February 2020     April 2020
                
                .
                . //  CLEAN UP THE RECESSION LOCALS AND MAKE THEM NUMERIC
                . forvalues i = 1/15 {
                  2.     local m1: word 1 of `recession`i''
                  3.     local y1: word 2 of `recession`i''
                  4.     local m2: word 3 of `recession`i''
                  5.     local y2: word 4 of `recession`i''
                  6.     local recession_start`i' = dofm(monthly("`m1' `y1'", "MY"))
                  7.     local recession_end`i' = dofm(monthly("`m2' `y2'", "MY"))
                  8.     assert !missing(`recession_start`i'', `recession_end`i'')
                  9. }
                
                .
                . gen number_recessions_worked = 0
                
                . forvalues i = 1/15 {
                  2.     replace number_recessions_worked = number_recessions_worked + 1 ///
                >         if min(end_date, `recession_end`i'') >= max(start_date, `recession_start`i'')
                  3. }
                (0 real changes made)
                (0 real changes made)
                (0 real changes made)
                (0 real changes made)
                (0 real changes made)
                (0 real changes made)
                (0 real changes made)
                (1 real change made)
                (1 real change made)
                (1 real change made)
                (1 real change made)
                (2 real changes made)
                (3 real changes made)
                (3 real changes made)
                (1 real change made)
                
                . collapse (sum) number_recessions_worked, by(id)
                
                .
                . list if id == "99999"
                
                     +------------------+
                     |    id   number~d |
                     |------------------|
                  5. | 99999          2 |
                     +------------------+
                This correctly produces the result you want: 2 depressions. I have also hand checked the other outputs from the example, and they all are correct. I think the code in #5 requires no changes.

                If you find any other problems with it, do post back.

                Comment

                Working...
                X