Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating Data Series from Start and End dates

    Hi,

    I have following dataset, with variabes start_date and end_date. I am using Stata 12 version in a secured lab without internet.

    Code:
    * Example generated by -dataex-. To install:    ssc    install    dataex
    clear
    input float id str30(start_date end_date)
    1 "1oct1981"  "1oct1986"  
    1 "2oct1986"  "2oct1991"  
    2 "15nov1991" "15nov1993"  
    3 "20dec1995" "20dec2001"  
    3 "21dec2001" "21dec2006"  
    4 "15mar2005" "15mar2011"  
    4 "16mar2011" "16mar2016"  
    5 "7feb1995"  "7feb2000"  
    5 "8feb2000"  "8feb2005"  
    5 "9feb2005"  "9feb2010"  
    6 "2jun2003"  "2jun2005"  
    7 "16aug1995" "10march1997"
    8 "2sep2001"  "10aug2005"  
    9 "8jan2014"  "9jul2017"  
    10 "14mar2005" "14mar2010"  
    10 "15mar2010" "15mar2015"  
    11 "1feb2013"  "15mar2014"  
    12 "15may1997" "15may2002"  
    13 "25nov2004" "19jan2007"  
    14 "1jan2016"  "1jan2017"  
    15 "21feb2012" "21feb2017"  
    16 "17jul2001" "17jul2016"  
    end
    Code:
    gen start_dt=date(start_date,"DMY")
    format start_dt %td
    
    gen end_dt=date(end_date,"DMY")
    format end_dt %td
    Now, I would like to create a variable "year_date" between the range of "15 June 2000" and "15 June 2016" so that I can keep the observations that are between the interval, in order to calculate population by year for 2000 - 2016."

    Code:
    keep if year_date>=start_dt & year_date<=end_d
    I initially started with the code below but then realized that I would have to make two variables in that case.

    Code:
    gen year_date=mdy(6,15,start_dt)
    I also tried the codes below but got errors.

    Code:
    local start = date("2000/06/15", "YMD")
    local end = date("2010/06/15", "YMD")
    
    egen year_date = seq(), from(`start') to(`end')
    format %td date
    I would appreciate your advice.

    Thank you,
    Pablo
    Last edited by Pablo Miah; 14 Aug 2018, 17:40.

  • #2
    The last block of code you show is nearly correct. It has two problems. First, the -format- command refers to a non-existent variable, date. You mean year_date, the variable you just created, I presume. Second, -egen, seq()- will not allocate memory, so you won't get the expected results unless your data set already includes 3,653 observations into what these dates can be fit.

    Even so, I don't really understand your explanation of what you are trying to do. It is easy enough to create a data set with a date variable that runs from 15 jun 2000 through 15 jun 2010:

    Code:
    clear
    local start = date("2000/06/15", "YMD")
    local end = date("2010/06/15", "YMD")
    
    set obs `=`end'-`start'+1'
    
    egen year_date = seq(), from(`start') to(`end')
    format %td year_date
    But I don't understand how that relates to the data you show and what kind of result you want to end up with.

    If the code shown here isn't helpful to you (and I strongly suspect it isn't), please post back showing an example of what the results you want would look like. I think an example will work better than trying to explain it in words.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      The last block of code you show is nearly correct. It has two problems. First, the -format- command refers to a non-existent variable, date. You mean year_date, the variable you just created, I presume. Second, -egen, seq()- will not allocate memory, so you won't get the expected results unless your data set already includes 3,653 observations into what these dates can be fit.

      Even so, I don't really understand your explanation of what you are trying to do. It is easy enough to create a data set with a date variable that runs from 15 jun 2000 through 15 jun 2010:

      Code:
      clear
      local start = date("2000/06/15", "YMD")
      local end = date("2010/06/15", "YMD")
      
      set obs `=`end'-`start'+1'
      
      egen year_date = seq(), from(`start') to(`end')
      format %td year_date
      But I don't understand how that relates to the data you show and what kind of result you want to end up with.

      If the code shown here isn't helpful to you (and I strongly suspect it isn't), please post back showing an example of what the results you want would look like. I think an example will work better than trying to explain it in words.
      Hi Clyde,

      Thank you for your prompt reply. I will try the code and reply here with an example soon.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        The last block of code you show is nearly correct. It has two problems. First, the -format- command refers to a non-existent variable, date. You mean year_date, the variable you just created, I presume. Second, -egen, seq()- will not allocate memory, so you won't get the expected results unless your data set already includes 3,653 observations into what these dates can be fit.

        Even so, I don't really understand your explanation of what you are trying to do. It is easy enough to create a data set with a date variable that runs from 15 jun 2000 through 15 jun 2010:

        Code:
        clear
        local start = date("2000/06/15", "YMD")
        local end = date("2010/06/15", "YMD")
        
        set obs `=`end'-`start'+1'
        
        egen year_date = seq(), from(`start') to(`end')
        format %td year_date
        But I don't understand how that relates to the data you show and what kind of result you want to end up with.

        If the code shown here isn't helpful to you (and I strongly suspect it isn't), please post back showing an example of what the results you want would look like. I think an example will work better than trying to explain it in words.
        Hi again,

        The actual dataset has over 100,000 observations. Anyways, after I used the code on the example, I got the outcome below:


        Code:
        gen start_dt=date(start_date,"DMY")
        format start_dt %td
        
        gen end_dt=date(end_date,"DMY")
        format end_dt %td
        
        local start = date("2000/06/15", "YMD")
        local end = date("2016/06/15", "YMD")
        
        set obs `=`end'-`start'+1'
        
        egen year_date = seq(), from(`start') to(`end')
        format %td year_date
        And this is what I got.

        Code:
        * Example generated by -dataex-. To    install: ssc    install    dataex
        clear
        input float(id start_dt end_dt) int    year_date
        1  7944  9770 14776
        1  9771 11597 14777
        2 11641 12372 14778
        3 13137 15329 14779
        3 15330 17156 14780
        4 16510 18701 14781
        4 18702 20529 14782
        5 12821 14647 14783
        5 14648 16475 14784
        5 16476 18302 14785
        6 15858 16589 14786
        7 13011 13583 14787
        8 15220 16658 14788
        9 19731 21009 14789
        10 16509 18335 14790
        10 18336 20162 14791
        end
        format %td start_dt
        format %td end_dt
        format %td year_date
        Eventually what I want to have year_date set on census day ranged from June 15 2000 to June 15 2016, so that I get something similar to this:

        Code:
        keep if year_date>=start_dt & year_date<=end_dt
        gen year=year(year_date)
        Code:
        * Example generated by -dataex-. To    install: ssc install    dataex
        clear
        input float(id start_dt end_dt) int    year_date float year
        3 13137 15329 14779 2000
        5 14648 16475 14784 2000
        12 13649 15475 14793 2000
        end
        format %td start_dt
        format %td end_dt
        format %td year_date
        Even though it worked, I suppose I would need the year_date range to be fixed on 15 June, of each year with range 2000 - 2016, in order for me to estimate annual population.
        Code:
         gen year_date=mdy(6,15,year)
        I am not sure if I made it clear.


        Thank you.
        Last edited by Pablo Miah; 15 Aug 2018, 07:26.

        Comment


        • #5
          UPDATE: So I tried something different with one additional id (#17) in the example.

          Code:
          * Example generated by -dataex-. To install:    ssc    install    dataex
          clear
          input float id str30(start_date end_date)
          1 "1oct1981"  "1oct1986"  
          1 "2oct1986"  "2oct1991"  
          2 "15nov1991" "15nov1993"  
          3 "20dec1995" "20dec2001"  
          3 "21dec2001" "21dec2006"  
          4 "15mar2005" "15mar2011"  
          4 "16mar2011" "16mar2016"  
          5 "7feb1995"  "7feb2000"  
          5 "8feb2000"  "8feb2005"  
          5 "9feb2005"  "9feb2010"  
          6 "2jun2003"  "2jun2005"  
          7 "16aug1995" "10march1997"
          8 "2sep2001"  "10aug2005"  
          9 "8jan2014"  "9jul2017"  
          10 "14mar2005" "14mar2010"  
          10 "15mar2010" "15mar2015"  
          11 "1feb2013"  "15mar2014"  
          12 "15may1997" "15may2002"  
          13 "25nov2004" "19jan2007"  
          14 "1jan2016"  "1jan2017"  
          15 "21feb2012" "21feb2017"  
          16 "17jul2001" "17jul2016"  
          17 "8jan2014"  "9jun2016"  
          end
          After running similar codes, the dataset looks like:

          Code:
          gen start_dt=date(start_date,"DMY")
          format start_dt %td
          
          gen end_dt=date(end_date,"DMY")
          format end_dt %td
          
          local start = date("2000/06/15", "YMD")
          local end = date("2016/06/15", "YMD")
          
          set obs `=`end'-`start'+1'
          
          egen year_date = seq(), from(`start') to(`end')
          format %td year_date
          
          gen year=year(year_date)
          gen year_date_census=mdy(6,15,year)
          format %td year_date_census
          Code:
          * Example generated by -dataex-. To    install: ssc install    dataex
          clear
          input float(id start_dt end_dt) int    year_date float(year    year_date_census)
          1  7944  9770 14776 2000 14776
          1  9771 11597 14777 2000 14776
          2 11641 12372 14778 2000 14776
          3 13137 15329 14779 2000 14776
          3 15330 17156 14780 2000 14776
          4 16510 18701 14781 2000 14776
          4 18702 20529 14782 2000 14776
          5 12821 14647 14783 2000 14776
          5 14648 16475 14784 2000 14776
          5 16476 18302 14785 2000 14776
          6 15858 16589 14786 2000 14776
          7 13011 13583 14787 2000 14776
          8 15220 16658 14788 2000 14776
          9 19731 21009 14789 2000 14776
          10 16509 18335 14790 2000 14776
          10 18336 20162 14791 2000 14776
          11 19390 19797 14792 2000 14776
          end
          format %td start_dt
          format %td end_dt
          format %td year_date
          format %td year_date_census

          However, even though id #17 should fall within the interval, it drops out once I run the code below since the corresponding date is 6/15/2000

          Code:
          keep if year_date_census>=start_dt & year_date_census<=end_dt

          Code:
          * Example generated by -dataex-. To    install: ssc install    dataex
          clear
          input float(id start_dt end_dt) int    year_date float(year    year_date_census)
          3 13137 15329 14779 2000 14776
          5 14648 16475 14784 2000 14776
          12 13649 15475 14793 2000 14776
          end
          format %td start_dt
          format %td end_dt
          format %td year_date
          format %td year_date_census


          Thanks,
          Pablo

          Comment


          • #6
            I still don't understand what you want the end result to look like. Can you just hand create a short example of what the final data should look like using the data editor, then run -dataex- and post that so I can see it?

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              I still don't understand what you want the end result to look like. Can you just hand create a short example of what the final data should look like using the data editor, then run -dataex- and post that so I can see it?
              Thank you for your response, and apologies for the confusion. Hope this would clarify it, I added shorter example.

              Code:
              * Example generated by -dataex-. To install: ssc install    dataex
              clear
              input float id str30(start_date end_date) float(start_dt    end_dt)
              1 "1oct1981"  "1oct1986"   7944  9770
              1 "2oct1986"  "2oct1991"   9771 11597
              2 "15nov1991" "15nov1993" 11641 12372
              3 "20dec1995" "20dec2001" 13137 15329
              3 "21dec2001" "21dec2006" 15330 17156
              16 "17jul2001" "17jul2016" 15173 20652
              17 "8jan2014"  "9jun2016"  19731 20614
              18 "1may2008"  "31jan2011" 17653 18658
              end
              format %td start_dt
              format %td end_dt
              Based on the condition below,

              Code:
              keep if year_date>=start_dt & year_date<=end_dt
              the final data should look something similar to the example below:

              Code:
              gen year=year(year_date)
              Code:
              * Example generated by -dataex-. To install: ssc    install    dataex
              clear
              input float(id start_dt end_dt year_date year)
              3 13137 15329 15141 2001
              17 19731 20614 19889 2014
              17 19731 20614 20254 2015
              18 17653 18658 17698 2008
              18 17653 18658 18063 2009
              18 17653 18658 18428 2010
              end
              format %td start_dt
              format %td end_dt
              format %td year_date
              Thank you

              Comment


              • #8
                OK, I think I understand what you want now. If I do, this will give it to you:

                Code:
                //    CREATE A DATA SET OF JUNE 15THS 2000-2016
                clear*
                local start_year 2000
                local end_year 2016
                set obs 17
                gen year_date = mdy(6, 15, 1999+_n)
                format year_date %td
                
                tempfile june15s
                save `june15s'
                
                //    NOW BRING IN THE DATA
                * Example generated by -dataex-. To install: ssc install    dataex
                clear
                input float id str30(start_date end_date) float(start_dt    end_dt)
                1 "1oct1981"  "1oct1986"   7944  9770
                1 "2oct1986"  "2oct1991"   9771 11597
                2 "15nov1991" "15nov1993" 11641 12372
                3 "20dec1995" "20dec2001" 13137 15329
                3 "21dec2001" "21dec2006" 15330 17156
                16 "17jul2001" "17jul2016" 15173 20652
                17 "8jan2014"  "9jun2016"  19731 20614
                18 "1may2008"  "31jan2011" 17653 18658
                end
                format %td start_dt
                format %td end_dt
                
                //    USE RANGEJOIN TO LINK EACH OBSERVATION IN THE DATA
                //  TO EACH JUNE 15 THAT FALLS BETWEEN START AND END DATE
                rangejoin year_date start_dt end_dt using `june15s'
                drop if missing(year_date) // NO IN-RANGE JUNE 15TH
                gen matched_year = year(year_date)
                To use this code you must have the -rangejoin- command, written by Robert Picard, and available from SSC. To use -rangejoin- you must also have -rangestat-, by Robert Picard, Roberto Ferrer, and Nick Cox, also available from SSC.

                Comment


                • #9
                  Originally posted by Clyde Schechter View Post
                  OK, I think I understand what you want now. If I do, this will give it to you:

                  Code:
                  // CREATE A DATA SET OF JUNE 15THS 2000-2016
                  clear*
                  local start_year 2000
                  local end_year 2016
                  set obs 17
                  gen year_date = mdy(6, 15, 1999+_n)
                  format year_date %td
                  
                  tempfile june15s
                  save `june15s'
                  
                  // NOW BRING IN THE DATA
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input float id str30(start_date end_date) float(start_dt end_dt)
                  1 "1oct1981" "1oct1986" 7944 9770
                  1 "2oct1986" "2oct1991" 9771 11597
                  2 "15nov1991" "15nov1993" 11641 12372
                  3 "20dec1995" "20dec2001" 13137 15329
                  3 "21dec2001" "21dec2006" 15330 17156
                  16 "17jul2001" "17jul2016" 15173 20652
                  17 "8jan2014" "9jun2016" 19731 20614
                  18 "1may2008" "31jan2011" 17653 18658
                  end
                  format %td start_dt
                  format %td end_dt
                  
                  // USE RANGEJOIN TO LINK EACH OBSERVATION IN THE DATA
                  // TO EACH JUNE 15 THAT FALLS BETWEEN START AND END DATE
                  rangejoin year_date start_dt end_dt using `june15s'
                  drop if missing(year_date) // NO IN-RANGE JUNE 15TH
                  gen matched_year = year(year_date)
                  To use this code you must have the -rangejoin- command, written by Robert Picard, and available from SSC. To use -rangejoin- you must also have -rangestat-, by Robert Picard, Roberto Ferrer, and Nick Cox, also available from SSC.
                  Thank you so much! It worked! But I suppose if I am working on a computer without a network, then I should ask the administrator to put plus folder to have rangestat and rangejoin. But thanks a lot.

                  Regards,
                  Pablo
                  Last edited by Pablo Miah; 15 Aug 2018, 15:17.

                  Comment

                  Working...
                  X