Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding monthly observations based on start and end dates

    I am wondering whether there is any other way to add monthly observations to my dataset, than me plotting them in. This is how my dataset looks

    conflictid location year startdate epend ependdate
    1-108 China, Myanmar (Burma) 1969 1969-02-28 1 1969-12-31
    1-109 China, Russia (Soviet Union) 1969 1969-03-02 1 1969-12-31
    1-110 El Salvador, Honduras 1969 1969-07-03 1 1969-07-18


    What I want is for each month of conflict to have their own observation, such that there will be 11 observations for id 1-108 (one for each month of conflict), 10 observations for 1-109 and so on. Is there any way to do this based on the start- and enddate variables?

    Thanks in advance!


  • #2
    Code:
    clear
    // create example data
    input str5 conflictid str28 location year str10 startdate epend str10 ependdate
    "1-108" "China, Myanmar (Burma)"       1969 "1969-02-28" 1 "1969-12-31"
    "1-109" "China, Russia (Soviet Union)" 1969 "1969-03-02" 1 "1969-12-31"
    "1-110" "El Salvador, Honduras"        1969 "1969-07-03" 1 "1969-07-18"
    end
    
    // turn the dates into something Stata can understant
    gen start = date(startdate, "YMD")
    gen ep_end = date(ependdate, "YMD")
    format start ep_end %td
    
    // turn these into monthly dates
    gen start_m = mofd(start)
    gen ep_end_m = mofd(ep_end)
    format start_m ep_end_m %tm
    
    // specify this as monthly survival data
    stset ep_end_m, failure(epend) origin(start_m) id(conflictid)
    
    // create monthly observations
    stsplit month, every(1)
    
    // create which month-year the record refers to
    gen current_month = start_m + month
    
    // one conflict ended in the same month it began
    // this means -stset- ignores that observation and
    // month was thus missing
    replace current_month = start_m if month == .
    
    // format current month
    format current_month %tm
    
    // the end of conflict indicator was turned into missing
    // for the new months created, so they need to be turned
    // into 0s
    replace epend = 0 if epend == .
    
    // a new daily end date for each record
    // in the last observation the end date is the original end date
    // otherwise it is the last day of the month
    bys conflictid (_t) : gen end2 = cond(_n==_N, ep_end, dofm(current_month + 1)-1)
    format end2 %td
    
    // specify the data as daily survival data
    stset end2, failure(epend) origin(start) id(conflictid)
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Here's another way to do it. I stole Maarten's opening code:

      Code:
       
      clear
      input str5 conflictid str28 location year str10 startdate epend str10 ependdate
      "1-108" "China, Myanmar (Burma)"       1969 "1969-02-28" 1 "1969-12-31"
      "1-109" "China, Russia (Soviet Union)" 1969 "1969-03-02" 1 "1969-12-31"
      "1-110" "El Salvador, Honduras"        1969 "1969-07-03" 1 "1969-07-18"
      end
      
      gen datestart = mofd(date(startdate, "YMD"))
      gen dateend = mofd(date(ependdate, "YMD")) 
      
      
      gen duration = dateend - datestart + 1 
      expand duration
      bysort conflictid : gen datewanted = datestart + _n - 1
      
      format date* %tm 
      
      edit

      Comment


      • #4
        thank you both!

        Comment


        • #5
          I have a follow-up question to this. I used Nick's coding on another dataset that looks like this:

          id country_name startdate enddate duration datewanted
          34 India 1980m8 1980m8 1 1980m8
          35 St Lucia 1980m7 1980m8 2 1980m7
          35 St Lucia 1980m7 1980m8 2 1980m8
          36 St Vincent and The Grenadines 1980m7 1980m8 2 1980m7
          36 St Vincent and The Grenadines 1980m7 1980m8 2 1980m8
          37 Haiti 1980m8 1980m8 1 1980m8
          38 Jamaica 1980m7 1980m8 2 1980m7

          When I tried to split the datewanted variable into two separate month and year variables, I could not seem to get it right when using the following commands;
          /// gen month=month(datewanted)
          /// gen year=year(datewanted)

          What I got was completely different months and years than the datewanted variable gives. Could it be because this is only possible when you have a YMD format - or is there another way to do this?

          Comment


          • #6
            month() and year() are for extracting months and years from daily date variables. This is explained in the help.

            There are various ways to get what you want.

            If you have a monthly date, you can convert it to a daily date and then extract what you want.

            Practical tip: if I don't know the answer, I always experiment using display and particular dates for which I know the correct answer. When I've worked it out, only then do I fire up generate.

            Code:
            . di year(dofm(ym(2014, 11)))
            2014
            
            . di month(dofm(ym(2014, 11)))
            11
            Note that the term "format" is highly ambiguous. How string data are presented to date functions to produce date variables is, however, irrelevant once those date variables are created. Thus November 2014 as a Stata monthly date is 655. Where it came from and how you want to display that are separate questions.
            Last edited by Nick Cox; 14 Nov 2014, 04:01.

            Comment


            • #7
              Originally posted by Nick Cox View Post
              Here's another way to do it. I stole Maarten's opening code:

              Code:
              clear
              input str5 conflictid str28 location year str10 startdate epend str10 ependdate
              "1-108" "China, Myanmar (Burma)" 1969 "1969-02-28" 1 "1969-12-31"
              "1-109" "China, Russia (Soviet Union)" 1969 "1969-03-02" 1 "1969-12-31"
              "1-110" "El Salvador, Honduras" 1969 "1969-07-03" 1 "1969-07-18"
              end
              
              gen datestart = mofd(date(startdate, "YMD"))
              gen dateend = mofd(date(ependdate, "YMD"))
              
              
              gen duration = dateend - datestart + 1
              expand duration
              bysort conflictid : gen datewanted = datestart + _n - 1
              
              format date* %tm
              
              edit
              Hi Nick,
              Sorry to bother you with this old post. I am having a problem converting date from beginning date to ending date. I am constructing a data set which shows daily permno-ncusip relationship. However, I notice that the day when the firm change it cusip is missing in the converted dataset.
              kypermno ncusip namedt nameenddt duration datewanted
              10001 39040610 09 Jan 86 21 Nov 93 2873 19nov1993
              10001 39040610 09 Jan 86 21 Nov 93 2873 20nov1993
              10001 29274A10 22 Nov 93 09 Jun 04 3852 22nov1993
              10001 29274A10 22 Nov 93 09 Jun 04 3852 23nov1993
              I sue code:
              Code:
              gen duration= nameenddt- namedt
               expand duration
               bysort kypermno namedt : gen datewanted = namedt + _n - 1
               format date* %td
              However, in the above example, the last day of on ncusip 39040610 is missing, thus the day of 21 Nov 1993.

              Could you let me how can I deal with this problem?

              Many thanks.

              Best,
              Bo

              Comment

              Working...
              X