Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Detection and removal of same year announcements

    Dear statalisters,

    I have searched the internet and forums extensively, but can't find the answer I'm looking for. Hopefully you can help me out.

    My objective:
    Exclude firms that announce (share repurchases) more than once in a year, in order to avoid an overlapping problem.

    Example of my dataset:
    Date-------------CUSIP_8 --------- Company
    02oct2009----- 02888410 ----- American Physicians Capital
    23jun2009 ----- 02888410 ----- American Physicians Capital
    11dec2008 -----02888410 ----- American Physicians Capital
    04dec2008----- 02888410 ----- American Physicians Capital
    16aug2007 ----- 02888410 ----- American Physicians Capital
    22may2007 ----- 02888410 ----- American Physicians Capital
    11sep2003 ----- 02888410 ----- American Physicians Capital

    I would like to detect and remove announcements that occur within 250 days (252 trading days in a year). In this case: 11sep2003 would remain.

    How would I do this with stata code?

    Thank you in advance for your help!

    Michiel

  • #2
    Welcome to Statalist!

    This seems to do what you want. Note that this assumes that the variable DATE is stored as a SIF date, not as a string.
    Code:
    sort CUSIP_8 Date
    generate to_drop = 0
    by CUSIP_8: replace to_drop = 1 if Date-Date[_n-1]<=250 | Date[_n+1]-Date<=250
    list, clean
    drop if to_drop
    drop to_drop
    list, clean

    Comment


    • #3
      See also -panelthin- (SSC).

      Comment


      • #4
        Nick Cox Certainly panelthin looks useful; I'm glad to have installed it.

        However, my interpretation of Michiel's requirement is that if two observations are within a minimum distance of each other, both are to be deleted. Thus if the second observation were less than the cutoff from the first observation, both the first and second observation would be deleted. And if the third observation were less than the cutoff from the second observation, all three would be deleted.

        With that said, I'm not certain I've correctly understood the requirements, beyond being able to duplicate the requested results for the small sample of data. Below are the results of my attempt with panelthin, which leaves more observations in place than he suggested. On further consideration, Michiel might prefer having more observations.
        Code:
        . tsset CUSIP_8 Date
               panel variable:  CUSIP_8 (strongly balanced)
                time variable:  Date, 11sep2003 to 02oct2009, but with gaps
                        delta:  1 day
        
        . panelthin, min(250) gen(to_keep)
        
        . list, clean
        
                    Date    CUSIP_8   to_keep  
          1.   11sep2003   02888410         1  
          2.   22may2007   02888410         1  
          3.   16aug2007   02888410         0  
          4.   04dec2008   02888410         1  
          5.   11dec2008   02888410         0  
          6.   23jun2009   02888410         0  
          7.   02oct2009   02888410         1

        Comment


        • #5
          Originally posted by William Lisowski View Post
          Welcome to Statalist!

          This seems to do what you want. Note that this assumes that the variable DATE is stored as a SIF date, not as a string.
          Code:
          sort CUSIP_8 Date
          generate to_drop = 0
          by CUSIP_8: replace to_drop = 1 if Date-Date[_n-1]<=250 | Date[_n+1]-Date<=250
          list, clean
          drop if to_drop
          drop to_drop
          list, clean
          It worked! Out of a dataset with 3500 announcements, 817 were removed. It corrects for the overlapping problem in both the estimation window (250 days) and event window in an event study methodology.

          Originally posted by William Lisowski View Post
          However, my interpretation of Michiel's requirement is that if two observations are within a minimum distance of each other, both are to be deleted. Thus if the second observation were less than the cutoff from the first observation, both the first and second observation would be deleted. And if the third observation were less than the cutoff from the second observation, all three would be deleted.
          Correct.

          Thanks for the help William Lisowski and Nick Cox!
          Last edited by Michiel van Nieuwenhuijzen; 08 Sep 2015, 09:07.

          Comment


          • #6
            This is odd. These observations should have been removed, just like the other 817. What could be the origin of the problem?

            ----------------------------------------------------------------------------------------------------------------------------------------------------
            storage display value
            variable name type format label variable label
            ----------------------------------------------------------------------------------------------------------------------------------------------------
            A str16 %16s

            This is the code I used:
            gen Date=date(A,"DMY")
            format Date %td

            And then William's:
            sort CUSIP_8 Date
            generate to_drop = 0
            by CUSIP_8: replace to_drop = 1 if Date-Date[_n-1]<=250 | Date[_n+1]-Date<=250
            drop if to_drop
            drop to_drop
            Last edited by Michiel van Nieuwenhuijzen; 08 Sep 2015, 09:22.

            Comment


            • #7
              The problem may lie in there being only two dates for that CUSIP_8. I don't have time to research it at the moment. Try replacing the single replace command with two commands:
              Code:
              by CUSIP_8: replace to_drop = 1 if Date-Date[_n-1]<=250
              by CUSIP_8: replace to_drop = 1 if Date[_n+1]-Date<=250
              In the future. please don't post pictures of data. See the Statalist FAQ linked to at the top of every page for advice on posting effectively. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using CODE delimiters, as described in section 12 of the FAQ. Why would I want to retype your data from a difficult-to-read picture when it could have been copied and pasted into your post (preferably without adding dashes as you did in post #1)?
              Last edited by William Lisowski; 08 Sep 2015, 09:52.

              Comment


              • #8
                Originally posted by William Lisowski View Post
                The problem may lie in there being only two dates for that CUSIP_8. I don't have time to research it at the moment. Try replacing the single replace command with two commands:
                Code:
                by CUSIP_8: replace to_drop = 1 if Date-Date[_n-1]<=250
                by CUSIP_8: replace to_drop = 1 if Date[_n+1]-Date<=250
                It didn't solve it. I manually browsed through the dataset and this seems to be only CUSIP_8 case where it happens.

                Originally posted by William Lisowski View Post
                In the future. please don't post pictures of data. See the Statalist FAQ linked to at the top of every page for advice on posting effectively. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using CODE delimiters, as described in section 12 of the FAQ. Why would I want to retype your data from a difficult-to-read picture when it could have been copied and pasted into your post (preferably without adding dashes as you did in post #1)?
                Sorry for the inconvenience. I will read through the FAQ more thoroughly for a future post.

                Comment


                • #9
                  Add the following to the code and you will see that the two dates are 251 days apart.
                  Code:
                  gen diff = Date-Date[_n-1]
                  If I change the dates to be within 250 days of each other, the original code from post #2 works, no need for the two-line solution in post #7.

                  Comment


                  • #10
                    Originally posted by William Lisowski View Post
                    Add the following to the code and you will see that the two dates are 251 days apart.
                    Code:
                    gen diff = Date-Date[_n-1]
                    If I change the dates to be within 250 days of each other, the original code from post #2 works, no need for the two-line solution in post #7.
                    Thank you so much!

                    Comment

                    Working...
                    X