Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • delete observations according to some rules

    Dear statalist,

    I have two sets of data and I append them together. The first set consists of a series of events and the year the event happens. There is only 1 event per year for each firm. A firm could have multiple events in different years. I call this data the "post" data, and I have a dummy post to indicate whether the data is from the "post" data. The other set of data consists of the year before the event year. For each firm in the post data, there is a corresponding pre year. I call this data the "pre“ data. In the "pre“ data, post == 0.

    I have the following variables: symbol, year to identify which firm it is and of which year, and a post variable that I already explained. If a firm has the event in two consecutive years, say in 2012 and 2013, the pre year for 2012 event is 2011, and the pre year for 2013 event is 2012. The problem arises here as 2012 is both the pre and the post year. I don't want such situations to happen, so I only want to keep 2011 (pre) and 2012 (post) and delete 2012 (pre) and 2013(post) for this firm. I use -duplicates tag Symbol Year, gen(flag)- to indicate the duplicates (in this example, 2012 pre and post are the duplicates), and I could drop the one with post == 0 & flag == 1. I don't know how to drop the observation in the following year (in this example, the observation for 2013).

    Here is some data
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long Symbol float(Year post) byte flag
     62 2016 0 0
     62 2017 1 0
     62 2018 0 0
     62 2019 1 0
    416 2013 0 0
    416 2014 1 1
    416 2014 0 1
    416 2015 1 0
    426 2013 0 0
    426 2014 1 0
    426 2018 0 0
    426 2019 1 0
    547 2012 0 0
    547 2013 1 0
    557 2013 0 0
    557 2014 1 0
    560 2010 0 0
    560 2011 1 0
    2356 2013 0 0
    2356 2014 1 1
    2356 2014 0 1
    2356 2015 1 0
    2713 2014 0 0
    2713 2015 1 1
    2713 2015 0 1
    2713 2016 1 0
    end
    format %ty Year
    The bold ones are what I want to delete. I can only think of how to delete the duplicate year that is not the post year (post == 0 & flag == 1), but not sure how to delete the year following.

    Any help would be appreciated. Thanks!

  • #2
    Code:
    sort Symbol Year
    by Symbol Year: gen num = _n
    drop if num == 2 | num[_n-1] == 2
    drop num
    which produces:

    Code:
    . li, noobs sepby(Symbol)
      +-----------------------------+
      | Symbol   Year   post   flag |
      |-----------------------------|
      |     62   2016      0      0 |
      |     62   2017      1      0 |
      |     62   2018      0      0 |
      |     62   2019      1      0 |
      |-----------------------------|
      |    416   2013      0      0 |
      |    416   2014      1      1 |
      |-----------------------------|
      |    426   2013      0      0 |
      |    426   2014      1      0 |
      |    426   2018      0      0 |
      |    426   2019      1      0 |
      |-----------------------------|
      |    547   2012      0      0 |
      |    547   2013      1      0 |
      |-----------------------------|
      |    557   2013      0      0 |
      |    557   2014      1      0 |
      |-----------------------------|
      |    560   2010      0      0 |
      |    560   2011      1      0 |
      |-----------------------------|
      |   2356   2013      0      0 |
      |   2356   2014      1      1 |
      |-----------------------------|
      |   2713   2014      0      0 |
      |   2713   2015      1      1 |
      +-----------------------------+
    Last edited by Hemanshu Kumar; 13 Nov 2022, 22:08.

    Comment


    • #3
      Hi Hemanshu,

      Thanks a lot for your reply. I found that the code works fine in some cases but not in others. It works only if the data is set in this format:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long Symbol float(Year post) byte flag
      416 2013 0 0
      416 2014 1 1
      416 2014 0 1
      416 2015 1 0
      end
      format %ty Year
      The two bolded ones that I want to delete are placed together, so I can use drop if num == 2 | num[_n-1] == 2

      However, in some cases the data displaces like this
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long Symbol float(Year post) byte flag
      300296 2014 0 0
      300296 2015 0 1
      300296 2015 1 1
      300296 2016 1 0
      end
      format %ty Year
      Then drop if num[_n-1] == 2 won't work. Any idea how to solve this? Thanks!

      Edit: I don't understand why when I sort Symbol Year, some data are placed in the order shown in the first example while others are ordered in the way shown in the second example.
      Last edited by Alice Yang; 14 Nov 2022, 00:34.

      Comment


      • #4
        Alice Yang just replace the sort command with
        Code:
        gsort Symbol Year -post
        and it should work.

        Comment


        • #5
          Thanks!!! It works this time.

          Comment

          Working...
          X