Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • delete observations conditionally

    Dear All, I find this question here (https://bbs.pinggu.org/forum.php?mod...=1#pid57289524). The dataset is
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id float a int yearin float n
    11000208  7 2008 2
    11000208  8 2011 2
    11000308  7 2008 3
    11000308  9 2011 3
    11000308  9 2014 3
    11000808  6 2008 2
    11000808 12 2011 2
    11001008  7 2008 2
    11001008 12 2011 2
    11001108  6 2008 2
    11001108  6 2011 2
    11001208  8 2008 3
    11001208  8 2011 3
    11001208  9 2014 3
    11001308 10 2008 2
    11001308 11 2011 2
    11001408  6 2008 2
    11001408  6 2011 2
    11001508  6 2008 2
    11001508  6 2011 2
    11001708  6 2008 2
    11001708 11 2011 2
    11001808  6 2008 2
    11001808 11 2011 2
    11002308  6 2008 3
    11002308  6 2011 3
    11002308  6 2014 3
    11002508  8 2008 3
    11002508  7 2011 3
    11002508 14 2014 3
    11002908  6 2008 3
    11002908  6 2011 3
    11002908  6 2014 3
    11003008  6 2008 3
    11003008  6 2011 3
    11003008  6 2014 3
    11003108  6 2008 3
    11003108  6 2011 3
    11003108  6 2014 3
    11003708  6 2008 2
    11003708  8 2011 2
    11003808  6 2008 3
    11003808  6 2011 3
    11003808  6 2014 3
    11003908  6 2008 3
    11003908  6 2011 3
    11003908 14 2014 3
    11004308  6 2008 3
    11004308  6 2011 3
    11004308  8 2014 3
    11004408  6 2008 2
    11004408  6 2011 2
    11004508  7 2008 2
    11004508  9 2011 2
    11004808  7 2008 2
    11004808  6 2011 2
    11005008  6 2008 2
    11005008  6 2011 2
    11005308  6 2008 2
    11005308 16 2011 2
    11005708  6 2008 2
    11005708  6 2011 2
    11006008  6 2008 3
    11006008  6 2011 3
    11006008  8 2014 3
    11006108  6 2008 3
    11006108  8 2011 3
    11006108 14 2014 3
    11006208  6 2008 2
    11006208 16 2011 2
    11006508  6 2008 3
    11006508  6 2011 3
    11006508 11 2014 3
    11006608  6 2008 3
    11006608  6 2011 3
    11006608  6 2014 3
    11006708  6 2008 3
    11006708  6 2011 3
    11006708  6 2014 3
    11006908  6 2008 3
    11006908  6 2011 3
    11006908  6 2014 3
    11007108  6 2008 3
    11007108  6 2011 3
    11007108 13 2014 3
    11007408  6 2008 3
    11007408  6 2011 3
    11007408  6 2014 3
    11007608  6 2008 2
    11007608 12 2011 2
    11007708  6 2008 3
    11007708  6 2011 3
    11007708  6 2014 3
    11007808  6 2008 2
    11007808  6 2011 2
    11007908  6 2008 3
    11007908  6 2011 3
    11007908  6 2014 3
    11009008  6 2008 2
    11009008  6 2011 2
    end
    Suppose that we want to drop `id' with the following situations.
    1. For each `id' with `n=2', if the value of `a' in the latter year is smaller than its value in the previous year, drop this `id' (e.g., id=11004808).
    2. For each `id' with `n=3 or 4 (no observations here though)', if the value of `a' in the very last year is smaller than any of its values in the previous years, drop this `id' as well. (to keep `id' with maximum value in the last period).
    Any suggestions are highly appreciated.
    Ho-Chuan (River) Huang
    Stata 19.0, MP(4)

  • #2
    This is not clear to me. I understand the two conditions to mean that if the observation in the final year is smaller than any earlier value, drop all observations for that id. But I do not understand "(to keep id with maximum value in the last period)". With that said, perhaps this will point in a useful direction.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id float a int yearin
    1 7 2001
    1 8 2002
    2 7 2001
    2 8 2002
    2 9 2003
    3 8 2001
    3 7 2002
    4 8 2001
    4 8 2002
    5 7 2001
    5 8 2002
    5 6 2003
    6 7 2001
    6 8 2002
    6 7 2003
    7 9 2001
    7 8 2002
    7 8 2003
    end
    
    bysort id (yearin): generate aa = a if _n!=_N
    bysort id (yearin): egen min_aa = min(aa)
    bysort id (yearin): generate todrop = a[_N]<min_aa
    list, noobs sepby(id)
    Code:
    . list, noobs sepby(id)
    
      +----------------------------------------+
      | id   a   yearin   aa   min_aa   todrop |
      |----------------------------------------|
      |  1   7     2001    7        7        0 |
      |  1   8     2002    .        7        0 |
      |----------------------------------------|
      |  2   7     2001    7        7        0 |
      |  2   8     2002    8        7        0 |
      |  2   9     2003    .        7        0 |
      |----------------------------------------|
      |  3   8     2001    8        8        1 |
      |  3   7     2002    .        8        1 |
      |----------------------------------------|
      |  4   8     2001    8        8        0 |
      |  4   8     2002    .        8        0 |
      |----------------------------------------|
      |  5   7     2001    7        7        1 |
      |  5   8     2002    8        7        1 |
      |  5   6     2003    .        7        1 |
      |----------------------------------------|
      |  6   7     2001    7        7        0 |
      |  6   8     2002    8        7        0 |
      |  6   7     2003    .        7        0 |
      |----------------------------------------|
      |  7   9     2001    9        8        0 |
      |  7   8     2002    8        8        0 |
      |  7   8     2003    .        8        0 |
      +----------------------------------------+
    Last edited by William Lisowski; 02 Mar 2019, 17:50.

    Comment


    • #3
      Dear William, Thanks for the reply. Let's forget the sentence "to keep id with maximum value in the last period" at this moment. Taking id=7 for example, a's value in 2003 is smaller than a's value in year 2001, so this id must be dropped. similarly, id=6 must be dropped as well. Any suggestions?
      Ho-Chuan (River) Huang
      Stata 19.0, MP(4)

      Comment


      • #4
        I see that I misunderstood "if the value of `a' in the very last year is smaller than any of its values in the previous years" to mean "... smaller than all of its values ..." rather than "... smaller than at least one of its values ...".

        With that said, the code becomes much simpler. The value in the very last year must be the maximum of all the values for that id.
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long id float a int yearin
        1 7 2001
        1 8 2002
        2 7 2001
        2 8 2002
        2 9 2003
        3 8 2001
        3 7 2002
        4 8 2001
        4 8 2002
        5 7 2001
        5 8 2002
        5 6 2003
        6 7 2001
        6 8 2002
        6 7 2003
        7 9 2001
        7 8 2002
        7 8 2003
        end
        
        bysort id (yearin): egen max_a = max(a)
        bysort id (yearin): generate todrop = a[_N]<max_a
        list, noobs sepby(id)
        Code:
        . list, noobs sepby(id)
        
          +----------------------------------+
          | id   a   yearin   max_a   todrop |
          |----------------------------------|
          |  1   7     2001       8        0 |
          |  1   8     2002       8        0 |
          |----------------------------------|
          |  2   7     2001       9        0 |
          |  2   8     2002       9        0 |
          |  2   9     2003       9        0 |
          |----------------------------------|
          |  3   8     2001       8        1 |
          |  3   7     2002       8        1 |
          |----------------------------------|
          |  4   8     2001       8        0 |
          |  4   8     2002       8        0 |
          |----------------------------------|
          |  5   7     2001       8        1 |
          |  5   8     2002       8        1 |
          |  5   6     2003       8        1 |
          |----------------------------------|
          |  6   7     2001       8        1 |
          |  6   8     2002       8        1 |
          |  6   7     2003       8        1 |
          |----------------------------------|
          |  7   9     2001       9        1 |
          |  7   8     2002       9        1 |
          |  7   8     2003       9        1 |
          +----------------------------------+

        Comment


        • #5
          Dear William, Thanks again. In fact, I follow your prior suggestion and come out with the same solution.
          Ho-Chuan (River) Huang
          Stata 19.0, MP(4)

          Comment

          Working...
          X