Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop observations after the last non missing value a variable

    Hi,

    I would like to drop observations after the last non missing value of the variable WC01001A by "unit_id". How can I do that?

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long unit_id float(mydate date_ann) long WC01001A
    1 699 709 1020218
    1 700 710       .
    1 701 711       .
    1 702 711  938955
    1 703 712       .
    1 704 713       .
    1 705 714 1279468
    1 706 715       .
    1 707 716       .
    1 708 717 1420246
    1 709 718       .
    1 710 719       .
    1 711 721 1100432
    1 712 722       .
    1 713 723       .
    1 714 724  958200
    1 715 725       .
    1 716 726       .
    1 717 727       .
    1 718 728       .
    1 719 729       .
    1 720 730       .
    1 721 731       .
    1 722 732       .
    1 723 733       .
    2 699 708 2133271
    2 700 709       .
    2 701 710       .
    2 702 711 1822220
    2 703 712       .
    2 704 713       .
    2 705 714 2353782
    2 706 715       .
    2 707 716       .
    2 708 717 1984144
    2 709 718       .
    2 710 719       .
    2 711 720 2191785
    2 712 721       .
    2 713 722       .
    2 714 723 1959188
    2 715 724       .
    2 716 725       .
    2 717 726       .
    2 718 727       .
    2 719 728       .
    2 720 729       .
    2 721 730       .
    2 722 731       .
    2 723 732       .
    3 699 709  921699
    3 700 710       .
    3 701 711       .
    3 702 712  832064
    3 703 713       .
    3 704 714       .
    3 705 715  885052
    3 706 716       .
    3 707 717       .
    3 708 718  879747
    3 709 719       .
    3 710 720       .
    3 711 721  992564
    3 712 722       .
    3 713 723       .
    3 714 724  738845
    3 715 725       .
    3 716 726       .
    3 717 727       .
    3 718 728       .
    3 719 729       .
    3 720 730       .
    3 721 731       .
    3 722 732       .
    3 723 733       .
    4 699 709  578530
    4 700 710       .
    4 701 711       .
    4 702 711  566451
    4 703 712       .
    4 704 713       .
    4 705 715  577297
    4 706 716       .
    4 707 717       .
    4 708 717  586578
    4 709 718       .
    4 710 719       .
    4 711 721  592345
    4 712 722       .
    4 713 723       .
    4 714 724  601694
    4 715 725       .
    4 716 726       .
    4 717 727       .
    4 718 728       .
    4 719 729       .
    4 720 730       .
    4 721 731       .
    4 722 732       .
    4 723 733       .
    end
    format %tm mydate
    format %tm date_ann
    label values unit_id unit_id
    label def unit_id 1 "130062", modify
    label def unit_id 2 "130088", modify
    label def unit_id 3 "130298", modify
    label def unit_id 4 "130502", modify
    For instance, for the first "unit_id", I would like to keep only the following:
    Code:
    unit_id    mydate    date_ann    WC01001A
    130062    2018m4    2019m2    1020218
    130062    2018m5    2019m3    
    130062    2018m6    2019m4    
    130062    2018m7    2019m4    938955
    130062    2018m8    2019m5    
    130062    2018m9    2019m6    
    130062    2018m10    2019m7    1279468
    130062    2018m11    2019m8    
    130062    2018m12    2019m9    
    130062    2019m1    2019m10    1420246
    130062    2019m2    2019m11    
    130062    2019m3    2019m12    
    130062    2019m4    2020m2    1100432
    130062    2019m5    2020m3    
    130062    2019m6    2020m4    
    130062    2019m7    2020m5    958200



  • #2
    Assuming your data are currently sorted in ascending order by mydate within unit_id,
    Code:
    assert mydate>mydate[_n-1] | unit_id!=unit_id[_n-1] | _n==1
    gsort unit_id -mydate
    by unit_id: generate temp = sum(cond(WC01001A==.,0,1))
    drop if temp==0
    drop temp
    sort unit_id mydate
    list if unit_id==1, clean
    Code:
    . list if unit_id==1, clean
    
           unit_id    mydate   date_ann   WC01001A  
      1.    130062    2018m4     2019m2    1020218  
      2.    130062    2018m5     2019m3          .  
      3.    130062    2018m6     2019m4          .  
      4.    130062    2018m7     2019m4     938955  
      5.    130062    2018m8     2019m5          .  
      6.    130062    2018m9     2019m6          .  
      7.    130062   2018m10     2019m7    1279468  
      8.    130062   2018m11     2019m8          .  
      9.    130062   2018m12     2019m9          .  
     10.    130062    2019m1    2019m10    1420246  
     11.    130062    2019m2    2019m11          .  
     12.    130062    2019m3    2019m12          .  
     13.    130062    2019m4     2020m2    1100432  
     14.    130062    2019m5     2020m3          .  
     15.    130062    2019m6     2020m4          .  
     16.    130062    2019m7     2020m5     958200

    Comment


    • #3
      Great, thanks a lot William!

      Comment


      • #4
        See also https://www.stata.com/support/faqs/d...t-occurrences/ for some systematic discussion.

        Here is another way to do it:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long unit_id float(mydate date_ann) long WC01001A
        1 699 709 1020218
        1 700 710       .
        1 701 711       .
        1 702 711  938955
        1 703 712       .
        1 704 713       .
        1 705 714 1279468
        1 706 715       .
        1 707 716       .
        1 708 717 1420246
        1 709 718       .
        1 710 719       .
        1 711 721 1100432
        1 712 722       .
        1 713 723       .
        1 714 724  958200
        1 715 725       .
        1 716 726       .
        1 717 727       .
        1 718 728       .
        1 719 729       .
        1 720 730       .
        1 721 731       .
        1 722 732       .
        1 723 733       .
        2 699 708 2133271
        2 700 709       .
        2 701 710       .
        2 702 711 1822220
        2 703 712       .
        2 704 713       .
        2 705 714 2353782
        2 706 715       .
        2 707 716       .
        2 708 717 1984144
        2 709 718       .
        2 710 719       .
        2 711 720 2191785
        2 712 721       .
        2 713 722       .
        2 714 723 1959188
        2 715 724       .
        2 716 725       .
        2 717 726       .
        2 718 727       .
        2 719 728       .
        2 720 729       .
        2 721 730       .
        2 722 731       .
        2 723 732       .
        3 699 709  921699
        3 700 710       .
        3 701 711       .
        3 702 712  832064
        3 703 713       .
        3 704 714       .
        3 705 715  885052
        3 706 716       .
        3 707 717       .
        3 708 718  879747
        3 709 719       .
        3 710 720       .
        3 711 721  992564
        3 712 722       .
        3 713 723       .
        3 714 724  738845
        3 715 725       .
        3 716 726       .
        3 717 727       .
        3 718 728       .
        3 719 729       .
        3 720 730       .
        3 721 731       .
        3 722 732       .
        3 723 733       .
        4 699 709  578530
        4 700 710       .
        4 701 711       .
        4 702 711  566451
        4 703 712       .
        4 704 713       .
        4 705 715  577297
        4 706 716       .
        4 707 717       .
        4 708 717  586578
        4 709 718       .
        4 710 719       .
        4 711 721  592345
        4 712 722       .
        4 713 723       .
        4 714 724  601694
        4 715 725       .
        4 716 726       .
        4 717 727       .
        4 718 728       .
        4 719 729       .
        4 720 730       .
        4 721 731       .
        4 722 732       .
        4 723 733       .
        end
        format %tm mydate
        format %tm date_ann
        label values unit_id unit_id
        label def unit_id 1 "130062", modify
        label def unit_id 2 "130088", modify
        label def unit_id 3 "130298", modify
        label def unit_id 4 "130502", modify
        
        bysort unit_id (mydate) : gen count = sum(!missing(WC01001A))
        by unit_id : drop if count == count[_N] & missing(WC01001A)
        
        list if unit_id == 1 , sepby(count)
        
        
             +-------------------------------------------------+
             | unit_id    mydate   date_ann   WC01001A   count |
             |-------------------------------------------------|
          1. |  130062    2018m4     2019m2    1020218       1 |
          2. |  130062    2018m5     2019m3          .       1 |
          3. |  130062    2018m6     2019m4          .       1 |
             |-------------------------------------------------|
          4. |  130062    2018m7     2019m4     938955       2 |
          5. |  130062    2018m8     2019m5          .       2 |
          6. |  130062    2018m9     2019m6          .       2 |
             |-------------------------------------------------|
          7. |  130062   2018m10     2019m7    1279468       3 |
          8. |  130062   2018m11     2019m8          .       3 |
          9. |  130062   2018m12     2019m9          .       3 |
             |-------------------------------------------------|
         10. |  130062    2019m1    2019m10    1420246       4 |
         11. |  130062    2019m2    2019m11          .       4 |
         12. |  130062    2019m3    2019m12          .       4 |
             |-------------------------------------------------|
         13. |  130062    2019m4     2020m2    1100432       5 |
         14. |  130062    2019m5     2020m3          .       5 |
         15. |  130062    2019m6     2020m4          .       5 |
             |-------------------------------------------------|
         16. |  130062    2019m7     2020m5     958200       6 |
             +-------------------------------------------------+
        Each non-missing value starts a new block or spell of observations within each panel. We want to drop any observations with missing values in the last block.

        Comment

        Working...
        X