Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keep rows until certain value of a variable is true

    Hi ,

    In the example below i want to tell the data to keep all rows until my variable "uitbetaald" equals the value 1. So i want to drop all rows after this. It does not seem like a dificult problem but i cant find the answer.

    Thanks in advance!

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(uitbetaald lastevent)
    0  1
    0  2
    0  3
    0  4
    0  5
    0  6
    0  7
    0  8
    0  9
    0 10
    0 11
    0 12
    0 13
    0 14
    0 15
    1 16
    end
    ------------------ copy up to and including the previous line ------------------


  • #2
    maybe,
    Code:
    drop if sum(uitbetaald) > 1

    Comment


    • #3
      Ah that a simple solution. It does not work exactly because all values are either 0 or 1. It then drops once the second 1 occurs. But if you substitute all 0's with a value like 0.0001 it works after the 1st one. Thanks a lot!

      code:
      replace uitbetaald = 0.001 if uitbetaald ==0
      drop if sum(uitbetaald) > 1

      Comment


      • #4
        The code in post #3 will fail after 1000 observations of zero. The code below corrects the omission in post #2.
        Code:
        drop if sum(uitbetaald) >= 1

        Comment


        • #5
          Why would you need to do this?

          Comment


          • #6
            Anyways, here I simulate a different way. I generate a random variable and define 1 at a random point within the observation range. I use egen to tag the first time that it's equal to 1, and keep all observations within that range.
            Code:
            cls
            
            
            forv i = 1/20 {
            clear
            qui {
            set obs 10000
            
            
            qui g varint = runiform()
            
            loc x: di round(runiform(1,10000))
            
            qui replace varint = 1 in `x'
            
            tempvar time firsttime
            
            g `time' = _n
            
            qui egen `firsttime' = min(cond(varint == 1, `time',.))
            
            keep in 1/`x'
            
            as _N ==`x'
            }
            di "All obs less than `x' are kept"
            }
            We can see that each time, a different observation is chosen to be the first time the main variable is 1

            Comment


            • #7
              Back to post #1:
              In the example below i want to tell the data to keep all rows until my variable "uitbetaald" equals the value 1. So i want to drop all rows after this.
              I realize now this can be interpreted in two ways. In your example, you show the first 15 observations with 0, and in observation 16 a 1. Do you want to keep observations 1-15 or observatinos 1-16?

              The code in post #4 will keep observations 1-15. To keep observations 1-16,
              Code:
              drop if sum(uitbetaald[_n-1]) >= 1

              Comment


              • #8
                William Lisowski Thanks that was what i was looking for! I want to keep observation 1-15

                Comment


                • #9
                  Jared Greathouse The reason i want to to this is because i want to sum up the amount of payments that still needs to be made for the past months until the last payment has been made (when uitbetaling ==1). Thansk for the example! I may use this in the future!

                  Comment


                  • #10
                    Consider a slightly different example dataset from what was posted in #1. I show two very general methods that can be adapted for a number of such approaches where the subset of interest can be viewed based on the current sort order or row observation number, without a priori knowing where that cut point is. Neither method is especially linked to the value of a variable of interest, so it generalizes when you don't already have a convenient 0/1 indicator variable.

                    Code:
                    clear *
                    cls
                    
                    input float(uitbetaald lastevent)
                    1  1
                    1  2
                    1  3
                    1  4
                    1  5
                    1  6
                    1  7
                    1  8
                    2  9
                    2 10
                    2 11
                    2 12
                    2 13
                    2 14
                    2 15
                    2 16
                    end
                    
                    * based on the boundary of change
                    gen byte start = uitbetaald[_n-1]==1 & uitbetaald==2
                    gen byte wanted1 = sum(start)
                    drop start
                    
                    * based on the observation rank order
                    gen `c(obs_t)' row = _n
                    summ row if uitbetaald==2, meanonly
                    gen byte wanted2 = row >= r(min)
                    drop row
                    
                    assert wanted1==wanted2
                    list
                    Result

                    Code:
                    . list, abbrev(12)
                    
                         +--------------------------------------------+
                         | uitbetaald   lastevent   wanted1   wanted2 |
                         |--------------------------------------------|
                      1. |          1           1         0         0 |
                      2. |          1           2         0         0 |
                      3. |          1           3         0         0 |
                      4. |          1           4         0         0 |
                      5. |          1           5         0         0 |
                         |--------------------------------------------|
                      6. |          1           6         0         0 |
                      7. |          1           7         0         0 |
                      8. |          1           8         0         0 |
                      9. |          2           9         1         1 |
                     10. |          2          10         1         1 |
                         |--------------------------------------------|
                     11. |          2          11         1         1 |
                     12. |          2          12         1         1 |
                     13. |          2          13         1         1 |
                     14. |          2          14         1         1 |
                     15. |          2          15         1         1 |
                         |--------------------------------------------|
                     16. |          2          16         1         1 |
                         +--------------------------------------------+

                    Comment

                    Working...
                    X