Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • find previous nonmissing value by group

    I have a dataset with: i) date, ii) firm, and iii) price. Sometimes price is missing. I need the previous price when there was a (nonmissing) price but am having difficulties locating it as the windows of missing obs are random. I tried searching and found some related posts, but could not get their code to work. Also, it was never the same exact situation as mine.

    Code:
    clear
    input float date float firm float price
    1 1 4
    2 1 .
    3 1 6
    4 1 .
    5 1 .
    6 1 .
    7 1 4
    8 1 .
    9 1 .
    10 1 6
    1 2 .
    2 2 5
    3 2 4
    4 2 .
    5 2 .
    6 2 .
    7 2 6
    8 2 .
    9 2 .
    10 2 .
    end
    On the days when there is a price, i would like to include on the same row/observation the last nonmissing previous price as well as the gap (in dates) from the current and the previous price. And do all of this within group (firm).

    The solution would look like:

    Code:
    clear
    input float date float firm float price float lastprice float dategap
    1 1 4 . .
    2 1 . . .
    3 1 6 4 2
    4 1 . . .
    5 1 . . .
    6 1 . . .
    7 1 4 6 4
    8 1 . . .
    9 1 . . .
    10 1 6 6 3
    1 2 . . .
    2 2 5 . .
    3 2 4 5 1
    4 2 . . .
    5 2 . . .
    6 2 . . .
    7 2 6 4 4
    8 2 . . .
    9 2 3 6 2
    10 2 . . .
    end
    Thank you in advance.


  • #2
    I believe this does what you want:
    Code:
    isid firm date, sort
    
    rangestat (lastnm) last_price = price, by(firm) interval(date . -1)
    replace last_price = . if missing(price)
    
    by firm, sort: gen run_num = sum(!missing(price))
    by firm run_num (date), sort: gen run_length = _N
    by firm (run_num date): gen date_gap = run_length[_n-1] if !missing(price) & run_num[_n-1] > 0 & _n > 1
    
    drop run_*
    -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.

    Now, this does not produce the same results you show in your second code block when applied to the data in the first code block. But I have verified by hand that my code's results are correct and yours are not. In particular, in your results for firm 1 on date 10, you show last price = 6, but, in fact, the last preceding non-missing price was 4, not 6. And in your results for firm 2 on date 9, you have last price = 6 and date gap = 2. But in the input data shown in the first code bock, the price on date 9 for firm 2 is missing, so there last price and date gap should be missing. In this last case, I believe it is because in the second code block, you also show the price on date 9 for firm 2 as 3, a value which appears nowhere in the input data.
    Last edited by Clyde Schechter; 22 Mar 2025, 16:42. Reason: Correct error in previously posted code.

    Comment


    • #3
      Clyde Schechter : Thanks for the excellent solution. I apologize that my given solution was erroneous. It wasn't the result of a working program, but rather my eyes and mind (which are mistake-prone).

      Comment


      • #4
        It wasn't the result of a working program, but rather my eyes and mind (which are mistake-prone).
        Not a problem. It happens often, to all of us.

        Comment


        • #5
          Here is another way. The main differences from #2 are (i) this only uses inbuilt Stata commands; (ii) this allows for successive dates to be arbitrary; in particular -- non-consecutive.

          Code:
          bys firm (date): gen run = sum(!missing(price))
          
          foreach var in date price {
              bys firm run (date): gen start_`var' = `var'[1] if _n == _N & !missing(price[1])
              bys firm (date): gen last_`var' = start_`var'[_n-1] if !missing(price)
              replace last_`var' = . if missing(price)
              drop start_`var'
          }
          
          gen date_gap = date - last_date
          drop last_date

          Comment


          • #6
            Here is yet another solution. Fairly full disclosure: I tried various other versions hinging on sorting, subscripting and related tricks. I also played with the idea that each non-missing price marked the beginning of a spell, which seemed a good start, but got too fiddly to carry through to the end. That was similar to the approach of Hemanshu Kumar -- but I was using tsspell from SSC, which makes some problems simpler, but made this one messier than Hemanshu's approach.

            The key idea here is this.

            Observations with missing values on price not only carry no information for what is wanted, but also are a distraction to be worked around.
            So, get rid of them (temporarily). At the end, put them back.


            Here's the code and the results:

            Code:
            clear
            input float date float firm float price
            1 1 4
            2 1 .
            3 1 6
            4 1 .
            5 1 .
            6 1 .
            7 1 4
            8 1 .
            9 1 .
            10 1 6
            1 2 .
            2 2 5
            3 2 4
            4 2 .
            5 2 .
            6 2 .
            7 2 6
            8 2 .
            9 2 .
            10 2 .
            end
            
            save work
            
            drop if price == .
            
            bysort firm (date) : gen previous = price[_n-1]
            by firm : gen prev_when = date[_n-1]
            gen date_gap = date - prev_when
            
            list, sepby(firm)
            
            merge 1:1 firm date using work
            
            sort firm date
            
            list, sepby(firm)

            Code:
            . clear
            
            . input float date float firm float price
            
                      date       firm      price
              1. 1 1 4
              2. 2 1 .
              3. 3 1 6
              4. 4 1 .
              5. 5 1 .
              6. 6 1 .
              7. 7 1 4
              8. 8 1 .
              9. 9 1 .
             10. 10 1 6
             11. 1 2 .
             12. 2 2 5
             13. 3 2 4
             14. 4 2 .
             15. 5 2 .
             16. 6 2 .
             17. 7 2 6
             18. 8 2 .
             19. 9 2 .
             20. 10 2 .
             21. end
            
            .
            . save work
            file work.dta saved
            
            .
            . drop if price == .
            (13 observations deleted)
            
            .
            . bysort firm (date) : gen previous = price[_n-1]
            (2 missing values generated)
            
            . by firm : gen prev_when = date[_n-1]
            (2 missing values generated)
            
            . gen date_gap = date - prev_when
            (2 missing values generated)
            
            .
            . list, sepby(firm)
            
                 +------------------------------------------------------+
                 | date   firm   price   previous   prev_w~n   date_gap |
                 |------------------------------------------------------|
              1. |    1      1       4          .          .          . |
              2. |    3      1       6          4          1          2 |
              3. |    7      1       4          6          3          4 |
              4. |   10      1       6          4          7          3 |
                 |------------------------------------------------------|
              5. |    2      2       5          .          .          . |
              6. |    3      2       4          5          2          1 |
              7. |    7      2       6          4          3          4 |
                 +------------------------------------------------------+
            
            .
            . merge 1:1 firm date using work
            
                Result                      Number of obs
                -----------------------------------------
                Not matched                            13
                    from master                         0  (_merge==1)
                    from using                         13  (_merge==2)
            
                Matched                                 7  (_merge==3)
                -----------------------------------------
            
            .
            . sort firm date
            
            .
            . list, sepby(firm)
            
                 +-----------------------------------------------------------------------+
                 | date   firm   price   previous   prev_w~n   date_gap           _merge |
                 |-----------------------------------------------------------------------|
              1. |    1      1       4          .          .          .      Matched (3) |
              2. |    2      1       .          .          .          .   Using only (2) |
              3. |    3      1       6          4          1          2      Matched (3) |
              4. |    4      1       .          .          .          .   Using only (2) |
              5. |    5      1       .          .          .          .   Using only (2) |
              6. |    6      1       .          .          .          .   Using only (2) |
              7. |    7      1       4          6          3          4      Matched (3) |
              8. |    8      1       .          .          .          .   Using only (2) |
              9. |    9      1       .          .          .          .   Using only (2) |
             10. |   10      1       6          4          7          3      Matched (3) |
                 |-----------------------------------------------------------------------|
             11. |    1      2       .          .          .          .   Using only (2) |
             12. |    2      2       5          .          .          .      Matched (3) |
             13. |    3      2       4          5          2          1      Matched (3) |
             14. |    4      2       .          .          .          .   Using only (2) |
             15. |    5      2       .          .          .          .   Using only (2) |
             16. |    6      2       .          .          .          .   Using only (2) |
             17. |    7      2       6          4          3          4      Matched (3) |
             18. |    8      2       .          .          .          .   Using only (2) |
             19. |    9      2       .          .          .          .   Using only (2) |
             20. |   10      2       .          .          .          .   Using only (2) |
            Clearly _merge is of no future use. It's shown here only to make all the used machinery visible. Also, there is no doubt an equivalent solution using frames.

            I am always personally delighted if rangestat is useful, but there is always interest too in seeing whether official commands suffice.

            Given the thread title, this reference may also be of use or interest to future readers.

            SJ-11-3 dm0059 . . . . . . . . . . . . Stata tip 101: Previous but different
            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
            Q3/11 SJ 11(3):472--473 (no commands)
            illustrates identifying the most recent value that differed
            from the present value in an irregularly spaced time series

            https://journals.sagepub.com/doi/pdf...867X1101100309

            It's not the same problem, but not completely different either.
            Last edited by Nick Cox; 23 Mar 2025, 03:14.

            Comment


            • #7
              Nick Cox : Thanks for your solution, time, and thoughts.

              Hemanshu Kumar : Thanks for your clever, succinct solution.

              Comment


              • #8
                Thanks for the thanks. If you're grading us on succinctness, the code solutions are all about the same length!

                Comment


                • #9
                  Nick Cox : All 3 solutions were A+. Maybe his post (length) made me think of that adjective.

                  Comment


                  • #10
                    Agreed!

                    Here for anyone interested is another solution using tsspell from SSC, improved a little from my suppressed effort earlier today.


                    Code:
                    clear
                    input float date float firm float price
                    1 1 4
                    2 1 .
                    3 1 6
                    4 1 .
                    5 1 .
                    6 1 .
                    7 1 4
                    8 1 .
                    9 1 .
                    10 1 6
                    1 2 .
                    2 2 5
                    3 2 4
                    4 2 .
                    5 2 .
                    6 2 .
                    7 2 6
                    8 2 .
                    9 2 .
                    10 2 .
                    end
                    
                    tsset firm date 
                    tsspell , fcond(price < .)
                    egen this = max(price), by(firm _spell)
                    egen when = min(date), by(firm _spell)
                    gen previous = L.this if _seq == 1 
                    gen prev_when = L.when if _seq == 1 & previous < .  
                    gen date_gap = date - prev_when 
                    
                    list firm date _spell price prev* date_gap, sepby(firm _spell)
                    
                         +---------------------------------------------------------------+
                         | firm   date   _spell   price   previous   prev_w~n   date_gap |
                         |---------------------------------------------------------------|
                      1. |    1      1        1       4          .          .          . |
                      2. |    1      2        1       .          .          .          . |
                         |---------------------------------------------------------------|
                      3. |    1      3        2       6          4          1          2 |
                      4. |    1      4        2       .          .          .          . |
                      5. |    1      5        2       .          .          .          . |
                      6. |    1      6        2       .          .          .          . |
                         |---------------------------------------------------------------|
                      7. |    1      7        3       4          6          3          4 |
                      8. |    1      8        3       .          .          .          . |
                      9. |    1      9        3       .          .          .          . |
                         |---------------------------------------------------------------|
                     10. |    1     10        4       6          4          7          3 |
                         |---------------------------------------------------------------|
                     11. |    2      1        0       .          .          .          . |
                         |---------------------------------------------------------------|
                     12. |    2      2        1       5          .          .          . |
                         |---------------------------------------------------------------|
                     13. |    2      3        2       4          5          2          1 |
                     14. |    2      4        2       .          .          .          . |
                     15. |    2      5        2       .          .          .          . |
                     16. |    2      6        2       .          .          .          . |
                         |---------------------------------------------------------------|
                     17. |    2      7        3       6          4          3          4 |
                     18. |    2      8        3       .          .          .          . |
                     19. |    2      9        3       .          .          .          . |
                     20. |    2     10        3       .          .          .          . |
                         +---------------------------------------------------------------+

                    Comment

                    Working...
                    X