Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • [Panel] Preceding observations with values lower than that of the current observation

    Hi,

    I am using panel data with the following structure: a single task (Task_id, also the panel id) attracts multiple entries (Entry_id) over time. Each entry receives a performance score (between 0 and 1).

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int task_id long entry_id float entry_performance
    1  1  .3610236
    1  2  .5415809
    1  3   .374851
    1  4  .5415809
    1  5 .47416735
    1  6 .47416735
    1  7  .5049182
    1  8  .3841909
    1  9     .4958
    1 10  .5889756
    1 11 .59394366
    1 12  .5415809
    1 13  .4594968
    2  1  .3826254
    2  2  .7526277
    2  3  .7923552
    2  4  .8073489
    2  5  .3816767
    2  6  .4103796
    end
    For each entry in each task, I am trying to count how many prior entries have higher values entry_performance. So, the expected outcome is as follow (output_var):

    Code:
    * Example generated by    -dataex-. To install: ssc install dataex
    clear
    input int task_id long    entry_id float(entry_performance output_var)
    1  1  .3610236 0
    1  2  .5415809 0
    1  3   .374851 2
    1  4  .5415809 0
    1  5 .47416735 2
    1  6 .47416735 2
    1  7  .5049182 2
    1  8  .3841909 5
    1  9     .4958 3
    1 10  .5889756 0
    1 11 .59394366 0
    1 12  .5415809 2
    1 13  .4594968 9
    2  1  .3826254 0
    2  2  .7526277 0
    2  3  .7923552 0
    2  4  .8073489 0
    2  5  .3816767 4
    2  6  .4103796 3
    end

    I have tried running this with loops but as the number of entries grow in a task, the time taken increases exponentially. I have tried using rangestat and rangerun but could not make things work.

    Any help is highly appreciated. Thanks in advance.

  • #2
    Thanks for your data example. It's a nice problem.

    rangestat and rangerun are from SSC, as you are asked to explain (FAQ Advice #12).

    We can't comment on code we can't see, but this should help. If you feed the window ending with the current observation to a program then the comparison is between each preceding value and the last value in the window. Double counting is not an issue as no value can be greater than itself.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int task_id long entry_id float entry_performance
    1  1  .3610236
    1  2  .5415809
    1  3   .374851
    1  4  .5415809
    1  5 .47416735
    1  6 .47416735
    1  7  .5049182
    1  8  .3841909
    1  9     .4958
    1 10  .5889756
    1 11 .59394366
    1 12  .5415809
    1 13  .4594968
    2  1  .3826254
    2  2  .7526277
    2  3  .7923552
    2  4  .8073489
    2  5  .3816767
    2  6  .4103796
    end
    
    program nhigher
       count if entry_performance > entry_performance[_N]
       gen N = r(N)
    end
    
    rangerun nhigher, by(task_id) int(entry_id . 0) use(entry_performance)
    
    l , sepby(task_id)
    
         +-----------------------------------+
         | task_id   entry_id   entry_~e   N |
         |-----------------------------------|
      1. |       1          1   .3610236   0 |
      2. |       1          2   .5415809   0 |
      3. |       1          3    .374851   1 |
      4. |       1          4   .5415809   0 |
      5. |       1          5   .4741673   2 |
      6. |       1          6   .4741673   2 |
      7. |       1          7   .5049182   2 |
      8. |       1          8   .3841909   5 |
      9. |       1          9      .4958   3 |
     10. |       1         10   .5889756   0 |
     11. |       1         11   .5939437   0 |
     12. |       1         12   .5415809   2 |
     13. |       1         13   .4594968   9 |
         |-----------------------------------|
     14. |       2          1   .3826254   0 |
     15. |       2          2   .7526277   0 |
     16. |       2          3   .7923552   0 |
     17. |       2          4   .8073489   0 |
     18. |       2          5   .3816767   4 |
     19. |       2          6   .4103796   3 |
         +-----------------------------------+
    
    .


    This does not reproduce your example completely, but the explanation is easy. In the first three observations

    Code:
      
    1  1  .3610236 0 1  
    2  .5415809 0 1  
    3   .374851 2
    only .5415809 is greater than .374851 so the count in observation 3 should be 1 not 2.
    Last edited by Nick Cox; 10 Apr 2020, 02:18.

    Comment


    • #3
      Dear Nick,

      That serves my purpose perfectly. Thanks

      --
      regards
      Swanand

      Comment


      • #4
        Good: I note that the thread title says "lower" but #1 says "higher". For counting lower values, just change > to <.

        Comment


        • #5
          Hi,

          Apologies for reviving an old thread but I have a related concern.
          I have panel data (panel_var, time_var). I am trying to create a count variable based on the following rule:

          1. For each record, look at the value in the "based_on_this" column.
          2. In all the preceding records (with a lag of 1), count the records in which values of "count_this" > value of "based_on_this" in the current observation.
          3. For instance, if you look at obs. 7, the based_on_this value is 1 and all the values of count_this from obs-1 to 6 are greater than 1. So, the output variable value is 6.
          3. Create a new variable "output" in which the count is stored.
          4. The variable "based_on_this" can have missing values. The corresponding value in the "output" variable should also be missing.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(time_var panel_var count_this based_on_this output)
           1 1  4 . .
           2 1  3 2 1
           3 1  6 . .
           4 1  2 . .
           5 1  7 6 0
           6 1  6 5 2
           7 1  9 1 6
           8 1 10 . .
           9 1  5 8 2
          10 1 12 . .
          end
          So far, I have followed Prof. Cox's excellent suggestions of using rangerun (the suggestions work perfectly when the "based_on_this" and "count_this" are same columns).
          Here is the code I am using

          Code:
          sort panel_var time_var
          program counting_prog
             count if count_this> based_on_this[_N]
             gen output = r(N)
          end
          rangerun counting_prog, by(panel_var) int(time_var . -1) use(based_on_this count_this)
          Any help would be highly appreciated. Thanks in advance and stay safe.

          regards
          Swanand J. Deodhar

          Comment

          Working...
          X