[Panel] Preceding observations with values lower than that of the current observation

Swanand Deodhar

Join Date: Sep 2018
Posts: 7

[Panel] Preceding observations with values lower than that of the current observation

10 Apr 2020, 01:53

Hi,

I am using panel data with the following structure: a single task (Task_id, also the panel id) attracts multiple entries (Entry_id) over time. Each entry receives a performance score (between 0 and 1).

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int task_id long entry_id float entry_performance
1  1  .3610236
1  2  .5415809
1  3   .374851
1  4  .5415809
1  5 .47416735
1  6 .47416735
1  7  .5049182
1  8  .3841909
1  9     .4958
1 10  .5889756
1 11 .59394366
1 12  .5415809
1 13  .4594968
2  1  .3826254
2  2  .7526277
2  3  .7923552
2  4  .8073489
2  5  .3816767
2  6  .4103796
end

For each entry in each task, I am trying to count how many prior entries have higher values entry_performance. So, the expected outcome is as follow (output_var):

Code:

* Example generated by    -dataex-. To install: ssc install dataex
clear
input int task_id long    entry_id float(entry_performance output_var)
1  1  .3610236 0
1  2  .5415809 0
1  3   .374851 2
1  4  .5415809 0
1  5 .47416735 2
1  6 .47416735 2
1  7  .5049182 2
1  8  .3841909 5
1  9     .4958 3
1 10  .5889756 0
1 11 .59394366 0
1 12  .5415809 2
1 13  .4594968 9
2  1  .3826254 0
2  2  .7526277 0
2  3  .7923552 0
2  4  .8073489 0
2  5  .3816767 4
2  6  .4103796 3
end

I have tried running this with loops but as the number of entries grow in a task, the time taken increases exponentially. I have tried using rangestat and rangerun but could not make things work.

Any help is highly appreciated. Thanks in advance.

Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35641

10 Apr 2020, 02:15

Thanks for your data example. It's a nice problem.

rangestat and rangerun are from SSC, as you are asked to explain (FAQ Advice #12).

We can't comment on code we can't see, but this should help. If you feed the window ending with the current observation to a program then the comparison is between each preceding value and the last value in the window. Double counting is not an issue as no value can be greater than itself.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int task_id long entry_id float entry_performance
1  1  .3610236
1  2  .5415809
1  3   .374851
1  4  .5415809
1  5 .47416735
1  6 .47416735
1  7  .5049182
1  8  .3841909
1  9     .4958
1 10  .5889756
1 11 .59394366
1 12  .5415809
1 13  .4594968
2  1  .3826254
2  2  .7526277
2  3  .7923552
2  4  .8073489
2  5  .3816767
2  6  .4103796
end

program nhigher
   count if entry_performance > entry_performance[_N]
   gen N = r(N)
end

rangerun nhigher, by(task_id) int(entry_id . 0) use(entry_performance)

l , sepby(task_id)

     +-----------------------------------+
     | task_id   entry_id   entry_~e   N |
     |-----------------------------------|
  1. |       1          1   .3610236   0 |
  2. |       1          2   .5415809   0 |
  3. |       1          3    .374851   1 |
  4. |       1          4   .5415809   0 |
  5. |       1          5   .4741673   2 |
  6. |       1          6   .4741673   2 |
  7. |       1          7   .5049182   2 |
  8. |       1          8   .3841909   5 |
  9. |       1          9      .4958   3 |
 10. |       1         10   .5889756   0 |
 11. |       1         11   .5939437   0 |
 12. |       1         12   .5415809   2 |
 13. |       1         13   .4594968   9 |
     |-----------------------------------|
 14. |       2          1   .3826254   0 |
 15. |       2          2   .7526277   0 |
 16. |       2          3   .7923552   0 |
 17. |       2          4   .8073489   0 |
 18. |       2          5   .3816767   4 |
 19. |       2          6   .4103796   3 |
     +-----------------------------------+

.

This does not reproduce your example completely, but the explanation is easy. In the first three observations

Code:

  
1  1  .3610236 0 1  
2  .5415809 0 1  
3   .374851 2

only .5415809 is greater than .374851 so the count in observation 3 should be 1 not 2.

Last edited by Nick Cox; 10 Apr 2020, 02:18.

Comment

Swanand Deodhar

Join Date: Sep 2018

Posts: 7
#3

10 Apr 2020, 03:28

Dear Nick,

That serves my purpose perfectly. Thanks

--
regards
Swanand
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35641
#4

10 Apr 2020, 05:33

Good: I note that the thread title says "lower" but #1 says "higher". For counting lower values, just change > to <.
Comment
Swanand Deodhar

Join Date: Sep 2018

Posts: 7
#5

08 Jun 2021, 01:09

Hi,

Apologies for reviving an old thread but I have a related concern.
I have panel data (panel_var, time_var). I am trying to create a count variable based on the following rule:

1. For each record, look at the value in the "based_on_this" column.
2. In all the preceding records (with a lag of 1), count the records in which values of "count_this" > value of "based_on_this" in the current observation.
3. For instance, if you look at obs. 7, the based_on_this value is 1 and all the values of count_this from obs-1 to 6 are greater than 1. So, the output variable value is 6.
3. Create a new variable "output" in which the count is stored.
4. The variable "based_on_this" can have missing values. The corresponding value in the "output" variable should also be missing.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(time_var panel_var count_this based_on_this output) 1 1 4 . . 2 1 3 2 1 3 1 6 . . 4 1 2 . . 5 1 7 6 0 6 1 6 5 2 7 1 9 1 6 8 1 10 . . 9 1 5 8 2 10 1 12 . . end

So far, I have followed Prof. Cox's excellent suggestions of using rangerun (the suggestions work perfectly when the "based_on_this" and "count_this" are same columns).
Here is the code I am using

Code:

sort panel_var time_var program counting_prog count if count_this> based_on_this[_N] gen output = r(N) end rangerun counting_prog, by(panel_var) int(time_var . -1) use(based_on_this count_this)

Any help would be highly appreciated. Thanks in advance and stay safe.

regards
Swanand J. Deodhar
Comment

Announcement

[Panel] Preceding observations with values lower than that of the current observation

Comment

Comment

Comment

Comment