count number of unique values over a rolling time window defined by a date variable

Yi Chen

Join Date: Dec 2020

Posts: 12
#1

count number of unique values over a rolling time window defined by a date variable

22 Apr 2021, 21:48

Dear Stata community,

I'd like to generate a variable that records the number of unique values of a variable (var1) within each value of another variable (var2) over a rolling time window defined by a date variable.I cannot really come up with codes that could provide the results I'm looking for. I wondered if anyone could provide some thoughts? Any help would be greatly appreciated.

Last edited by Yi Chen; 22 Apr 2021, 21:54.
Tags: None
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

23 Apr 2021, 00:13

Use -dataex- to show a sample of your data, and explain what you want with reference to this data.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35699

23 Apr 2021, 02:16

Here is one way to think of it. The extension to another variable (e.g. panel or longitudinal data) is a matter of using a by() option too. rangerun is from SSC. The term distinct is recommended strongly over unique in https://www.stata-journal.com/articl...article=dm0042

Code:

clear 
set obs 50 
set seed 2803 
gen y = runiformint(1,5)
gen t = _n 

program mydistinct 
     sort y 
     gen count = sum(y != y[_n-1])
     replace count = count[_N]
end 

rangerun mydistinct, int(t -5 -1) 
     
list 

     +----------------+
     | y    t   count |
     |----------------|
  1. | 3    1       . |
  2. | 1    2       1 |
  3. | 3    3       2 |
  4. | 1    4       2 |
  5. | 2    5       2 |
     |----------------|
  6. | 4    6       3 |
  7. | 4    7       4 |
  8. | 1    8       4 |
  9. | 1    9       3 |
 10. | 1   10       3 |
     |----------------|
 11. | 3   11       2 |
 12. | 1   12       3 |
 13. | 1   13       2 |
 14. | 5   14       2 |
 15. | 2   15       3 |
     |----------------|
 16. | 2   16       4 |
 17. | 4   17       3 |
 18. | 4   18       4 |
 19. | 3   19       3 |
 20. | 4   20       3 |
     |----------------|
 21. | 4   21       3 |
 22. | 2   22       2 |
 23. | 3   23       3 |
 24. | 1   24       3 |
 25. | 4   25       4 |
     |----------------|
 26. | 1   26       4 |
 27. | 3   27       4 |
 28. | 1   28       3 |
 29. | 2   29       3 |
 30. | 2   30       4 |
     |----------------|
 31. | 2   31       3 |
 32. | 3   32       3 |
 33. | 2   33       3 |
 34. | 2   34       2 |
 35. | 2   35       2 |
     |----------------|
 36. | 3   36       2 |
 37. | 3   37       2 |
 38. | 4   38       2 |
 39. | 5   39       3 |
 40. | 4   40       4 |
     |----------------|
 41. | 1   41       3 |
 42. | 4   42       4 |
 43. | 4   43       3 |
 44. | 5   44       3 |
 45. | 4   45       3 |
     |----------------|
 46. | 1   46       3 |
 47. | 3   47       3 |
 48. | 2   48       4 |
 49. | 3   49       5 |
 50. | 3   50       4 |
     +----------------+

Comment

Yi Chen

Join Date: Dec 2020

Posts: 12
#4

23 Apr 2021, 08:30

Thanks so much Nick for the elegant solution. I copied your codes to my Stata and it works perfectly. But when I adjust the codes to my data as the following, Stata couldn't generate the count variable, I have been debugging but couldn't find out the reason. I wondered if you might have some insights about what are the possible problems in the following codes:

program mydistinct
sort prf_npi npi_surgeon
g count = sum (prf_npi != prf_npi[_n-1] | npi_surgeon != npi_surgeon[_n-1])
replace count = count[_N]
end
rangerun mydistinct, int(admsndt -90 -1) by(prvdrnum P2A)

Last edited by Yi Chen; 23 Apr 2021, 08:47.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35699
#5

23 Apr 2021, 08:52

For unique read distinct!

rangerun yields a set of scalars for each separate observation, given the window associated with that observation, in the example the 5 preceding values.

You want, it seems. to do something different, return a set of values, one for each observation in the window. rangerun does not purport to do that and in any case it makes no sense to me, as the window moves, and a value could be tagged for some but not necessarily all of the windows in which it occurs.

For example, consider any particular observation. As the window moves it enters and then leaves the window, so that it is or is not a candidate for tagging. Then within the window it will be tagged as distinct depending on whether it is the first of its kind to be observed within the window. So, there is no meaning that I can see to what will be a transient result.

Very likely what you want to do is something different namely do more than count distinct values, but that must be done in a way that results in a scalar from the program called by rangerun.
Comment
Yi Chen

Join Date: Dec 2020

Posts: 12
#6

23 Apr 2021, 09:25

Got it, thanks a lot Nick.
Comment

Announcement

count number of unique values over a rolling time window defined by a date variable

Comment

Comment

Comment

Comment

Comment