Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • count number of unique values over a rolling time window defined by a date variable

    Dear Stata community,

    I'd like to generate a variable that records the number of unique values of a variable (var1) within each value of another variable (var2) over a rolling time window defined by a date variable.I cannot really come up with codes that could provide the results I'm looking for. I wondered if anyone could provide some thoughts? Any help would be greatly appreciated.
    Last edited by Yi Chen; 22 Apr 2021, 21:54.

  • #2
    Use -dataex- to show a sample of your data, and explain what you want with reference to this data.

    Comment


    • #3
      Here is one way to think of it. The extension to another variable (e.g. panel or longitudinal data) is a matter of using a by() option too. rangerun is from SSC. The term distinct is recommended strongly over unique in https://www.stata-journal.com/articl...article=dm0042


      Code:
      clear 
      set obs 50 
      set seed 2803 
      gen y = runiformint(1,5)
      gen t = _n 
      
      program mydistinct 
           sort y 
           gen count = sum(y != y[_n-1])
           replace count = count[_N]
      end 
      
      rangerun mydistinct, int(t -5 -1) 
           
      list 
      
           +----------------+
           | y    t   count |
           |----------------|
        1. | 3    1       . |
        2. | 1    2       1 |
        3. | 3    3       2 |
        4. | 1    4       2 |
        5. | 2    5       2 |
           |----------------|
        6. | 4    6       3 |
        7. | 4    7       4 |
        8. | 1    8       4 |
        9. | 1    9       3 |
       10. | 1   10       3 |
           |----------------|
       11. | 3   11       2 |
       12. | 1   12       3 |
       13. | 1   13       2 |
       14. | 5   14       2 |
       15. | 2   15       3 |
           |----------------|
       16. | 2   16       4 |
       17. | 4   17       3 |
       18. | 4   18       4 |
       19. | 3   19       3 |
       20. | 4   20       3 |
           |----------------|
       21. | 4   21       3 |
       22. | 2   22       2 |
       23. | 3   23       3 |
       24. | 1   24       3 |
       25. | 4   25       4 |
           |----------------|
       26. | 1   26       4 |
       27. | 3   27       4 |
       28. | 1   28       3 |
       29. | 2   29       3 |
       30. | 2   30       4 |
           |----------------|
       31. | 2   31       3 |
       32. | 3   32       3 |
       33. | 2   33       3 |
       34. | 2   34       2 |
       35. | 2   35       2 |
           |----------------|
       36. | 3   36       2 |
       37. | 3   37       2 |
       38. | 4   38       2 |
       39. | 5   39       3 |
       40. | 4   40       4 |
           |----------------|
       41. | 1   41       3 |
       42. | 4   42       4 |
       43. | 4   43       3 |
       44. | 5   44       3 |
       45. | 4   45       3 |
           |----------------|
       46. | 1   46       3 |
       47. | 3   47       3 |
       48. | 2   48       4 |
       49. | 3   49       5 |
       50. | 3   50       4 |
           +----------------+

      Comment


      • #4
        Thanks so much Nick for the elegant solution. I copied your codes to my Stata and it works perfectly. But when I adjust the codes to my data as the following, Stata couldn't generate the count variable, I have been debugging but couldn't find out the reason. I wondered if you might have some insights about what are the possible problems in the following codes:

        program mydistinct
        sort prf_npi npi_surgeon
        g count = sum (prf_npi != prf_npi[_n-1] | npi_surgeon != npi_surgeon[_n-1])
        replace count = count[_N]
        end
        rangerun mydistinct, int(admsndt -90 -1) by(prvdrnum P2A)
        Last edited by Yi Chen; 23 Apr 2021, 08:47.

        Comment


        • #5
          For unique read distinct!

          rangerun yields a set of scalars for each separate observation, given the window associated with that observation, in the example the 5 preceding values.

          You want, it seems. to do something different, return a set of values, one for each observation in the window. rangerun does not purport to do that and in any case it makes no sense to me, as the window moves, and a value could be tagged for some but not necessarily all of the windows in which it occurs.

          For example, consider any particular observation. As the window moves it enters and then leaves the window, so that it is or is not a candidate for tagging. Then within the window it will be tagged as distinct depending on whether it is the first of its kind to be observed within the window. So, there is no meaning that I can see to what will be a transient result.

          Very likely what you want to do is something different namely do more than count distinct values, but that must be done in a way that results in a scalar from the program called by rangerun.

          Comment


          • #6
            Got it, thanks a lot Nick.

            Comment

            Working...
            X