Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Occurence of maximal values in panel data.

    Hi all,

    I am working on the following set of data :

    clear
    input byte(id time)
    1 1
    1 2
    1 3
    1 4
    2 1
    2 2
    2 3
    3 1
    3 2
    3 3
    3 4
    4 1
    4 2
    5 1
    5 2
    5 3
    6 1
    6 2
    7 1
    7 2
    7 3
    8 1
    8 2
    8 3
    8 4
    9 1
    9 2
    9 3
    9 4
    10 1
    10 2
    10 3
    end

    I wish to get the value of the maximum of the variable "time" for each "id". For example :
    - for id "1" the max value of time is 4.
    - for id "2" the max value of time is 3.
    - for id "3" the max value of time is 4.
    - for id "4" the max value of time is 2 ect...

    Then I would like to get the number of occurrence of every maximum value for the time variable. In the example above :
    - The number of occurrence of maximum time value = 1 is 0.
    - The number of occurrence of maximum time value = 2 is 2 (occurres for id = 4 and id = 6).
    - The number of occurrence of maximum time value = 3 is 4 (occurres for id = 2, id = 5 id = 7 and id = 10).
    - The number of occurrence of maximum time value = 4 is 4 (occurres for id = 1, id = 3 id = 8 and id = 9).

    Do you know a Stata script that could automatize this process ?

    Thank you so much for your help.

    Best regards,

    Al.

  • #2
    Perhaps this code, which I apply to your sample data, will point you in a useful direction.
    Code:
    . by id (time), sort: generate maxtime = time if _n==_N
    (22 missing values generated)
    
    . list if maxtime!=., clean
    
           id   time   maxtime  
      4.    1      4         4  
      7.    2      3         3  
     11.    3      4         4  
     13.    4      2         2  
     16.    5      3         3  
     18.    6      2         2  
     21.    7      3         3  
     25.    8      4         4  
     29.    9      4         4  
     32.   10      3         3  
    
    . tabulate maxtime
    
        maxtime |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              2 |          2       20.00       20.00
              3 |          4       40.00       60.00
              4 |          4       40.00      100.00
    ------------+-----------------------------------
          Total |         10      100.00
    Here's a second, similar approach.
    Code:
    . by id (time), sort: generate maxtime = _n==_N
    
    . list if maxtime, clean
    
           id   time   maxtime  
      4.    1      4         1  
      7.    2      3         1  
     11.    3      4         1  
     13.    4      2         1  
     16.    5      3         1  
     18.    6      2         1  
     21.    7      3         1  
     25.    8      4         1  
     29.    9      4         1  
     32.   10      3         1  
    
    . tabulate time if maxtime
    
           time |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              2 |          2       20.00       20.00
              3 |          4       40.00       60.00
              4 |          4       40.00      100.00
    ------------+-----------------------------------
          Total |         10      100.00
    Last edited by William Lisowski; 04 Jul 2017, 09:36.

    Comment


    • #3
      Al:
      as far as your first question is concerned, you may want to try:
      Code:
      bysort id: egen A=max( time)
      Admittedly, I'm not clear with your second question, hence, the following one is a temptative answer:
      Code:
      . bysort id A: gen B=A if _n==1
      (22 missing values generated)
      
      . tab B
      
                B |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                2 |          2       20.00       20.00
                3 |          4       40.00       60.00
                4 |          4       40.00      100.00
      ------------+-----------------------------------
            Total |         10      100.00
      PS: Crocsed in the cyberspace with William's much more efficient code.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        On the assumption that each combination of id and time occurs at most once, and that id and time never have any missing values:

        Code:
        isid id time, sort // VERIFY NECESSARY ASSUMPTION
        by id (time): gen max_time_this_id = time[_N]
        
        gen max_time_is_here = (time == max_time_this_id)
        by time, sort: egen occurrences_as_max_time = total(max_time_is_here)
        Added: Crossed with both Carlo's and William's responses.

        Comment


        • #5
          This works perfectly. Thank you so much.
          Al.

          Comment

          Working...
          X