Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Maximum value across rows

    Hello all,

    I have the data below:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long ID double(time miles cars stops1 stops2 stops3)
    1 162 5 40 27 6
    1 163 7 42 32
    1 164 7 43 41
    1 165 2 47 48
    2 162 10 71 39 7 4
    2 163 11 73 42
    2 164 9 78 58
    2 165 6 82 61
    end
    format %tq time
    I want to find the maximum value across the stops rows. How can I do this if each ID has a different number of stops? This is just sample data, but the real data has many different stops.

    I would appreciate any help.

    A
    Last edited by Anoush Khachatryan; 09 Sep 2023, 16:26.

  • #2
    Code:
    egen wanted = rowmax(stops*)

    Comment


    • #3
      Clyde Schechter Thank you! This code worked very well.

      I have another question. Using the same data, how would I calculate the largest value by row for only stops 4, 5, and 6?

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long ID double(time miles cars stops1 stops2 stops3 stops4 stops5 stops6)
      1 162 5 40 27 6  6 5 3
      1 163 7 42 32
      1 164 7 43 41
      1 165 2 47 48
      2 162 10 71 39 7 4 9 10 22
      2 163 11 73 42
      2 164 9 78 58
      2 165 6 82 61
      end
      format %tq time
      I would appreciate any assistance.

      A

      Comment


      • #4
        Code:
        egen wanted2 = rowmax(stops4 stops5 stops6)
        Added:
        Or, on the assumption that, as in your example, these variables appear as a consecutive block in your data set:
        Code:
        egen wanted2 = rowmax(stops4-stops6)

        Comment


        • #5
          Clyde Schechter Thank you for your response!

          I apologize, I should have explained a bit better. Since this is only a sample of my data, there are many ID values with many different stops. Due to the size of my data, I want to find the maximum row value for stops by excluding the first X amount of stops based on an equation.

          For example, my equation is X=10-7 for ID==1 and X=10-5 for ID==2. Therefore, I want to exclude the first 3 stops from ID==1 and the first 5 stops for ID==2. Is there a some way I can automate this for the many ID variables I have all with a different amount of stops?

          Anoush
          Last edited by Anoush Khachatryan; 09 Sep 2023, 19:31.

          Comment


          • #6
            In principle, yes. But it cannot be done with the example data you show, because while you may have in your head some information about which stops to include for which id, there is nothing in the example data that provides this information. Please post back with a new data example (using -dataex-, of course) that contains the additional variable(s) needed to see which stops to count for each id.

            Comment


            • #7
              Sorry about that. Here is an example of the data:

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input long ID double(time miles cars X stops1 stops2 stops3 stops4 stops5 stops6)
              1 162 5 40 3 27 6 6 5 3
              1 163 7 42 3
              1 164 7 43 3
              1 165 2 47 3
              2 162 10 71 5 39 7 4 9 10 22
              2 163 11 73 5
              2 164 9 78 5
              2 165 6 82 5
              3 162 5 7 1 4 2 10 9 
              3 163 6 7 1
              3 164 8 3 1
              3 165 6 2 1
              end
              format %tq time
              For ID==1, X==3 so I would exclude the first three stops (27, 6, and 6) and only find the row max between 5 and 3. For ID==2, X==5 so I would exclude the first five stops, etc.

              Anoush

              Comment


              • #8
                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input byte ID int time byte(miles cars X stops1 stops2 stops3 stops4 stops5 stops6)
                1 162  5 40 3 27 6  6 5  3  .
                1 163  7 42 3  . .  . .  .  .
                1 164  7 43 3  . .  . .  .  .
                1 165  2 47 3  . .  . .  .  .
                2 162 10 71 5 39 7  4 9 10 22
                2 163 11 73 5  . .  . .  .  .
                2 164  9 78 5  . .  . .  .  .
                2 165  6 82 5  . .  . .  .  .
                3 162  5  7 1  4 2 10 9  .  .
                3 163  6  7 1  . .  . .  .  .
                3 164  8  3 1  . .  . .  .  .
                3 165  6  2 1  . .  . .  .  .
                end
                
                isid ID time, sort
                reshape long stops, i(ID time)
                by ID time (_j), sort: egen wanted = total(cond(_j > X, stops, .))
                reshape wide
                The example data you show is not genuine -dataex- output. It does not run when used. -dataex- does not elide missing values, it explicitly represents them as . or "". In the future, use genuine -dataex- output to show example data: do not try to mock up your own and dress it up as -dataex-. In this instance, I found another way to import your data and then ran -dataex-: you can see how it looks in the code above. But it is not reasonable to expect that this sort of extra work will be undertaken in the general case.

                Added: When the value of X leads to the exclusion of all non-missing values of the stops variables, this code produces 0 as the total. This is consistent with the mathematical definition of an empty sum. However, if you prefer to have that result as a missing value, add -, missing- to the end of the -egen- command.
                Last edited by Clyde Schechter; 10 Sep 2023, 12:56.

                Comment


                • #9
                  Clyde Schechter Thank you very much!! It works perfectly!

                  Comment

                  Working...
                  X