Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Return Value in Adjacent Column

    Good morning,

    I start with 3 columns on my datasheet.
    In column 4 the maximum value of column 2, for the previous 365 days is calculated using...

    rangestat (max) max_figure1_1year = figure1, int(ddate -365 -1)

    Column 5 is where I need help...

    I need column 5 to return the value in column 3 that is adjacent to the calculated figure in column 4

    For example column 4 returns 74 (the maximum value of column 2 in the previous 365 days, column 5 returns 70 (the adjacent value from column 3)

    If there are joint maximum values for the previous 365 days, as is the case in the second example, then the highest adjacent value is returned (here 84 was the maximum value observed twice, so the value returned in column 5 is 82, as it is the highest adjacent value.

    Hope that makes sense. Thanks.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float ddate byte(figure1 figure2) double max_figure1_1year float figure1_neighbour
    22200 46 68 74 70
    22158 74 70 84 82
    22149 70 72 86  .
    22109 54 77 86  .
    22070 25 78 86  .
    21872 72 80 86  .
    21854 28 81 86  .
    21821 84 82 86  .
    21805 84 77 86  .
    21789 86 78 84  .
    21778 81 78 86  .
    21762 82 78 86  .
    21740 84 76 86  .
    21712 66 78 86  .
    21675 83 79 86  .
    21479 76 80 86  .
    21448 84 81 86  .
    21423 86 76 90  .
    21414 71 79 90  .
    21404 80 80 90  .
    21376 86 80 90  .
    21336 77 82 90  .
    21332 85 83 90  .
    21304 83 81 96  .
    21295 54 82 96  .
    21268 86 82 96  .
    21110 65 84 96  .
    21089 79 86 96  .
    21077 90 87 96  .
    21000 88 88 96  .
    20992 89 88 96  .
    20958 96 83 91  .
    20940 72 84 91  .
    20748 85 84 91  .
    20732 91 83 87  .
    20664 63 85 87  .
    20595 70 87 87  .
    20578 78 87 87  .
    20556 87 84  .  .
    end
    format %td ddate

  • #2
    I was hoping to find a more elegant solution, but the following code does what you ask:
    Code:
    gen wanted = .
    forvalues i = 1/`c(N)' {
        summ figure2 if figure1 == max_figure1_1year[`i'] ///
            & inrange(ddate[`i']-ddate, 1, 365), meanonly
        if `r(N)' > 0 {
            replace wanted = `r(max)' in `i'
        }
    }

    Comment


    • #3
      Thanks Clyde,

      Running that code returns an 'invalid syntax' error. The generated 'wanted column' remains empty.

      Tried playing around with it myself, to no avail.

      . gen wanted = .
      (39 missing values generated)

      .
      . forvalues i = 1/`c(N)' {
      2.
      . summ figure2 if figure1 == max_figure1_1year[`i'] ///
      3.
      . & inrange(ddate[`i']-ddate, 1, 365), meanonly
      4.
      . if `r(N)' > 0 {
      5.
      . replace wanted = `r(max)' in `i'
      6.
      . }
      7.
      . }
      invalid syntax
      r(198);

      Comment


      • #4
        Hans, it should work if you run the code in a do-file, rather than interactively from the command window.

        Alternatively, if you remove the triple slash /// and join the line starting with & inrange... to the previous line, then it'll work from the command window too.
        Last edited by Hemanshu Kumar; 13 Sep 2022, 10:01.

        Comment


        • #5
          Thanks Hemanshu, it works fine from the do-file. Thank you too Clyde. Much appreciated!

          Comment


          • #6
            One more question on this...

            Is it possible to run that same code, but do it by a group variable?
            Thanks

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input float(group ddate) byte(figure1 figure2) double max_figure1_1year
            1 22200 46 68 74
            1 22158 74 70 84
            1 22149 70 72 86
            1 22109 54 77 86
            1 22070 25 78 86
            1 21872 72 80 86
            1 21854 28 81 86
            1 21821 84 82 86
            1 21805 84 77 86
            1 21789 86 78 84
            1 21778 81 78 86
            1 21762 82 78 86
            1 21740 84 76 86
            1 21712 66 78 86
            1 21675 83 79 86
            1 21479 76 80 86
            1 21448 84 81 86
            1 21423 86 76 90
            2 21414 71 79 90
            2 21404 80 80 90
            2 21376 86 80 90
            2 21336 77 82 90
            2 21332 85 83 90
            2 21304 83 81 96
            2 21295 54 82 96
            2 21268 86 82 96
            2 21110 65 84 96
            2 21089 79 86 96
            2 21077 90 87 96
            2 21000 88 88 96
            2 20992 89 88 96
            2 20958 96 83 91
            2 20940 72 84 91
            2 20748 85 84 91
            2 20732 91 83 87
            2 20664 63 85 87
            2 20595 70 87 87
            2 20578 78 87 87
            2 20556 87 84  .
            end
            format %td ddate

            Comment


            • #7
              Yes, it's a very minor modification:
              Code:
              gen wanted = .
              forvalues i = 1/`c(N)' {
                  summ figure2 if figure1 == max_figure1_1year[`i'] ///
                      & inrange(ddate[`i']-ddate, 1, 365) & group == group[`i'], meanonly
                  if `r(N)' > 0 {
                      replace wanted = `r(max)' in `i'
                  }
              }

              Comment


              • #8
                Thanks very much, Clyde.

                Comment


                • #9
                  One final observation on this - as grateful as I am for the solutions here, the actual dataset is just over a million rows with many, many groups. It will take a very long time for the do-file to cycle through them all. Would need to be running this daily, so that processing time is not really feasible. Thanks again.

                  Comment


                  • #10
                    Well, if, as you say, the number of groups is large, you might get a noticeable speed up by doing it this way instead:
                    Code:
                    capture program drop one_group
                    program define one_group
                        gen wanted = .
                        forvalues i = 1/`c(N)' {
                            summ figure2 if figure1 == max_figure1_1year[`i'] ///
                                & inrange(ddate[`i']-ddate, 1, 365), meanonly
                            if `r(N)' > 0 {
                                replace wanted = `r(max)' in `i'
                            }
                        }
                        exit
                    end
                    
                    runby one_group, by(group) status
                    -runby- is written by Robert Picard and me, and is available from SSC.

                    You might notice that the code inside program one_group is actually the same code as the "group-less" code offered in #2. Also, -runby-, with the -status- option, gives you a progress report periodically, along with an estimate of time remaining to completion.

                    Comment


                    • #11
                      Thanks Clyde. I will check it out in the morning.

                      Comment


                      • #12
                        That's processing just over a million rows in less than 4 minutes now, Clyde. Thank you very much.

                        Comment

                        Working...
                        X