Return Value in Adjacent Column

Hans Schubert

Join Date: Jul 2021

Posts: 109
#1

Return Value in Adjacent Column

13 Sep 2022, 02:56

Good morning,

I start with 3 columns on my datasheet.
In column 4 the maximum value of column 2, for the previous 365 days is calculated using...

rangestat (max) max_figure1_1year = figure1, int(ddate -365 -1)

Column 5 is where I need help...

I need column 5 to return the value in column 3 that is adjacent to the calculated figure in column 4

For example column 4 returns 74 (the maximum value of column 2 in the previous 365 days, column 5 returns 70 (the adjacent value from column 3)

If there are joint maximum values for the previous 365 days, as is the case in the second example, then the highest adjacent value is returned (here 84 was the maximum value observed twice, so the value returned in column 5 is 82, as it is the highest adjacent value.

Hope that makes sense. Thanks.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float ddate byte(figure1 figure2) double max_figure1_1year float figure1_neighbour 22200 46 68 74 70 22158 74 70 84 82 22149 70 72 86 . 22109 54 77 86 . 22070 25 78 86 . 21872 72 80 86 . 21854 28 81 86 . 21821 84 82 86 . 21805 84 77 86 . 21789 86 78 84 . 21778 81 78 86 . 21762 82 78 86 . 21740 84 76 86 . 21712 66 78 86 . 21675 83 79 86 . 21479 76 80 86 . 21448 84 81 86 . 21423 86 76 90 . 21414 71 79 90 . 21404 80 80 90 . 21376 86 80 90 . 21336 77 82 90 . 21332 85 83 90 . 21304 83 81 96 . 21295 54 82 96 . 21268 86 82 96 . 21110 65 84 96 . 21089 79 86 96 . 21077 90 87 96 . 21000 88 88 96 . 20992 89 88 96 . 20958 96 83 91 . 20940 72 84 91 . 20748 85 84 91 . 20732 91 83 87 . 20664 63 85 87 . 20595 70 87 87 . 20578 78 87 87 . 20556 87 84 . . end format %td ddate
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30355

13 Sep 2022, 08:59

I was hoping to find a more elegant solution, but the following code does what you ask:

Code:

gen wanted = .
forvalues i = 1/`c(N)' {
    summ figure2 if figure1 == max_figure1_1year[`i'] ///
        & inrange(ddate[`i']-ddate, 1, 365), meanonly
    if `r(N)' > 0 {
        replace wanted = `r(max)' in `i'
    }
}

Comment

Hans Schubert

Join Date: Jul 2021

Posts: 109
#3

13 Sep 2022, 09:53

Thanks Clyde,

Running that code returns an 'invalid syntax' error. The generated 'wanted column' remains empty.

Tried playing around with it myself, to no avail.

. gen wanted = .
(39 missing values generated)

.
. forvalues i = 1/`c(N)' {
2.
. summ figure2 if figure1 == max_figure1_1year[`i'] ///
3.
. & inrange(ddate[`i']-ddate, 1, 365), meanonly
4.
. if `r(N)' > 0 {
5.
. replace wanted = `r(max)' in `i'
6.
. }
7.
. }
invalid syntax
r(198);
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1548
#4

13 Sep 2022, 09:59

Hans, it should work if you run the code in a do-file, rather than interactively from the command window.

Alternatively, if you remove the triple slash /// and join the line starting with & inrange... to the previous line, then it'll work from the command window too.

Last edited by Hemanshu Kumar; 13 Sep 2022, 10:01.
Comment
Hans Schubert

Join Date: Jul 2021

Posts: 109
#5

13 Sep 2022, 10:20

Thanks Hemanshu, it works fine from the do-file. Thank you too Clyde. Much appreciated!
Comment

Hans Schubert

Join Date: Jul 2021
Posts: 109

13 Sep 2022, 12:40

One more question on this...

Is it possible to run that same code, but do it by a group variable?
Thanks

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(group ddate) byte(figure1 figure2) double max_figure1_1year
1 22200 46 68 74
1 22158 74 70 84
1 22149 70 72 86
1 22109 54 77 86
1 22070 25 78 86
1 21872 72 80 86
1 21854 28 81 86
1 21821 84 82 86
1 21805 84 77 86
1 21789 86 78 84
1 21778 81 78 86
1 21762 82 78 86
1 21740 84 76 86
1 21712 66 78 86
1 21675 83 79 86
1 21479 76 80 86
1 21448 84 81 86
1 21423 86 76 90
2 21414 71 79 90
2 21404 80 80 90
2 21376 86 80 90
2 21336 77 82 90
2 21332 85 83 90
2 21304 83 81 96
2 21295 54 82 96
2 21268 86 82 96
2 21110 65 84 96
2 21089 79 86 96
2 21077 90 87 96
2 21000 88 88 96
2 20992 89 88 96
2 20958 96 83 91
2 20940 72 84 91
2 20748 85 84 91
2 20732 91 83 87
2 20664 63 85 87
2 20595 70 87 87
2 20578 78 87 87
2 20556 87 84  .
end
format %td ddate

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30355

13 Sep 2022, 12:53

Yes, it's a very minor modification:

Code:

gen wanted = .
forvalues i = 1/`c(N)' {
    summ figure2 if figure1 == max_figure1_1year[`i'] ///
        & inrange(ddate[`i']-ddate, 1, 365) & group == group[`i'], meanonly
    if `r(N)' > 0 {
        replace wanted = `r(max)' in `i'
    }
}

Comment

Hans Schubert

Join Date: Jul 2021

Posts: 109
#8

13 Sep 2022, 15:50

Thanks very much, Clyde.
Comment
Hans Schubert

Join Date: Jul 2021

Posts: 109
#9

13 Sep 2022, 17:15

One final observation on this - as grateful as I am for the solutions here, the actual dataset is just over a million rows with many, many groups. It will take a very long time for the do-file to cycle through them all. Would need to be running this daily, so that processing time is not really feasible. Thanks again.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#10

13 Sep 2022, 17:29

Well, if, as you say, the number of groups is large, you might get a noticeable speed up by doing it this way instead:

Code:

capture program drop one_group program define one_group gen wanted = . forvalues i = 1/`c(N)' { summ figure2 if figure1 == max_figure1_1year[`i'] /// & inrange(ddate[`i']-ddate, 1, 365), meanonly if `r(N)' > 0 { replace wanted = `r(max)' in `i' } } exit end runby one_group, by(group) status

-runby- is written by Robert Picard and me, and is available from SSC.

You might notice that the code inside program one_group is actually the same code as the "group-less" code offered in #2. Also, -runby-, with the -status- option, gives you a progress report periodically, along with an estimate of time remaining to completion.
Comment
Hans Schubert

Join Date: Jul 2021

Posts: 109
#11

13 Sep 2022, 17:43

Thanks Clyde. I will check it out in the morning.
Comment
Hans Schubert

Join Date: Jul 2021

Posts: 109
#12

14 Sep 2022, 01:51

That's processing just over a million rows in less than 4 minutes now, Clyde. Thank you very much.
Comment

Announcement

Return Value in Adjacent Column

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment