Generating a maximum value of variable following non-missing consecutive observations

Alberto Alvarez

Join Date: Jul 2016
Posts: 195

Generating a maximum value of variable following non-missing consecutive observations

08 Dec 2023, 15:17

Dear Stata Users,

I have a dataset in STATA as below:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str3 fic int year byte(age peak)
"AA" 2011 . 1
"AA" 2012 1 0
"AA" 2013 2 0
"AA" 2014 . 0
"AA" 2015 . 0
"AA" 2016 . 0
"AA" 2017 . 1
"AA" 2018 1 0
"AA" 2019 2 0
"AA" 2020 3 0
"BB" 1988 4 0
"BB" 1989 . 1
"BB" 1990 1 0
"BB" 1991 2 0
"BB" 1992 3 0
"BB" 1993 0 0
"BB" 1994 5 0
end

I need to construct the dataset that will look like the one below, where “max_age” indicates the maximum number variable “age” with non-missing consecutive data that starts from t+1 after peak==1:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str3 fic int year byte(age peak) float max_age
"AA" 2011 . 1 2
"AA" 2012 1 0 .
"AA" 2013 2 0 .
"AA" 2014 . 0 .
"AA" 2015 . 0 .
"AA" 2016 . 0 .
"AA" 2017 . 1 4
"AA" 2018 1 0 .
"AA" 2019 2 0 .
"AA" 2020 3 0 .
"BB" 1988 4 0 .
"BB" 1989 . 1 3
"BB" 1990 1 0 .
"BB" 1991 2 0 .
"BB" 1992 3 0 .
"BB" 1993 0 0 .
"BB" 1994 5 0 .
end

Thank you for your help.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#2

08 Dec 2023, 15:59

Are you sure that your example solution is correct? In particular look at observation 7. Are you certain that the value of max_age there should be 4? I question it because that value 4 is an age that is associated with fic "BB" and everything else about the data suggests that you want to do this calculation within groups of observations having the same fic, in which case the correct answer there would be 3. If I'm right about this and you made a mistake, then the code you want is:

Code:

by fic (year), sort: gen peak_group = sum(peak) by fic peak_group (year), sort: gen consecutive_subgroup = sum(age != age[_n-1]+1) by fic peak_group (year): egen wanted = max(cond(consecutive_subgroup == 1, age, .)) by fic peak_group (year): replace wanted = . if peak != 1

Now, if your example solution really is correct, the implication is that fic is just a red herring in the data, except perhaps for determining the order of observations. In that case:

Code:

gen `c(obs_t)' obs_no = _n // PRESERVE EXISTING SORT ORDER gen peak_group = sum(peak) by peak_group (obs_no), sort: gen consecutive_subgroup = sum(age != age[_n-1]+1) by peak_group (obs_no): egen wanted = max(cond(consecutive_subgroup == 1, age, .)) by peak_group (obs_no): replace wanted = . if peak != 1
Comment
Alberto Alvarez

Join Date: Jul 2016

Posts: 195
#3

09 Dec 2023, 11:10

Dear Clyde,

Thank you for your help! You are right, observation 7 should be ".", so the first solution that you provided is the one that I need! Thank you very much!
Comment

Announcement

Generating a maximum value of variable following non-missing consecutive observations

Comment

Comment