Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating a maximum value of variable following non-missing consecutive observations

    Dear Stata Users,

    I have a dataset in STATA as below:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str3 fic int year byte(age peak)
    "AA" 2011 . 1
    "AA" 2012 1 0
    "AA" 2013 2 0
    "AA" 2014 . 0
    "AA" 2015 . 0
    "AA" 2016 . 0
    "AA" 2017 . 1
    "AA" 2018 1 0
    "AA" 2019 2 0
    "AA" 2020 3 0
    "BB" 1988 4 0
    "BB" 1989 . 1
    "BB" 1990 1 0
    "BB" 1991 2 0
    "BB" 1992 3 0
    "BB" 1993 0 0
    "BB" 1994 5 0
    end

    I need to construct the dataset that will look like the one below, where “max_age” indicates the maximum number variable “age” with non-missing consecutive data that starts from t+1 after peak==1:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str3 fic int year byte(age peak) float max_age
    "AA" 2011 . 1 2
    "AA" 2012 1 0 .
    "AA" 2013 2 0 .
    "AA" 2014 . 0 .
    "AA" 2015 . 0 .
    "AA" 2016 . 0 .
    "AA" 2017 . 1 4
    "AA" 2018 1 0 .
    "AA" 2019 2 0 .
    "AA" 2020 3 0 .
    "BB" 1988 4 0 .
    "BB" 1989 . 1 3
    "BB" 1990 1 0 .
    "BB" 1991 2 0 .
    "BB" 1992 3 0 .
    "BB" 1993 0 0 .
    "BB" 1994 5 0 .
    end
    Thank you for your help.


  • #2
    Are you sure that your example solution is correct? In particular look at observation 7. Are you certain that the value of max_age there should be 4? I question it because that value 4 is an age that is associated with fic "BB" and everything else about the data suggests that you want to do this calculation within groups of observations having the same fic, in which case the correct answer there would be 3. If I'm right about this and you made a mistake, then the code you want is:
    Code:
    by fic (year), sort: gen peak_group = sum(peak)
    by fic peak_group (year), sort: gen consecutive_subgroup = sum(age != age[_n-1]+1)
    by fic peak_group (year): egen wanted = max(cond(consecutive_subgroup == 1, age, .))
    by fic peak_group (year): replace wanted = . if peak != 1
    Now, if your example solution really is correct, the implication is that fic is just a red herring in the data, except perhaps for determining the order of observations. In that case:
    Code:
    gen `c(obs_t)' obs_no = _n // PRESERVE EXISTING SORT ORDER
    gen peak_group = sum(peak)
    by peak_group (obs_no), sort: gen consecutive_subgroup = sum(age != age[_n-1]+1)
    by peak_group (obs_no): egen wanted = max(cond(consecutive_subgroup == 1, age, .))
    by peak_group (obs_no): replace wanted = . if peak != 1

    Comment


    • #3
      Dear Clyde,

      Thank you for your help! You are right, observation 7 should be ".", so the first solution that you provided is the one that I need! Thank you very much!

      Comment

      Working...
      X