Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Find outcome of the most recent and second most recent event within group

    Dear all,

    I am trying to find the outcomes of the most recent and second most recent events (in the past) for each observation within a group and record those outcomes in two separate columns. The data set is an unbalanced yearly panel dataset (with gaps for some groups). Below is a simple example showing just one group.

    group_id year inspection violation
    1 2010 0 0
    1 2011 1 1
    1 2012 0 0
    1 2013 1 0
    1 2014 0 0
    1 2015 1 1
    1 2016 0 0
    1 2017 0 0
    1 2018 0 0
    1 2019 0 0
    1 2020 1 0

    Here, an event corresponds to an "inspection" and the outcome of an event corresponds to "violation." Both the inspection and violation variables are binary - that is, "inspection" is 1 if there was an inspection in a given year and 0 otherwise; "violation" is 1 if the inspection detected any violation and 0 otherwise. So, when there is no inspection in a given year (i.e., when the value of "inspection" is 0), the value of "violation" is always 0, and when there is an inspection in a given year (i.e., when the value of "inspection" is 1), the value of "violation" could be either 0 or 1.

    Now, what I would like to achieve is shown in the following table where two new columns are added.

    group_id year inspection violation most_recent_insp_viol second_most_recent_insp_viol
    1 2010 0 0 . .
    1 2011 1 1 . .
    1 2012 0 0 1 .
    1 2013 1 0 1 .
    1 2014 0 0 0 1
    1 2015 1 1 0 1
    1 2016 0 0 1 0
    1 2017 0 0 1 0
    1 2018 0 0 1 0
    1 2019 0 0 1 0
    1 2020 1 0 1 0

    For each observation, "most_recent_insp_viol" shows whether there was a violation in the most recent inspection in the past, and "second_most_recent_insp_viol" shows whether there was a violation in the second most recent inspection in the past.

    For 2010 and 2011, there was no inspection in the past, so the values of "most_recent_insp_viol" and "second_most_recent_insp_viol" are missing. For 2012 and 2013, there is one inspection in the past (which occurred in 2011) and the inspection found a violation, so the value of "most_recent_insp_viol" is 1. But the 2012 and 2013 instances do not have the second most recent inspection, so the value of "second_most_recent_insp_viol" is missing. For 2014 and 2015, the most recent inspection occurred in 2013 which did not find a violation, so the value of "most_recent_insp_viol" is 0; the second most recent inspection occurred in 2011 which found a violation, so the value of "second_most_recent_insp_viol" is 1. The logic is the same for 2016 ~ 2020.

    I think I was able to create the "most_recent_insp_viol" column, using the code below:

    Code:
    gen current_viol = violation if inspection == 1
    bysort group_id: gen lagged_viol = current_viol[_n-1] // lag by 1 year
    bysort group_id: carryforward lagged_viol, gen(most_recent_insp_viol)

    But I cannot figure out how to create the "second_most_recent_insp_viol" column. Your help would be greatly appreciated. Thank you so much in advance.

    Best regards,
    IJ

  • #2
    Sorry, below is the code for generating the example dataset above.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(group_id year inspection violation)
    1  2010 0 0
    1  2011 1 1
    1  2012 0 0
    1  2013 1 0
    1  2014 0 0
    1  2015 1 1
    1  2016 0 0
    1  2017 0 0
    1  2018 0 0
    1  2019 0 0
    1  2020 1 0
    end

    Comment


    • #3
      Thanks for using -dataex-.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float(group_id year inspection violation)
      1  2010 0 0
      1  2011 1 1
      1  2012 0 0
      1  2013 1 0
      1  2014 0 0
      1  2015 1 1
      1  2016 0 0
      1  2017 0 0
      1  2018 0 0
      1  2019 0 0
      1  2020 1 0
      end
      
      isid group_id year, sort
      by group_id (year): gen most_recent_outcome = .
      by group_id (year): replace most_recent_outcome = cond(inspection[_n-1], ///
          violation[_n-1], most_recent_outcome[_n-1])
          
      by group_id (year): gen inspections_before_now = sum(inspection[_n-1])
      rangestat (max) second_most_recent_outcome = violation, by(group_id) ///
          interval(inspections_before_now -2 -2)
      -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.

      Comment


      • #4
        Dear Clyde,

        The code works perfectly. Thank you very much.

        Best regards,
        IJ

        Comment

        Working...
        X