Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to construct a 'reputation' variable?

    Hi,

    I have a dataset on compliance actions and penalties for violations of an environmental statute for individual plants in an EPA compliance database for the Clean Air Act. I have the start and end dates for a violation and overall penalties for the entire period of violation. What I want to do is construct a variable that takes a value of 1 for a plant i if a plant j in the same county was assessed a penalty in the previous year (I am interested in seeing the impact of such a variable on the duration of violation for a plant).

    What makes it complicated is that my data does not have penalties assessed for each year of violation/noncompliance. I have overall penalties for periods of violation. I don't have a clue how to construct this variable. The data is set up like the following:

    HTML Code:
                      ID          County        Start_Date        End_Date       Duration of Violation (years)      Penalty         Reputation
                     -----         -----------      ----------------     ----------------      ---------------------------------------  --------------     ----------------
                      1             12345            2005               2006                                   1                           $15,000                   -
                      2             12345            2008               2009                                   1                           $ 3,000                    0
                      3             12345            2010               2013                                   3                           $30,000                   1
                      4             12345            2012               2014                                   2                           $ 9,000                    1
    The reputation variable isn't constructed yet but I know what it would look like. As you can see Plant 4 has a value of 1 for the Reputation variable because Plant 3 in the same county had a penalty assessed/was in violation in 2011.

    I know what the reputation variable is supposed to look like, I just have no idea how to accomplish this in Stata. The code also needs to take into account that the plants have to be in the same county.

    Any help would be highly appreciated.



    Last edited by Caroline Abraham; 03 Nov 2019, 16:35.

  • #2
    I'm not completely clear about some details of what you want. I assume that what you are trying to identify here is observations where the year before the start date of a penalty falls in the range from start_date to end_date of another penalty in the same county. If that's it, then this code should do it:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte plant_id int(county start_date end_date)
    1 12345 2005 2006
    2 12345 2008 2009
    3 12345 2010 2013
    4 12345 2012 2014
    end
    
    gen long obs_no = _n
    preserve
    rename (obs_no plant_id start_date end_date) =_2
    tempfile copy
    save `copy'
    
    restore
    joinby county using `copy'
    by obs_no, sort: egen reputation = max(inrange(start_date-1, start_date_2, end_date_2))
    drop *_2
    by obs_no: keep if _n == 1
    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.



    Comment

    Working...
    X