Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trying to assess yearly coverage for 5-year date ranges, using half (or more) of each year as the cutoff for coverage

    Hi all,

    I am working with data involving coverage of a 5-year environmental safety plan for various municipalities in several states. In most cases, the municipalities have multiple rows to account for multiple plans and also multi-jurisdiction plans that they have been included in (with other municipalities, counties, school/school districts, etc.). Each row contains the name of the municipality, the state, and the approval/expiration dates for a given plan. The dataset is very large (and is confidential), so I have changed the municipal names/states the municipalities are located in, but here is a small example which I think illustrates the data:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str12 community str6 state str9(approval expiration) float(approval_date expiration_date)
    "Springdon"    "Alaska" "1/8/2008"  "1/8/2013"  17539 19366
    "Springdon"    "Alaska" "3/5/2012"  "3/5/2017"  19057 20883
    "Springdon"    "Alaska" "4/16/2012" "4/16/2017" 19099 20925
    "Springdon"    "Alaska" "5/7/2012"  "5/7/2017"  19120 20946
    "Springdon"    "Alaska" "6/11/2012" "6/11/2017" 19155 20981
    "Springdon"    "Alaska" "2/12/2013" "2/12/2018" 19401 21227
    "Springdon"    "Alaska" "2/12/2013" "2/12/2018" 19401 21227
    "Springdon"    "Alaska" "5/9/2013"  "5/9/2018"  19487 21313
    "Springdon"    "Alaska" "5/28/2014" "5/27/2019" 19871 21696
    "Springdon"    "Alaska" "11/8/2021" "11/7/2026" 22592 24417
    "Springfield"  "Alaska" "9/30/2008" "9/30/2013" 17805 19631
    "Springfield"  "Alaska" "2/8/2010"  "2/8/2015"  18301 20127
    "Springfield"  "Alaska" "6/11/2012" "6/11/2017" 19155 20981
    "Springfield"  "Alaska" "6/11/2012" "6/11/2017" 19155 20981
    "Springfield"  "Alaska" "6/11/2012" "6/11/2017" 19155 20981
    "Springfield"  "Alaska" "6/6/2014"  "6/6/2019"  19880 21706
    "Springfield"  "Alaska" "7/7/2015"  "7/6/2020"  20276 22102
    "Springfield"  "Alaska" "2/12/2018" "2/11/2023" 21227 23052
    "Springfield"  "Alaska" "2/11/2020" "2/10/2025" 21956 23782
    "Springfield"  "Alaska" "5/3/2021"  "5/2/2026"  22403 24228
    "Spring Creek" "Alaska" "3/18/1997" "3/17/2002" 13591 15416
    "Spring Creek" "Alaska" "11/1/2004" "11/1/2009" 16376 18202
    end
    format %td approval_date
    format %td expiration_date
    My task is to show the plan coverage in the data by year (coded "0" if not covered in a given year and "1" if covered), with a place considered "covered" for a given year if it has a plan (or plans) in place for more than half of the days of that year (i.e., 183 days or more). So, ultimately I hope to wind up with something that looks like (in long format, wide format would be equally good):

    Code:
    * Example generated by -dataex-.    For    more    info, type help dataex
    clear
    input str12 community str6 state    int    year    byte plan_coverage
    "Springdon"    "Alaska" 1997 0
    "Springdon"    "Alaska" 1998 0
    "Springdon"    "Alaska" 1999 0
    "Springdon"    "Alaska" 2000 0
    "Springdon"    "Alaska" 2001 0
    "Springdon"    "Alaska" 2002 0
    "Springdon"    "Alaska" 2003 0
    "Springdon"    "Alaska" 2004 0
    "Springdon"    "Alaska" 2005 0
    "Springdon"    "Alaska" 2006 0
    "Springdon"    "Alaska" 2007 0
    "Springdon"    "Alaska" 2008 1
    "Springdon"    "Alaska" 2009 1
    "Springdon"    "Alaska" 2010 1
    "Springdon"    "Alaska" 2011 1
    "Springdon"    "Alaska" 2012 1
    "Springdon"    "Alaska" 2013 1
    "Springdon"    "Alaska" 2014 1
    "Springdon"    "Alaska" 2015 1
    "Springdon"    "Alaska" 2016 1
    "Springdon"    "Alaska" 2017 1
    "Springdon"    "Alaska" 2018 1
    "Springdon"    "Alaska" 2019 0
    "Springdon"    "Alaska" 2020 0
    "Springdon"    "Alaska" 2021 0
    "Springdon"    "Alaska" 2022 1
    "Springdon"    "Alaska" 2023 1
    "Springdon"    "Alaska" 2024 1
    "Springdon"    "Alaska" 2025 1
    "Springdon"    "Alaska" 2026 1
    "Springfield"  "Alaska" 1997 0
    "Springfield"  "Alaska" 1998 0
    "Springfield"  "Alaska" 1999 0
    "Springfield"  "Alaska" 2000 0
    "Springfield"  "Alaska" 2001 0
    "Springfield"  "Alaska" 2002 0
    "Springfield"  "Alaska" 2003 0
    "Springfield"  "Alaska" 2004 0
    "Springfield"  "Alaska" 2005 0
    "Springfield"  "Alaska" 2006 0
    "Springfield"  "Alaska" 2007 0
    "Springfield"  "Alaska" 2008 1
    "Springfield"  "Alaska" 2009 1
    "Springfield"  "Alaska" 2010 1
    "Springfield"  "Alaska" 2011 1
    "Springfield"  "Alaska" 2012 1
    "Springfield"  "Alaska" 2013 1
    "Springfield"  "Alaska" 2014 1
    "Springfield"  "Alaska" 2015 1
    "Springfield"  "Alaska" 2016 1
    "Springfield"  "Alaska" 2017 1
    "Springfield"  "Alaska" 2018 1
    "Springfield"  "Alaska" 2019 1
    "Springfield"  "Alaska" 2020 1
    "Springfield"  "Alaska" 2021 1
    "Springfield"  "Alaska" 2022 1
    "Springfield"  "Alaska" 2023 1
    "Springfield"  "Alaska" 2024 1
    "Springfield"  "Alaska" 2025 1
    "Springfield"  "Alaska" 2026 1
    "Spring Creek" "Alaska" 1997 1
    "Spring Creek" "Alaska" 1998 1
    "Spring Creek" "Alaska" 1999 1
    "Spring Creek" "Alaska" 2000 1
    "Spring Creek" "Alaska" 2001 1
    "Spring Creek" "Alaska" 2002 0
    "Spring Creek" "Alaska" 2003 0
    "Spring Creek" "Alaska" 2004 0
    "Spring Creek" "Alaska" 2005 1
    "Spring Creek" "Alaska" 2006 1
    "Spring Creek" "Alaska" 2007 1
    "Spring Creek" "Alaska" 2008 1
    "Spring Creek" "Alaska" 2009 1
    "Spring Creek" "Alaska" 2010 0
    "Spring Creek" "Alaska" 2011 0
    "Spring Creek" "Alaska" 2012 0
    "Spring Creek" "Alaska" 2013 0
    "Spring Creek" "Alaska" 2014 0
    "Spring Creek" "Alaska" 2015 0
    "Spring Creek" "Alaska" 2016 0
    "Spring Creek" "Alaska" 2017 0
    "Spring Creek" "Alaska" 2018 0
    "Spring Creek" "Alaska" 2019 0
    "Spring Creek" "Alaska" 2020 0
    "Spring Creek" "Alaska" 2021 0
    "Spring Creek" "Alaska" 2022 0
    "Spring Creek" "Alaska" 2023 0
    "Spring Creek" "Alaska" 2024 0
    "Spring Creek" "Alaska" 2025 0
    "Spring Creek" "Alaska" 2026 0
    end
    I really appreciate any advice on this that you all might be able to offer, especially since I have been trying to "hand code" this for the past week using brute force without much progress. I hope that I have provided enough information about the data and the way that I explained my issue makes sense.

    Best wishes,
    Matt


  • #2
    I see that there are overlapping dates across multiple rows for each community and state combination. I also see duplicate rows. Can you clarify what it means?

    Comment


    • #3
      Sure, Navi. Each row represents a different plan, and most of the municipalities have several. This is due to the fact that the plans are broken out by date (2001-2005, 2005-2010, etc). and also because the municipalities are entered into multijurisdiction plans with other municipalities, school districts, counties, utility services, etc. For each municipality, all of these plans have their own row in the data, and the dates often overlap (or, as you noted, in some cases have the exact same dates). I hope that makes sense!!

      Comment


      • #4
        I'm assuming that you consider the community to be covered if any of its associated plans is active during that year (or at least half thereof).
        Code:
        preserve
        clear
        set obs `=2026-1997+1'
        gen year = 1996 + _n
        tempfile years
        save `years'
        
        restore
        cross using `years'
        
        gen byte covered = approval_date <= mdy(6, 30, year) ///
            & expiration_date >= mdy(7, 1, year)
        collapse (max) plan_coverage = covered, by(community state year)

        Comment


        • #5
          Thanks so much, Clyde. That is exactly what I am looking to do! However, when I enter the code into a .do file and then execute, Stata returns the following:

          Code:
          . do "C:\Users\mattb\AppData\Local\Temp\STD968_000000.tmp"
          
          . preserve
          
          . clear
          
          . set obs `=2026-1997+1'
          Number of observations (_N) was 0, now 30.
          
          . gen year = 1996 + _n
          
          . tempfile years
          
          . save `years'
          file C:\Users\mattb\AppData\Local\Temp\ST_968_00000a.tmp saved as .dta format
          
          .
          . restore
          
          . cross using `years'
          
          .
          . gen byte covered = approval_date <= mdy(6, 30, year) ///
          >     & expiration_date >= mdy(7, 1, year)
          type mismatch
          r(109);
          
          end of do-file
          
          r(109);
          Given my lack of experience doing this kind of data shaping in Stata, my first inclination is that I probably did something wrong...

          Thanks again, and sorry for any inconvenience!

          Comment


          • #6
            In #1
            Code:
             
             approval_date expiration_date year 
            are all numeric, and year is numeric in #5 too, so it is hard to see what the problem is. Perhaps you are using a different version of your data in which one or more of those daily date variables is in fact string.

            Comment


            • #7
              You're right, Nick, I was. Works like a charm! Thanks so much to all of you for your help.

              Comment

              Working...
              X