Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using (nested) loops to conditionally sum

    Hello, thank you in advance for any advice you are able to provide me with. I am a rather novice when it comes to coding, so feel free to comment on mistakes in my methodology.

    Here is an example of some of the data I am working with for reference.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long PlantID str2 PlantState double NameplateCapacityMW int OperatingYear str25 County float Year
    60127 "VA"   80 2016 "Accomack" 2016
    60127 "VA"   80 2016 "Accomack" 2017
    60127 "VA"   80 2016 "Accomack" 2018
    60127 "VA"   80 2016 "Accomack" 2019
    60445 "ID"   40 2016 "Ada"      2016
    60445 "ID"   40 2016 "Ada"      2017
    60010 "ID"   20 2017 "Ada"      2018
    60445 "ID"   40 2016 "Ada"      2019
    60445 "ID"   40 2016 "Ada"      2016
    60010 "ID"   20 2017 "Ada"      2017
    60010 "ID"   20 2017 "Ada"      2018
    60445 "ID"   40 2016 "Ada"      2019
    59850 "CO"  1.2 2015 "Adams"    2017
    59850 "CO"  1.2 2015 "Adams"    2018
    59144 "CO"  1.8 2012 "Adams"    2019
    59850 "CO"  1.2 2015 "Adams"    2016
    60725 "CO"  1.5 2017 "Adams"    2017
    59849 "CO"  1.2 2015 "Adams"    2018
    59144 "CO"  1.8 2012 "Adams"    2019
    59849 "CO"  1.2 2015 "Adams"    2015
    60726 "CO"  1.8 2017 "Adams"    2017
    59144 "CO"  1.8 2012 "Adams"    2018
    61563 "CO"  1.6 2017 "Adams"    2019
    60764 "CO" 12.8 2016 "Adams"    2016
    60764 "CO" 12.8 2016 "Adams"    2018
    59144 "CO"  1.8 2012 "Adams"    2019
    59850 "CO"  1.2 2015 "Adams"    2016
    60726 "CO"  1.8 2017 "Adams"    2017
    61439 "CO"  1.3 2017 "Adams"    2018
    61439 "CO"  1.3 2017 "Adams"    2019
    59144 "CO"  1.8 2012 "Adams"    2012
    59849 "CO"  1.2 2015 "Adams"    2016
    59850 "CO"  1.2 2015 "Adams"    2017
    61753 "CO"  1.5 2018 "Adams"    2018
    60764 "CO" 12.8 2016 "Adams"    2019
    59850 "CO"  1.2 2015 "Adams"    2015
    59144 "CO"  1.8 2012 "Adams"    2016
    59849 "CO"  1.2 2015 "Adams"    2017
    59850 "CO"  1.2 2015 "Adams"    2018
    59849 "CO"  1.2 2015 "Adams"    2019
    61563 "CO"  1.6 2017 "Adams"    2018
    60726 "CO"  1.8 2017 "Adams"    2019
    60725 "CO"  1.5 2017 "Adams"    2018
    61439 "CO"  1.3 2017 "Adams"    2019
    59850 "CO"  1.2 2015 "Adams"    2016
    59849 "CO"  1.2 2015 "Adams"    2017
    59144 "CO"  1.8 2012 "Adams"    2018
    61563 "CO"  1.6 2017 "Adams"    2019
    58935 "VT"    2 2013 "Addison"  2013
    58935 "VT"    2 2013 "Addison"  2014
    61340 "VT"  2.2 2017 "Addison"  2017
    61340 "VT"  2.2 2017 "Addison"  2018
    61340 "VT"  2.2 2017 "Addison"  2019
    58935 "VT"    2 2013 "Addison"  2013
    61340 "VT"  2.2 2017 "Addison"  2017
    58935 "VT"    2 2013 "Addison"  2018
    60562 "VT"  4.9 2016 "Addison"  2019
    58935 "VT"    2 2013 "Addison"  2013
    60562 "VT"  4.9 2016 "Addison"  2016
    58935 "VT"    2 2013 "Addison"  2017
    60562 "VT"  4.9 2016 "Addison"  2018
    58935 "VT"    2 2013 "Addison"  2019
    58618 "FL"  1.2 2010 "Alachua"  2010
    58618 "FL"  1.2 2010 "Alachua"  2011
    58618 "FL"  1.2 2010 "Alachua"  2012
    57438 "FL"  1.6 2011 "Alachua"  2013
    58618 "FL"  1.2 2010 "Alachua"  2014
    57438 "FL"  1.6 2011 "Alachua"  2015
    57438 "FL"  1.6 2011 "Alachua"  2016
    58618 "FL"  1.2 2010 "Alachua"  2017
    58618 "FL"  1.2 2010 "Alachua"  2018
    57438 "FL"  1.6 2011 "Alachua"  2019
    57438 "FL"  1.6 2011 "Alachua"  2011
    58618 "FL"  1.2 2010 "Alachua"  2012
    58618 "FL"  1.2 2010 "Alachua"  2013
    58618 "FL"  1.2 2010 "Alachua"  2014
    58618 "FL"  1.2 2010 "Alachua"  2015
    58618 "FL"  1.2 2010 "Alachua"  2016
    57438 "FL"  1.6 2011 "Alachua"  2017
    58618 "FL"  1.2 2010 "Alachua"  2018
    57438 "FL"  1.6 2011 "Alachua"  2019
    58739 "NC"    3 2016 "Alamance" 2016
    58740 "NC"    3 2016 "Alamance" 2017
    59406 "NC"    2 2015 "Alamance" 2018
    60367 "NC"  5.2 2016 "Alamance" 2019
    61527 "NC"    5 2017 "Alamance" 2017
    60367 "NC"  5.2 2016 "Alamance" 2018
    58739 "NC"    3 2016 "Alamance" 2019
    58739 "NC"    3 2016 "Alamance" 2016
    61527 "NC"    5 2017 "Alamance" 2017
    59406 "NC"    2 2015 "Alamance" 2018
    58725 "NC"    5 2015 "Alamance" 2019
    58739 "NC"    3 2016 "Alamance" 2016
    58740 "NC"    3 2016 "Alamance" 2017
    58725 "NC"    5 2015 "Alamance" 2018
    61527 "NC"    5 2017 "Alamance" 2019
    58725 "NC"    5 2015 "Alamance" 2015
    58725 "NC"    5 2015 "Alamance" 2016
    58725 "NC"    5 2015 "Alamance" 2017
    58725 "NC"    5 2015 "Alamance" 2018
    end



    I am trying to create a new dataframe that contains sums of each capacity type (for example, variable "NameplateCapacityMW") by year for each unique County entry. The idea of the algorithm is that within a given County, for Year x (say, 2016) all unique Plants within the given County with OperatingYear <= 2016 are summed, with this sum recorded for that year in some variable (AnnCntyNmCap).

    Basically, for example, my first few entries of the desired output would be:

    County Year AnnCntyNmCap
    Accomack 2016 80
    Accomack 2017 80
    Accomack 2018 80
    Accomack 2019 80
    Ada 2016 40
    Ada 2017 60
    Ada 2018 60
    Ada 2019 60

    Notice that for Ada, in year 2017 a value of 20 is added because a second generator opened in 2017 (denoted by the second plant having an "OperatingYear" of 2017)

    Here was my initial idea for implementing the algorithm I described above:


    gen AnnCntyNmCap = .

    levelsof County, local(counties)

    foreach i of local counties {
    forvalues of j = i {
    replace AnnCntyNmCap = AnnCntyNmCap + ((Year-YrEntrdServ+1)*NmPltCap_MW)
    }
    }


    Because my ultimate goal is to have one line per Year per County, I expect to have to use the collapse command at some point, but the above code does not execute so it will take some further development before I will worry about that.

    Thank you for taking the time to look over my issue, please let me know if there is anything I can elaborate on to better convey my problem or goal.

  • #2
    The trickiness here is to avoid counting the same power station (?) more than once, but your problem could be both simpler (I can't see any need for loops) and more complicated than you think.

    I did this

    Code:
    duplicates drop *, force 
    bysort PlantID (Year) : gen tag = sum(Year >= Operating) == 1 
    bysort PlantState County (Year) : gen wanted = sum(tag * Name)
    What the code doesn't do is cope with changes in capacity in the lifetime of each station. There aren't any in the example data.

    What my code does do is cope with the possibility of the same county name in different states.

    Comment


    • #3
      Thank you very much Nick. Funny because after posting this my roommate and I were talking about a way to use by-grouping in SAS to achieve the desired result; I just wasn't sure how to implement it in Stata.

      Luckily capacity does not change over the life of the power station, at least for the types of generators I am working with.

      Your code seems to accomplish exactly what I needed, but now I am having trouble collapsing the data with a certain parameter. I need to get rid of all observations but one for each County, and the County that needs to be saved is the County with the greatest "wanted" value. This is with the goal being to get a result similar to the one I noted in my original post, so I need a list of years for each county. I limited the dataset to Adams County to provide another sample:

      Code:
      input long PlantID str2 PlantState double NameplateCapacityMW int OperatingYear str25 County float(Year tag wanted)
      59144 "CO"  1.8 2012 "Adams" 2012 1  1.8
      59144 "CO"  1.8 2012 "Adams" 2019 0 24.7
      59144 "CO"  1.8 2012 "Adams" 2014 0  1.8
      59144 "CO"  1.8 2012 "Adams" 2013 0  1.8
      59144 "CO"  1.8 2012 "Adams" 2017 0 20.3
      59144 "CO"  1.8 2012 "Adams" 2015 0  1.8
      59849 "CO"  1.2 2015 "Adams" 2017 0 18.8
      59849 "CO"  1.2 2015 "Adams" 2019 0 24.7
      59849 "CO"  1.2 2015 "Adams" 2015 1    3
      59850 "CO"  1.2 2015 "Adams" 2017 0 18.8
      59850 "CO"  1.2 2015 "Adams" 2019 0 24.7
      59850 "CO"  1.2 2015 "Adams" 2015 1  4.2
      59850 "CO"  1.2 2015 "Adams" 2018 0 23.4
      60725 "CO"  1.5 2017 "Adams" 2017 1 20.3
      60726 "CO"  1.8 2017 "Adams" 2017 1 18.8
      60726 "CO"  1.8 2017 "Adams" 2019 0 24.7
      60764 "CO" 12.8 2016 "Adams" 2017 0 20.3
      60764 "CO" 12.8 2016 "Adams" 2018 0 20.3
      60764 "CO" 12.8 2016 "Adams" 2019 0 24.7
      60764 "CO" 12.8 2016 "Adams" 2016 1   17
      61439 "CO"  1.3 2017 "Adams" 2018 1 24.7
      61439 "CO"  1.3 2017 "Adams" 2019 0 24.7
      61563 "CO"  1.6 2017 "Adams" 2018 1 21.9
      61563 "CO"  1.6 2017 "Adams" 2019 0 24.7
      61753 "CO"  1.5 2018 "Adams" 2018 1 23.4
      61753 "CO"  1.5 2018 "Adams" 2019 0 24.7

      So I need something of this form to reduce down to the following observations:

      Code:
      PlantState County Year tag wanted
      "CO"  "Adams" 2012 1  1.8
      "CO"  "Adams" 2013 0  1.8
      "CO"  "Adams" 2014 0  1.8
      "CO"  "Adams" 2015 0  4.2
      "CO"  "Adams" 2016 0  17
      "CO"  "Adams" 2017 0  23.2
      "CO"  "Adams" 2018 0  24.7
      "CO"  "Adams" 2019 0  24.7
      I dropped variables just to get the point across that I don't want Plant specific data in the end, I want the capacity for the whole county per year in the end.

      By manually putting it together for this county I have realized that my system is failing at some point, because for Year = 2017 there is no observation with wanted = 23.2, so this data would be incorrect even if collapsed. I can share the rest of my code if desired to see if I've made a mistake elsewhere in my methodology, but I can also troubleshoot on my own.

      Thank you again for he help Nick, don't feel obligated to spend anymore time fixing my problems.

      Comment


      • #4
        Check whether the following is not doing what you want:

        Code:
        . gsort County Year -wanted
        
        
        . by County Year: keep if _n==1
        (18 observations deleted)

        Comment


        • #5
          Hi Joro,

          Sorry for the late response. That did exactly what I needed, thank you!

          Working out a few bugs earlier in the code, but everything that was suggested works beautifully. Thank you both for your help.

          Comment


          • #6
            Seth you could also use collapse, which is Stata's command for aggregating. (Also, it deletes obs as part of collapsing, so save your data beforehand.)
            Also, see this entry here

            Code:
            * Using the data from post #1
            rename _all, lower
            rename plantstate state  // just renaming some of the variables with shorter names
            rename nameplatecapacitymw capacity
            renam operatingyear operating_year
            collapse (sum) total_capacity = capacity (count) count_plants = plantid (max) max_plant = capacity, by(state county year)
            sort state county year
            
            . list, sepby(state) noobs abbrev(14)
            
              +---------------------------------------------------------------------+
              | state     county   year   total_capacity   count_plants   max_plant |
              |---------------------------------------------------------------------|
              |    CO      Adams   2012              1.8              1         1.8 |
              |    CO      Adams   2015              2.4              2         1.2 |
              |    CO      Adams   2016             19.4              6        12.8 |
              |    CO      Adams   2017              9.9              7         1.8 |
              |    CO      Adams   2018             25.9             10        12.8 |
              |    CO      Adams   2019               27             10        12.8 |
              |---------------------------------------------------------------------|
              |    FL    Alachua   2010              1.2              1         1.2 |
              |    FL    Alachua   2011              2.8              2         1.6 |
              |    FL    Alachua   2012              2.4              2         1.2 |
              |    FL    Alachua   2013              2.8              2         1.6 |
              |    FL    Alachua   2014              2.4              2         1.2 |
              |    FL    Alachua   2015              2.8              2         1.6 |
              |    FL    Alachua   2016              2.8              2         1.6 |
              |    FL    Alachua   2017              2.8              2         1.6 |
              |    FL    Alachua   2018              2.4              2         1.2 |
              |    FL    Alachua   2019              3.2              2         1.6 |
              |---------------------------------------------------------------------|
              |    ID        Ada   2016               80              2          40 |
              |    ID        Ada   2017               60              2          40 |
              |    ID        Ada   2018               40              2          20 |
              |    ID        Ada   2019               80              2          40 |
              |---------------------------------------------------------------------|
              |    NC   Alamance   2015                5              1           5 |
              |    NC   Alamance   2016               14              4           5 |
              |    NC   Alamance   2017               21              5           5 |
              |    NC   Alamance   2018             19.2              5         5.2 |
              |    NC   Alamance   2019             18.2              4         5.2 |
              |---------------------------------------------------------------------|
              |    VA   Accomack   2016               80              1          80 |
              |    VA   Accomack   2017               80              1          80 |
              |    VA   Accomack   2018               80              1          80 |
              |    VA   Accomack   2019               80              1          80 |
              |---------------------------------------------------------------------|
              |    VT    Addison   2013                6              3           2 |
              |    VT    Addison   2014                2              1           2 |
              |    VT    Addison   2016              4.9              1         4.9 |
              |    VT    Addison   2017              6.4              3         2.2 |
              |    VT    Addison   2018              9.1              3         4.9 |
              |    VT    Addison   2019              9.1              3         4.9 |
              +---------------------------------------------------------------------+
            
            
            * With the data from post #3
            * Also note, this is with year, not operating_year
            collapse (sum) total_capacity = capacity (count) count_plants = plantid (max) max_plant = capacity (min) min_plant=capacity, by(state county year)
            
            . list, sepby(state) noobs abbrev(14)
            
              +-------------------------------------------------------------------------------+
              | state   county   year   total_capacity   count_plants   max_plant   min_plant |
              |-------------------------------------------------------------------------------|
              |    CO    Adams   2012              1.8              1         1.8         1.8 |
              |    CO    Adams   2013              1.8              1         1.8         1.8 |
              |    CO    Adams   2014              1.8              1         1.8         1.8 |
              |    CO    Adams   2015              4.2              3         1.8         1.2 |
              |    CO    Adams   2016             12.8              1        12.8        12.8 |
              |    CO    Adams   2017             20.3              6        12.8         1.2 |
              |    CO    Adams   2018             18.4              5        12.8         1.2 |
              |    CO    Adams   2019             23.2              8        12.8         1.2 |
              +-------------------------------------------------------------------------------+
            Last edited by David Benson; 18 Jan 2019, 14:40.

            Comment

            Working...
            X