Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting unique observations in a complicated data structure

    Dear Statalist,

    I have a panel data on the level of towns and their administrative parts. Each town and administrative part has a unique id - id_town and id_part. For each town, I have an observation with data on the town as a whole (type=="O", id_part==0). If a town has more administrative parts than the one, there are also further observations consisting of data on the separate administrative parts (type=="K"). Every year, a certain value of coefficient is assigned by law to all towns based on their size. The coefficient is the same for all administrative parts of a town. However, a town can choose to change the value of the coefficient in some of its administrative parts.

    I want to count how many towns decided to change the coefficient in some of its parts by year (i.e. when the coefficient in some of a town's administrative parts is different from the coefficient on a town level in a given year). Secondly, I want to know how many administrative parts inside the towns decided for such a change every year..

    Thank you very much for a help.

    The structure of the data is following:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year long id_town str1 type long id_part double coefficient
    2014 502316 "O"      0 3.5
    2015 502316 "O"      0 3.5
    2016 502316 "O"      0 3.5
    2017 502316 "O"      0 3.5
    2018 502316 "O"      0 3.5
    2014 502545 "O"      0 1.4
    2015 502545 "O"      0 1.4
    2016 502545 "O"      0 1.4
    2017 502545 "O"      0 1.4
    2018 502545 "O"      0 1.4
    2014 503444 "K" 621056 1.6
    2015 503444 "K" 621056 1.6
    2016 503444 "K" 621056 1.6
    2017 503444 "K" 621056 1.6
    2018 503444 "K" 621056 1.6
    2014 503444 "K" 637157 1.6
    2015 503444 "K" 637157 1.6
    2016 503444 "K" 637157 1.6
    2017 503444 "K" 637157 1.6
    2018 503444 "K" 637157 1.6
    2014 503444 "K" 637165 1.6
    2015 503444 "K" 637165 1.6
    2016 503444 "K" 637165 1.6
    2017 503444 "K" 637165 1.6
    2018 503444 "K" 637165 1.6
    2014 503444 "K" 637181 1.6
    2015 503444 "K" 637181 1.6
    2016 503444 "K" 637181 1.6
    2017 503444 "K" 637181 1.6
    2018 503444 "K" 637181 1.6
    2014 503444 "K" 637190 1.6
    2015 503444 "K" 637190 1.6
    2016 503444 "K" 637190 1.6
    2017 503444 "K" 637190 1.6
    2018 503444 "K" 637190 1.6
    2014 503444 "K" 652784 1.6
    2015 503444 "K" 652784 1.6
    2016 503444 "K" 652784 1.6
    2017 503444 "K" 652784 1.6
    2018 503444 "K" 652784 1.6
    2014 503444 "K" 746142 1.6
    2015 503444 "K" 746142 1.6
    2016 503444 "K" 746142 1.6
    2017 503444 "K" 746142 1.6
    2018 503444 "K" 746142 1.6
    2014 503444 "K" 774332 1.6
    2015 503444 "K" 774332 1.6
    2016 503444 "K" 774332 1.6
    2017 503444 "K" 774332 1.6
    2018 503444 "K" 774332 1.6
    2014 503444 "K" 774341 1.6
    2015 503444 "K" 774341 1.6
    2016 503444 "K" 774341 1.6
    2017 503444 "K" 774341 1.6
    2018 503444 "K" 774341 1.6
    2014 503444 "O"      0   2
    2015 503444 "O"      0   2
    2016 503444 "O"      0   2
    2017 503444 "O"      0   2
    2018 503444 "O"      0   2
    2014 503657 "O"      0 1.4
    2015 503657 "O"      0 1.4
    2016 503657 "O"      0 1.4
    2017 503657 "O"      0 1.4
    2018 503657 "O"      0 1.4
    2014 503738 "O"      0 1.4
    2015 503738 "O"      0 1.4
    2016 503738 "O"      0 1.4
    2017 503738 "O"      0 1.4
    2018 503738 "O"      0 1.4
    2014 503941 "K" 990043 1.4
    2015 503941 "K" 990043 1.4
    2016 503941 "K" 990043 1.4
    2017 503941 "K" 990043 1.4
    2018 503941 "K" 990043 1.4
    2014 503941 "K" 990051 1.4
    2015 503941 "K" 990051 1.4
    2016 503941 "K" 990051 1.4
    2017 503941 "K" 990051 1.4
    2018 503941 "K" 990051 1.4
    2014 503941 "K" 990094 1.4
    2015 503941 "K" 990094 1.4
    2016 503941 "K" 990094 1.4
    2017 503941 "K" 990094 1.4
    2018 503941 "K" 990094 1.4
    2014 503941 "K" 990205 1.4
    2015 503941 "K" 990205 1.4
    2016 503941 "K" 990205 1.4
    2017 503941 "K" 990205 1.4
    2018 503941 "K" 990205 1.4
    2014 503941 "K" 990213 1.4
    2015 503941 "K" 990213 1.4
    2016 503941 "K" 990213 1.4
    2017 503941 "K" 990213 1.4
    2018 503941 "K" 990213 1.4
    2014 503941 "O"      0 1.4
    2015 503941 "O"      0 1.4
    2016 503941 "O"      0 1.4
    2017 503941 "O"      0 1.4
    2018 503941 "O"      0 1.4
    end

  • #2
    If I follow your explanation correctly, this should do it:

    Code:
    //    VERIFY TOWN, PART, AND YEAR IDENTIFY OBS. UNIQUELY
    isid id_town id_part year
    assert inlist(type, "K", "O")
    
    by year id_town (type), sort: gen double town_coef = coefficient[_N]
    assert town_coef == coefficient if type == "O"
    
    gen byte this_part_changed = (coefficient != town_coef)
    by year id_town: egen how_many_parts_changed = total(this_part_changed)
    gen byte some_part_changed = (how_many_parts_changed > 0)

    Comment


    • #3
      Thank you, it works great. If I may ask a similar follow-up question:

      Again, I have a panel data on towns (id) and coefficients. The observations are only on the town-year level. Now, every year, towns can decide if they want to set a coefficient. If yes, they can choose from values 2, 3, 4, 5. If they decide not to have any coefficient at all, the value of the coefficient is recorded as missing.

      I would like to calculate:
      - how many times a town changed its coefficient in the whole period
      - how many different coefficient values a town ever used
      - how many towns changed the coefficient from one year to another by more than 1 category (e.g. from 2 to 4)

      Thank you again for help. Data:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int year str6 id float coefficient
      2014 "500291" .
      2015 "500291" .
      2016 "500291" 2
      2017 "500291" 2
      2014 "503444" 2
      2015 "503444" 2
      2016 "503444" 2
      2017 "503444" 2
      2014 "506486" 5
      2015 "506486" 4
      2016 "506486" 5
      2017 "506486" 3
      2014 "507580" .
      2015 "507580" .
      2016 "507580" .
      2017 "507580" .
      2014 "509841" .
      2015 "509841" .
      2016 "509841" .
      2017 "509841" 2
      2014 "512869" .
      2015 "512869" .
      2016 "512869" .
      2017 "512869" .
      2014 "513458" 3
      2015 "513458" .
      2016 "513458" 3
      2017 "513458" 3
      2014 "513539" 2
      2015 "513539" 3
      2016 "513539" 2
      2017 "513539" 2
      2014 "514705" 2
      2015 "514705" 2
      2016 "514705" 3
      2017 "514705" .
      end

      Comment


      • #4
        I would like to calculate:
        - how many times a town changed its coefficient in the whole period.
        Note that I count changes from missing to a coefficient and from a coefficient to missing.

        Code:
        bys id (year): egen count1= total(coefficient!=coefficient[_n-1] &_n>1)
        - how many different coefficient values a town ever used
        A bit involved since you need to exclude the missing values, but this does it.

        Code:
        sort id coefficient
        by id: egen tag= total(coefficient!=coefficient[_n-1] & _n>1) if !missing(coefficient)
        bys id: egen count2= max(tag)
        replace count2= count2 +1
        - how many towns changed the coefficient from one year to another by more than 1 category (e.g. from 2 to 4)
        Code:
        bys id (year): gen tag2= abs(coefficient-coefficient[_n-1]) if _n>1 &!missing(coefficient)
        replace tag2= 0 if missing(tag2)
        bys id (year): egen count3= total(tag2>1)

        Result:

        Code:
        . l id year coefficient count*, sepby(id)
        
             +-----------------------------------------------------+
             |     id   year   coeffi~t   count1   count2   count3 |
             |-----------------------------------------------------|
          1. | 500291   2014          .        1        1        0 |
          2. | 500291   2015          .        1        1        0 |
          3. | 500291   2016          2        1        1        0 |
          4. | 500291   2017          2        1        1        0 |
             |-----------------------------------------------------|
          5. | 503444   2014          2        0        1        0 |
          6. | 503444   2015          2        0        1        0 |
          7. | 503444   2016          2        0        1        0 |
          8. | 503444   2017          2        0        1        0 |
             |-----------------------------------------------------|
          9. | 506486   2014          5        3        3        1 |
         10. | 506486   2015          4        3        3        1 |
         11. | 506486   2016          5        3        3        1 |
         12. | 506486   2017          3        3        3        1 |
             |-----------------------------------------------------|
         13. | 507580   2014          .        0        .        0 |
         14. | 507580   2015          .        0        .        0 |
         15. | 507580   2016          .        0        .        0 |
         16. | 507580   2017          .        0        .        0 |
             |-----------------------------------------------------|
         17. | 509841   2014          .        1        1        0 |
         18. | 509841   2015          .        1        1        0 |
         19. | 509841   2016          .        1        1        0 |
         20. | 509841   2017          2        1        1        0 |
             |-----------------------------------------------------|
         21. | 512869   2014          .        0        .        0 |
         22. | 512869   2015          .        0        .        0 |
         23. | 512869   2016          .        0        .        0 |
         24. | 512869   2017          .        0        .        0 |
             |-----------------------------------------------------|
         25. | 513458   2014          3        2        1        0 |
         26. | 513458   2015          .        2        1        0 |
         27. | 513458   2016          3        2        1        0 |
         28. | 513458   2017          3        2        1        0 |
             |-----------------------------------------------------|
         29. | 513539   2014          2        2        2        0 |
         30. | 513539   2015          3        2        2        0 |
         31. | 513539   2016          2        2        2        0 |
         32. | 513539   2017          2        2        2        0 |
             |-----------------------------------------------------|
         33. | 514705   2014          2        2        2        0 |
         34. | 514705   2015          2        2        2        0 |
         35. | 514705   2016          3        2        2        0 |
         36. | 514705   2017          .        2        2        0 |
             +-----------------------------------------------------+

        ADDED IN EDIT: The last question asks the total number of towns. My solution (count3) gives you the total number of times a town had a change in coefficient of greater than 2. To answer your question:

        Code:
        by id: gen wanted= count3>=1 & _n==1
        tab wanted
        The frequency for the category 1 gives you the number of towns.

        Code:
        . by id: gen wanted= count3>=1 & _n==1
        
        . tab wanted
        
             wanted |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  0 |         35       97.22       97.22
                  1 |          1        2.78      100.00
        ------------+-----------------------------------
              Total |         36      100.00
        Last edited by Andrew Musau; 12 Apr 2019, 02:49.

        Comment

        Working...
        X