Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • summing over a unique identifier over a panel data of county and year

    I want to find the summation of unique rf_id for each unique county in different years of my dataset. Is the following going to suffice ?

    Code:
    bysort rf_id county (year) : gen wanted = _n == 1
    bysort rf_id (year county): replace wanted = sum(wanted)
    by county year: replace wanted = wanted[_N]


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long rf_id float(year county)
    102990885 2000 13121
    103020981 2000 36061
    103070511 2001 36061
    103090560 2000  9009
    103100020 2001 9009
    103100550 2000 41051
    103100882 2002 9009
    103100885 2000  6081
    103120813 2001 48113
    103170821 2000  4013
    103180001 2001  6075
    103180015 2001 51013
    103180097 2000 10003
    103200783 2000     .
    103210207 2000 36061
    103270266 2000 17031
    103340240 2000 9009
    103360009 2000  6037
    103360119 2000  6037
    103390422 2000 48201
    103400270 2000 24031
    103400773 2000 36061
    103400785 2000 27053
    103400808 2000  8031
    103400829 2000  8031
    103400840 2000  6075
    103430018 2000  6085
    103430187 2002 24031
    103460235 2000 11001
    103470580 2001 11001
    103480485 2000  6059
    103480531 2000 17031
    103490132 2001  8031
    103490362 2000 36061
    103510154 2001 36061
    103510160 2002 36061
    103510293 2000 27053
    end

  • #2
    I guess this following code did the trick. But, if you feel this is not the right code , please do let me know.

    Code:
    bysort rf_id (year county) : gen wanted = _n == 1
    bysort year county (rf_id ): replace wanted = sum(wanted)
    by year county: replace wanted = wanted[_N]

    Comment


    • #3
      See https://www.stata-journal.com/articl...article=dm0042 for various twists on this question.

      The number of distinct identifiers in each county and each year could be got by

      Code:
      bysort county year id : gen wanted = _n == 1 
      by county year : replace wanted = sum(wanted) 
      by county year : replace  wanted = wanted[_N]
      Is that what you wanted?
      Last edited by Nick Cox; 21 Oct 2022, 19:04.

      Comment


      • #4
        Mr Cox,

        Your code gave me a better and more clean result than what I did by myself. thanks for taking the time to look at my code and graciously providing me a better and more efficient snippet of coding! Very humbled to learn from you!

        Comment

        Working...
        X