Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • unique identifier at County year level

    in my dataset i have a uniqye fake_id. this data is distributed at panel of year and county level. i have another inc variable which takes either value of 1 or 0. i want to find out how many total unique identifier fake_id

    Can anyone tell me how I can do that?

    The best I could come up with this following , but I'm not sure if this is doable.

    Code:
    bysort county year inc : gen wanted = _n == 1 
    by county year : replace wanted = sum(wanted) 
    by county year : replace  wanted = wanted[_N]
    The following is a sample of my data

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double fake_id float(county year) byte inc
    20010000020  6085 2000 0
    20010000104  6085 2000 0
    20010000164 27053 2000 0
    20010000185  8001 2000 0
    20010000225 48167 2000 1
    20010000246  6085 2000 0
    20010000331  6097 2000 0
    20010000421  6085 2000 0
    20010000476  6085 2000 0
    20010000488  6081 2000 0
    20010000493  6085 2000 0
    20010000505  6041 2000 1
    20010000519 10003 2000 0
    20010000604  6085 2000 0
    20010000644 39153 2000 0
    20010000701 37183 2000 0
    20010000729 17097 2000 0
    20010000764  6073 2000 0
    20010000783 24027 2000 0
    20010000898  6085 2000 0
    20010000971  8013 2000 1
    20010001012 48453 2000 1
    20010001055 36029 2001 0
    20010001059  6037 2000 1
    20010001099 24027 2000 0
    20010001113  6001 2000 0
    20010001201  6085 2000 0
    20010001237 27161 2000 0
    20010001259 27171 2000 0
    20010001270 27053 2001 1
    20010001306 34039 2001 0
    20010001316 27003 2000 0
    20010001665 48453 2001 0
    20010001666 48453 2001 0
    20010001679 48453 2001 1
    20010001706 36061 2000 0
    20010001708 34027 2001 0
    20010001726 16001 2001 0
    20010001812  6085 2001 0
    20010001868 16001 2001 0
    20010001880 48453 2000 0
    20010001885 24005 2001 1
    20010001889 41067 2001 0
    20010001910  6037 2001 0
    20010001927 29215 2001 0
    20010001958 50007 2000 0
    20010002020 36103 2001 0
    20010002228  6037 2001 0
    20010002251  6085 2001 0
    20010002267 17031 2001 0
    20010002269  6037 2000 0
    20010002298 34025 2001 1
    20010002348 41045 2000 0
    20010002395 34039 2001 0
    20010002406  6081 2001 0
    20010002407 26065 2001 0
    20010002431  6007 2001 0
    20010002434 34027 2000 0
    20010002446 18141 2001 0
    20010002492  6013 2000 0
    20010002605 34021 2001 0
    20010002779 34023 2001 0
    20010002798  6085 2000 0
    20010003001  9001 2000 0
    20010003039 24031 2001 0
    20010003183 34025 2000 0
    20010003221  6001 2001 0
    20010003338 45003 2000 0
    20010003628 42101 2001 0
    20010003755  9001 2000 0
    20010003795 48113 2001 0
    20010003846 12099 2000 0
    20010004000  6085 2000 0
    20010004080  6111 2001 0
    20010004247 18083 2000 0
    20010004306  6085 2001 0
    20010004354  6085 2001 0
    20010004395 44007 2000 0
    20010004472 27163 2001 0
    20010004534  6085 2001 0
    20010004535 13179 2000 0
    20010004552  6085 2000 0
    20010004641 29189 2001 0
    20010004773 48251 2000 0
    20010004801 25025 2001 0
    20010004808 34027 2000 0
    20010004882 48439 2001 0
    20010004928 55139 2001 0
    20010005042 48453 2001 0
    20010005281 36061 2001 0
    20010005303 41067 2001 0
    20010005414 17031 2001 0
    20010005492  6059 2001 0
    20010005509 48453 2001 0
    20010005510 34031 2001 0
    20010005532 27163 2001 0
    20010005629  6085 2001 0
    20010005716 34023 2001 0
    20010005724 45077 2001 0
    20010005729 42091 2001 0
    end

  • #2
    Your code encapsulates the main idea in this territory. For one thread, see

    Code:
    SJ-23-2 dm0042_4  . . . . . . . . . . . . . . . . Software update for distinct
            (help distinct, distinctgen if installed)  N. J. Cox and G. M. Longton
            Q2/23   SJ 23(2):595--596
            most important change is addition of distinctgen command
    
    SJ-20-4 dm0042_3  . . . . . . . . . . . . . . . . Software update for distinct
            (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
            Q4/20   SJ 20(4):1028--1030
            sort() option has been added
    
    SJ-15-3 dm0042_2  . . . . . . . . . . . . . . . . Software update for distinct
            (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
            Q3/15   SJ 15(3):899
            improved table format and display of large numbers of
            observations
    
    SJ-12-2 dm0042_1  . . . . . . . . . . . . . . . . Software update for distinct
            (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
            Q2/12   SJ 12(2):352
            options added to restrict output to variables with a minimum
            or maximum of distinct values
    
    SJ-8-4  dm0042  . . . . . . . . . . . .  Speaking Stata: Distinct observations
            (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
            Q4/08   SJ 8(4):557--568
            shows how to answer questions about distinct observations
            from first principles; provides a convenience command

    Comment


    • #3
      Originally posted by Nick Cox View Post
      Your code encapsulates the main idea in this territory. For one thread, see

      Code:
      SJ-23-2 dm0042_4 . . . . . . . . . . . . . . . . Software update for distinct
      (help distinct, distinctgen if installed) N. J. Cox and G. M. Longton
      Q2/23 SJ 23(2):595--596
      most important change is addition of distinctgen command
      
      SJ-20-4 dm0042_3 . . . . . . . . . . . . . . . . Software update for distinct
      (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
      Q4/20 SJ 20(4):1028--1030
      sort() option has been added
      
      SJ-15-3 dm0042_2 . . . . . . . . . . . . . . . . Software update for distinct
      (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
      Q3/15 SJ 15(3):899
      improved table format and display of large numbers of
      observations
      
      SJ-12-2 dm0042_1 . . . . . . . . . . . . . . . . Software update for distinct
      (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
      Q2/12 SJ 12(2):352
      options added to restrict output to variables with a minimum
      or maximum of distinct values
      
      SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations
      (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
      Q4/08 SJ 8(4):557--568
      shows how to answer questions about distinct observations
      from first principles; provides a convenience command
      Mr. Cox, are you supposed to give me a link of certain thread but mistakenly posted something else ?

      Comment


      • #4
        I am just flagging that the command distinct is one contribution here; see especially the 2008 paper for a review of various techniques.

        Comment


        • #5
          Mr. Cox,

          It took me a day to understand the whole mechanism. Now, I realize that I don't need to include the fake id in my coding to get the summation of my inc variable = 1 or 0 at a unique county level for a speficic year to make my desired panel level data. Because, my fake_id has exactly the same amount of distinct number as my total observation of full data. On top of it for each fake_id observation my inc variable has either 1 or 0 value.

          Therefore, I have come up with the following code:

          Code:
          // Sort the data by county, year
          bysort county year : gen count_1 = sum(inc == 1)
          bysort county year : gen count_0 = sum(inc == 0)
          
          collapse (sum) count_1 count_0, by(county year)
          OR

          Code:
          egen inc_sum = total(inc == 1), by(county year)
          collapse (sum) inc_sum, by(county year)
          My personal understanding is both of them are right. But, in case if you have some other thoughts please do share

          Comment

          Working...
          X