Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • What miss option means in gunique ?

    Deat Stata user,
    I found this gunique function, and using the same variable list I got different total observations and unique observations by adding or not add
    Code:
    miss
    option.

    Code:
    gunique id name_first name_last
    N = 19,632,203; 4,678,062 unbalanced groups of sizes 1 to 6,026
    Code:
    gunique id name_first name_last, miss
    N = 19,632,557; 4,678,416 unbalanced groups of sizes 1 to 6,026
    Does anyone knows why the N and number of unbalanced groups are different in these 2 cases? And what means size 1 to 6026?

    Thanks a lot!

  • #2
    When you specify the -miss- option, -gunique- includes missing values as distinct values; without that option -gunique- skips over observations with missing values. So that's why with -miss- you get a bigger total N and a larger number of groups. The sizes refer to how many observations are associated with a given distinct value. So, in your case, there is some combination of id, name_first, and name_list that appears only once in the data set. And there is some combination that appears 6,026 times. All other combinations appear some number of times between those limits.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      When you specify the -miss- option, -gunique- includes missing values as distinct values; without that option -gunique- skips over observations with missing values. So that's why with -miss- you get a bigger total N and a larger number of groups. The sizes refer to how many observations are associated with a given distinct value. So, in your case, there is some combination of id, name_first, and name_list that appears only once in the data set. And there is some combination that appears 6,026 times. All other combinations appear some number of times between those limits.
      Hi Clyde, Thank you so much!

      Comment

      Working...
      X