Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • egen min producing wrong value

    Hello, StataListers. I am using Stata SE 15.1 for Windows and am having a strange egen problem. I would like to egen a value that equals the minimum randomid value within each value of a variable I have called selectiongroup. I am doing this to try to select one case in each selectiongroup to focus on for additional research activities.

    The variable selectiongroup is a float variable with 26 unique values: 111, 112, 113, 115, 121, 123...[some more three-digit values]...345. They are not sequential, as you can see, but I don't think that should matter. The variable randomid is long with a range of 311428 to 99509254, also not sequential.

    I wrote the following simple script:

    Code:
    bysort selectiongroup: egen minID=min(randomid)
    format min %15.0f
    gen selected=1 if randomid==minID
    However, for a subset of the selectiongroup values, no case was selected:

    HTML Code:
    selectiong |       selected
          roup |         1          . |     Total
    -----------+----------------------+----------
           111 |         1          0 |         1 
           112 |         0          1 |         1 
           113 |         1          2 |         3 
           115 |         1          5 |         6 
           121 |         1          3 |         4 
           123 |         1          2 |         3 
           124 |         0          1 |         1 
           125 |         0          2 |         2 
           131 |         1         14 |        15 
           132 |         0          1 |         1 
           133 |         0          4 |         4 
           134 |         0          2 |         2 
           135 |         1          3 |         4 
           141 |         1          8 |         9 
           142 |         0          3 |         3 
           143 |         0          1 |         1 
           144 |         0          1 |         1 
           145 |         1          1 |         2 
           323 |         1          4 |         5 
           324 |         1          2 |         3 
           331 |         1          0 |         1 
           333 |         1          3 |         4 
           334 |         1          7 |         8 
           335 |         1          2 |         3 
           344 |         1          4 |         5 
           345 |         1          1 |         2 
    -----------+----------------------+----------
         Total |        17         77 |        94
    When I browse the data, I see that some of the minID values are +/- 1 or 2 from the randomID that is the minimum within that selectiongroup. For example, the minID for selectiongroup 112, which has only one randomid of 69729831, is 69729832. For selectiongroup 125, where the smallest randomID is 59146117, the minID variable is equal to 59146116. For others, the code seems to have worked fine. Any ideas about why this is coming up with imprecise results? It's not that many groups, so I could figure this out manually, but it just seems like a perplexing issue.


  • #2
    The word "imprecise" was well chosen. This is a precision problem. Your original variable, randomid, has too many digits to be correctly stored in a float. Your variable randomid is a long. But your -egen- command does not specify a storage type for minID, and Stata defaults to -float-. So you are having low order (binary) digits chopped off, and that leads to the problem you are seeing. The fix is:

    Code:
    by selectiongroup, sort: egen long minID = min(randomid)
    Added: see -help precision- and -help data types- for more information.

    Comment


    • #3
      Thank you so much!

      Comment

      Working...
      X