Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Strange error: "do not uniquely identify" when they clearly do

    Here's a weird one you guys might be able to help me out with.

    I have 4 variables: id, id_2, year, month.

    When I try
    Code:
     isid id id_2 year month
    I get that they do not uniquely identify. However, they should, so I explored more and ran
    Code:
     bysort id id_2 year month: gen count=_n
    and then tabbed the count, finding that there were ~3000 observations that were repeats and 2.8 mil that were unique.

    However, when I look at the supposed repeats, I see this (attached picture). Look at the first two rows. Clearly they are unique. Yet Stata disagrees.

    Thoughts? I'm stumped.
















  • #2
    This has come up in several posts lately. You probably need to recast the id_2 variable as a "double" or a "long". Just plain "float" (the default) has limitations on how many integer digits it can store.

    Depending on how you got the variables into Stata, then you may face a complicated situation, since once they're stored as float, your extra digits are just plain gone. Hopefully it's an ID you generated (so you have control over how it's generated and can declare a variable type right then and there), or you're using StatTransfer, you can specify storage types. If it's an insheet situation, kinda messy since you would need to fall back on infile and fully specify things. If it was an existing Stata dataset, you may be in a bind.

    See http://www.statalist.org/forums/foru...vidual-records and check out Clyde's suggestion about post three; there have been other discussions that go into greater depth, just that one was a handy place to start (and searching doesn't work so well, at least not for me).

    Or, I may be completely wrong, and there is some other issue at work. Just that a lot of people have been bitten by this (including myself; I fell back on string in the situation I faced), so it feels like the float precision bug at work.
    Last edited by ben earnhart; 27 Sep 2014, 16:00.

    Comment


    • #3
      Oh, I just realized you had a recent post about "expand." If it was the expand command that out you over the edge, then you *are* in a position to control the way id_2 gets generated, and then you're set.

      Comment


      • #4
        Precision is worth checking. But in the example, month clearly differs yet Stata is saying that the values are the same. Are you sure that what you typed above is what you actually typed into Stata?
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          Yes, I tried it multiple ways to confirm that I wasn't mistyping it. I just retried again. Same problem.

          This is actually only tangentially related to the post I had about expanding -- these IDs are, unfortunately, inherited. However, I don't think the IDs are the error. They are all the exact same length, so if it is an issue with they type it should affect all of the observations.

          Comment


          • #6
            Originally posted by Cody Cook View Post
            Yes, I tried it multiple ways to confirm that I wasn't mistyping it. I just retried again. Same problem.
            It can't be.

            As Richard intimates, what you claim to have typed, bysort id id_2 year month: gen count=_n, won't produce the values that you show for the count variable in the screenshot that you attached . . . unless, of course, you're listing only the second of two identical rows for each observation shown in the attachment, the first of each pair of duplicates having a value of 1 for count. (That is, the command that you didn't show was list if count == 2.)

            If your dataset is too large to attach, why don't you run through
            Code:
            log using Log.smcl
            capture noisily isid id id_2 year month
            sort id id_2 year month
            list if id == 10959, noobs sepby(month)
            log close
            and attach the log file?

            Comment


            • #7
              I like Joseph's theory. We don't know what command produced the listing. I had some other ideas but Joseph's idea is definitely worth pursuing first.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 19.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                The problem is that you typed
                bysort id id_2 year month: gen count=_n
                li if count==2
                which will only show you the second instance for each of your duplicate pairs. You want instead
                bysort id id_2 year month: gen count=_N
                li if count==2
                Note that _n indicates the observation number within by-group, so it varies by observation, whereas _N indicates the total number of observations within by-group, so it is constant within each by-group.

                Comment


                • #9
                  Ahh I'm an idiot. Thanks guys. It was indeed _n instead of _N. I'm traveling now and can't confirm that, but I'm fairly sure. I'll post again if there are still any errors.

                  Comment

                  Working...
                  X