Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata Automatically rounds numbers... Turning off this feature?

    Hi everyone,

    I have the following sort of empirical dilemma. To match observations between two separate datasets, I manually created an identifier that's common. However, when I type in an id, it seems to round it up. For example, using compustat data, I do:

    replace firmid=19999997 if conm=="ACCRETIVE HEALTH INC"

    However, stata rounds it to: 19999996.

    How can I avoid this since it's producing duplicate estimates when I try to match the two datasets together? I did some simple searches online, but didn't find anything -- only things showing up on google were how to round numbers, not to avoid rounding!

    (I should also note that I've been playing around with the "format" feature, e.g., format firmid %12.0f, but it's not working 100% as hoped.)

    Thank you!
    Last edited by Christos Makridis; 07 Feb 2016, 16:28.

  • #2
    The "automatic rounding" arises because Stata is obeying your command. You're using by default a float type which isn't adequate for your purpose. There aren't enough bits in a float to hold 8 digit integers exactly.

    The key word here is precision and there are probably hundreds of posts in Statalist on the subject. search precision in Stata for resources. (As always, one does need to know good keywords.)

    Your question is complicated by the fact you are trying to replace an existing variable. Note that the existing values may be compromised already. Otherwise I would try.

    Code:
     
    recast long firmid 
    replace firmid=19999997 if conm=="ACCRETIVE HEALTH INC"

    Comment


    • #3
      This is a precision issue. Do read -help precision- and -help data types- for more information. Briefly, the default storage type for a variable created with -gen- or -replace- is -float-, and that can only hold 7 digits of precision. For numbers with 8 or more digits, there are not enough bits available to represent them all exactly, so Stata takes the nearest value that fits in -float-.

      You can get around this by using the -double- storage type of the -long- storage type instead. With -long- you can get 9 digits, and with -double- you can get 16. If it is feasible to go back and re-create the data set you are working with, at the point where firmid is created, make it a long or double. If you cannot re-create the data set, you can -recast double firmid- before you get to your -replace firmid...- command. (But, in all honesty, I would be worried that -firmid- already contains incorrect values even before you get to your -replace firmid...- command as result of this issue.)

      Note: Crossed in cyberspace with Nick's post, which says essentially the same thing.

      Comment


      • #4
        Ah, precision does raise many more relevant searches and not surprisingly your suggestions work out exactly as they should. Re-running the data using "double" works fine, but "recast" is a good trick to know for future reference too. Thanks a lot!

        Comment

        Working...
        X