Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Basic addition stumper

    Hello,

    A couple folks on our team have been stumped by this one. We're working in Stata 16.1. I stripped the issue down to a little bit of test code that I'm hoping somebody can pick apart. Basically, when `i'*10000 is commented out we get output that we'd expect (630,1230,631,1231, ...), but when that piece is included it skips the odd values (19610630,19621230,19610632,19621232, 19610632, 19621232,...). It feel like we're missing something really obvious, but I'm not sure what it could be. I tried floor(`i'*10000), but same result.

    We don't need a workaround. This is easy to do in a different way. But this is a concerning issue to not be able to resolve.

    Thanks!

    Code:
    clear
    
    set seed 1000
    
    set obs 63
    
    gen year = 1960 +_n
    
    gen month = runiformint(1,12)
    
    gen cohort = .
    format cohort  %9.0f
    
    
    forval y = 0/9 {
    
    forvalues i = 1960/2023 {
        qui replace cohort  = `i'*10000 + 63`y' if inrange(month,1,6) & year==`i'
        qui replace cohort  = `i'*10000 + 123`y' if inrange(month,7,12) & year==`i'
    
    }
    di cohort[1]
    di cohort[2]
    qui replace cohort = .
    }

  • #2
    Your variable needs to be declared as a double data type instead of the default, float. See -help data types-. This issue has come up here several times before.

    Comment


    • #3
      For integers, I would go with long here.

      Comment


      • #4
        The problem that Leonardo and Daniel are referring to is called precision. It has to do with how computers store numbers: it uses a string of 0s and 1s to represent a number. By default Stata reserves 32 bits (or 4 bytes (or 8 nybbles)) for a single number. If the number doesn't fit within that reserved space, it will be rounded. That is what happened in your case: with 4 bytes Stata can store about 7 significant digits, and the number you want to store has 8.

        The number of significant digits is in part reduced by the fact that Stata wants to allow for a fractional part. If you know that the values in your variable have no fractional part (i.e. is an integer) you can store more with those 4 bytes. This is Daniel's suggestion for using long. Using longs you can store 9 significant digits instead of 7 with those same 4 bytes. Alternatively, you can just reserve more bytes for a single number and increase the precision that way. This is Leonardo's suggestion using double. This reserves 8 bytes instead of 4 for each number, which gives you 16 digits of accuracy.

        You can read more about this fascinating/nerdish topic in help precision
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          Thanks for the replies - especially to Maarten for the patient reply! We're doing our best over here. With this problem I wasn't sure how to search for this issue, since I didn't know what the solution was, and so I didn't really know how to search for the problem. Sorry for spamming the board with a problem that has come up in the past.

          Our team gets focused on the implications of output pretty quickly in our daily work, aren't usually running very complex code, and aren't often thinking about why our software would seemingly change numbers when we didn't direct it to. Also, we're rarely working with big data whose size can become a resource constraint. Given that, it looks like recommending that my team members run set type double, permanently might be a good idea and, if needed, just run compress at the end of processes. I didn't see any bright red flags in the precision help file with doing something like this. Does anybody have any important warnings about making double permanent?

          Comment


          • #6
            Many users have come to the same conclusion as you about set type double. Suffice it to say that Stata leaves it to users by decide by having a default default of float that you can override.

            That's what I go with myself - default float. But even as someone long familiar with this issue (I've written about it many times) I get bitten occasionally and then blame myself when I realise usually fairly quickly what I forgot. Other way round, in a team you have to think about the least experienced user too and the risk of bizarre or even incorrect results wasting time and effort.

            Comment


            • #7
              I take the opposite approach as Nick, where I have set my default type as -double-. The reason is that my datasets are rarely larger than available memory, and so in that respect, memory is cheap. I also make use of -compress- prior to saving any analysis datasets, and that can give back some memory savings. When I start bumping up to memory limits, the problems at that point are usually not ones that are solved by saving some memory between doubles and floats.

              Comment


              • #8
                It's the same point: make your own decision according to your needs.

                Comment


                • #9
                  Originally posted by Nick Cox View Post
                  It's the same point: make your own decision according to your needs.
                  I completely agree, and was offering only a different approach.

                  Comment

                  Working...
                  X