Basic addition stumper

Michael Weinerman

Join Date: Nov 2017

Posts: 22
#1

Basic addition stumper

06 Nov 2023, 15:17

Hello,

A couple folks on our team have been stumped by this one. We're working in Stata 16.1. I stripped the issue down to a little bit of test code that I'm hoping somebody can pick apart. Basically, when `i'*10000 is commented out we get output that we'd expect (630,1230,631,1231, ...), but when that piece is included it skips the odd values (19610630,19621230,19610632,19621232, 19610632, 19621232,...). It feel like we're missing something really obvious, but I'm not sure what it could be. I tried floor(`i'*10000), but same result.

We don't need a workaround. This is easy to do in a different way. But this is a concerning issue to not be able to resolve.

Thanks!

Code:

clear set seed 1000 set obs 63 gen year = 1960 +_n gen month = runiformint(1,12) gen cohort = . format cohort %9.0f forval y = 0/9 { forvalues i = 1960/2023 { qui replace cohort = `i'*10000 + 63`y' if inrange(month,1,6) & year==`i' qui replace cohort = `i'*10000 + 123`y' if inrange(month,7,12) & year==`i' } di cohort[1] di cohort[2] qui replace cohort = . }
Tags: None
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#2

06 Nov 2023, 15:49

Your variable needs to be declared as a double data type instead of the default, float. See -help data types-. This issue has come up here several times before.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3872
#3

06 Nov 2023, 15:59

For integers, I would go with long here.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3464
#4

07 Nov 2023, 02:37

The problem that Leonardo and Daniel are referring to is called precision. It has to do with how computers store numbers: it uses a string of 0s and 1s to represent a number. By default Stata reserves 32 bits (or 4 bytes (or 8 nybbles)) for a single number. If the number doesn't fit within that reserved space, it will be rounded. That is what happened in your case: with 4 bytes Stata can store about 7 significant digits, and the number you want to store has 8.

The number of significant digits is in part reduced by the fact that Stata wants to allow for a fractional part. If you know that the values in your variable have no fractional part (i.e. is an integer) you can store more with those 4 bytes. This is Daniel's suggestion for using long. Using longs you can store 9 significant digits instead of 7 with those same 4 bytes. Alternatively, you can just reserve more bytes for a single number and increase the precision that way. This is Leonardo's suggestion using double. This reserves 8 bytes instead of 4 for each number, which gives you 16 digits of accuracy.

You can read more about this fascinating/nerdish topic in help precision

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
3 likes
Comment
Michael Weinerman

Join Date: Nov 2017

Posts: 22
#5

07 Nov 2023, 09:19

Thanks for the replies - especially to Maarten for the patient reply! We're doing our best over here. With this problem I wasn't sure how to search for this issue, since I didn't know what the solution was, and so I didn't really know how to search for the problem. Sorry for spamming the board with a problem that has come up in the past.

Our team gets focused on the implications of output pretty quickly in our daily work, aren't usually running very complex code, and aren't often thinking about why our software would seemingly change numbers when we didn't direct it to. Also, we're rarely working with big data whose size can become a resource constraint. Given that, it looks like recommending that my team members run set type double, permanently might be a good idea and, if needed, just run compress at the end of processes. I didn't see any bright red flags in the precision help file with doing something like this. Does anybody have any important warnings about making double permanent?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35754
#6

07 Nov 2023, 09:27

Many users have come to the same conclusion as you about set type double. Suffice it to say that Stata leaves it to users by decide by having a default default of float that you can override.

That's what I go with myself - default float. But even as someone long familiar with this issue (I've written about it many times) I get bitten occasionally and then blame myself when I realise usually fairly quickly what I forgot. Other way round, in a team you have to think about the least experienced user too and the risk of bizarre or even incorrect results wasting time and effort.
3 likes
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#7

07 Nov 2023, 10:10

I take the opposite approach as Nick, where I have set my default type as -double-. The reason is that my datasets are rarely larger than available memory, and so in that respect, memory is cheap. I also make use of -compress- prior to saving any analysis datasets, and that can give back some memory savings. When I start bumping up to memory limits, the problems at that point are usually not ones that are solved by saving some memory between doubles and floats.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35754
#8

07 Nov 2023, 10:16

It's the same point: make your own decision according to your needs.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#9

07 Nov 2023, 11:49

Originally posted by Nick Cox View Post

It's the same point: make your own decision according to your needs.

I completely agree, and was offering only a different approach.
Comment

Announcement

Basic addition stumper

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment