-float- and -compress- don't quite handle "bogus" doubles

Mike Lacy

Join Date: Apr 2014

Posts: 2425
#1

-float- and -compress- don't quite handle "bogus" doubles

14 Jan 2019, 15:38

Pedantic point that sometimes matters: I've just noticed that replacing a variable stored as a double with its float() value does not yield something -compress-ible. This is not earth-shattering, but it seemed odd to me. -recast- is often necessary.

When importing from an Excel file (and perhaps in other situations), numeric values import as doubles even when they don't really have or need that precision. In such situations, I'm compulsive enough to not want to store them as doubles. On discovering that -compress- would not compress them to floats, I figured I would just replace each bogus double variable with its float() representation, and then compress, which didn't do anything, as it left the floated values as doubles. Here's an illustration:

Code:

clear set obs 1 gen double x = 1.23 compress recast x, force desc

Since -help float()- indicates that it returns " the value of x rounded to float precision," I should think that it would produce a compressible value for x, but it doesn't. Is this just imprecise documentation?

The time this would matter would be when one genuinely needs to compress a large file, with a sizeable number of bogus double variables. -recast float x1-x999, force- will work in that case, but I'd wager that most of us don't tend to recall that -recast- is out there.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30179
#2

14 Jan 2019, 16:28

Well, I hadn't remembered this either and was a bit surprised by what you've found, but the Stata 15.1 current documentation is actually quite clear:

compress reduces the size of your dataset by considering two things. First, it considers demoting
doubles to longs, ints, or bytes
floats to ints or bytes
longs to ints or bytes
ints to bytes
str#s to shorter str#s
strLs to str#s

In other words, shrinking doubles to floats isn't even a consideration in -compress-. It doesn't even try to see if that could be done with no loss of information. So it looks like you have to use -recast- for this purpose, either with the -force- option or after -replace x = float(x)-.
1 like
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2425
#3

14 Jan 2019, 22:59

Whoops, thanks for catching my presumption there.
Comment

Announcement

-float- and -compress- don't quite handle "bogus" doubles

Comment

Comment