Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • -float- and -compress- don't quite handle "bogus" doubles

    Pedantic point that sometimes matters: I've just noticed that replacing a variable stored as a double with its float() value does not yield something -compress-ible. This is not earth-shattering, but it seemed odd to me. -recast- is often necessary.

    When importing from an Excel file (and perhaps in other situations), numeric values import as doubles even when they don't really have or need that precision. In such situations, I'm compulsive enough to not want to store them as doubles. On discovering that -compress- would not compress them to floats, I figured I would just replace each bogus double variable with its float() representation, and then compress, which didn't do anything, as it left the floated values as doubles. Here's an illustration:
    Code:
    clear
    set obs 1
    gen double x = 1.23
    compress
    recast x, force
    desc
    Since -help float()- indicates that it returns " the value of x rounded to float precision," I should think that it would produce a compressible value for x, but it doesn't. Is this just imprecise documentation?

    The time this would matter would be when one genuinely needs to compress a large file, with a sizeable number of bogus double variables. -recast float x1-x999, force- will work in that case, but I'd wager that most of us don't tend to recall that -recast- is out there.




  • #2
    Well, I hadn't remembered this either and was a bit surprised by what you've found, but the Stata 15.1 current documentation is actually quite clear:

    compress reduces the size of your dataset by considering two things. First, it considers demoting
    doubles to longs, ints, or bytes
    floats to ints or bytes
    longs to ints or bytes
    ints to bytes
    str#s to shorter str#s
    strLs to str#s
    In other words, shrinking doubles to floats isn't even a consideration in -compress-. It doesn't even try to see if that could be done with no loss of information. So it looks like you have to use -recast- for this purpose, either with the -force- option or after -replace x = float(x)-.

    Comment


    • #3
      Whoops, thanks for catching my presumption there.

      Comment

      Working...
      X