I wonder if anyone can help me interpret the following results (see below) of using the compress command.
My confusion arises from the fact that Stata allocates totally (see grand total value) of 253,813,893 bytes for various purposes and utilization and then after the compress command reports saving memory more than this grand total (457,062,876 bytes saved).
IMHO the savings are capped at grand total, or specifically by grand total used.
Furthermore, this doesn't agree with the actual savings (if the data are saved to a file and filesize before and after are compared).
My confusion arises from the fact that Stata allocates totally (see grand total value) of 253,813,893 bytes for various purposes and utilization and then after the compress command reports saving memory more than this grand total (457,062,876 bytes saved).
IMHO the savings are capped at grand total, or specifically by grand total used.
Furthermore, this doesn't agree with the actual savings (if the data are saved to a file and filesize before and after are compared).
Code:
. memory
Memory usage
used allocated
---------------------------------------------------------------------
data 154,974,000 234,881,024
strLs 0 0
---------------------------------------------------------------------
data & strLs 154,974,000 234,881,024
---------------------------------------------------------------------
data & strLs 154,974,000 234,881,024
var. names, %fmts, ... 1,188 34,832
overhead 13,647,920 13,648,076
Stata matrices 0 0
ado-files 13,484 13,484
stored results 0 0
Mata matrices 0 0
Mata functions 0 0
set maxvar usage 5,231,728 5,231,728
other 4,749 4,749
---------------------------------------------------------------------
grand total 173,867,693 253,813,893
. save "C:\temp\industry_before.dta", replace
file C:\temp\industry_before.dta saved
.
. recast strL industrystr
. compress
industrystr is strL now coalesced
(457,062,876 bytes saved)
. memory
Memory usage
used allocated
---------------------------------------------------------------------
data 53,904,000 301,989,888
strLs 80,352,806 80,352,806
---------------------------------------------------------------------
data & strLs 134,256,806 382,342,694
---------------------------------------------------------------------
data & strLs 134,256,806 382,342,694
var. names, %fmts, ... 1,188 34,832
overhead 13,647,936 13,648,076
Stata matrices 0 0
ado-files 13,484 13,484
stored results 0 0
Mata matrices 0 0
Mata functions 0 0
set maxvar usage 5,231,728 5,231,728
other 4,744 4,744
---------------------------------------------------------------------
grand total 153,150,510 401,275,558
. save "C:\temp\industry_after.dta", replace
file C:\temp\industry_after.dta saved
.
end of do-file

Comment