Stata manual declares the ranges for each data storage type in the manual for datatypes, for example, the range for byte is: [-127;100]. The interval is asymmetric since the largest values in the range are reserved for missing and extended missing values.
So the total number of values that can be represented by a byte type is: 127 (negatives) + 1 (zero) +100 (positives) + 1 (mv) + 26 (extended mv) = 255.
But one byte can hold 256 different values! Upon careful inspection we can see that the value -128 (which can in theory be represented by a signed byte storage type) is blacklisted. We can't create such a value byte variable in Stata:
results in a missing value in variable x.
The problem is that Stata doesn't do the same validation when opening datasets, and happily accepts value -128. This results in a rather strange behavior later.
Produces the following output:
while identical stats are expected for clones and originals.
So the total number of values that can be represented by a byte type is: 127 (negatives) + 1 (zero) +100 (positives) + 1 (mv) + 26 (extended mv) = 255.
But one byte can hold 256 different values! Upon careful inspection we can see that the value -128 (which can in theory be represented by a signed byte storage type) is blacklisted. We can't create such a value byte variable in Stata:
Code:
clear set obs 1 generate byte x=-128
The problem is that Stata doesn't do the same validation when opening datasets, and happily accepts value -128. This results in a rather strange behavior later.
Code:
clear use "http://www.radyakin.org/statalist/2017/1371324-stata-data-types.dta" clonevar V1=V0 summarize
Code:
Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- V0 | 1 -128 . -128 -128 V1 | 0
- What is the purpose of blacklisting values like -128, -32768, etc for their exact value types? For example, C# fits -128 nicely into a signed byte type (sbyte).
- Would it be possible to add validation to prohibit such values in input files if these values are subsequently incorrectly processed in Stata? (as demonstrated by clonevar/summarize example above).
Comment