Stata specifies a display format for every variable in the dataset. This format affects how the values are displayed on the screen and in some cases in output files. If the variable is numeric and contains value labels for some or all of the values, these labels are also affected by the variable format. A user can adjust default formats to fit a particular need with the help of command format.
The manual for the format command contains the following sentence:
Before version 14 of Stata nine characters wide meant literally 9 bytes wide. In the Unicode context of Stata 14 this is no longer same.
Compare for example the following presentation in the output window and the browser of the same dataset:
Here all variables are formatted with format %20.0g which according to the manual should provide the capacity for 20 characters. However, only the content in the data browser window seems to be formatted consistently with the manual, while the content in the output window does not obey the same rule and formats the values according to the byte width (Cyrillic letters occupy 2 bytes in the UTF-8 Unicode character encoding).
If the format width is doubled, it fits the text in the output window nicely, but results in an unnecessary wide spacing in the browser window. (and the situation will be worse for languages utilizing 3 and 4 byte Unicode characters):
The manual for the format command contains the following sentence:
. For example, %9.2f specifies the f format that is nine characters wide and has two digits following the decimal point.
Compare for example the following presentation in the output window and the browser of the same dataset:
Here all variables are formatted with format %20.0g which according to the manual should provide the capacity for 20 characters. However, only the content in the data browser window seems to be formatted consistently with the manual, while the content in the output window does not obey the same rule and formats the values according to the byte width (Cyrillic letters occupy 2 bytes in the UTF-8 Unicode character encoding).
If the format width is doubled, it fits the text in the output window nicely, but results in an unnecessary wide spacing in the browser window. (and the situation will be worse for languages utilizing 3 and 4 byte Unicode characters):
- I wonder, what are the recommendations for an external program saving a dataset in Stata's .dta format? Should it apply the byte-widest or character-widest format width?
- In practice, what do users prefer most commonly: browse or list?
- Is there any "fit" format - one that will expand the column precisely enough to fit the widest label/value for the purpose of list/browse commands? (I suppose no, based on the description of the existing formats).
Comment