Stata limits variable labels to a maximum of 80 characters. SPSS, on the other hand, limits variable labels to a maximum of 256 bytes. Stata's documentation states:
This is wrong on two counts. The first is that after importing an .sav file, variable labels are truncated to 80 bytes, not 80 characters. If your labels are purely ASCII characters, you will not notice the difference. But if your labels are written in a script where each character is multiple bytes, like Arabic, you'll notice quite quickly. Your label will be half as long or shorter than what Stata can actually store, and truncation will frequently occur partway through a character, leaving an invalid Unicode character at the end (appearing as �).
There's a chance there is some esoteric reason for doing it this way, and that this is not a bug but rather a mistake in the documentation. But what is almost surely a bug is that the original variable label is only stored as a variable characteristic (named spss_variable_label) if it is 82 bytes or longer. If you import a variable with an 81 byte label, the last byte is simply lost and not recoverable in the Stata data, existing neither in the 80 byte label nor in a variable characteristic.
If any of this behavior is fixed or otherwise changed in a future update, I would really appreciate it if StataCorp could reply letting me know which version has changed it. I have written a command for internal use at my company that in one step has to match up variables in Stata with variables in a different file format, and it explicitly takes all of this odd behavior into account.
(I don't believe this behavior is version dependent, but just in case, I am running the latest version of Stata 19.5, born date 18 Feb 2026, compile number 195038, on MacOS 14.7.4)
If an SPSS variable label is too long, it will be truncated to 80 characters, and the original variable label will be stored as a variable characteristic.
There's a chance there is some esoteric reason for doing it this way, and that this is not a bug but rather a mistake in the documentation. But what is almost surely a bug is that the original variable label is only stored as a variable characteristic (named spss_variable_label) if it is 82 bytes or longer. If you import a variable with an 81 byte label, the last byte is simply lost and not recoverable in the Stata data, existing neither in the 80 byte label nor in a variable characteristic.
If any of this behavior is fixed or otherwise changed in a future update, I would really appreciate it if StataCorp could reply letting me know which version has changed it. I have written a command for internal use at my company that in one step has to match up variables in Stata with variables in a different file format, and it explicitly takes all of this odd behavior into account.
(I don't believe this behavior is version dependent, but just in case, I am running the latest version of Stata 19.5, born date 18 Feb 2026, compile number 195038, on MacOS 14.7.4)

Comment