Hi.
I am importing some voter registration databases that have >8 million records. The text files are tab delimited. Column names in the first row. If it matters, the support file says "encoding: UTF-16 LE."
When importing in Stata/MP 15.1, I use a command like this: import delimited "...`filedate'.txt", stringcols(_all) clear
For some files, I get this statement from Stata: "Note: 1,171,366,858 binary zeros were ignored in the source file. The first instance occurred on line 1. Binary zeros are not valid in text data. Inspect your data carefully."
The files seem to import fine, but I'm wondering what this warning means. Does this have to do with the encoding? Or use of the -stringcol(_all)- option? Should I specify the UTF type?
Thanks for any advice.
I am importing some voter registration databases that have >8 million records. The text files are tab delimited. Column names in the first row. If it matters, the support file says "encoding: UTF-16 LE."
When importing in Stata/MP 15.1, I use a command like this: import delimited "...`filedate'.txt", stringcols(_all) clear
For some files, I get this statement from Stata: "Note: 1,171,366,858 binary zeros were ignored in the source file. The first instance occurred on line 1. Binary zeros are not valid in text data. Inspect your data carefully."
The files seem to import fine, but I'm wondering what this warning means. Does this have to do with the encoding? Or use of the -stringcol(_all)- option? Should I specify the UTF type?
Thanks for any advice.
Comment