While reading a utf8 tab delimited file with
I got the following message:
There are supposedly some quoted fields with newlines, hence the bindquote option,
but removing it only changes the message to:
.
Looking at the file I can see that there are many unmatched double quotes, including
at line 5091, but always in the context of a non-ascii string, not a multi-line field.
Here is the offending line in octal:
Is it possible that -import delimited- is thinking individual bytes in a double-byte character
are quote marks? Anyway, I thought to use the -unicode convert- function to address the issue
but the command:
.
returns the message:
which I don't understand, since I have given a destination file name (bankers.tsv). Suggestions
welcome.
Daniel Feenberg
Code:
import delimited using Bankers-current.txt, encoding("utf-8") bindquote(strict)
Unmatched quote exceeded 20 lines while processing row 5688;
there may be a problem with your data or perhaps you have a quoted string
with too many lines. You may specify maxquotedrows() to override the
default behavior.
there may be a problem with your data or perhaps you have a quoted string
with too many lines. You may specify maxquotedrows() to override the
default behavior.
but removing it only changes the message to:
Note: Unmatched quote while processing row 5091; this can be due to a
formatting problem in the file or because a quoted data element spans
multiple lines..
formatting problem in the file or because a quoted data element spans
multiple lines..
Looking at the file I can see that there are many unmatched double quotes, including
at line 5091, but always in the context of a non-ascii string, not a multi-line field.
Here is the offending line in octal:
HTML Code:
<xmp> 0000000 R U 7 1 3 8 5 3 8 6 \t 320 221 320 220 320 0000020 235 320 232 " 320 241 320 220 320 235 320 232 320 242 </xmp>
are quote marks? Anyway, I thought to use the -unicode convert- function to address the issue
but the command:
.
Code:
unicode convertfile Bankers-current.txt bankers.tsv ,dstencoding(latin1) srccallback(skip) replace srcencoding(utf8)
Code:
file "Bankers-current.txt" can not be converted to the same file[ r(602);
welcome.
Daniel Feenberg
Comment