Dear Statalist Users,
I want to import a ".csv" dataset which contains both numeric and string variables.
First I tried using
The command works, but for some observations all variable values are stored within a simple variable. I browsed through the raw data with a text editor and I think it has to do with quotation marks ( " ) and the delimiter symbol ( ; ), which both can be found as strings for a certain string variable. Some string values look like this
"bet"yes".
Hence, I think that due to the uneven amount of quotation marks, Stata searches further for the line for the closing quotation symbol, which results in my problem.
Next, I tried
as I thought this might solve the issue, but it does not and the same problem occurs. Is this the case because I have not specified the -bindqoutes- option?
My next try was
This solves the first problem, but gives birth to another. As previously mentioned, I also have string values with the delimiter symbol it self in it, for instance "a;b".
Thus, it will now use the semicolon, which is supposed to be meant as a string as an delimiter. As a result, for observation with semicolons as a string, the variables values are shifted to the right and it results in one extra variable for each semicolon, which was meant to be a string value.
Is there a way to tackle both problems at once? I haven't found a solution yet and my next move would be to stick with the last command and reshift the affected variable values, after the import.
Best regards,
Ali
I want to import a ".csv" dataset which contains both numeric and string variables.
First I tried using
Code:
import delimited mydata.csv, delimiter(";") varnames(1)
"bet"yes".
Hence, I think that due to the uneven amount of quotation marks, Stata searches further for the line for the closing quotation symbol, which results in my problem.
Next, I tried
Code:
import delimited mydata.csv, delimiter(";") varnames(1) stripquotes(yes)
My next try was
Code:
import delimited mydata.csv, delimiter(";") varnames(1) bindquote(nobind) stripquotes(yes)
Thus, it will now use the semicolon, which is supposed to be meant as a string as an delimiter. As a result, for observation with semicolons as a string, the variables values are shifted to the right and it results in one extra variable for each semicolon, which was meant to be a string value.
Is there a way to tackle both problems at once? I haven't found a solution yet and my next move would be to stick with the last command and reshift the affected variable values, after the import.
Best regards,
Ali
Comment