Export delimited CSV / Empty rows [BUG?]

Valerio Giuffrida

Join Date: Dec 2015

Posts: 7
#1

Export delimited CSV / Empty rows [BUG?]

26 Feb 2016, 07:24

Dear STATALIST,
I have a problem which I can't find reason for and I couldn't find any reference to that in previous discussions.
I have a STATA 14.1 SE version

I have a big dataset of 593,000 rows and 18 cols.
If I simply save the dataset it has no empty rows. [save "temp.dta"]
If I export in csv format I have the issue described below. [export delimited "temp.csv"]

From 351,646 to 366,451 I will have some entry behaving like in the attached table.

Is it possible that this is a STATA bug in the export function?

Regards,
Valerio Giuffrida

Last edited by Valerio Giuffrida; 26 Feb 2016, 07:29.
Tags: bug, CSV, export
eric_a_booth

Join Date: Apr 2014

Posts: 292
#2

26 Feb 2016, 08:50

No I don't think it's a bug.
(Without seeing your file) My best guess is that you have ascii codes in the string variables in these rows that create line breaks.

These line break ascii codes are hidden characters. There are two common types: carriage returns which are ascii code 13 (commonly \r) and linefeeds which are ascii value 10 (commonly \n). [The combination of these two is a CRLF and marked as \r\n]

You want to remove these from within your strings. To do so, you can run something like:

Code:

**remove the problematic ascii codes from all string vars ds, has(type string) foreach x in `r(varlist)' { replace `x' = subinstr(`x', `"`=char(10)'"', "", .) }

or you can explicitly list the variables you think are problematic. In order to find these problematic ascii codes, you can use the SSC package -charlist- . Additionally, you can help diagnose other issues with your file by using -hexdump- (useful if my suspicions are wrong and the issue is not related to the ascii codes 10 or 13).

Here's some example do-file code where I (1) recreate your issue in the system dataset "auto.dta" that is (2) exported to Excel/csv. (3) Then I use the code above to remove the problem ascii codes .

Code:

sysuse auto, clear export delimited using temp1.csv, replace replace make = make + `" `=char(10)'"' if rep78==3 l make rep78 in 25/50 //examine the problem export delimited using temp2.csv, replace **examine hexdump temp2.csv , analyze res cap which charlist //from SSC if _rc ssc install charlist, replace charlist make local x `"`r(ascii)'"' local y = `=regexm(`"`x'"', "(10|13)")' local y = `=regexs(1)' di `"`y' code exists in your data"' //shows the problematic ascii codes in 'y' **remove the problematic ascii codes from all string vars ds, has(type string) foreach x in `r(varlist)' { replace `x' = subinstr(`x', `"`=char(10)'"', "", .) } l make rep78 in 25/50 //FIXED

Last edited by eric_a_booth; 26 Feb 2016, 09:00.

Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX
Comment
Valerio Giuffrida

Join Date: Dec 2015

Posts: 7
#3

26 Feb 2016, 09:04

Thank you Eric for your quick reply,
I will run this code on my files soon and I'll let you know if that is the case.
I'm not sharing the complete file because of its dimension, but I can imagine the possibility you pointed out is realistic. If it's not the case I will share a portion of the database.

Regards,
Valerio
Comment
Sourav Sarkar

Join Date: Aug 2017

Posts: 1
#4

16 Aug 2017, 14:24

Thank you very much for this discussion. Thank you eric. Your following code worked in my case for exporting stata data to csv files:
ds, has(type string)
foreach x in `r(varlist)' {
replace `x' = subinstr(`x', `"`=char(10)'"', "", .)
}
Even if there is nothing to replace, do we lose anything if we type the above code anytime before exporting the stata data as a csv file? I actually have to deal with large datasets and will not like to check everytime whether the exported file is proper.

Thank You!
Comment

Announcement

Export delimited CSV / Empty rows [BUG?]

Comment

Comment

Comment