Dear Statalisters,
For further manipulations, I'm looking for a native Stata solution to read in text data, so that every character (including spaces) is read in as one separate observation (one character per line).
So what I would basically need is to insert whitespaces between every characters of my input file.
With a stream editior (e.g. GNU sed) this can be easily done by just replacing every character with itself followed by a space.
The following solution using the filefilter command works, but is very crude, I am looking for something more elegant and especially faster (when it comes to bigger text sizes).
********************* START *********************
/* generate some test data */
clear
set obs 1
gen str word="Hello world. This is a test! `"
outfile using test.raw, noquote replace
/* filefilter the test data
to replace every character with the character + a space in front of it
[ backslash and left quote in one additional routine] */
qui {
forvalues i=33(1)255 {
if !(`i'==92|`i'==96) {
noisily di "`=char(`i')'"
filefilter test.raw test_1.raw, ///
from(`"`=char(`i')'"') to(`" `=char(`i')'"') replace
erase test.raw
copy test_1.raw test.raw, replace
}
}
filefilter test.raw test_1.raw, ///
from(\BS) to(`" \BS"') replace
erase test.raw
copy test_1.raw test.raw, replace
filefilter test.raw test_1.raw, ///
from(\LQ) to(`" \LQ"') replace
erase test.raw
copy test_1.raw test.raw, replace
/* replace spaces by a placeholder */
filefilter test.raw test_1.raw, ///
from(`" "') to(`" *SPACE* "') replace
erase test.raw
copy test_1.raw test.raw, replace
}
/* infile data */
infile str10 char using test, clear
/* replace placeholder */
replace char=" " if char=="*SPACE*"
compress
/* erase the test data */
erase test.raw
erase test_1.raw
********************* END *********************
So if anyone has any ideas (e.g. using the file command or regular expressions), please let me know, I would be very grateful.
Many thanks
Ali
P.S.: I'm using Stata 12.1
For further manipulations, I'm looking for a native Stata solution to read in text data, so that every character (including spaces) is read in as one separate observation (one character per line).
So what I would basically need is to insert whitespaces between every characters of my input file.
With a stream editior (e.g. GNU sed) this can be easily done by just replacing every character with itself followed by a space.
The following solution using the filefilter command works, but is very crude, I am looking for something more elegant and especially faster (when it comes to bigger text sizes).
********************* START *********************
/* generate some test data */
clear
set obs 1
gen str word="Hello world. This is a test! `"
outfile using test.raw, noquote replace
/* filefilter the test data
to replace every character with the character + a space in front of it
[ backslash and left quote in one additional routine] */
qui {
forvalues i=33(1)255 {
if !(`i'==92|`i'==96) {
noisily di "`=char(`i')'"
filefilter test.raw test_1.raw, ///
from(`"`=char(`i')'"') to(`" `=char(`i')'"') replace
erase test.raw
copy test_1.raw test.raw, replace
}
}
filefilter test.raw test_1.raw, ///
from(\BS) to(`" \BS"') replace
erase test.raw
copy test_1.raw test.raw, replace
filefilter test.raw test_1.raw, ///
from(\LQ) to(`" \LQ"') replace
erase test.raw
copy test_1.raw test.raw, replace
/* replace spaces by a placeholder */
filefilter test.raw test_1.raw, ///
from(`" "') to(`" *SPACE* "') replace
erase test.raw
copy test_1.raw test.raw, replace
}
/* infile data */
infile str10 char using test, clear
/* replace placeholder */
replace char=" " if char=="*SPACE*"
compress
/* erase the test data */
erase test.raw
erase test_1.raw
********************* END *********************
So if anyone has any ideas (e.g. using the file command or regular expressions), please let me know, I would be very grateful.
Many thanks
Ali
P.S.: I'm using Stata 12.1
Comment