removing spaces in string variable using subinstr doesn't fully work

Zhang_Lu

Join Date: Oct 2014

Posts: 155
#1

removing spaces in string variable using subinstr doesn't fully work

27 Oct 2014, 21:27

Hello, specialists, I encounter a weird problem when trying to removing spaces in string variables using subinstr function. This approach worked resonablely well previously w.r.t some variables imported from several .xls file. However, it cannot fully remove the spaces in some recent cases. Here I upload the raw dataset (land_granting_y2008) and my do.file (importing_mulsheets_test.do).So I 'll highly appreciate it if someone can check it for me to see where the problem arise. The main problem is that after some procedure like replace area=subinstr(area," ","",.) , there's still a space after decimal point and I cannot further remove them. I guess there may be some invisible charcter or something like suggested by previous FAQ answers however I don't quite understand it.
Attached Files

importing_mulsheets_test.do (0, 0 views)

land_granting_y2007.dta (45.2 KB, 1 view)

land_granting_y2007.dta (45.2 KB, 1 view)

land_granting_y2007.xls (0, 0 views)

importing_mulsheets_test.do (0, 0 views)

land_granting_y2007.xls (0, 0 views)

importing_mulsheets_test.do (0, 0 views)

Last edited by Zhang_Lu; 27 Oct 2014, 21:29.
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4421
#2

27 Oct 2014, 21:35

Try something like replace area = subinstr(area, char(160), "", .) and report back to the list if that doesn't work.
Comment
Zhang_Lu

Join Date: Oct 2014

Posts: 155
#3

27 Oct 2014, 21:51

I'm afraid it doesn't work neither, in fact , with replace area=subinstr(area," ","",.) the data did change,and returned like
. replace area=subinstr(area," ","",.)
(351 real changes made)
with the new command , it changes nothing
Comment
Zhang_Lu

Join Date: Oct 2014

Posts: 155
#4

28 Oct 2014, 01:14

After reading some former FAQ， I guess my problem are due to that the space in each of my oberservations has length more than 1, and subinstr function can only eliminates substrings of length 1 （I'm not sure if this statesment is correct, it's according to http://www.stata.com/statalist/archi...8.html），so it means " " equals to " " as an argument in subinstr function.
other possible is , as I have mentioned , it's some invisible element behind the decimal point, not spaces
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4421
#5

28 Oct 2014, 01:55

In the Excel worksheets, those decimal points appear not to be decimal points, but rather periods (full stops) in Unicode.

Stata cannot work with Unicode, and so you've apparently set up your Windows operating system to use some kind of ANSI-compliant coding scheme in order to be able to see the Chinese characters correctly when you use Stata on your machine. But they're still double-byte encoded underneath.

If the glyphs for periods (full stops) in Chinese character sets is anything like those in Japanese, then they have their own built-in space after, and that's what you're seeing, not an actual separate space character.

You might try list in 1 and selecting and copying the period on the screen, and then pasting it between the two double-quote marks in the second subinstr() argument. On my machine, when I use your Stata datasets, the period looks like £® in the non-Unicode setting that I have at the moment (I think that its ISO Latin 1 or something--it takes a restart in Windows 8/8.1 to switch between encoding schemes, which I haven't done), and so it would look something like replace area = subinstr(area, "£®", "", .) if I were to do it on my machine. On yours, with your current encoding set-up, it would look like a period and a space, I believe.

Last edited by Joseph Coveney; 28 Oct 2014, 01:57. Reason: Make that [font=courier]replace area = subinstr(area, "£®", ".", 1)[/font]
Comment

Announcement

removing spaces in string variable using subinstr doesn't fully work

Comment

Comment

Comment

Comment