Removing Leading and Trailing Spaces

Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#1

Removing Leading and Trailing Spaces

03 Jul 2022, 15:29

Consider the following full worked example

Code:

import delimited "http://qed.econ.queensu.ca/jae/datasets/ke001/readme.kh.txt", /// rowrange(45:75) clear delim(")") bindquote(strict) replace v2 =ustrregexra(v2,"\p{Z}","")

My goal is to remove all leading and trailing spaces in my string variable, since a command I've written doesn't play nice with trailing spaces.

As current, my syntax here gets rid of leading spaces (and apparently the one for Inner Mongolia, but for now I can live with that). Optimally, I'd want all leading and trailing spaces gone. Equally, I've tried

Code:

import delimited "http://qed.econ.queensu.ca/jae/datasets/ke001/readme.kh.txt", /// rowrange(45:75) clear delim(")") bindquote(strict) replace v2 =strtrim(v2)

And I still have trailing spaces. Why? Maybe it's a special character being interpreted by me as a space?
Tags: None

Daniel Schaefer

Join Date: Mar 2020
Posts: 814

03 Jul 2022, 16:04

Hey Jared, I just used R to quickly print the string literal of the section of text you are working with:

Code:

"(1) Beijing\t\n(2) Tianjin\t\n(3) Hebei\t\n(4) Shanxi\t\n(5) Inner Mongolia\t\n(6) Liaoning\t\n(7) Jilin\t\n(8) Heilongjiang\t\n(9) Shanghai\t\n(10) Jiangsu\t\n(11) Zhejiang\t\n(12) Anhui\t\n(13) Fujian\t\n(14) Jiangxi\t\n(15) Shandong\t\n(16) Henan\t\n(17) Hubei\t\n(18) Hunan\t\n(19) Guangdong\t\n(20) Guangxi\t\n(21) Hainan\t\n(22) Chongqing\t\n(23) Sichuan\t\n(24) Guizhou\t\n(25) Yunnan\t\n(26) Tibet\t\n(27) Shaanxi\t\n(28) Gansu\t\n(29) Qinghai\t\n(30) Ningxia\t\n(31) Xinjiang"

I'm thinking those tab characters (\t) are causing your problem.

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

03 Jul 2022, 16:14

Your data has a nuisance trailing tab on each line.

Code:

28 31 29 20 42 65 69 6A 69 6E 67 09 0A ( 1 ) B e i j i n g

Not sure enough about \p{Z} to say why your regex didn't solve the problem.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10194
#4

03 Jul 2022, 16:17

Code:

replace v2 =ustrtrim(v2)

See

Code:

help ustrtrim()
Comment

Jared Greathouse

Join Date: Sep 2021
Posts: 2170

03 Jul 2022, 16:22

Yeah I figured this was part of the answer.

Code:

import delimited "http://qed.econ.queensu.ca/jae/datasets/ke001/readme.kh.txt", ///
    delimiter(tab) varnames(1) ///
    rowrange(45:75) clear  ///
    bindquote(nobind)
    
rename xi province

split province, p(")")

replace province1 = regexs(2) if regexm(province1, "^([^0-9]*)([0-9]+)([^0-9]*)$")

drop province v2

rename (province1 province2) (id province)

replace province = subinstr(province, " ", "", .)

This is more or less what I've done, which works. Presumably, a more efficient solution exists.

Comment

Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#6

03 Jul 2022, 17:59

This seems sufficient for your described problem. I also use Andrew's suggestion of ustrtrim() and use a different regex pattern to remove the province number.

Code:

gen want = ustrregexs(2) if ustrregexm(province, "^(\([0-9]+\)[\s]+)(.*)", .) replace want = ustrtrim(want) // for good measure, but not necessary with example data
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#7

03 Jul 2022, 18:57

Yep all the solutions here work. Thank you all so much!
Comment

Announcement

Removing Leading and Trailing Spaces

Comment

Comment

Comment

Comment

Comment

Comment