I have a string variable
that provides a sentence about the number and type of people on board of boats. I want to convert this variable into three variables:
for the total number of passengers,
for the number of crew and
for the number of children. The text is inconsistent in that it doesn't mention crew or children if there are none, e.g.:
I created N_all via:
But I have not been able to successfully extract the crew or children using regular expressions. For example,
gives the error "regexp: nested *?+". What am I doing wrong?
Code:
People
Code:
N_all
Code:
N_crew
Code:
N_children
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str29 People "49 (2 crew, 17 children) " "50 (1 crew, 8 children) " "40 (2 crew, 4 children) " "47 (2 crew, 13 children) " "27 (2 crew, 4 children) " "58 (2, crew, 2 children) " "38 (2 crew, 3 children) " "28 (2 crew, 2 children) " "20 (2 crew) " "3 (1 crew) " "3 (2 crew) " "41 (1 crew, 9 children) " "10 (3 crew) " "37 (6 children) " "3 (2 crew) " "4 " end
Code:
gen N_all = regexs(0) if regexm(People, "^[0-9]+")
Code:
gen N_crew = regexs(0) if regexm(People, "(\d+)[^\d]+?(?=crew)")
Comment