Manipulating string -- Delete everything after space

Budu Gulo

Join Date: Feb 2018
Posts: 238

Manipulating string -- Delete everything after space

12 Apr 2019, 08:20

Code:

+--------------------------------------------------------+
  |        sex           age                     education |
  |--------------------------------------------------------|
  | Male (Sex)     25+ (Age)   Advanced (Aggregate levels) |
  | Male (Sex)     25+ (Age)   Advanced (Aggregate levels) |
  | Male (Sex)     25+ (Age)   Advanced (Aggregate levels) |
  | Male (Sex)     15+ (Age)   Advanced (Aggregate levels) |
  | Male (Sex)   15-64 (Age)   Advanced (Aggregate levels) |
  +--------------------------------------------------------+

I would like to delete everything after space . For example, in the first observation sex becomes Male; age becomes 25+; education becomes Advanced.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str244 sex str11 age str31 education
"Male (Sex)"   "25+ (Age)"   "Advanced (Aggregate levels)"    
"Male (Sex)"   "25+ (Age)"   "Advanced (Aggregate levels)"    
"Male (Sex)"   "25+ (Age)"   "Advanced (Aggregate levels)"    
"Male (Sex)"   "15+ (Age)"   "Advanced (Aggregate levels)"    
"Male (Sex)"   "15-64 (Age)" "Advanced (Aggregate levels)"    
"Male (Sex)"   "25+ (Age)"   "Advanced (Aggregate levels)"    
"Male (Sex)"   "15-64 (Age)" "Advanced (Aggregate levels)"    
"Male (Sex)"   "15+ (Age)"   "Advanced (Aggregate levels)"    
"Male (Sex)"   "25+ (Age)"   "Advanced (Aggregate levels)"    
"Male (Sex)"   "25+ (Age)"   "Advanced (Aggregate levels)"    
"Male (Sex)"   "25+ (Age)"   "Intermediate (Aggregate levels)"
"Female (Sex)" "25+ (Age)"   "Advanced (Aggregate levels)"    
"Male (Sex)"   "25+ (Age)"   "Advanced (Aggregate levels)"    
"Male (Sex)"   "15+ (Age)"   "Advanced (Aggregate levels)"    
"Male (Sex)"   "15-64 (Age)" "Advanced (Aggregate levels)"    
end

Tags: None

Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#2

12 Apr 2019, 08:33

The code below puts the first word in each variable into a new variable. If you don't want to have new variables, just use "replace" instead of "gen"

Code:

foreach var of sex age education{ gen `var'_reduced = word(`var',1) }
Comment
Budu Gulo

Join Date: Feb 2018

Posts: 238
#3

12 Apr 2019, 08:47

Sven-Kristjan Bormann Thanks for the code. Unfortunately, it yields invalid syntax. Please check.
Comment
Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#4

12 Apr 2019, 10:11

You are right. The correct code is

Code:

foreach var in sex age education{ gen `var'_reduced = word(`var',1) }
Comment
Budu Gulo

Join Date: Feb 2018

Posts: 238
#5

12 Apr 2019, 10:59

Many thanks Sven-Kristjan Bormann ! The code works now. I opted for the replace option. Just for reference, posting the code

Code:

foreach var in sex age education{ replace `var' = word(`var',1) }
Comment
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#6

12 Apr 2019, 12:34

Budu Gulo ,

This looks like the data producing system has combined the value labels with variable labels into the values, as Male/Female are value labels for the categorical variable Sex. If this is indeed the case, perhaps you could re-export the data from the original system (what was it?) again with different settings that will require no further cleanup.

Provided that this hypothesis is true, I see no reason for those labels to be a single word only (although it may have been the case for the three listed variables, it may not be true in other cases or in the future). If you care, perhaps a more robust strategy would be keeping everything before " (" rather than deleting everything after space.

Code:

foreach var in sex age education { replace `var' = substr(`var',1,strpos(`var'," (")-1) }

In any case, recording categorical attributes (like sex or education) in categorical (numerical labelled) variables will make your work more productive and enjoyable, especially if you are working with large data files.

Best, Sergiy Radyakin

Last edited by Sergiy Radyakin; 12 Apr 2019, 12:35. Reason: added formatting
Comment
Budu Gulo

Join Date: Feb 2018

Posts: 238
#7

13 Apr 2019, 18:03

Sergiy Radyakin : Sorry for the late reply. The data (unemployment rates) is from ILO ; downloaded in Excel format. Regarding the strategy, I find your suggestion on using "(" very helpful. It is indeed robust. Thanks a lot!
Comment

Announcement

Manipulating string -- Delete everything after space

Comment

Comment

Comment

Comment

Comment

Comment