Trimming labels of categorical variables

Sonnen Blume

Join Date: Aug 2018

Posts: 342
#1

Trimming labels of categorical variables

09 Nov 2023, 13:10

Hi,

I want to remove the () with its contents from all variable labels in the dataset. Here is an example:

educational level
- no education (below primary)
- has education

to

educational level
- no education
- has education

Please let me know if this is possible to do.

Thanks in advance!
Tags: None
Girish Venkataraman

Join Date: Dec 2021

Posts: 281
#2

09 Nov 2023, 13:32

Regex will help. See below. Please do consider using -dataex- to post your raw data. Your example above seemed easy enough, but I had to spend two minutes copy/pasting your data into Stata and finagling it to make sure my code works.

Code:

ds, has(type string) foreach v in `r(varlist)'{ gen _`v' = ustrregexra(`v', "\(.*\)", "") }
Comment

Maarten Buis

Join Date: Mar 2014
Posts: 3464

09 Nov 2023, 13:35

Code:

. // example dataset
. clear

. input ed

            ed
  1. 1
  2. 0
  3. end

. label define ed_lb 0 "no education (below primary)" ///
>                    1 "has education"

. label values ed ed_lb

.
. // what to do
. local labname : value label ed

. levelsof ed
0 1

. foreach lev in `r(levels)' {
  2.         local lab : label `labname' `lev'
  3.         gettoken lab rest : lab, parse("(")
  4.         label define `labname' `lev' `"`lab'"', modify
  5. }

.
. codebook ed

-----------------------------------------------------------------------------------------------------------------------
ed                                                                                                          (unlabeled)
-----------------------------------------------------------------------------------------------------------------------

                  Type: Numeric (float)
                 Label: ed_lb

                 Range: [0,1]                         Units: 1
         Unique values: 2                         Missing .: 0/2

            Tabulation: Freq.   Numeric  Label
                            1         0  no education
                            1         1  has education

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------

Comment

Girish Venkataraman

Join Date: Dec 2021

Posts: 281
#4

09 Nov 2023, 13:37

My bad. Did not read the title properly in the OP. Glad to see Maarten Buis's excellent solution.
Comment
Sonnen Blume

Join Date: Aug 2018

Posts: 342
#5

09 Nov 2023, 13:53

Originally posted by Girish Venkataraman View Post

Regex will help. See below. Please do consider using -dataex- to post your raw data. Your example above seemed easy enough, but I had to spend two minutes copy/pasting your data into Stata and finagling it to make sure my code works.

Code:

ds, has(type string) foreach v in `r(varlist)'{ gen _`v' = ustrregexra(`v', "\(.*\)", "") }

Thank you Girish!
1 like
Comment

Sonnen Blume

Join Date: Aug 2018
Posts: 342

09 Nov 2023, 13:54

Originally posted by Maarten Buis View Post

Code:

. // example dataset
. clear

. input ed

ed
1. 1
2. 0
3. end

. label define ed_lb 0 "no education (below primary)" ///
> 1 "has education"

. label values ed ed_lb

.
. // what to do
. local labname : value label ed

. levelsof ed
0 1

. foreach lev in `r(levels)' {
2. local lab : label `labname' `lev'
3. gettoken lab rest : lab, parse("(")
4. label define `labname' `lev' `"`lab'"', modify
5. }

.
. codebook ed

-----------------------------------------------------------------------------------------------------------------------
ed (unlabeled)
-----------------------------------------------------------------------------------------------------------------------

Type: Numeric (float)
Label: ed_lb

Range: [0,1] Units: 1
Unique values: 2 Missing .: 0/2

Tabulation: Freq. Numeric Label
1 0 no education
1 1 has education

Marvellous! Thanks so much, Maarten.

One more thing please, is it possible to apply this for the entire dataset i.e. without specifying a variable.

Announcement

Trimming labels of categorical variables

Comment

Comment

Comment

Comment

Comment