Renaming the variables by parsing a string

Aman Ojas Desai

Join Date: Jul 2022

Posts: 9
#1

Renaming the variables by parsing a string

17 Jul 2022, 19:18

Dear all,

First-time poster here. I am using the PSID data set and want to include all the variables from waves in my analysis. I am aware that it will be a huge dataset. As you know the PSID variable names are not consistently coded across waves. I renamed all the variables by their variable labels using -lab2varn. Now I want to rename the variable names so that the numeric values you see in the names end up as suffixes. For example, “education_1980_ head” should become something like “education_ head_80”. “tot_txbl_income_80_81” should become “tot_txbl_income_81”. The goal is to have proper stub names with year suffix to ultimately create a panel. Maybe running a loop that does it to all variable names can be helpful. How should I approach this?

Thank you for your help!

input byte release_number float _1968_INTERVIEW_NUMBER byte(education_1980_head education_1980_wife education_1981_head education_1981_wife) long tot_txbl_income_80_81
2 2 1 1 1 0 0
2 2 1 1 1 0 0
2 2 1 1 . . 0
2 4 2 3 2 3 25000
2 4 2 3 2 3 0
2 4 3 3 3 3 10000
2 4 7 4 7 4 3000
2 4 3 2 3 2 0
2 4 2 3 . . 0
2 4 2 3 2 3 0
2 4 3 3 3 3 0
2 4 3 2 3 2 0
2 4 3 3 3 3 0
2 4 3 2 3 2 0
2 4 7 4 7 4 0
2 4 7 4 7 4 0
2 4 2 3 . . 0
2 4 3 3 3 3 0
2 4 3 2 3 2 0
2 4 3 3 3 3 0
2 4 3 3 3 3 0
2 4 2 3 2 3 0
2 4 2 3 2 3 0
2 4 3 3 3 3 0
2 4 3 3 3 3 0
2 4 2 3 2 3 0
2 4 3 3 3 3 0
2 4 3 2 3 2 16600
2 4 7 4 7 4 22480
2 4 3 3 3 3 10000
2 4 2 3 2 3 0
2 4 2 3 2 3 0
2 4 2 3 . . 0
2 4 3 3 3 3 0
2 4 3 3 3 3 0
2 4 3 3 3 3 0
2 4 3 3 3 3 0
2 5 3 3 3 3 12200
2 5 3 3 3 3 0
2 5 3 3 3 3 3050
2 5 3 3 3 3 0
2 5 3 3 3 3 0
2 5 3 3 3 3 0
2 5 3 3 3 3 0
2 5 3 3 3 3 0
2 5 3 3 3 3 0
2 5 3 3 3 3 0
2 5 3 3 3 3 0
2 5 3 3 3 3 0
2 5 3 3 3 3 0
2 5 3 3 3 3 0
2 5 3 3 3 3 0
2 5 3 3 3 3 0
2 5 3 3 3 3 0
2 5 3 3 3 3 0
2 5 3 3 3 3 0
2 5 3 3 3 3 0
2 6 2 4 2 4 2000
2 6 2 4 2 4 4700
2 6 3 5 3 5 12500
2 6 7 7 7 7 4600
2 6 3 5 3 5 0
2 6 3 5 3 5 0
2 6 7 7 7 7 0
2 6 7 7 7 7 0
2 6 7 7 7 7 0
2 6 7 7 7 7 0
2 6 7 7 7 7 0
2 6 7 7 7 7 0
2 6 7 7 7 7 0
2 6 3 5 3 5 30000
2 6 3 5 3 5 0
2 6 7 7 7 7 12500
2 6 7 7 7 7 0
2 6 7 7 7 7 0
2 6 7 7 7 7 0
2 6 7 7 7 7 0
2 6 7 7 7 7 0
2 7 2 2 2 2 3300
2 7 2 2 2 2 0
2 7 4 4 4 4 2881
2 7 3 0 3 0 0
2 7 3 3 3 3 0
2 7 3 0 3 0 0
2 7 4 4 4 4 0
2 7 4 4 4 4 0
2 7 3 0 3 0 0
2 7 3 0 3 0 0
2 7 4 4 4 4 0
2 7 3 3 3 3 0
2 7 3 0 3 0 0
2 7 3 3 3 3 0
2 7 3 3 3 3 0
2 7 3 3 3 3 0
2 7 3 0 3 0 0
2 7 3 3 3 3 0
2 7 3 0 3 0 0
2 7 3 0 3 0 0
2 7 4 4 4 4 0
2 7 3 0 3 0 0
end
[/CODE]
Tags: label, loop, panel data, rename

Fei Wang

Join Date: Oct 2021
Posts: 726

17 Jul 2022, 20:02

Aman, you may use -rename group- to flexibly rename variable names, as below.

Code:

rename *_(##)_(##) *_(##)[3]
rename *_(##)(##)_* *[1]_*[4]_(##)[3]

Please note that the code above also renamed "_1968_INTERVIEW_NUMBER".

Code:

. des, f

Contains data
  obs:           100                          
 vars:             7                          
---------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
---------------------------------------------------------------------------------------------
release_number  byte    %8.0g                 
_INTERVIEW_NUMBER_68
                float   %9.0g                 
education_head_80
                byte    %8.0g                 
education_wife_80
                byte    %8.0g                 
education_head_81
                byte    %8.0g                 
education_wife_81
                byte    %8.0g                 
tot_txbl_income_81
                long    %12.0g                
---------------------------------------------------------------------------------------------

Last edited by Fei Wang; 17 Jul 2022, 20:07.

Comment

Aman Ojas Desai

Join Date: Jul 2022

Posts: 9
#3

17 Jul 2022, 20:16

This is exactly what I was looking for. Thank you so much. If I may ask a follow-up question, is there a way to include all stubs with a reshape command while converting the dataset from wide to long? Since I wish to include all the variables (around 5000), it would be helpful not to type all the stubs manually.

Thank you so much for your help. I do appreciate it.
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#4

17 Jul 2022, 20:59

Aman, assume in your data example the stubs are "education_head_", "education_wife_", and "tot_txbl_income_" and the two-digit year should be extracted to create the new variable "year" in the long form. Below is a code example to extract all stubs.

Code:

des education_head_80-tot_txbl_income_81, varlist // place all the original variables of the stubs in the list local vl = r(varlist) local stub = ustrregexra("`vl'", "\d", "") local stub: list uniq stub

You may see that the macro "stub" stores all the stubs, and may be used for reshaping.

Code:

. di "`stub'" education_head_ education_wife_ tot_txbl_income_ reshape long "`stub'", ...

Last edited by Fei Wang; 17 Jul 2022, 21:09.
Comment
Aman Ojas Desai

Join Date: Jul 2022

Posts: 9
#5

18 Jul 2022, 07:56

This is really helpful, Fei! I really appreciate it. Thank you!

Best,
Comment

Announcement

Renaming the variables by parsing a string

Comment

Comment

Comment

Comment