Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Renaming the variables by parsing a string

    Dear all,

    First-time poster here. I am using the PSID data set and want to include all the variables from waves in my analysis. I am aware that it will be a huge dataset. As you know the PSID variable names are not consistently coded across waves. I renamed all the variables by their variable labels using -lab2varn. Now I want to rename the variable names so that the numeric values you see in the names end up as suffixes. For example, “education_1980_ head” should become something like “education_ head_80”. “tot_txbl_income_80_81” should become “tot_txbl_income_81”. The goal is to have proper stub names with year suffix to ultimately create a panel. Maybe running a loop that does it to all variable names can be helpful. How should I approach this?




    Thank you for your help!


    input byte release_number float _1968_INTERVIEW_NUMBER byte(education_1980_head education_1980_wife education_1981_head education_1981_wife) long tot_txbl_income_80_81
    2 2 1 1 1 0 0
    2 2 1 1 1 0 0
    2 2 1 1 . . 0
    2 4 2 3 2 3 25000
    2 4 2 3 2 3 0
    2 4 3 3 3 3 10000
    2 4 7 4 7 4 3000
    2 4 3 2 3 2 0
    2 4 2 3 . . 0
    2 4 2 3 2 3 0
    2 4 3 3 3 3 0
    2 4 3 2 3 2 0
    2 4 3 3 3 3 0
    2 4 3 2 3 2 0
    2 4 7 4 7 4 0
    2 4 7 4 7 4 0
    2 4 2 3 . . 0
    2 4 3 3 3 3 0
    2 4 3 2 3 2 0
    2 4 3 3 3 3 0
    2 4 3 3 3 3 0
    2 4 2 3 2 3 0
    2 4 2 3 2 3 0
    2 4 3 3 3 3 0
    2 4 3 3 3 3 0
    2 4 2 3 2 3 0
    2 4 3 3 3 3 0
    2 4 3 2 3 2 16600
    2 4 7 4 7 4 22480
    2 4 3 3 3 3 10000
    2 4 2 3 2 3 0
    2 4 2 3 2 3 0
    2 4 2 3 . . 0
    2 4 3 3 3 3 0
    2 4 3 3 3 3 0
    2 4 3 3 3 3 0
    2 4 3 3 3 3 0
    2 5 3 3 3 3 12200
    2 5 3 3 3 3 0
    2 5 3 3 3 3 3050
    2 5 3 3 3 3 0
    2 5 3 3 3 3 0
    2 5 3 3 3 3 0
    2 5 3 3 3 3 0
    2 5 3 3 3 3 0
    2 5 3 3 3 3 0
    2 5 3 3 3 3 0
    2 5 3 3 3 3 0
    2 5 3 3 3 3 0
    2 5 3 3 3 3 0
    2 5 3 3 3 3 0
    2 5 3 3 3 3 0
    2 5 3 3 3 3 0
    2 5 3 3 3 3 0
    2 5 3 3 3 3 0
    2 5 3 3 3 3 0
    2 5 3 3 3 3 0
    2 6 2 4 2 4 2000
    2 6 2 4 2 4 4700
    2 6 3 5 3 5 12500
    2 6 7 7 7 7 4600
    2 6 3 5 3 5 0
    2 6 3 5 3 5 0
    2 6 7 7 7 7 0
    2 6 7 7 7 7 0
    2 6 7 7 7 7 0
    2 6 7 7 7 7 0
    2 6 7 7 7 7 0
    2 6 7 7 7 7 0
    2 6 7 7 7 7 0
    2 6 3 5 3 5 30000
    2 6 3 5 3 5 0
    2 6 7 7 7 7 12500
    2 6 7 7 7 7 0
    2 6 7 7 7 7 0
    2 6 7 7 7 7 0
    2 6 7 7 7 7 0
    2 6 7 7 7 7 0
    2 7 2 2 2 2 3300
    2 7 2 2 2 2 0
    2 7 4 4 4 4 2881
    2 7 3 0 3 0 0
    2 7 3 3 3 3 0
    2 7 3 0 3 0 0
    2 7 4 4 4 4 0
    2 7 4 4 4 4 0
    2 7 3 0 3 0 0
    2 7 3 0 3 0 0
    2 7 4 4 4 4 0
    2 7 3 3 3 3 0
    2 7 3 0 3 0 0
    2 7 3 3 3 3 0
    2 7 3 3 3 3 0
    2 7 3 3 3 3 0
    2 7 3 0 3 0 0
    2 7 3 3 3 3 0
    2 7 3 0 3 0 0
    2 7 3 0 3 0 0
    2 7 4 4 4 4 0
    2 7 3 0 3 0 0
    end
    [/CODE]


  • #2
    Aman, you may use -rename group- to flexibly rename variable names, as below.

    Code:
    rename *_(##)_(##) *_(##)[3]
    rename *_(##)(##)_* *[1]_*[4]_(##)[3]
    Please note that the code above also renamed "_1968_INTERVIEW_NUMBER".

    Code:
    . des, f
    
    Contains data
      obs:           100                          
     vars:             7                          
    ---------------------------------------------------------------------------------------------
                  storage   display    value
    variable name   type    format     label      variable label
    ---------------------------------------------------------------------------------------------
    release_number  byte    %8.0g                 
    _INTERVIEW_NUMBER_68
                    float   %9.0g                 
    education_head_80
                    byte    %8.0g                 
    education_wife_80
                    byte    %8.0g                 
    education_head_81
                    byte    %8.0g                 
    education_wife_81
                    byte    %8.0g                 
    tot_txbl_income_81
                    long    %12.0g                
    ---------------------------------------------------------------------------------------------
    Last edited by Fei Wang; 17 Jul 2022, 20:07.

    Comment


    • #3
      This is exactly what I was looking for. Thank you so much. If I may ask a follow-up question, is there a way to include all stubs with a reshape command while converting the dataset from wide to long? Since I wish to include all the variables (around 5000), it would be helpful not to type all the stubs manually.

      Thank you so much for your help. I do appreciate it.

      Comment


      • #4
        Aman, assume in your data example the stubs are "education_head_", "education_wife_", and "tot_txbl_income_" and the two-digit year should be extracted to create the new variable "year" in the long form. Below is a code example to extract all stubs.

        Code:
        des education_head_80-tot_txbl_income_81, varlist  // place all the original variables of the stubs in the list
        local vl = r(varlist)
        local stub = ustrregexra("`vl'", "\d", "")
        local stub: list uniq stub
        You may see that the macro "stub" stores all the stubs, and may be used for reshaping.

        Code:
        . di "`stub'"
        education_head_ education_wife_ tot_txbl_income_
        
        reshape long "`stub'", ...
        Last edited by Fei Wang; 17 Jul 2022, 21:09.

        Comment


        • #5
          This is really helpful, Fei! I really appreciate it. Thank you!

          Best,

          Comment

          Working...
          X