Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape / Rename Problem

    Dear Statalist community,

    I am trying to transform a data from wide into long format. The problem is that my data has thousands of variables, and I do not want to write all of them into the reshape command. To avoid writing all the variables, I am using the "ds" command combined with local to obtain the list of variables that I want to reshape. The problem is that those variables name finish with the year, in particular variables are called XXXXX_2018. What I want to do is rename the variables XXX so that I can then write: reshape varlist, i(PUBID) j(year) and I do not need to write every single variable. However, I am having problems renaming those variables. The code is the following:

    Code:
      ds PUBID KEY_SEX KEY_BDATE_M KEY_BDATE_Y CV_SAMPLE_TYPE KEY_RACE_ETHNICITY,not
      local allvar r(varlist)
      di `allvar'
      
      foreach vari in  `allvar' {
      local  aa = subinstr("`vari'","_2018","",.)
      rename `vari' `aa'
      }
    1. Is there a more efficient way to do a reshape without having to write thousands of variables names?
    2. How can I fix my code so that I can efficiently eliminate the suffix "_2018" from my variables names?

    I attach the code that is producing the following error:

    Code:
    .   ds PUBID KEY_SEX KEY_BDATE_M KEY_BDATE_Y CV_SAMPLE_TYPE KEY_RACE_ETHNICITY,not
    ~1_01_M_1998  ~1_05_Y_1998  ~2_05_M_1998  ~3_04_Y_1998  YSCH_21300..  YSCH_21300..  YSCH_21800..  YSCH_21800..  NE~E_05_1998  NE~T_02_1998
    ~1_01_Y_1998  ~2_01_M_1998  ~2_05_Y_1998  ~3_05_M_1998  YSCH_21300..  YSCH_21300..  YSCH_21800..  YSCH_21800..  NE~E_06_1998  NE~T_03_1998
    ~1_02_M_1998  ~2_01_Y_1998  ~3_01_M_1998  ~3_05_Y_1998  YSCH_21300..  YSCH_21300..  YSCH_21800..  YSCH_21800..  NE~D_01_1998  NE~T_04_1998
    ~1_02_Y_1998  ~2_02_M_1998  ~3_01_Y_1998  ~4_01_M_1998  YSCH_21300..  YSCH_21300..  YSCH_21800..  YSCH_21800..  NE~D_02_1998  NE~T_05_1998
    ~1_03_M_1998  ~2_02_Y_1998  ~3_02_M_1998  ~4_01_Y_1998  YSCH_21300..  YSCH_21300..  YSCH_21800..  YSCH_21800..  NE~D_03_1998  NE~T_06_1998
    ~1_03_Y_1998  ~2_03_M_1998  ~3_02_Y_1998  ~5_01_M_1998  YSCH_21300..  YSCH_21300..  YSCH_21800..  NE~E_01_1998  NE~D_04_1998
    ~1_04_M_1998  ~2_03_Y_1998  ~3_03_M_1998  ~5_01_Y_1998  YSCH_21300..  YSCH_21800..  YSCH_21800..  NE~E_02_1998  NE~D_05_1998
    ~1_04_Y_1998  ~2_04_M_1998  ~3_03_Y_1998  YSCH_21300..  YSCH_21300..  YSCH_21800..  YSCH_21800..  NE~E_03_1998  NE~D_06_1998
    ~1_05_M_1998  ~2_04_Y_1998  ~3_04_M_1998  YSCH_21300..  YSCH_21300..  YSCH_21800..  YSCH_21800..  NE~E_04_1998  NE~T_01_1998
    
    .   local allvar r(varlist)
    
    .   di `allvar'
    YSCH_20400_01_01_M_1998 YSCH_20400_01_01_Y_1998 YSCH_20400_01_02_M_1998 YSCH_20400_01_02_Y_1998 YSCH_20400_01_03_M_1998 YSCH_20400_01_03_Y_1998 YSCH_2
    > 0400_01_04_M_1998 YSCH_20400_01_04_Y_1998 YSCH_20400_01_05_M_1998 YSCH_20400_01_05_Y_1998 YSCH_20400_02_01_M_1998 YSCH_20400_02_01_Y_1998 YSCH_20400
    > _02_02_M_1998 YSCH_20400_02_02_Y_1998 YSCH_20400_02_03_M_1998 YSCH_20400_02_03_Y_1998 YSCH_20400_02_04_M_1998 YSCH_20400_02_04_Y_1998 YSCH_20400_02_
    > 05_M_1998 YSCH_20400_02_05_Y_1998 YSCH_20400_03_01_M_1998 YSCH_20400_03_01_Y_1998 YSCH_20400_03_02_M_1998 YSCH_20400_03_02_Y_1998 YSCH_20400_03_03_M
    > _1998 YSCH_20400_03_03_Y_1998 YSCH_20400_03_04_M_1998 YSCH_20400_03_04_Y_1998 YSCH_20400_03_05_M_1998 YSCH_20400_03_05_Y_1998 YSCH_20400_04_01_M_199
    > 8 YSCH_20400_04_01_Y_1998 YSCH_20400_05_01_M_1998 YSCH_20400_05_01_Y_1998 YSCH_21300_01_01_1998 YSCH_21300_01_02_1998 YSCH_21300_01_03_1998 YSCH_213
    > 00_01_04_1998 YSCH_21300_01_05_1998 YSCH_21300_02_01_1998 YSCH_21300_02_02_1998 YSCH_21300_02_03_1998 YSCH_21300_02_04_1998 YSCH_21300_02_05_1998 YS
    > CH_21300_03_01_1998 YSCH_21300_03_02_1998 YSCH_21300_03_03_1998 YSCH_21300_03_04_1998 YSCH_21300_03_05_1998 YSCH_21300_04_01_1998 YSCH_21300_05_01_1
    > 998 YSCH_21800_01_01_1998 YSCH_21800_01_02_1998 YSCH_21800_01_03_1998 YSCH_21800_01_04_1998 YSCH_21800_01_05_1998 YSCH_21800_02_01_1998 YSCH_21800_0
    > 2_02_1998 YSCH_21800_02_03_1998 YSCH_21800_02_04_1998 YSCH_21800_02_05_1998 YSCH_21800_03_01_1998 YSCH_21800_03_02_1998 YSCH_21800_03_03_1998 YSCH_2
    > 1800_03_04_1998 YSCH_21800_03_05_1998 YSCH_21800_04_01_1998 YSCH_21800_05_01_1998 NEWSCHOOL_SCHCODE_01_1998 NEWSCHOOL_SCHCODE_02_1998 NEWSCHOOL_SCHC
    > ODE_03_1998 NEWSCHOOL_SCHCODE_04_1998 NEWSCHOOL_SCHCODE_05_1998 NEWSCHOOL_SCHCODE_06_1998 NEWSCHOOL_PUBID_01_1998 NEWSCHOOL_PUBID_02_1998 NEWSCHOOL_
    > PUBID_03_1998 NEWSCHOOL_PUBID_04_1998 NEWSCHOOL_PUBID_05_1998 NEWSCHOOL_PUBID_06_1998 NEWSCHOOL_LEFT_01_1998 NEWSCHOOL_LEFT_02_1998 NEWSCHOOL_LEFT_0
    > 3_1998 NEWSCHOOL_LEFT_04_1998 NEWSCHOOL_LEFT_05_1998 NEWSCHOOL_LEFT_06_1998
    
    .   
    .   foreach vari in  `allvar' {
      2.   local  aa = subinstr("`vari'","_2018","",.)
      3.   rename `vari' `aa'
      4.   }
    parentheses unbalanced


  • #2
    reshape long wants to you to supply stub names, not variable names.

    But you shouldn't want to get rid of an element like 1998 from variable names, as that's precisely the information you need for a date variable in the reshaped file.

    You're right that you need to extract the stub but if you do that by rename, it's no longer a stub.

    Here's the principle with a toy dataset.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(id whatever a_1998 a_2008 a_2018 b_1998 b_2008 b_2018 c_1998 c_2008 c_2018)
    1  42 5 1 4 2 4 3 3 3 1
    2 666 2 2 5 5 3 5 2 5 5
    end
    
    * these two lines are crucial 
    unab stubs : *_1998 
    local stubs : subinstr local stubs "1998" "", all 
    
    reshape long `stubs', i(id) j(year)
    
    list, sepby(id)
    
         +-------------------------------------+
         | id   year   whatever   a_   b_   c_ |
         |-------------------------------------|
      1. |  1   1998         42    5    2    3 |
      2. |  1   2008         42    1    4    3 |
      3. |  1   2018         42    4    3    1 |
         |-------------------------------------|
      4. |  2   1998        666    2    5    2 |
      5. |  2   2008        666    2    3    5 |
      6. |  2   2018        666    5    5    5 |
         +-------------------------------------+
    There is further trickery if you don't have the same variables in every year.

    See also https://www.stata.com/support/faqs/d...-with-reshape/

    Comment


    • #3
      Hi Nick,

      Thank you very much for your help. It does work. However, it is the case in my data that I observe different variables across years. Could you tell me how to fix this issue in this case?

      Thanks!

      Comment


      • #4
        Also, I have been playing with your code, and now I have a new question, pherhaps a bit out of topic. After using unab stub, if I display the result using di `stub' or di `"`stub'"' I get different results. Why is that? It looks this is key for latter using the command subinstr. Since I can't use that command if I create my varlist using ds and then local.

        See my code:
        Code:
        unab stubs : *1998 
          di `stubs'
          di `"`stubs'"'
        And the results I obtain:
        Code:
        .   unab stubs : *1998 
        
        .   di `stubs'
        -4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-43-4-4-4-4-4101
        > -4-4-4-4-4-4-4-4-4-4-4
        
        .   di `"`stubs'"' 
        YSCH_20400_01_01_M_1998 YSCH_20400_01_01_Y_1998 YSCH_20400_01_02_M_1998 YSCH_20400_01_02_Y_1998 YSCH_20400_01_03_M_1998 YSCH_20400_01_03_Y_1998 YSCH_2
        > 0400_01_04_M_1998 YSCH_20400_01_04_Y_1998 YSCH_20400_01_05_M_1998 YSCH_20400_01_05_Y_1998 YSCH_20400_02_01_M_1998 YSCH_20400_02_01_Y_1998 YSCH_20400
        > _02_02_M_1998 YSCH_20400_02_02_Y_1998 YSCH_20400_02_03_M_1998 YSCH_20400_02_03_Y_1998 YSCH_20400_02_04_M_1998 YSCH_20400_02_04_Y_1998 YSCH_20400_02_
        > 05_M_1998 YSCH_20400_02_05_Y_1998 YSCH_20400_03_01_M_1998 YSCH_20400_03_01_Y_1998 YSCH_20400_03_02_M_1998 YSCH_20400_03_02_Y_1998 YSCH_20400_03_03_M
        > _1998 YSCH_20400_03_03_Y_1998 YSCH_20400_03_04_M_1998 YSCH_20400_03_04_Y_1998 YSCH_20400_03_05_M_1998 YSCH_20400_03_05_Y_1998 YSCH_20400_04_01_M_199
        > 8 YSCH_20400_04_01_Y_1998 YSCH_20400_05_01_M_1998 YSCH_20400_05_01_Y_1998 YSCH_21300_01_01_1998 YSCH_21300_01_02_1998 YSCH_21300_01_03_1998 YSCH_213
        > 00_01_04_1998 YSCH_21300_01_05_1998 YSCH_21300_02_01_1998 YSCH_21300_02_02_1998 YSCH_21300_02_03_1998 YSCH_21300_02_04_1998 YSCH_21300_02_05_1998 YS
        > CH_21300_03_01_1998 YSCH_21300_03_02_1998 YSCH_21300_03_03_1998 YSCH_21300_03_04_1998 YSCH_21300_03_05_1998 YSCH_21300_04_01_1998 YSCH_21300_05_01_1
        > 998 YSCH_21800_01_01_1998 YSCH_21800_01_02_1998 YSCH_21800_01_03_1998 YSCH_21800_01_04_1998 YSCH_21800_01_05_1998 YSCH_21800_02_01_1998 YSCH_21800_0
        > 2_02_1998 YSCH_21800_02_03_1998 YSCH_21800_02_04_1998 YSCH_21800_02_05_1998 YSCH_21800_03_01_1998 YSCH_21800_03_02_1998 YSCH_21800_03_03_1998 YSCH_2
        > 1800_03_04_1998 YSCH_21800_03_05_1998 YSCH_21800_04_01_1998 YSCH_21800_05_01_1998 NEWSCHOOL_SCHCODE_01_1998 NEWSCHOOL_SCHCODE_02_1998 NEWSCHOOL_SCHC
        > ODE_03_1998 NEWSCHOOL_SCHCODE_04_1998 NEWSCHOOL_SCHCODE_05_1998 NEWSCHOOL_SCHCODE_06_1998 NEWSCHOOL_PUBID_01_1998 NEWSCHOOL_PUBID_02_1998 NEWSCHOOL_
        > PUBID_03_1998 NEWSCHOOL_PUBID_04_1998 NEWSCHOOL_PUBID_05_1998 NEWSCHOOL_PUBID_06_1998 NEWSCHOOL_LEFT_01_1998 NEWSCHOOL_LEFT_02_1998 NEWSCHOOL_LEFT_0
        > 3_1998 NEWSCHOOL_LEFT_04_1998 NEWSCHOOL_LEFT_05_1998 NEWSCHOOL_LEFT_06_1998
        
        . 
        end of do-file

        Comment


        • #5
          Code:
          di "`stubs'"
          insists to display that the argument is a string to be taken literally whereas given

          Code:
          di `stubs'
          display tries to evaluate what follows (meaning for variable names the value in the first observation).


          For overlapping variable sets, let's suppose that your variable names in question range over years 1998 to 2018

          This is a sketch and utterly untested


          Code:
          foreach v of var * { 
                local last = substr("`v'", -4, 4) 
                if inrange("`last'", "1998", "2018") {
                       local this = substr("`v'", 1, length("`v'") - 4)
                       local wanted `wanted' `this' 
                 } 
          } 
          
          local wanted : list uniq wanted

          Comment

          Working...
          X