Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape long to wide multiple variables renaming

    Hello,

    I have a panel and need to transform it from Long to Wide. A string variable, countrycode, where that is the 3-letter ISO country code names appears. After putting everything into wide, all variables must be renamed by adding the countrycode 3-letter ISO country code before the variable's name.

    Something like :
    Code:
    countrycode_varname

    Data are for almost 80 years for 100 countries for about 200 variables. In the example below from my data, I am just reporting three countries and only three variables. Data contain gaps or even not observation for a particular variable for some counties, What I want it should be something like this :
    Code:
    ts  FRA_cpi FRA_u FRA_education  DEU_cpi  DEU_u DEU_education GBR_cpi GBR_u GBR_education USA_cpi USA_u USA_education.

    I could do it with repeated pasting in the data, but I have more than 100 countries for 200 variables, so I am thinking a loop may be useful.

    Furthermore, it would be nice if the new variables could maintain their origin label


    Thank you for any help you could provide



    [CODE]
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id ts) str97 country str3 countrycode float(cpi u education)
    1 2001 "France"         "FRA" 1.6347808  8.61        .
    1 2002 "France"         "FRA" 1.9234123   8.7        .
    1 2003 "France"         "FRA"  2.098472  8.31        .
    1 2004 "France"         "FRA" 2.1420896  8.91 55.69758
    1 2005 "France"         "FRA" 1.7458694  8.49 56.59061
    1 2006 "France"         "FRA" 1.6751245  8.45        .
    1 2007 "France"         "FRA"  1.487998  7.66 59.42295
    1 2008 "France"         "FRA"  2.812862  7.06 60.34956
    1 2009 "France"         "FRA" .08762048  8.74 60.42087
    1 2010 "France"         "FRA" 1.5311227  8.87 61.89888
    1 2011 "France"         "FRA"  2.111598  8.81 62.77157
    2 2001 "Germany"        "DEU"  1.983857  7.77        .
    2 2002 "Germany"        "DEU" 1.4208056  8.48        .
    2 2003 "Germany"        "DEU" 1.0342277  9.78        .
    2 2004 "Germany"        "DEU" 1.6657335 10.73 78.48408
    2 2005 "Germany"        "DEU" 1.5469097 11.17 77.62368
    2 2006 "Germany"        "DEU" 1.5774282 10.25 77.84647
    2 2007 "Germany"        "DEU" 2.2983418  8.66 79.11421
    2 2008 "Germany"        "DEU"  2.628382  7.52 80.16417
    2 2009 "Germany"        "DEU"  .3127376  7.74 80.61875
    2 2010 "Germany"        "DEU" 1.1038091  6.97 81.12799
    2 2011 "Germany"        "DEU" 2.0751746  5.82 81.72286
    3 2001 "United Kingdom" "GBR" 1.5323496   4.7        .
    3 2002 "United Kingdom" "GBR" 1.5204024  5.04        .
    3 2003 "United Kingdom" "GBR" 1.3765004  4.81        .
    3 2004 "United Kingdom" "GBR" 1.3903975  4.59        .
    3 2005 "United Kingdom" "GBR" 2.0891366  4.75        .
    3 2006 "United Kingdom" "GBR"  2.455662  5.35        .
    3 2007 "United Kingdom" "GBR" 2.3865614  5.26        .
    3 2008 "United Kingdom" "GBR" 3.5214086  5.62        .
    3 2009 "United Kingdom" "GBR" 1.9617318  7.54 84.67217
    3 2010 "United Kingdom" "GBR"  2.492655  7.79 86.42583
    3 2011 "United Kingdom" "GBR" 3.8561125  8.04 86.08344
    4 2001 "United States"  "USA"  2.826171  4.73        .
    4 2002 "United States"  "USA" 1.5860317  5.78        .
    4 2003 "United States"  "USA" 2.2700949  5.99        .
    4 2004 "United States"  "USA"  2.677237  5.53 85.15315
    4 2005 "United States"  "USA"  3.392747  5.08 85.19178
    4 2006 "United States"  "USA"  3.225944  4.62 85.72323
    4 2007 "United States"  "USA" 2.8526726  4.62        .
    4 2008 "United States"  "USA" 3.8391004  5.78 86.61413
    4 2009 "United States"  "USA" -.3555463  9.25 86.70908
    4 2010 "United States"  "USA" 1.6400435  9.63 87.14941
    4 2011 "United States"  "USA" 3.1568415  8.95 87.59089
    end

  • #2
    Mario, you may try the code below.

    Code:
    drop id country
    qui d cpi-education, varlist
    local vl = r(varlist)
    rename (`vl') _=
    
    local vl = ustrregexra("`vl'","(\S+)", "@_$1")
    
    reshape wide "`vl'", i(ts) j(countrycode) string
    Last edited by Fei Wang; 21 Jul 2022, 08:06.

    Comment


    • #3
      Another way to do it is:
      Code:
      drop country
      rename (cpi u education) _=
      reshape wide @_cpi @_u @_education, i(id ts) j(countrycode) string

      Comment


      • #4
        Originally posted by Fei Wang View Post
        Mario, you may try the code below.

        Code:
        drop id country
        qui d cpi-education, varlist
        local vl = r(varlist)
        rename (`vl') _=
        
        local vl = ustrregexra("`vl'","(\S+)", "@_$1")
        
        reshape wide "`vl'", i(ts) j(countrycode) string



        Thank you both for the codes.


        @Fei Wang
        I am reaching till the var renaming part, but then it fails. I've used my internal dataset of 200 variable till there without problem.

        Code:
        rename (`vl') =_ local vl = ustrregexra("`vl'","(\S+)", "@_$1")
        syntax error
            Syntax is
                rename  oldname    newname   [, renumber[(#)] addnumber[(#)] so
        > rt ...]
                rename (oldnames) (newnames) [, renumber[(#)] addnumber[(#)] so
        > rt ...]
                rename  oldnames              , {upper|lower|proper}

        @Clyde SchechterI also have tried yours till the reshaped part using all variables (more than 200)


        Code:
         preserve
         rename (seq- ΔLP) _=
        
        . reshape (wide _seq- _ΔLP), i(id ts) j(countrycode) string
        
        
        invalid syntax
            In the reshape command that you typed, you omitted the word wide or long, or
            substituted some other word for it.  You should have typed
        
                . reshape wide varlist, ...
            or
                . reshape long varlist, ...
        
            You might have omitted varlist, too.  The basic syntax of reshape is
        
                 long                                wide
                +---------------+                   +------------------+
                | i   j   a   b |                   | i   a1 a2  b1 b2 |
                |---------------| <--- reshape ---> |------------------|
                | 1   1   1   2 |                   | 1   1   3   2  4 |
                | 1   2   3   4 |                   | 2   5   7   6  8 |
                | 2   1   5   6 |                   +------------------+
                | 2   2   7   8 |
                +---------------+
        
                long to wide: reshape wide a b, i(i) j(j)    (j existing variable)
                wide to long: reshape long a b, i(i) j(j)    (j    new   variable)

        What did I do wrong?

        Comment


        • #5
          Mario, the code in #2 belongs to multiple lines, and in #4, you falsely wrote two lines of code in one line.

          Also, it seems you switched "_" and "=" in the following line.

          Code:
          rename (`vl') _=
          You may copy and paste the code in #2 and replace the variable list "cpi-education" with the list of your 200+ variables. Make sure different lines of code belong to different lines.
          Last edited by Fei Wang; 21 Jul 2022, 20:34.

          Comment


          • #6
            -reshape (wide _seq- _ΔLP), i(id ts) j(countrycode) string- is incorrect, and it is not what I showed in #3. There should be no parentheses around wide _seq- _ΔLP.

            Comment


            • #7
              Originally posted by Fei Wang View Post
              Mario, the code in #2 belongs to multiple lines, and in #4, you falsely wrote two lines of code in one line.

              Also, it seems you switched "_" and "=" in the following line.

              Code:
              rename (`vl') _=
              You may copy and paste the code in #2 and replace the variable list "cpi-education" with the list of your 200+ variables. Make sure different lines of code belong to different lines.
              @Fei Wang
              I have copied exactly your code in my do file, by changing only the variables names in order to correspond with my variables.
              I am reaching in the last line of the code, but there I get
              Code:
              reshape wide "`vl'", i(ts) j(countrycode) string
              { required
              r(100);

              I have tried the same with your code without changing anything. Still same error!



              @Clyde Schechter

              I run the code exactly how you showed.

              Code:
               drop country rename (govtseq-ΔLP_eksL) _=
              reshape wide _govtseq-_ΔLP_eksL, i(id ts) j(countrycode) string

              Similarly to the first code by Fei Wang I get

              Code:
              reshape wide _govtseq-_ΔLP_eksL, i(id ts) j(countrycode) string
              { required
              r(100);

              I hate to disturb people, particularly when they aim to help me. I fell awful, but I do not see the { required
              What am I missing there?
              Last edited by Mario Ferri; 22 Jul 2022, 11:51.

              Comment


              • #8
                That is very bizarre. I should point out, however, that you have, in fact, made several changes here. First, your commands are now referring to variables that do not appear in the example data you posted. I can imagine that you chose to post a simplified, abbreviated data set. But the fact that you have chosen to abbreviate the variables involved with a range, rather than write them out individually, has made an important difference, because the -reshape- command no longer includes the @ characters that appeared in my code. While I doubt this is causing the error message you are getting, it will also mean that after the -reshape- runs you will have variable names like _govtseqFRA, not your intended FRA_govtseq. Moreover, you cannot fix that by writing the command as reshape wide @_govtseq-@_ΔLP_eksL, because the @ characters will prevent Stata from understanding your intended range of variables (since variable names cannot contain @, and only variable names can bound the wildcard var1-var2 sequence.) I also note that in #7, you have put the -drop- and -rename- commands on the same line, which is not legal in Stata. But then, I do not understand why you did not get an error message there, that would have prevented you from even reaching the -reshape- command.

                So let's start fresh. Post back with a new example data set that includes more variables (so as to illustrate the different approach needed for a large list of variables too long to write out individually). Also make sure that the data example you post, when run with your code, produces the same error. Then I can try to troubleshoot appropriately.

                Comment


                • #9
                  Mario, the code in #2 works well on the data example in #1 (I'm using Stata 16.1) -- You may try this first and see if there are any error messages. If it's ok, then I agree with Clyde that something in your full dataset but not in the data example may have caused errors. You may post a new data example, with more variables, which the codes do not work well on, and we'll see what are the problems.

                  Comment


                  • #10
                    deleted
                    Last edited by Mario Ferri; 24 Jul 2022, 06:25.

                    Comment


                    • #11
                      Now, I have managed to run the code. I do not know why the code initially failed. I used an unorthodox way by breaking the data in three different parts. Furthermore, I then run the code for each one and then just used the merge command. Probably there was something in my database, which still I have not determined what was it. Anyway, the code works!

                      I have been greatly benefited from this forum every time I posted a question and cannot thank enough!

                      Since my new dataset created now is a result of three databases, I will need now two different reordering loops in order to produce my final results.
                      For more than 100 countries, a reordering for each one is time-consuming

                      The first is to order everything by each variable separately. In the example in # 1 that will be like:

                      Code:
                       
                       ts FRA_cpi DEU_cpi GBR_cpi USA_cpi FRA_u  DEU_u  GBR_u   USA_u FRA_education DEU_education GBR_education USA_education

                      And the second to order everything by countrycode simliar to #1;

                      Code:
                       
                       ts  FRA_cpi FRA_u FRA_education  DEU_cpi  DEU_u DEU_education GBR_cpi GBR_u GBR_education USA_cpi USA_u USA_education.
                      PS1 @Clyde Schechter
                      It was my bad pasting in the forums of the codes you were mentioning in #6. My mistakes, my apologies!


                      PS2 Fei Wang The code works perfectly! Probably, there was something in my data still I have to locate, as said above


                      Thank you for everything!

                      Mario

                      Comment


                      • #12
                        I have a similar issue, except I want to reshape from wide to long. I am working with multiple datasets: within each dataset I a person id and then a bunch of variables. However, the variable names are not all consistent in length. For example: taxes2016 fedtaxes2016 provtaxes2016 famtaxcredit2016 and so on. Is there a way that I can use a loop to reshape all of the variables without having to manually type them in?

                        So far, I tried:

                        Code:
                        local stubs
                        foreach var of local vars {
                            reshape long `var', i(famid) j(year)
                        }
                        and

                        Code:
                        local stubs
                        foreach var of varlist test* {
                            local stub = substr("`var'",7,2)
                            local stubs : list stubs | stub
                            }
                        display "`stubs'"
                        reshape long `stubs', i(famid) j(year_s) string
                        destring year_s, generate(year)
                        I do not get an error message, but nothing happens. I appreciate any help on this.

                        Comment


                        • #13
                          I'm completely confused by your question. What are the names of the variables. In the descriptive text you suggest they are things like taxes2016, fedtaxes2016, etc. But your code (the second block) suggests that they are something like test*. Please clarify by posting an example of your data, using the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.



                          When asking for help with code, always show example data. When showing example data, always use -dataex-.

                          Comment


                          • #14
                            Apologies for the confusion. Unfortunately, I cannot provide my exact data, but I can show you an example of what it looks like and what I want to end up with:

                            current structure:

                            Code:
                            * Example generated by -dataex-. For more info, type help dataex
                            clear
                            
                            "person_id" "taxes2016" "fedtaxes2016" "provtaxes2016" "famtaxcredit2016"
                            "1"         "3565"      "4985"         "5687"          "8798"            
                            "2"         "2049"      "7453"         "6989"          "2566"            
                            "3"         "1649"      "6366"         "629"           "4875"            
                            "4"         "1000"      "200"          "300"           "203"             
                            "5"         "500"       "100"          "789"           "100"             
                            end

                            desired structure:

                            Code:
                            * Example generated by -dataex-. For more info, type help dataex
                            clear
                            
                            "person_id" "taxes" "fedtaxes" "provtaxes" "famtaxcredit" "year"
                            "1"         "3565"  "4985"     "5687"      "8798"         "2016"
                            "2"         "2049"  "7453"     "6989"      "2566"         "2016"
                            "3"         "1649"  "6366"     "629"       "4875"         "2016"
                            "4"         "1000"  "200"      "300"       "203"          "2016"
                            "5"         "500"   "100"      "789"       "100"          "2016"
                            end

                            The real dataset I am using has far more variables and multiple years of data. I would like to have a more efficient way of reshaping the datasets into long format without having to name each individual stub. Is that possible?

                            Comment


                            • #15
                              If the -dataex- output you show is really output of -dataex-, then there is some serious bug in -dataex-. I suspect it is not really that. I think that you have either tried to emulate -dataex- output with some typed code, or you have taken some actual -dataex- output and made edits that had the effect of invalidating it. Please don't do that--always provide -dataex- output exactly the way it comes. Never edit it at all--there is no such thing as a "minor change." I'm guessing that you did that because you need to respect the confidentiality of the actual data. The effective way to do that is to edit the data set itself in Stata (but don't save the changes!) and then run -dataex- on that.

                              Be that as it may, I have enough of a sense of your data organization to get you started. I have created a data set that I think resembles yours, and I start with -dataex- output from there. If my guesses about your actual data set are wrong, then my code following that will not work and we have both wasted our time.
                              Code:
                              * Example generated by -dataex-. For more info, type help dataex
                              clear
                              input str1 person_id str4(taxes2016 fedtaxes2016 provtaxes2016 famtaxcredit2016)
                              "1" "3565" "4985" "5687" "8798"
                              "2" "2049" "7453" "6989" "2566"
                              "3" "1649" "6366" "629"  "4875"
                              "4" "1000" "200"  "300"  "203"
                              "5" "500"  "100"  "789"  "100"
                              end
                              
                              ds *2016 // SEE NOTE 1
                              local stubs `r(varlist)'
                              local stubs: subinstr local stubs "2016" "", all
                              
                              reshape long `stubs', i(person_id) j(year)
                              destring `stubs' , replace // SEE NOTE 2
                              Note 1: In your original post you stated that you had data for several years. I am assuming here that every variable that you have for any year, you have for 2016. Otherwise put, there are no 2017 or 2018 etc. variables that do not also appear as 2016 variables. If that is wrong, then this line of code will not do the job. You will have to find a year that actually has all of the variables that ever occur in any year. If there is no such year, then a somewhat different approach will be needed--post back for specific advice in this situation.

                              Note 2: This line of code is not essential to your original request. But with variables that represent money amounts, it makes no sense to hold them as string variables--which completely precludes doing any calculations with them. So I think you need to convert these to numeric variables sooner or later, and now seems as good a time as any. The same is true of the year variable. In your desired output you give it as a string, but, again, that is basically crippling your data by making it unsuitable for calculations. So the way I used the -reshape- syntax, the year variable is already created as numeric. I do not destring the id variable because one doesn't do calculations on those anyway.

                              Comment

                              Working...
                              X