Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ordering Multiples variables

    I am having a dataset in wide format for almost 80 years for 100 countries for about 200 variables. Variables bring the 3-letter ISO country code as their first component. I would need to reorder the variables in two different ways, with probably two loops.



    The first is to order everything, grouping by each variable separately. In the example, below, that will be like:
    Code:
     
      ts FRA_u DEU_u GBR_u USA_u FRA_size  DEU_size  GBR_size USA_size FRA_Δprdm DEU_Δprdm GBR_Δprdm USA_Δprdm

    And the second to order everything by country code, similar to :

    Code:
    ts FRA_u FRA_size FRA_u FRA_Δprdm DEU_u  DEU_size DEU_Δprdm GBR_u GBR_size GBR_Δprdm  USA_u USA_size USA_Δprdm
    I will need now two different reordering loops in order to produce my final results.
    For more than 100 countries, a reordering for each one is time-consuming



    The following is an example sample created just for this purpose. Variables do follow a random order, and as said above, there are more than 200 for more than 100 countries

    I have been greatly benefited from this forum every time I posted a question and cannot thank enough!

    Best,
    Mario


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(ts DEU_u DEU_size DEU_Δprdm FRA_u FRA_size FRA_Δprdm GBR_u GBR_size GBR_Δprdm USA_u USA_size USA_Δprdm)
    2000  7.92 19.044518         0 10.22  22.32587  .08333331 5.56 16.809107        0 3.99  14.02601         0
    2001  7.77 18.948927         0  8.61 22.141853   .1547619  4.7 17.468409        0 4.73 14.528614         0
    2002  8.48  19.23157 .08333331   8.7  22.72115          0 5.04 18.273579        0 5.78 15.041862         0
    2003  9.78  19.32564         0  8.31 23.143427          0 4.81  18.95393 .0714286 5.99 15.246542         0
    2004 10.73  18.81738         0  8.91 23.044697          0 4.59 19.572525        0 5.53  15.16888         0
    2005 11.17  18.77591 .08333337  8.49  23.06704  .23809522 4.75 19.789913        0 5.08 15.055298 .08333331
    2006 10.25  18.32777         0  8.45  22.75918  .08333337 5.35 19.919577        0 4.62  15.00832         0
    2007  8.66 17.855055 .08333331  7.66  22.43314 -.14285713 5.26  19.74264        0 4.62 15.214797         0
    2008  7.52  18.25556         0  7.06 22.562563          0 5.62 20.502636        0 5.78  15.98871         0
    2009  7.74   19.9917         0  8.74 24.080805  .14285713 7.54   22.0684        0 9.25 16.822023         0
    2010  6.97  19.56321         0  8.87 23.989256          0 7.79 21.641914        0 9.63 16.743158         0
    end

  • #2
    If I understand you correctly, you are working with a data set that has 100 x 200 = 20,000 variables and are looking to reorder it in two different ways. I honestly cannot imagine what can usefully be done with such a data set, in any order, nor why you would want to work with it in this wide layout.

    My approach to a data set like this would be to instantly -reshape- it into long layout (i.e. standard panel data) and do all further data management and analysis from there.
    Code:
    ds DEU_*
    local stubs `r(varlist)'
    local stubs: subinstr local stubs "DEU_" "@", all
    
    reshape long `stubs', i(ts) j(country) string
    encode country, gen(ncountry)
    xtset ncountry ts

    Comment


    • #3
      I agree with Clyde's advice.

      The experienced users here generally agree that, with few exceptions, Stata makes it much more straightforward to accomplish complex analyses using a long layout of your data rather than a wide layout of the same data. You should try to achieve what you need with the data organized as it was before you reshaped it to a wide layout at

      https://www.statalist.org/forums/for...ables-renaming

      and seek the help of Statalist if there are tasks you cannot figure out how to easily accomplish with a long layout. The sort of problems you will encounter trying to use your wide data will almost certainly be solved by reshaping the data back to long.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        If I understand you correctly, you are working with a data set that has 100 x 200 = 20,000 variables and are looking to reorder it in two different ways. I honestly cannot imagine what can usefully be done with such a data set, in any order, nor why you would want to work with it in this wide layout.

        My approach to a data set like this would be to instantly -reshape- it into long layout (i.e. standard panel data) and do all further data management and analysis from there.
        Code:
        ds DEU_*
        local stubs `r(varlist)'
        local stubs: subinstr local stubs "DEU_" "@", all
        
        reshape long `stubs', i(ts) j(country) string
        encode country, gen(ncountry)
        xtset ncountry ts
        Thanks for the suggestion. My motivation is of two type, why I want to work with that dataset

        First, it is part of the task to produce a dataset of the general.

        Second, I am having a code in matlab for a procedure which stata is not able to perform, yet Format needs to be in wide as in #1.

        The variables are within the stata matrix size, in the dataset, and already in wide stata format. I just need to do a reordering as in #1.

        So, to keep it short, no need to produce new variables, reshape in long or other. Just a simple reordering loop.

        Thank you very much for your time and help

        Mario
        Last edited by Mario Ferri; 26 Jul 2022, 16:09.

        Comment


        • #5
          First, it is part of the task to produce a dataset of the general.
          OK, but the long data set is just as complete as the wide version, and is at least usable within Stata itself.

          Second, I am having a code in matlab for a procedure which stata is not able to perform, yet Format needs to be in wide as in #1.
          OK, this is a reason for doing what you ask. It seems from your description that all that matters is the grouping by country and the grouping by attribute (variable), but to assure greater consistency, I have also sorted the countries and the groupings alphabetically within their opposite groups.
          Code:
          //  GROUPING SEPARATELY BY VARIABLE
          ds DEU_*
          local vbles `r(varlist)'
          local vbles: subinstr local vbles "DEU_" "", all
          local vbles: list sort vbles
          
          local n_vbles: word count `vbles'
          
          foreach i of numlist `n_vbles' (-1) 1 {
              local var: word `i' of `vbles'
              order *_`var', alpha after(ts)
          }
          
          
          //  GROUPING SEPARATELY BY COUNTRY
          ds *_u
          local countries `r(varlist)'
          local countries: subinstr local countries "_u" "", all
          local countries: list sort countries
          
          local n_countries: word count `countries'
          
          foreach i of numlist `n_countries' (-1) 1 {
              local ctry: word `i' of `countries'
              order `ctry'_*, alpha after(ts)
          }

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            OK, but the long data set is just as complete as the wide version, and is at least usable within Stata itself.


            OK, this is a reason for doing what you ask. It seems from your description that all that matters is the grouping by country and the grouping by attribute (variable), but to assure greater consistency, I have also sorted the countries and the groupings alphabetically within their opposite groups.
            Code:
            // GROUPING SEPARATELY BY VARIABLE
            ds DEU_*
            local vbles `r(varlist)'
            local vbles: subinstr local vbles "DEU_" "", all
            local vbles: list sort vbles
            
            local n_vbles: word count `vbles'
            
            foreach i of numlist `n_vbles' (-1) 1 {
            local var: word `i' of `vbles'
            order *_`var', alpha after(ts)
            }
            
            
            // GROUPING SEPARATELY BY COUNTRY
            ds *_u
            local countries `r(varlist)'
            local countries: subinstr local countries "_u" "", all
            local countries: list sort countries
            
            local n_countries: word count `countries'
            
            foreach i of numlist `n_countries' (-1) 1 {
            local ctry: word `i' of `countries'
            order `ctry'_*, alpha after(ts)
            }

            Yes, all that matters is the grouping by country and the grouping by attribute (variable). The code provided works like a charm!

            Thank you so much for one more time, Clyde.

            Always have I said, this is an excellent forum composed of awesome people willing to offer their advice patiently for free.

            I have been greatly benefited from this forum every time I've posted a question and cannot thank enough!

            Thank you wholeheartedly for everything!
            Best,
            Mario

            Comment

            Working...
            X