Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • -sreshape-

    i just installed -sreshape- onto stata mp.. does anyone know if it has a max variable limit? i have 7500 vars in wide format that i wanted to -sreshape- into long but i keep getting error message

  • #2
    -sreshape- is a user-contributed command. What it can or cannot do may be documented in the help file. Since you have posted the same sort of question multiple times now, perhaps you may need to consider performing your reshape in discrete subsets of variables, and then merge the results together. I am away from my computer now but you might try from this short description.

    Comment


    • #3
      help limits: max # of elements in the numeric list 2,500

      Looking for numlist in sreshape.ado show:

      sreshape.ado 218: numlist "`cjv'", sort

      the local cjv will contain the number of j values

      more than j 2500 values will fail:

      Code:
      clear
      set obs 2501
      gen x = _n
      gen j= _n
      gen i = 1
      
      reshape wide x, i(i) j(j)
          
      sreshape long
      
      
      Data                               long   ->   wide
      -----------------------------------------------------------------------------
      Number of obs.                     2501   ->       1
      Number of variables                   3   ->    2502
      j variable (2501 values)              j   ->   (dropped)
      xij variables:
                                            x   ->   x1 x2 ... x2501
      -----------------------------------------------------------------------------
      
      .         
      . sreshape long
      invalid numlist has too many elements

      Comment


      • #4
        See also my post just now at your earlier topic at
        https://www.statalist.org/forums/for...omputing-speed

        Comment


        • #5
          Leonardo...good to know. i realized that sreshape works faster with fewer j elements. I m trying to split my data set "horizontally" like you mentioned. however , I m having problems.

          i ve tried variations of this so far:

          suppose i have a variables named x1 x2 x3 ....x2000 and had set the following local macro and forv loop (that obviously didnt work):

          local j 365
          forv i=366(365)2557{
          use id x(`i'-`j')-x`i' using abcd.dta
          sreshape long x,i(id) j(day)
          save abcd`i'.dta
          }

          so for the first iteration, i=366, i want to use from the master dataset: the variable id and x1 to x366 using abcd.dta.


          is it possible to specify the above -use- command by "subracting" macros to specify an x variable ? i tried a few formats but none worked.

          any thoughts or alternative approaches?

          thx

          Comment


          • #6
            Below are two examples of splitting your data horizontally in the way that you are attempting to do. It appears that you are trying to split a year's worth of data at a time, so the second example allows you to handle leap years.
            Code:
            . forvalues i=1(365)2557 {
              2. local j = min(`i'+364,2557)
              3. display "reading x`i' through x`j'"
              4. // use id x`i'-x`j' using abcd.dta
            . // ...
            . }
            reading x1 through x365
            reading x366 through x730
            reading x731 through x1095
            reading x1096 through x1460
            reading x1461 through x1825
            reading x1826 through x2190
            reading x2191 through x2555
            reading x2556 through x2557
            
            . 
            . local i 1
            
            . local j 0
            
            . foreach y in 365 365 366  365 365 365 366 {
              2. local j = `j'+`y'
              3. display "reading x`i' through x`j'"
              4. // use id x`i'-x`j' using abcd.dta
            . // ...
            . local i = `i'+`y'
              5. }
            reading x1 through x365
            reading x366 through x730
            reading x731 through x1096
            reading x1097 through x1461
            reading x1462 through x1826
            reading x1827 through x2191
            reading x2192 through x2557

            Comment


            • #7
              thanks!!!

              Comment


              • #8
                Have you tested -greshape-?

                Below are some timings of alternatives, not splitting the wide vars. I made a small modification making sreshape (sreshape_mod) to work with J > 2500.
                J=2557 and J=7600 are based on information given in your recent post stata mp 15 computing speed

                In the thread Making reshape faster, paulvonhippel, give a reshape long solution based on using expand and replace.

                A variant of the "expand replace" strategy for reshape long was used below, compared to alternatives; reshape, greshape, fastreshape, sreshape and a modified sreshape (sreshape_mod) to allowing J>2500 (numlist limit).

                The wide datasets have J=7600, J=2557 and J=2500, wide byte variables, both have N=2 observations. More records should favor the expand replace approach, and the StataMP replace is 100% parallelized adding benefit of using StataMP.

                This is a quick and dirty comparison, on a small (N=2), but very wide data:

                Code:
                Stata MP4, N=2 (ID), timings for 5 repetions 
                --------------------------------------------------------------------------------
                      J=7600    J=2557   J=2500  
                --------------------------------------------------------------------------------
                 1:   731.43    80.06     53.14  reshape long v, i(id) j(i)
                 2:    36.78     5.92      4.21  greshape long v, i(id) j(i)
                 3:     6.19     0.88      0.52  ad-hoc: expand replace
                 4:   284.17    58.81     40.43  fastreshape long v, i(id) j(i)  
                 5:   145.96    26.68     18.75  sreshape_mod long v, i(id) j(i) nopreserve  
                51:     0.00     0.00     19.06  sreshape long v, i(id) j(i) nopreserve
                --------------------------------------------------------------------------------
                * modification to sreshape was bacially avoiding the numlist command setting
                locals cjv and otherjordered to globals defined in the calling dofile as:
                          
                mata : st_global("cjv", invtokens(strofreal(1..`J')))
                mata : st_global("otherjordered", invtokens(strofreal(`J'..2)))
                /* packages installed 2019-06-02

                net install gtools, ///
                from(https://raw.githubusercontent.com/mc.../master/build/)

                ssc install fastreshape

                net install dm0090.pkg , from(http://www.stata-journal.com/software/sj16-3)

                */
                Last edited by Bjarte Aagnes; 02 Jun 2019, 13:11.

                Comment


                • #9
                  Bjarte Aagnes I am really interested in this benchmark. Can you post the code? If not, I would be curious to know what happens if you run

                  Code:
                   
                   greshape long v, i(id) j(i) nochecks
                  greshape has a couple of checks that slow it down somewhat: it uses preserve/restore, it sorts the data, and it checks for duplicates. I doubt it will reach the speed of the ad-hoc expand/replace but I'd be curious to know how much speed it gains. Cheers!

                  Comment


                  • #10
                    Below is timings including the nochecks argument to greshape. (timings may vary from the previous due to more tasks on the system)
                    Code:
                    Stata MP4, N=2 (ID), mean timings for 10 rep
                    ---------------------------------------------------------------------------------------------------------------
                             J =       7600    2557      2500      1000     250 
                    ---------------------------------------------------------------------------------------------------------------
                       1:   10 =     177.91   17.68     17.20      4.28    0.88   reshape long v, i(id) j(i)
                       2:   10 =       8.03    1.16      1.10      0.27    0.07   greshape long v, i(id) j(i)
                      21:   10 =       7.82    1.14      1.06      0.26    0.07   greshape long v, i(id) j(i) nochecks 
                       3:   10 =       1.40    0.17      0.17      0.03    0.01   ad-hoc: expand replace
                       4:   10 =      64.77   11.48     11.64      3.98    0.94   fastreshape long v, i(id) j(i)  
                       5:   10 =      33.98    5.58      5.34      1.62    0.36   sreshape_mod long v, i(id) j(i) nopreserve    
                      51:   10 =         NA      NA      5.42      1.65    0.37   sreshape long v, i(id) j(i) nopreserve    
                    ---------------------------------------------------------------------------------------------------------------

                    Comment


                    • #11
                      Bjarte Aagnes Ah, I hadn't quite noticed the small N. How are you doing expand, replace by group? I tried a version of it and performance deteriorates pretty quickly as N grows.

                      Comment

                      Working...
                      X