-sreshape-

Vishal Sharma

Join Date: Sep 2018

Posts: 60
#1

-sreshape-

31 May 2019, 17:59

i just installed -sreshape- onto stata mp.. does anyone know if it has a max variable limit? i have 7500 vars in wide format that i wanted to -sreshape- into long but i keep getting error message
Tags: None
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2400
#2

01 Jun 2019, 11:46

-sreshape- is a user-contributed command. What it can or cannot do may be documented in the help file. Since you have posted the same sort of question multiple times now, perhaps you may need to consider performing your reshape in discrete subsets of variables, and then merge the results together. I am away from my computer now but you might try from this short description.
Comment

Bjarte Aagnes

Join Date: Apr 2014
Posts: 783

01 Jun 2019, 13:11

help limits: max # of elements in the numeric list 2,500

Looking for numlist in sreshape.ado show:

sreshape.ado 218: numlist "`cjv'", sort

the local cjv will contain the number of j values

more than j 2500 values will fail:

Code:

clear
set obs 2501
gen x = _n
gen j= _n
gen i = 1

reshape wide x, i(i) j(j)
    
sreshape long


Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                     2501   ->       1
Number of variables                   3   ->    2502
j variable (2501 values)              j   ->   (dropped)
xij variables:
                                      x   ->   x1 x2 ... x2501
-----------------------------------------------------------------------------

.         
. sreshape long
invalid numlist has too many elements

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

01 Jun 2019, 13:12

See also my post just now at your earlier topic at
https://www.statalist.org/forums/for...omputing-speed
Comment
Vishal Sharma

Join Date: Sep 2018

Posts: 60
#5

01 Jun 2019, 18:15

Leonardo...good to know. i realized that sreshape works faster with fewer j elements. I m trying to split my data set "horizontally" like you mentioned. however , I m having problems.

i ve tried variations of this so far:

suppose i have a variables named x1 x2 x3 ....x2000 and had set the following local macro and forv loop (that obviously didnt work):

local j 365
forv i=366(365)2557{
use id x(`i'-`j')-x`i' using abcd.dta
sreshape long x,i(id) j(day)
save abcd`i'.dta
}

so for the first iteration, i=366, i want to use from the master dataset: the variable id and x1 to x366 using abcd.dta.

is it possible to specify the above -use- command by "subracting" macros to specify an x variable ? i tried a few formats but none worked.

any thoughts or alternative approaches?

thx
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

01 Jun 2019, 18:53

Below are two examples of splitting your data horizontally in the way that you are attempting to do. It appears that you are trying to split a year's worth of data at a time, so the second example allows you to handle leap years.

Code:

. forvalues i=1(365)2557 {
  2. local j = min(`i'+364,2557)
  3. display "reading x`i' through x`j'"
  4. // use id x`i'-x`j' using abcd.dta
. // ...
. }
reading x1 through x365
reading x366 through x730
reading x731 through x1095
reading x1096 through x1460
reading x1461 through x1825
reading x1826 through x2190
reading x2191 through x2555
reading x2556 through x2557

. 
. local i 1

. local j 0

. foreach y in 365 365 366  365 365 365 366 {
  2. local j = `j'+`y'
  3. display "reading x`i' through x`j'"
  4. // use id x`i'-x`j' using abcd.dta
. // ...
. local i = `i'+`y'
  5. }
reading x1 through x365
reading x366 through x730
reading x731 through x1096
reading x1097 through x1461
reading x1462 through x1826
reading x1827 through x2191
reading x2192 through x2557

Comment

Vishal Sharma

Join Date: Sep 2018

Posts: 60
#7

01 Jun 2019, 19:35

thanks!!!
Comment
Bjarte Aagnes

Join Date: Apr 2014

Posts: 783
#8

02 Jun 2019, 12:41

Have you tested -greshape-?

Below are some timings of alternatives, not splitting the wide vars. I made a small modification making sreshape (sreshape_mod) to work with J > 2500.
J=2557 and J=7600 are based on information given in your recent post stata mp 15 computing speed

In the thread Making reshape faster, paulvonhippel, give a reshape long solution based on using expand and replace.

A variant of the "expand replace" strategy for reshape long was used below, compared to alternatives; reshape, greshape, fastreshape, sreshape and a modified sreshape (sreshape_mod) to allowing J>2500 (numlist limit).

The wide datasets have J=7600, J=2557 and J=2500, wide byte variables, both have N=2 observations. More records should favor the expand replace approach, and the StataMP replace is 100% parallelized adding benefit of using StataMP.

This is a quick and dirty comparison, on a small (N=2), but very wide data:

Code:

Stata MP4, N=2 (ID), timings for 5 repetions -------------------------------------------------------------------------------- J=7600 J=2557 J=2500 -------------------------------------------------------------------------------- 1: 731.43 80.06 53.14 reshape long v, i(id) j(i) 2: 36.78 5.92 4.21 greshape long v, i(id) j(i) 3: 6.19 0.88 0.52 ad-hoc: expand replace 4: 284.17 58.81 40.43 fastreshape long v, i(id) j(i) 5: 145.96 26.68 18.75 sreshape_mod long v, i(id) j(i) nopreserve 51: 0.00 0.00 19.06 sreshape long v, i(id) j(i) nopreserve -------------------------------------------------------------------------------- * modification to sreshape was bacially avoiding the numlist command setting locals cjv and otherjordered to globals defined in the calling dofile as: mata : st_global("cjv", invtokens(strofreal(1..`J'))) mata : st_global("otherjordered", invtokens(strofreal(`J'..2)))

/* packages installed 2019-06-02

net install gtools, ///
from(https://raw.githubusercontent.com/mc.../master/build/)

ssc install fastreshape

net install dm0090.pkg , from(http://www.stata-journal.com/software/sj16-3)

*/

Last edited by Bjarte Aagnes; 02 Jun 2019, 13:11.
1 like
Comment
Mauricio Caceres

Join Date: Sep 2015

Posts: 130
#9

02 Jun 2019, 20:29

Bjarte Aagnes I am really interested in this benchmark. Can you post the code? If not, I would be curious to know what happens if you run

Code:

greshape long v, i(id) j(i) nochecks

greshape has a couple of checks that slow it down somewhat: it uses preserve/restore, it sorts the data, and it checks for duplicates. I doubt it will reach the speed of the ad-hoc expand/replace but I'd be curious to know how much speed it gains. Cheers!
Comment

Bjarte Aagnes

Join Date: Apr 2014
Posts: 783

#10

03 Jun 2019, 07:39

Below is timings including the nochecks argument to greshape. (timings may vary from the previous due to more tasks on the system)

Code:

Stata MP4, N=2 (ID), mean timings for 10 rep
---------------------------------------------------------------------------------------------------------------
         J =       7600    2557      2500      1000     250 
---------------------------------------------------------------------------------------------------------------
   1:   10 =     177.91   17.68     17.20      4.28    0.88   reshape long v, i(id) j(i)
   2:   10 =       8.03    1.16      1.10      0.27    0.07   greshape long v, i(id) j(i)
  21:   10 =       7.82    1.14      1.06      0.26    0.07   greshape long v, i(id) j(i) nochecks 
   3:   10 =       1.40    0.17      0.17      0.03    0.01   ad-hoc: expand replace
   4:   10 =      64.77   11.48     11.64      3.98    0.94   fastreshape long v, i(id) j(i)  
   5:   10 =      33.98    5.58      5.34      1.62    0.36   sreshape_mod long v, i(id) j(i) nopreserve    
  51:   10 =         NA      NA      5.42      1.65    0.37   sreshape long v, i(id) j(i) nopreserve    
---------------------------------------------------------------------------------------------------------------

Comment

Mauricio Caceres

Join Date: Sep 2015

Posts: 130
#11

03 Jun 2019, 09:01

Bjarte Aagnes Ah, I hadn't quite noticed the small N. How are you doing expand, replace by group? I tried a version of it and performance deteriorates pretty quickly as N grows.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment