Creating a more efficient loop for generating multiple datasets

Andy El-Zayaty

Join Date: Jun 2019

Posts: 17
#1

Creating a more efficient loop for generating multiple datasets

22 Jun 2019, 15:19

Hello Statalist,

I am in the process of creating 718 different data files based off of one master dataset. I am running a basic loop, keeping the variables created, saving them as their own dta file, and then starting again. The way I'm doing it is not efficient though, and I'm wondering if there is a better way.

Currently I am running the following code:

Code:

use "S:\elzayaty\Mobility paper\INDUSTRY DIVERSIFICATION DISTANCE DATA.dta", clear forval x = 1/718 { bysort investorid: gen prod_1`x' = ever_1 * ever_`x' } keep investorid prod* save "S:\elzayaty\Mobility paper\prod1.dta", replace ************************************************** use "S:\elzayaty\Mobility paper\INDUSTRY DIVERSIFICATION DISTANCE DATA.dta", clear forval x = 1/718 { bysort investorid: gen prod_2`x' = ever_2 * ever_`x' } keep investorid prod* save "S:\elzayaty\Mobility paper\prod2.dta", replace

As you can see, this is a fiddly thing to do 718 times - not only is it a lot of code, I need to replace the variable names prod_1`x' with prod_2`x' and ever_1 with ever_2, and the filename from prod1.dta to prod2.dta, and so on for 718 iterations.

Any thoughts on how to address this would be greatly appreciated!

-Andy
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30091
#2

22 Jun 2019, 17:42

I'm not sure I understand what you are trying to do, but if I do have it right, you need a loop inside a loop. In the following code, I have also taken a few steps to try to minimize the amount of thrashing to disk required. Also, I have eliminated the -bysort investorid:- prefix because it isn't doing anything with that command: you will get the same results without grouping by investorid and you will save the time involved in sorting.

Code:

clear* use "S:\elzayaty\Mobility paper\INDUSTRY DIVERSIFICATION DISTANCE DATA.dta", clear forvalues i = 1/718 { capture preserve if !inlist(c(rc), 0, 621) { display as error "Unexpected error preserving data set" exit c(rc) } forvalues j = 1/718 { gen prod_`i'`j' = ever`i'*ever_`j' } keep investorid prod* save "S:\elzayaty\Mobility paper\prod`i'.dta", replace restore, preserve }

Note: Not tested, beware of typos or other errors.
1 like
Comment
Andy El-Zayaty

Join Date: Jun 2019

Posts: 17
#3

22 Jun 2019, 21:10

Hi Clyde,

It looks like this code has worked exactly as I needed it to! This is incredibly helpful. Really appreciate your response!

Thanks,
Andy
Comment

Announcement

Creating a more efficient loop for generating multiple datasets

Comment

Comment