Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a more efficient loop for generating multiple datasets

    Hello Statalist,

    I am in the process of creating 718 different data files based off of one master dataset. I am running a basic loop, keeping the variables created, saving them as their own dta file, and then starting again. The way I'm doing it is not efficient though, and I'm wondering if there is a better way.

    Currently I am running the following code:
    Code:
    use "S:\elzayaty\Mobility paper\INDUSTRY DIVERSIFICATION DISTANCE DATA.dta", clear
    
    forval x = 1/718 {
          bysort investorid: gen prod_1`x' = ever_1 * ever_`x'
    }
     
    keep investorid prod*
     
    save "S:\elzayaty\Mobility paper\prod1.dta", replace
    **************************************************
    
    use "S:\elzayaty\Mobility paper\INDUSTRY DIVERSIFICATION DISTANCE DATA.dta", clear
    
    forval x = 1/718 {
          bysort investorid: gen prod_2`x' = ever_2 * ever_`x'
    }
     
    keep investorid prod*
     
    save "S:\elzayaty\Mobility paper\prod2.dta", replace
    As you can see, this is a fiddly thing to do 718 times - not only is it a lot of code, I need to replace the variable names prod_1`x' with prod_2`x' and ever_1 with ever_2, and the filename from prod1.dta to prod2.dta, and so on for 718 iterations.

    Any thoughts on how to address this would be greatly appreciated!

    -Andy

  • #2
    I'm not sure I understand what you are trying to do, but if I do have it right, you need a loop inside a loop. In the following code, I have also taken a few steps to try to minimize the amount of thrashing to disk required. Also, I have eliminated the -bysort investorid:- prefix because it isn't doing anything with that command: you will get the same results without grouping by investorid and you will save the time involved in sorting.

    Code:
    clear*
    use "S:\elzayaty\Mobility paper\INDUSTRY DIVERSIFICATION DISTANCE DATA.dta", clear
    forvalues i = 1/718 {
        capture preserve
        if !inlist(c(rc), 0, 621) {
            display as error "Unexpected error preserving data set"
            exit c(rc)
        }
        forvalues j = 1/718 {
            gen prod_`i'`j' = ever`i'*ever_`j'
        }
        keep investorid prod*
        save "S:\elzayaty\Mobility paper\prod`i'.dta", replace
        restore, preserve
    }
    Note: Not tested, beware of typos or other errors.

    Comment


    • #3
      Hi Clyde,

      It looks like this code has worked exactly as I needed it to! This is incredibly helpful. Really appreciate your response!

      Thanks,
      Andy

      Comment

      Working...
      X