Help: parallelize loops

Casper Frolijk

Join Date: Dec 2018
Posts: 2

Help: parallelize loops

26 May 2023, 04:13

Dear all,

hope all is well. I have searched Statalist and other forums to find a solution to the below, but unsuccessfully. Perhaps you can help?

I would like to use "parallel" to do the following task:

Load and merge file A on file B
Do some computations
Save as file_A_B.dta

After that I am:

Appending all obs for firm A, B etc.

We are talking about a correspondence of a firm A to another 15k firms (and doing this for all firms), therefore a large computation task!

I tried to use parallel append, but this seems only useful when opening files named file_A_B (where A and B are firms), but not to load and merge one on to another.

Any idea if this is possible? I am printing below the loop that I am running now (and which will take ages). In the loop I am creating and dropping a folder for each firm because running it for all firms jams the memory.

Code:

forvalues i=1/14065 { /*this loop matches and compares each category between firm i and firm j, and creates object T, which indicates a match*/
mkdir "Firmactivities/Technology/firm`i'/"    
    forvalues v=1/14065 {
        use "Firmactivities/Technology/Baseline/firm`i'.dta", clear
        merge Cat using "Firmactivities/Technology/Baseline/firm`v'.dta"
        drop _merge
        gen match = 1 if WIPO`i' == WIPO`v' & !missing(WIPO`i')
        drop WIPO* Cat
        gen id = _n
        egen max = max(id)
        egen sum_m = sum(match)
        gen T = sum_m/max
        drop match id max sum_m
        capture noisily rename ID`i' ID_1
        capture noisily rename ID`v' ID_2
        capture noisily gen ID_2 = "X"
        duplicates drop
        save "Firmactivities/Technology/firm`i'/firm`i'_`v'.dta", replace
    }


use "Firmactivities/Technology/firm`i'/firm`i'_1.dta", clear
save "Firmactivities/T/FirmT`i'.dta", replace

 /*this loop appends all object T's for a particular firm i and firm j*/
    forvalues v=2/14065 {
        use "Firmactivities/T/FirmT`i'.dta", clear
        append using "Firmactivities/Technology/firm`i'/firm`i'_`v'.dta", force
        save "Firmactivities/T/FirmT`i'.dta", replace
    }
    
drop if missing(ID_2)
order ID_1 ID_2 T
sort ID_1 ID_2
save "Firmactivities/T/FirmT`i'.dta", replace    

shell rmdir "Firmactivities/Technology/firm`i'/" /s /q    
}

use "Firmactivities/T/FirmT1.dta", clear
save "Firmactivities/FirmT.dta", replace

forvalues i=2/14065 {/*this loop puts together all corresponding files*/
    use "Firmactivities/FirmT.dta", clear
    append using "Firmactivities/T/FirmT`i'.dta", force
    save "Firmactivities/FirmT.dta", replace
    }

Many thanks in advance!

Best,

Casper

Tags: None

Announcement

Help: parallelize loops