Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help: parallelize loops

    Dear all,

    hope all is well. I have searched Statalist and other forums to find a solution to the below, but unsuccessfully. Perhaps you can help?

    I would like to use "parallel" to do the following task:
    1. Load and merge file A on file B
    2. Do some computations
    3. Save as file_A_B.dta

    After that I am:
    1. Appending all obs for firm A, B etc.

    We are talking about a correspondence of a firm A to another 15k firms (and doing this for all firms), therefore a large computation task!

    I tried to use parallel append, but this seems only useful when opening files named file_A_B (where A and B are firms), but not to load and merge one on to another.

    Any idea if this is possible? I am printing below the loop that I am running now (and which will take ages). In the loop I am creating and dropping a folder for each firm because running it for all firms jams the memory.
    Code:
    forvalues i=1/14065 { /*this loop matches and compares each category between firm i and firm j, and creates object T, which indicates a match*/
    mkdir "Firmactivities/Technology/firm`i'/"    
        forvalues v=1/14065 {
            use "Firmactivities/Technology/Baseline/firm`i'.dta", clear
            merge Cat using "Firmactivities/Technology/Baseline/firm`v'.dta"
            drop _merge
            gen match = 1 if WIPO`i' == WIPO`v' & !missing(WIPO`i')
            drop WIPO* Cat
            gen id = _n
            egen max = max(id)
            egen sum_m = sum(match)
            gen T = sum_m/max
            drop match id max sum_m
            capture noisily rename ID`i' ID_1
            capture noisily rename ID`v' ID_2
            capture noisily gen ID_2 = "X"
            duplicates drop
            save "Firmactivities/Technology/firm`i'/firm`i'_`v'.dta", replace
        }
    
    
    use "Firmactivities/Technology/firm`i'/firm`i'_1.dta", clear
    save "Firmactivities/T/FirmT`i'.dta", replace
    
     /*this loop appends all object T's for a particular firm i and firm j*/
        forvalues v=2/14065 {
            use "Firmactivities/T/FirmT`i'.dta", clear
            append using "Firmactivities/Technology/firm`i'/firm`i'_`v'.dta", force
            save "Firmactivities/T/FirmT`i'.dta", replace
        }
        
    drop if missing(ID_2)
    order ID_1 ID_2 T
    sort ID_1 ID_2
    save "Firmactivities/T/FirmT`i'.dta", replace    
    
    shell rmdir "Firmactivities/Technology/firm`i'/" /s /q    
    }
    
    use "Firmactivities/T/FirmT1.dta", clear
    save "Firmactivities/FirmT.dta", replace
    
    forvalues i=2/14065 {/*this loop puts together all corresponding files*/
        use "Firmactivities/FirmT.dta", clear
        append using "Firmactivities/T/FirmT`i'.dta", force
        save "Firmactivities/FirmT.dta", replace
        }


    Many thanks in advance!

    Best,

    Casper

Working...
X