parallel usage

Dave Airey

Join Date: Apr 2014

Posts: 407
#1

parallel usage

04 Aug 2025, 08:32

I sometimes run many regressions on each of 1000s of genes or proteins where subjects from an experiment are measured for each protein. I sometimes use Stata, sometimes R. In this case I'm using Stata because I want an interval censored Cox model. Stata offers a better implementation of this model in my opinion than R solutions. My question has to do with whether I can use the parallel package. Below is example code, that uses third party commands xsvmat and xframeappend to grab r(table) from each model and append into one results frame. xsvmat and xframeappend are from Roger Newson. To speed this up, I broke the 2755 genes into groups of 500 or so and ran those subsets in 5 separate Stata instances because I have a Windows laptop with 32gb of RAM and 14 cores. I manually appended the results into a single results file and ran model diagnostics on top biomarkers. While this approach worked just fine, I was curious about the use of parallel.

Has anyone had a similar problem dataset (long formatted dataset with many endpoints) and used parallel on a Windows machine to collect model results for each endpoint?

Code:

frame reset use mydata.dta, clear frame create cox_results local setA = "c.x1 i.x2" quietly { forvalues i = 1/2755 { noisily display `i' stintcox baseline_ntx `setA' if gene_id == `i', /// interval(ltime rtime) favorspeed xsvmat, from(r(table)') rowname(parm) names(col) idnum(`i') /// frame(outframe, replace) frame change cox_results xframeappend outframe, fast frame change default } } frame change cox_results save cox_setA_cov_results.dta, replace /* The mydata.dta variables are: gene_id subject ltime rtime baseline_ntx x1 x2 2755 gene_id, ~800 subjects measured for each protein (gene_id) A cox model is run for each gene_id using a for loop */

Last edited by Dave Airey; 04 Aug 2025, 08:49.
Tags: None
George Vega

Join Date: May 2014

Posts: 14
#2

04 Aug 2025, 09:28

Hi Dave,

You can certainly do something like we do in this example. You need to write a program that holds all the logic, iterating through gene ids after listing them. It would be something like this (my Stat is a bit rusty!):

Code:

prog def parfor /* some code to list available genes */ /* and store it in genelist */ foreach i in $genelist { ... your fancy code ... ... A step to save the i-th gene results... } end parallel initialize 10 parallel, prog(parfor): parfor /* Collect all the results */
1 like
Comment
Dave Airey

Join Date: Apr 2014

Posts: 407
#3

04 Aug 2025, 09:37

Thank you George Vega! So it is possible. I will give it a try. Cheers!
1 like
Comment

Dave Airey

Join Date: Apr 2014
Posts: 407

04 Aug 2025, 13:18

It worked great George Vega! Just trying 500 genes in a stintcox model with no covariates and using 8 cores, the parallel code ran in 53 seconds whereas the sequential code ran in 184 seconds, 3.5x difference. Pretty nice.

Code:

clear all
use mydata.dta, clear
keep if gene_id <= 500

program define parfor
levelsof gene, local(mygenes)
global mygenes = r(levels)
foreach i of global mygenes {
    stintcox baseline_ntx if gene == "`i'", /// 
        interval(ltime rtime) favorspeed
    xsvmat, from(r(table)') rowname(parm) names(col) idstr("`i'") /// 
        saving(.\parallel\gene_`i', replace)
}
end

timer clear
timer on 1
parallel initialize 8
parallel, prog(parfor): parfor
timer off 1
timer list

use mydata.dta, clear
keep if gene_id <= 500
levelsof gene, local(mygenes)
global mygenes = r(levels)
timer on 2
quietly {
foreach i of global mygenes {
    stintcox baseline_ntx if gene == "`i'", /// 
        interval(ltime rtime) favorspeed
    xsvmat, from(r(table)') rowname(parm) names(col) idstr("`i'") /// 
        saving(.\parallel\gene_`i', replace)
}
}
timer off 2

timer list

. timer list
1: 53.05 / 1 = 53.0550
2: 184.05 / 1 = 184.0460

Last edited by Dave Airey; 04 Aug 2025, 13:21.

Comment

Dave Airey

Join Date: Apr 2014

Posts: 407
#5

04 Aug 2025, 13:26

Hmmm.... this should be in StataNow! A nice complement to Stata MP, which I have and like. Oh, and I did check results against each other and they were the same.

Last edited by Dave Airey; 04 Aug 2025, 13:51.
Comment
Dave Airey

Join Date: Apr 2014

Posts: 407
#6

05 Aug 2025, 10:06

Actually when I checked more carefully by a merge operation with assert commands, I discovered that 4/500 of the genes' data had been split across cores. There are 800+ observations on the same subjects per gene. This results in models being estimated in part of the 800 observations per gene. So the parfor program needs more information to split the data across cores but by gene.
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 766
#7

05 Aug 2025, 11:12

I think you can solve the issue by just reloading the original dataset within your program. By doing so, each gene is tested on the original, unsplit dataset.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment

Dave Airey

Join Date: Apr 2014
Posts: 407

05 Aug 2025, 11:36

Thanks Felix Bittmann , I solved this just now using a different method, but one illustrated in George Vega's gallery. I used one of the parallel macros "$pll_instance". I created a variable pid assigning an integer from 1 to the number of cores I wanted to each gene randomly. I then used that in the program to subset the data to a core. Now all my genes get processed and match the sequential results using a series of assert commands (not shown).

Code:

clear all
use mydata.dta, clear
keep if gene_id <= 500

bysort gene: generate pid = runiformint(1,8) if _n == 1
bysort gene: replace pid = pid[1]
save mytempdata.dta, replace

program define parfor
use if pid == $pll_instance using mytempdata.dta, clear
levelsof gene, local(mygenes)
global mygenes = r(levels)
foreach i of global mygenes {
    stintcox baseline_ntx if gene == "`i'", ///
        interval(ltime rtime) favorspeed
    xsvmat, from(r(table)') rowname(parm) names(col) idstr("`i'") ///
        saving(.\parallel\gene_`i', replace)
}
end

timer clear
timer on 1
parallel initialize 8
parallel, prog(parfor): parfor
timer off 1
timer list

use mytempdata.dta, clear
levelsof gene, local(mygenes)
global mygenes = r(levels)
timer on 2
quietly {
foreach i of global mygenes {
    stintcox baseline_ntx if gene == "`i'", ///
        interval(ltime rtime) favorspeed
    xsvmat, from(r(table)') rowname(parm) names(col) idstr("`i'") ///
        saving(.\parallel\gene_`i', replace)
}
}
timer off 2

timer list

Last edited by Dave Airey; 05 Aug 2025, 11:40.

Comment

Dave Airey

Join Date: Apr 2014

Posts: 407
#9

05 Aug 2025, 12:54

Actually, George pointed out offline there is a by option that keeps the by variable from being split up across child processes. Slapped forehead. That will work too.
Comment
Demetrio Guzzardi

Join Date: Nov 2017

Posts: 50
#10

11 Sep 2025, 06:31

Hello everyone,

I am also trying to use the parallel function but I am stuggling to make it work. Could you point me what am I missing?

I am running a very simple code like this one:

Code:

clear all set obs 1000 gen y=rnormal() gen x1=rnormal() gen x2=rnormal() program define parfor foreach n of numlist 1/100 { reghdfe y x1 x2 estimates save "$path/OUTPUT/try_loop/try_`n'", replace } end parallel initialize 4 parallel, prog(parfor): parfor

But I keep having an error as one of the child (or more) stops because cannot write the .ster file. Moreover, from the log files, it seems to me that each child performs all the loop, while I was expecting some child to do some part like from n=1/25 , the other n=25/50 etc.

Is there something I am doing wrong and something I am not grasping from how parallel work?
Comment

Felix Bittmann

Join Date: Aug 2018
Posts: 766

#11

11 Sep 2025, 08:11

The problem is that parallel works a bit differently as you suppose in this example. parallel cannot magically read your dofile and find all relevant loops and then split your file up intelligently (sadly). What it does, it splits the dataset into 4 parts and then applies your code to all. Then it recombines the results into a single dataset. This can be helpful, but in your example. you want to distribute the loop across multiple threads but using the original complete datafile. This could be done as such:

Code:

clear all
global totalcores = 2        //This MUST be a global to work
parallel init $totalcores
set obs 1000
gen y = runiform()
gen x = runiform()
gen id = _n
save maindata, replace        //Must be stored, so each instance can start with a complete dataset

program parfor
    use maindata, clear
    di $pll_instance
    local inst = $pll_instance - 1        //Parallel starts enumerating at 1, so this is required for mod() to work
    forvalues i = 1/10 {        //Here you define your main loop
        if mod(`i', $totalcores) == `inst' {
            reg y x if id > `i'        //Some condition I guess
            estimates save test_`i'
        }
    }
end


parallel, prog(parfor): parfor

Best wishes

Stata 18.0 MP | ORCID | Google Scholar

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment