I sometimes run many regressions on each of 1000s of genes or proteins where subjects from an experiment are measured for each protein. I sometimes use Stata, sometimes R. In this case I'm using Stata because I want an interval censored Cox model. Stata offers a better implementation of this model in my opinion than R solutions. My question has to do with whether I can use the parallel package. Below is example code, that uses third party commands xsvmat and xframeappend to grab r(table) from each model and append into one results frame. xsvmat and xframeappend are from Roger Newson. To speed this up, I broke the 2755 genes into groups of 500 or so and ran those subsets in 5 separate Stata instances because I have a Windows laptop with 32gb of RAM and 14 cores. I manually appended the results into a single results file and ran model diagnostics on top biomarkers. While this approach worked just fine, I was curious about the use of parallel.
Has anyone had a similar problem dataset (long formatted dataset with many endpoints) and used parallel on a Windows machine to collect model results for each endpoint?
Has anyone had a similar problem dataset (long formatted dataset with many endpoints) and used parallel on a Windows machine to collect model results for each endpoint?
Code:
frame reset use mydata.dta, clear frame create cox_results local setA = "c.x1 i.x2" quietly { forvalues i = 1/2755 { noisily display `i' stintcox baseline_ntx `setA' if gene_id == `i', /// interval(ltime rtime) favorspeed xsvmat, from(r(table)') rowname(parm) names(col) idnum(`i') /// frame(outframe, replace) frame change cox_results xframeappend outframe, fast frame change default } } frame change cox_results save cox_setA_cov_results.dta, replace /* The mydata.dta variables are: gene_id subject ltime rtime baseline_ntx x1 x2 2755 gene_id, ~800 subjects measured for each protein (gene_id) A cox model is run for each gene_id using a for loop */
Comment