Parallel Stata

Rodrigo Garcia Ayala

Join Date: Jun 2020

Posts: 5
#1

Parallel Stata

04 Jun 2020, 14:12

Hi,
Hope you can kindly help me with this issue.

I am trying to parallelize a process which needs to be done over many stats (mean, median, etc), years, geographical areas and countries using microdata from national population census. This amounts to way to many loop iterations. However, I haven't been able to use parallel effectively. In particular, I want to take advantage of the `parallel append` feature.

I am trying to do something like the command below for a bunch of files located within a folder, under the directory labeled by the global $dir, such that the collapsed_d program runs on each individual file. This has clearly not worked because the syntax is incorrect, but I haven't managed to understand how the e() option should be stated. The example in the Stata helper hasn't been helpful because of how files are stored in the example.

parallel append , do("collapsed_d.do") ///
prog(collapsed_d) e("$dir/filename_`g'.dta")

where `g' corresponds to each of the names that I want to process in parallel. In the directory $dir/ files apprear as:
filename_112.dta
filename_113.dta
:
filename_998.dta
Even more importantly, I am not sure this is the most efficient way to deal with the challenge. Any thoughts on this would also be highly appreciated.

Thank you in advance.

Rodrigo
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

05 Jun 2020, 14:43

Welcome to Stata list. You will increase your chances of a useful answer by following the FAQ on asking questions-provide Stata code in code delimiters, readable Stata output, and sample data using dataex. Even if someone wanted to help you, they couldn't run your code to see what it does.

This particular program parallel I have never seen or heard discussed. I must admit I think trying to do a parallel estimator may be a bit much if you are a new user.

I would start by making sure parallel works correctly using the code and data example provided in the help file.

Next, I might try set trace on. This will help you see how Stata is interpreting your macros. I often find I have made errors in macros.

One thing you can do is that you can run more than one invocation of Stata at the same time so you could have one invocation of Stata running on the first 200 data files and other on the next 200. You would then need to combine them in another do file.

Before I invested too much time in trying to parallel something, I would ask myself whether I need to run this repeatedly. If this is a calculation you will need to do once, would it run overnight for example or over the weekend? Also consider statsby or user-written rangestat instead of loops.
Comment
Rodrigo Garcia Ayala

Join Date: Jun 2020

Posts: 5
#3

07 Jun 2020, 22:37

Hi Phil,

Thanks for your kind response and advice on how to post my questions.

Regarding this specific topic, I have tried to follow the examples in section 4 of this entry: help parallel##append. I came up with the parallel command after doing some research on how to make the process more efficient. As I posted in the original question, I am working with microdata coming from census of several countries and years, so I really need some sort of efficient way of doing it.

I will re-post my question following your advice. Hopefully, this time I make things easier for those that can kindly help me.

Best,
Comment
Rodrigo Garcia Ayala

Join Date: Jun 2020

Posts: 5
#4

07 Jun 2020, 22:38

Hi,
Hope you can kindly help me with this issue.

I am trying to parallelize a process which needs to be done over many stats (mean, median, etc), years, geographical areas and countries using microdata from national population census. This amounts to way to many loop iterations. However, I haven't been able to use parallel effectively. In particular, I want to take advantage of the `parallel append` feature.
I am trying to do something like the command below for a bunch of files located within a folder, under the directory labeled by the global $dir, such that the collapsed_d program runs on each individual file. This has clearly not worked because the syntax is incorrect, but I haven't managed to understand how the e() option should be stated. The example in the Stata helper hasn't been helpful because of how files are stored in the example.

parallel append , do("collapsed_d.do") prog(collapsed_d) e("$dir/filename_`g'.dta")
Where the program collapsed is defined as below:
program define collapsed_d
collapse sum income, by(geolev2) fast
} ;
and `g' corresponds to each of the names that I want to process in parallel. In the directory $dir/ files appear as:
filename_112.dta
filename_113.dta
:
filename_998.dta
The data can be found here: https://www.dropbox.com/sh/osghpo7c8...oK7m8eKGa?dl=0
Even more importantly, I am not sure this is the most efficient way to deal with the challenge. Any thoughts on this would also be highly appreciated.
Thanks in advance,
Rodrigo

Dropbox - statalist - Simplify your life

https://www.dropbox.com

Dropbox is a free service that lets you bring your photos, docs, and videos anywhere and share them easily. Never email yourself a file again!
Comment
Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#5

08 Jun 2020, 12:08

The expected format for the expression()-option is documented in the help file, Something like the code below should work in theory. I have not tested it.

Code:

parallel append , do("collapsed_d.do") prog(collapsed_d) e("$dir/filename_%03.0f.dta", 112/998)
Comment
Rodrigo Garcia Ayala

Join Date: Jun 2020

Posts: 5
#6

08 Jun 2020, 13:37

Hi Sven,

Thank you for your help. This is a little embarrassing, but I I still can't manage to get it done if the list if of this kind instead:

filename_mx_110.dta
filename_pe_117.dta
filename_pe_110.dta
filename_pe_112.dta
filename_br_115.dta

Thanks again.
Rodrigo
Comment
Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#7

09 Jun 2020, 06:13

You should post exactly how your filenames look like. Otherwise it is difficult to help you. Instead of using the expression option, you can provide also directly the filenames. Depending on where your files are located something like the code below might work

Code:

local files : dir "." files "*.dta" parallel append `files'

Note, that I have never used the parallel command. I only help you based on my understanding of the help file.
Comment
Anam Yasir

Join Date: Feb 2020

Posts: 13
#8

12 Jun 2020, 02:10

sorry for this irrelevant comment to this post. i am new to this forum, and facing one issue in GARCH model. i started a topic almost 3 days ago by following FAQs, but received no response there. please help me to find proper response, as i am seriously finding that issue in my analysis. apology for this irrelevant response.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment