Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using stata in batch mode - issue with random numbers and saving output with task array id

    Hello,

    I am new to posting on the forum, but I follow it frequently. Thank you to the various contributors, It is very helpful.

    I have a regression that I need to run 10 (I need to run it 1000 times, but the trial run is for 10 times) times and save the output using regsave. In each iteration, there is one variable (id_random) that needs to be randomly generated. It is one of the independent variables in the regression, which has a large number of fixed effects. Therefore, I am trying to use batch mode. My dataset is an unbalanced panel dataset with firms across years.

    I am having two issues
    1) all 10 iterations seem to start from the same seed. I could define the seed every time as set seed `=123456+`SLURM_ARRAY_TASK_ID'' but I am having difficulty capturing the SLURM_ARRAY_TASK_ID in my do file.
    2) the output is not getting saved with the task id suffix. I want each iteration to have a stata output file _FEresults_1 _FEresults_2 and so on up to _FEresults_10 based on the regsave command in my do file
    regsave using ./_FEresults_SLURM_ARRAY_TASK_ID,
    again the issue seems to be that I am having difficulty capturing the SLURM_ARRAY_TASK_ID in my do file


    I submit the following script for batch mode.
    -------------------------------------------------------------------
    #!/bin/bash
    #SBATCH -J state # Job name
    #SBATCH -o stata_%A-%a.out # Job output file name
    #SBATCH --array=1-10 # Replace with your range
    #SBATCH -p standard-mem-s # Job queue
    #SBATCH -c 6 # Cores
    #SBATCH --mem-per-cpu=6G # Memory

    module purge
    module load stata/mp-15

    stata-mp do randomization.do ${SLURM_CPUS_PER_TASK} ${SLURM_ARRAY_TASK_ID}

    The randomization.do file contains the following:
    -----------------------------------------------------
    ssc install regsave

    ***my input file
    use ./_Main, clear

    ***drop the variable from prior iteration and generate a new random number
    drop id_random
    gen id_random = runiform(0 , 1200)


    ***regression
    sort firm year
    tsset firm year, annual
    reg y size i.firm i.year id_random, vce(cluster firm)
    regsave using ./_FEresults_SLURM_ARRAY_TASK_ID, replace ci level(95) detail(scalars)


    My output is as follows:
    -------------------------------
    10 log files named stata_SLURM_CPUS_PER_TASK_SLURM_ARRAY_TASK_ID
    single stata output file _FEresults_SLURM_ARRAY_TASK_ID instead of _FEresults_1 _FEresults_2 and so on up to _FEresults_10


    How do I get the do file to recognize and use the array task id?

    Thank you!
    Gauri
    Last edited by Gauri Bhat; 13 Jun 2022, 22:02.

  • #2
    When you feed arguments to a do file, they are stored as locals `1', `2', etc. Try using the following code for your regsave line:
    Code:
    regsave using "./_`2'", replace ci level(95) detail(scalars)
    Associate Professor of Finance and Economics
    University of Illinois
    www.julianreif.com

    Comment


    • #3
      Thank you very much! That worked.

      Comment

      Working...
      X