Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple Stata Installations on the same Operating System

    I need to run many do files concurrently. Using parallel and batcher packages, I have experimented with how many I can run. For whatever the reason, when the number of concurrent do files passes 20-24 range, they all get bugged down. The reason is NOT the cpu or RAM resources as I have plenty of those on the server I am running the do files. To overcome this I have installed a couple of Virtual Machines, but they get bugged down after 7-8 concurrent do files and they are much slower. So, with VMs I am not even getting 1/3 of the efficiency running the files as on running them on metal.

    Before, starting to purchase more hardware for independent Stata installations, I figured I should explore if multiple Stata Installations on the same Operating System (of course in different locations on the same hard disk) would work. Before I try and possibly mess things up with my current set up, I thought I should ask to see if anybody could offer any insight.

  • #2
    Hearing none, I bit my lips and installed a second Stata of the same flavor on the same operating system and it works

    For reference purposes to others interested, here is how i did it.

    1) I renamed the original Stata installation directory to StataV1
    2) Installed a new Stata with default options as if i am installing for the very first time
    3) To initiate the new Stata I copied the LIC file from StataV1 to Stata directory
    4) Renamed the new Stata dir to StataV2, so I can install a third one if I need to,
    5) Created a shortcut for each exe file and put them on desktop
    6) Launch V1 version with the shortcut and opened the batcher file and run it which launches 21 instances on V1
    7) Launch V2 version with the shortcut opened another batcher file and run it which launches an additional 15-20 instances on V2 without any signs of bugging down.
    8) I had to specify the full version of do file paths after installing second copy as just the name of the file in the same director does not seem to work anymore.
    9) You dont need to reinstall ssc packages if you have any as those are installed elsewhere in the hard disk and both versions appear to know where to find them.

    Here are some more supplemental information for large scale task parallelization that I found out mostly by trial and error.

    1) StataMP is dubbed the "Parallel edition" but what that really means is parallelization of some commands not to be confused with task parallelization (that is my best guess which could be wrong). Stata has a list of those commands somewhere on the internet that I know for sure
    2) I find batcher package as my preferred way of parallelizing tasks, parallel package is also available but It is more of a black box where you dont see individual windows running code (may be you can and I never found out who knows)
    3) Trying to use Virtual Machines for task parallelization is problematic in that they are slow because of the overhead. They are easy to set up for file sharing but if you want to use winexe and shell commands you need to establish a network between the host and the guest and also adjust firewall settings on both so they dont freak out. (Not an easy thing to do at least for me as establishing networks is not something I have done before nor do I want to) You can get around the network issue using a copy of say excel file to run a vbs script on the guest machine but then you have to change paths before you can run them if you want to grab data from the host. It works with only file sharing and without establishing a new network , but you have to code the new paths in your Stata do file.

    Comment


    • #3
      I worked a bit on panellising Stata (see multishell (Slides UK UGM 2018) and simluate2/psimulate2) and have a couple of comments respectively questions:
      1. Oscar is correct, Stata MP is not really MP. Some commands are optimized for parallel computing, but not all are supported (see the performance report: https://www.stata.com/statamp/report.pdf). From my perspective the main problems are the costs of Stata MP.
      2. You need to be careful when you do any calculation using the random number generator. Stata might produce the same random numbers if the default settings are used. Have a look at help rngseed and help rngstream. I have a section on both topics in the helpfile for multishell andsimulate2/psimulate2.
      3. multishell is able to connect Stata instances across PCs in a network. It is fairly simple to set up, but the program has some bugs.While the theory of connecting several machines in a network is simple, implementing it in Stata turned out to be pretty hard.
      4. What do you exactely mean with "bugged down"? Does Stata crash or does it get much slower? I am wondering what the reason for this behaviour is. I have run 15 parallel instances of Stata using simulate2 on a Virtual Machine and it worked fine.

      Comment


      • #4
        Hi Jan,
        1) I would be interested in trying multishell not for loops but for launching multiple instances if it allows use of locals in loops. I think the link u shared says it does not allow that. Is that possible?
        2) What I mean by bugged down is running slower. Let me try to explain in a little bit more detail about my saga. I have a pretty long do file with lots of 2-3 level loops, lets call that main.do. Main receives 15 or so local variables from other do files initiated by batcher. I have MP-4 core flavor, a server with 4 cpu sockets each has 10-core double thread processors (so 40 physical cores, 80 logical processors) and 32 gig RAM.

        My problem started when I wanted to run more than 20 do files concurrently. Using a single stata installation, I could open up 20-24 instances and they would run fine. When I pass 24 instances mark, each following instance would open ever slowly, an after 32-34 instances, it fails to open any more instances. The do file editor does not open new files. When I open 24-30 instances it not only runs them slow but all the instances (1-23) also slows down.

        So, i installed a VM to be able to have an independent stata operation. In the VM I toyed with allocating 4-32 cores with enough RAM but the same problem of not being able launch or slowness after a certain number instance started happening about 6-8 instances range compared to 21-24 instances on the host machine. I installed a second VM with similar results, of course the second VM allowed me to add 6-8 instances more but the speed was 1/3 of the host machine, so I decided to abandon the VM approach.

        I started thinking about investing in smaller servers (like two cpu sockets) and connecting them to the main server. However, it bothered to me see that I had so much unused cpu resources and RAM in the main server I started to think if Multiple Stata installations would work and it did.

        Now I have 3 stata installations (lets say V1, V2, V3) on the same hard disk. I launch about 20 in V1 and it opens up 20 instances. The weird thing is with 20 instances open they run very slowly (my cpu utilization stays at 1-2%) until I start minimizing the windows. When I start minimizing the windows the speed visibly improves and cpu usage goes up to what it should be. Then I launch additional 20 instances on V2. Same thing happens it opens 20 instances no problem and starts them just file. But they crawl until I start minimizing the stata windows.

        I tried to open additional 20 more instances using V3 but V3 does not even launch. I think because i have 40 instances already open and 40 physical cores. So, I tried later again when V1 and V2 finished about 15-20 do files. V3 opened and I successfully launched new 20 instances, initially they were very slow, but when I minimized the windows the speed and the cpu usage shot up.

        So, I believe I am making use of all 40 cores with this set up even though my cpu usage never goes above 50%. I am very curious though why speed is so slow until I start minimizing the windows and if there is a way to avoid that or automatically do that. I really dont want to be on standby to minimize the windows, which would defeat my ultimate goal of click once to get the dominos started and go have fun.





        Comment


        • #5
          Hi Oscar Ozfidan
          You could add each do file individually via multishell add or you add all the do files into a loop which then multishell processes. For example you have do_file1 to do_file20. Then the do file you add to multishell would look like:

          Code:
          forvalues i = 1(1)20 {
             do mypath/do_file`i'.do
          }
          Then multishell should create an instance for each of those do files.

          I just played around with Stata 16.0 SE on a virtual machine. I have no idea about server speed, but my feeling is the individual cores (or Stata instances) are not very fast. So I benefit more from scaling. I started 25 and it felt very slow. Now I am running some simulations now with 20 instances and might increase to 25 again. In both cases I am tracking the time and will keep it updated here.

          Is there anyone from Stata who can give some insights ( Alan Riley (StataCorp), Hua Peng (StataCorp), or others)?

          Comment


          • #6
            If I understand your hardware properly, you have 40 actual cores, and 80 cores if you count hyperthreading. Let's ignore hyperthreading and just call it 40 cores.

            You have a 4-core Stata/MP license, meaning each instance of Stata/MP will try to use 4 cores simultaneously if possible for whatever command is running at the time. If you launch 20 instances, and each of them is running a highly-parallelized command at the same time, then they are trying to use 80 cores simultaneously.

            If you don't care about the within-command parallelization of Stata/MP and instead want to run as many simultaneous single-threaded instances as possible, try putting -set processors 1- at the beginning of your do-file that you are running within each Stata/MP. I would guess that you'll be able to run more simultaneous instances.

            This all ignores potential I/O slowdowns from the various processes doing anything intensive with the hard drive, so keep that in mind as well.

            Comment


            • #7
              I have been trying many different things in the past few days.

              First a correction, while independent Stata installations on the same hard disk are possible, the limit on how many instances that can be launched does not appear to be improved with multiple installations. In #2 it appeared to me that way because one of the Stata's have already closed some instances and I missed that. The number of instances that I can launch is 36 (with single or multiple installations) which happens to be 40 physical cores, minus the 4 cores for my Stata flavor. Makes sense!

              I am getting the best results when I assign maximum cores allowed for VM (32 in my case) and launch no more than 10 instances on a VM. The work done by the VM does not appear to affect the performance on host at all. My guess is because VM utilizes the hyperthreading. The same applies for the 2nd VM. It also has access to 32 cores. So, with 2 VMs I can run 20 more instances with no downside on the host speed. The overall cpu usage on the host goes up to 65% with 31 instances running on host, 10 on VM1, and 10 on VM2. Since there is no downside on host speed and my overall cpu usage is still at 65% at during the most intense parts of the script, I plan on adding a third VM.

              It appears multishell requires separate do files (with arguments to be passed on to the main.do). That is not ideal in may case as I would have to manage hundreds of files which was the same issue with parallel. Using batcher, I can pass arguments to all the instances with a single do file. I have been in touch with the author of batcher and he seems to be interested in granting my wish to make it even easier to deal with limits on instances with a suggestion I had.

              I found out there is a "windows manage minimize" command that solved the slowness issue I was experiencing when all instances were full screen, and I think it is now incorporated in batcher.

              I will definitely experiment with Alan's suggestion on limiting the number of processors and report back on that when I can.

              Comment


              • #8
                I have been bench-marking execution times and from the results it appears an estimation command (gsem with nbreg option to be precise) that use approximately 1/3 of the execution time is not parallelized and the overhead from it is slowing things down significantly. That is 36 min under default 4-cores vs 16 minutes when I use a single processor for that estimation in one of the shortest files I have.

                A similar issue has also been also bugging down speed in files run on VM. 45 min vs 28 minutes when I use only a single processor for the entire file execution.
                The results are less than favorable for the MP-4-core flavor. Before I post my results on the list, I just wanted to make sure that gsem with nbreg option is indeed not parallelized or it is not a bug issue skewing the results.

                So, can this be checked by Stata please? Alan Riley (StataCorp), Hua Peng (StataCorp) JanDitzen

                Comment


                • #9
                  Oscar Ozfidan, would you please send me the dataset and the do-file you used for gsem with nbreg option which exhibits the performance issues? We will look into it. Also, would you include some short system information as well? The output of

                  Code:
                  about
                  query compilenumber
                  will help greatly.

                  You may email them to me directly at [email protected] or through tech support. Thanks.

                  Comment

                  Working...
                  X