I am trying to use python's multiprocessing approach to speed up a program I am working on.
The python code runs in serial, but has calls to an mpi executable. It is these calls that I would like to do in parallel as they are independent of one another.
For each step the python script takes I have a set of calculations that must be done by the mpi program.
For example, if I am running over 24 cores, I would like the python script to call 3 instances of the mpi executable each running on 8 of the cores. Each time one mpi executable run ends, another instance is started until all members of a queue are finished.
I am just getting started using multiprocessing, and I am fairly certain this is possible, but I am not sure on how to go about doing it. I can set up a queue and have multiple processes start, it's the adding of the next set of calculations into the queue and starting them that is the issue.
If some kind soul could give me some pointers, or some example code, I'd be most obliged!
Related
I was wondering whether running two scripts on a dual-core CPU (in parallel at the same time) decreases the speed of the codes as compared to running them serially (running at different times or on two different CPUs). What factors should be considered into account while trying to answer this question?
No; assuming the processes can run independently (neither one is waiting on the other one to move forward), they will run faster in parallel than in serial on a multicore system. For example, here's a little script called busy.py:
for i in range(400000000):
pass
Running this once, on its own:
$ time python busy.py
real 0m6.875s
user 0m6.871s
sys 0m0.004s
Running it twice, in serial:
$ time (python busy.py; python busy.py)
real 0m14.702s
user 0m14.701s
sys 0m0.001s
Running it twice, in parallel (on a multicore system - relying on the OS to assign different cores):
$ time (python busy.py & python busy.py)
real 0m7.849s
user 0m7.843s
sys 0m0.004s
If we run python busy.py & python busy.py or a single-core system, or simulate it with taskset, we get results that looks more like the serial case than the parallel case:
$ time (taskset 1 python busy.py & taskset 1 python busy.py)
real 0m13.057s
user 0m13.035s
sys 0m0.013s
There is some variance in these numbers because, as #tdelaney mentioned, there are other applications competing for my cores and context-switches are occurring. (You can see how many context-switches occurred with /usr/bin/time -v.)
Nonetheless, they get the idea across: running twice in serial necessarily takes twice as long as running once, as does running twice in "parallel" (context-switching) on a single core. Running twice in parallel on two separate cores takes only about as long as running once, because the two processes really can run at the same time. (Assuming they are not waiting on each other, competing for the same resource, etc.)
This is why the multiprocessing module is useful for parallelizable tasks in Python. (threading can also be useful, if the tasks are IO-bound rather than the CPU-bound.)
How Dual Cores Works When Running Scripts
running two separate scripts
if you run one script then another script on a dual core CPU, then
weather or not your scripts will run on each of the cores is dependent on
your Operating System.
running two separate threads
on a dual core CPU then your threads will actually be faster than on a single core.
I have a script that runs for about 3 hours. If I use multiprocessing, it takes about 20 minutes. The problem is the memory.
Say there is 10 tasks in the script. If I run it on a single core, it will go all the way through. But if I use multiprocessing, I have to stop it half way through to free up the memory and start the second half manually.
I've tried to use garbage collect, and using something like popen but it doesn't help.
I'm running the script in pycharm. Not sure if that matters on how the memory is handled.
I have 32GB of RAM. When the script starts, I'm already using 6GB. I'm running 64-bit Python 3.6. I have a 12-core processor.
I'm running windows and just using Pool from multiprocessing.
Running out of memory would make the script bomb. Thus I need to stop it half way and start it again from the 2nd half.
The question is how to run the script all the way through using multiprocessing without using fewer cores?
I have a machine learning application in Python. And I'm using the multiprocessing module in Python to parallelize some of the work (specifically feature computation).
Now, multiprocessing works differently on Unix variants, and Windows OS.
Unix (mac/linux): fork/forkserver/spawn
Windows: spawn
Why multiprocessing.Process behave differently on windows and linux for global object and function arguments
Because of spawn being used on Windows, the launch of multiprocessing processes is really slow. It loads all the modules from scratch for each process on Windows.
Is there a way to speed up the creation of the extra processes on Windows? (using threads instead of multiple processes is not an option)
Instead of creating multiple new processes each time, I highly suggest using concurrent.futures ProcessPoolExecutor and leaving the executor open in the background.
That way, you don't create a new process each time, but rather leave them open in the background and pass some work using the module's functions or queues and pipes.
Bottom line - Don't create new processes each time. Leave them open and pass work.
I have simple python code which is using 2 processes one is the main process and another which is created by multiprocessing module. Both processes runs in infinite loop. I want my python code to never crash/hang/freeze. I've already handled most of the errors/exceptions. FYI its a IOT project and I'm running this code as launcher in /etc/rc.local path. I tried using pid module from python as given here
Accoring to the link given the pid module works as below.
from pid import PidFile
with PidFile():
do_something()
My question is, does above logic meets my requirements or do I need to put some more logic like checking the existance of pid file and then decide to kill/stop/restart the process (or code itself) if any of the two processes freezes due to any bugs from code.
Please suggest is there any other way to achieve this, if pid module is not suitable for my requirement.
Hi I resolved this issue by creating a separate python scripts for both tasks rather using of multiprocessing modules such as queue. I suggest not to use multiprocessing queue inside infinite loops as it freezes the process/processes after some time.
How to start an always on Python Interpreter on a server?
If bash starts multiple python programs, how can I run it on just one interpreter?
And how can I start a new interpreter after tracking number of bash requests, say after X requests to python programs, a new interpreter should start.
EDIT: Not a copy of https://stackoverflow.com/questions/16372590/should-i-run-1000-python-scripts-at-once?rq=1
Requests may come pouring in sequentially
You cannot have new Python programs started through bash run on the same interpreter, each program will always have its own. If you want to limit the number of Python programs running the best approach would be to have a Python daemon process running on your server and instead of creating a new program through bash on each request you would signal the daemon process to create a thread to handle the task.
To run a program forever in python:
while True :
do_work()
You could look at spawning threads for incoming request. Look at threading.Thread class.
from threading import Thread
task = new Thread(target=do_work, args={})
task.start()
You probably want to take a look at http://docs.python.org/3/library/threading.html and http://docs.python.org/3/library/multiprocessing.html; threading would be more lightweight but only allows one thread to execute at a time (meaning it won't take advantage of multicore/hyperthreaded systems), while multiprocessing allows for true simultaneous execution but can be a bit less lightweight than threading if you're on a system that doesn't utilize lightweight subprocesses and may not be as necessary if the threads/processes spend lots of time doing I/O requests.