I have a machine learning application in Python. And I'm using the multiprocessing module in Python to parallelize some of the work (specifically feature computation).
Now, multiprocessing works differently on Unix variants, and Windows OS.
Unix (mac/linux): fork/forkserver/spawn
Windows: spawn
Why multiprocessing.Process behave differently on windows and linux for global object and function arguments
Because of spawn being used on Windows, the launch of multiprocessing processes is really slow. It loads all the modules from scratch for each process on Windows.
Is there a way to speed up the creation of the extra processes on Windows? (using threads instead of multiple processes is not an option)
Instead of creating multiple new processes each time, I highly suggest using concurrent.futures ProcessPoolExecutor and leaving the executor open in the background.
That way, you don't create a new process each time, but rather leave them open in the background and pass some work using the module's functions or queues and pipes.
Bottom line - Don't create new processes each time. Leave them open and pass work.
Related
Two python scripts, A and B, have compatibility issues and need separate conda environments for them. Here's the scenario. When script A runs, it sends data to process B (Script B is running in a different terminal), and process B returns the output to process A (Process A cannot be put to sleep). I have been using pickle files for exchanging data between these two processes, but this method seems slow, and I would like to speed it up, which is necessary for my work.
make one program a child of the other using the subprocess module and have the communication over stdin and stdout. (fastest) (note you have to activate the other anaconda environment in the command to launch the child)
have one application be a server and attach to a socket on localhost, the other application is going to be the client using the socket module. (most organized and scalable solution)
make a part of the memory a shared memory that both applications can access and write and read from using multiprocessing.shared_memory (requires proper synchronization, but can be faster than first option for transferring GBs of data at a time), (wrapping it in an io.TextIOWrapper will make communication a lot easier, as easy as working with sockets)
I want to use python to write some script that runs parallel processes.
I learned that pexpect.spawn, multiprocessing.Process and concurrent.futures can all be used for parallel processing. I'm currently using pexpect.spawn, but I'm not sure whether pexpect.spawn will automatically utilize all the cores available.
This what top command returned when I used pexpect.spawn to start several workers. Are those processes run on different cores? How can I read that? Why the number of processes returned by top is much larger than the number of cores of my computer?
I am trying to use python's multiprocessing approach to speed up a program I am working on.
The python code runs in serial, but has calls to an mpi executable. It is these calls that I would like to do in parallel as they are independent of one another.
For each step the python script takes I have a set of calculations that must be done by the mpi program.
For example, if I am running over 24 cores, I would like the python script to call 3 instances of the mpi executable each running on 8 of the cores. Each time one mpi executable run ends, another instance is started until all members of a queue are finished.
I am just getting started using multiprocessing, and I am fairly certain this is possible, but I am not sure on how to go about doing it. I can set up a queue and have multiple processes start, it's the adding of the next set of calculations into the queue and starting them that is the issue.
If some kind soul could give me some pointers, or some example code, I'd be most obliged!
How to start an always on Python Interpreter on a server?
If bash starts multiple python programs, how can I run it on just one interpreter?
And how can I start a new interpreter after tracking number of bash requests, say after X requests to python programs, a new interpreter should start.
EDIT: Not a copy of https://stackoverflow.com/questions/16372590/should-i-run-1000-python-scripts-at-once?rq=1
Requests may come pouring in sequentially
You cannot have new Python programs started through bash run on the same interpreter, each program will always have its own. If you want to limit the number of Python programs running the best approach would be to have a Python daemon process running on your server and instead of creating a new program through bash on each request you would signal the daemon process to create a thread to handle the task.
To run a program forever in python:
while True :
do_work()
You could look at spawning threads for incoming request. Look at threading.Thread class.
from threading import Thread
task = new Thread(target=do_work, args={})
task.start()
You probably want to take a look at http://docs.python.org/3/library/threading.html and http://docs.python.org/3/library/multiprocessing.html; threading would be more lightweight but only allows one thread to execute at a time (meaning it won't take advantage of multicore/hyperthreaded systems), while multiprocessing allows for true simultaneous execution but can be a bit less lightweight than threading if you're on a system that doesn't utilize lightweight subprocesses and may not be as necessary if the threads/processes spend lots of time doing I/O requests.
I'm hosting Python script with Python for Delphi components inside my Delphi application. I'd like to create background tasks which keep running by script.
Is it possible to create threads which keep running even if the script execution ends (but not the host process, which keeps going on). I've noticed that the program gets stuck if the executing script ends and there is thread running. However if I'll wait until the thread is finished everything goes fine.
I'm trying to use "threading" standard module for threads.
Python has its own threading module that comes standard, if it helps. You can create thread objects using the threading module.
threading Documentation
thread Documentation
The thread module offers low level threading and synchronization using simple Lock objects.
Again, not sure if this helps since you're using Python under a Delphi environment.
If a process dies all it's threads die with it, so a solution might be a separate process.
See if creating a xmlrpc server might help you, that is a simple solution for interprocess communication.
Threads by definition are part of the same process. If you want them to keep running, they need to be forked off into a new process; see os.fork() and friends.
You'll probably want the new process to end (via exit() or the like) immediately after spawning the script.