Can I do multithreads on each process of a multiprocess program?
For example, let say I have 4 cores available, can I add 30 threads to each of these 4 cores?
This might sound confusing so here's a sample code that shows my question better
from multiprocessing import Process
from threading import Thread
if __name__ == "__main__":
processes = []
for i in range(4):
processes.append(Process(target=target))
for p in processes:
# Can I add threads on each of these processes
# p.append(Thread(target=target2))
p.start()
for p in processes:
p.join()
This is not for a specific project it's just for my general knowledge.
Thank you
Yes, each Process can spawn in their own Thread objects. In fact, when you are using Threads without the multiprocessing module you are witnessing this since your main script is being run in its own process and it is spawning Threads! To have many processes each with their own threads will quickly become complicated to manage shared memory though (mostly because processes have separate memory), and you will have to be very careful to avoid deadlock. Your script will likely be quite lengthy to accomplish something useful with this technique. I think in general it would be best to stick to one or the other. To quote this post that you would probably be interested in:
Spawning processes is a bit slower than spawning threads. Once they are running, there is not much difference.
Related
I am using Python's multiprocessing.Pool class to distribute tasks among processes.
The simple case works as expected:
from multiprocessing import Pool
def evaluate:
do_something()
pool = Pool(processes=N)
for task in tasks:
pool.apply_async(evaluate, (data,))
N processes are spawned, and they continually work through the tasks that I pass into apply_async. Now, I have another case where I have many different very complex objects which each need to do computationally heavy activity. I initially let each object create its own multiprocessing.Pool on demand at the time it was completing work, but I eventually ran into OSError for having too many files open, even though I would have assumed that the pools would get garbage collected after use.
At any rate, I decided it would be preferable anyway for each of these complex objects to share the same Pool for computations:
from multiprocessing import Pool
def evaluate:
do_something()
pool = Pool(processes=N)
class ComplexClass:
def work:
for task in tasks:
self.pool.apply_async(evaluate, (data,))
objects = [ComplexClass() for i in range(50)]
for complex in objects:
complex.pool = pool
while True:
for complex in objects:
complex.work()
Now, when I run this on one of my computers (OS X, Python=3.4), it works just as expected. N processes are spawned, and each complex object distributes their tasks among each of them. However, when I ran it on another machine (Google Cloud instance running Ubuntu, Python=3.5), it spawns an enormous number of processes (>> N) and the entire program grinds to a halt due to contention.
If I check the pool for more information:
import random
random_object = random.sample(objects, 1)
print (random_object.pool.processes)
>>> N
Everything looks correct. But it's clearly not. Any ideas what may be going on?
UPDATE
I added some additional logging. I set the pool size to 1 for simplicity. Within the pool, as a task is being completed, I print the current_process() from the multiprocessing module, as well as the pid of the task using os.getpid(). It results in something like this:
<ForkProcess(ForkPoolWorker-1, started daemon)>, PID: 5122
<ForkProcess(ForkPoolWorker-1, started daemon)>, PID: 5122
<ForkProcess(ForkPoolWorker-1, started daemon)>, PID: 5122
<ForkProcess(ForkPoolWorker-1, started daemon)>, PID: 5122
...
Again, looking at actually activity using htop, I'm seeing many processes (one per object sharing the multiprocessing pool) all consuming CPU cycles as this is happening, resulting in so much OS contention that progress is very slow. 5122 appears to be the parent process.
1. Infinite Loop implemented
If you implement an infinite loop, then it will run like an infinite loop.
Your example (which does not work at all due to other reasons) ...
while True:
for complex in objects:
complex.work()
2. Spawn or Fork Processes?
Even though your code above shows only some snippets, you cannot expect the same results on Windows / MacOS on the one hand and Linux on the other. The former spawn processes, the latter fork them. If you use global variables which can have state, you will run into troubles when developing on one environment and running on the other.
Make sure, not to use global statefull variables in your processes. Just pass them explicitly or get rid of them in another way.
3. Use a Program, not a Script
Write a program with the minimal requirement to have a __main__. Especially, when you use Multiprocessing you need this. Instantiate your Pool in that namespace.
1) Your question contains code which is different from what you run (Code in question has incorrect syntax and cannot be run at all).
2) multiprocessing module is extremely bad in error handling/reporting for errors that happen in workers.
The problem is very likely in code that you don't show. Code you show (if fixed) will just work forever and eat CPU, but it will not cause errors with too many open files or processes.
Just being noob in this context:
I am try to run one function in multiple processes so I can process a huge file in shorter time
I tried
for file_chunk in file_chunks:
p = Process(target=my_func, args=(file_chunk, my_arg2))
p.start()
# without .join(), otherwise main proc has to wait
# for proc1 to finish so it can start proc2
but it seemed not so really fast enough
now I ask myself, if it is really running the jobs parallelly. I thought about Pool also, but I am using python2 and it is ugly to make it map two arguments to the function.
am I missing something in my code above or the processes that are created this way (like above) run really paralelly?
The speedup is proportional to the amount of CPU cores your PC has, not the amount of chunks.
Ideally, if you have 4 CPU cores, you should see a 4x speedup. Yet other factors such as IPC overhead must be taken into account when considering the performance improvement.
Spawning too many processes will also negatively affect your performance as they will compete against each other for the CPU.
I'd recommend to use a multiprocessing.Pool to deal with most of the logic. If you have multiple arguments, just use the apply_async method.
from multiprocessing import Pool
pool = Pool()
for file_chunk in file_chunks:
pool.apply_async(my_func, args=(file_chunk, arg1, arg2))
I am not an expert either, but what you should try is using joblib Parallel
from joblib import Parallel, delayed
import multiprocessing as mp
def random_function(args):
pass
proc = mp.cpu_count()
Parallel(n_jobs=proc)(delayed(random_function)(args) for args in args_list)
This will run a certain function (random_function) using a number of available cpus (n_jobs).
Feel free to read the docs!
I have 24 cores on my machine, but I just can't get them all running. When I top, only 3 processes are running, and usually only one hits 100% CPU, the other two ~30%.
I've read all the related threads on this site, but still can't figure out what's wrong with my code.
Pseudocode of how I used pool is as follows
import multiprocessing as mp
def Foo():
pool = mp.Pool(mp.cpu_count())
def myCallbackFun():
pool.map(myFunc_wrapper, myArgs)
optimization(callback=myCallbackFun) # scipy optimization that has a callback function.
Using pdb, I stopped before optimization, and checked I indeed have 24 workers.
But when I resume the program, top tells me I only have three Python processes running. Another thing is, when I ctrl-c to terminate my program, it has soooo many workers to interrupt (e.g., PoolWorker-367) -- I've pressing ctrl-c for minutes, but there are still workers out there. Shouldn't there be just 24 workers?
How to make my program use all CPUs?
With multiprocessing Python starts new processes. With a script like yours it will fork infinitely. You need to wrap the script part of your module like this:
import multiprocessing as mp
if __name__ == '__main__':
pool = mp.Pool(24)
pool.map(myFunc_wrapper, myArgs)
For future readers --
As #mata correctly points out,
You may be running into an IO bottleneck if your involved arguments
are very big
This is indeed my case. Try to minimize the size of the arguments passed to each process.
I have a Python program that takes around 10 minutes to execute. So I use Pool from multiprocessing to speed things up:
from multiprocessing import Pool
p = Pool(processes = 6) # I have an 8 thread processor
results = p.map( function, argument_list ) # distributes work over 6 processes!
It runs much quicker, just from that. God bless Python! And so I thought that would be it.
However I've noticed that each time I do this, the processes and their considerably sized state remain, even when p has gone out of scope; effectively, I've created a memory leak. The processes show up in my System Monitor application as Python processes, which use no CPU at this point, but considerable memory to maintain their state.
Pool has functions close, terminate, and join, and I'd assume one of these will kill the processes. Does anyone know which is the best way to tell my pool p that I am finished with it?
Thanks a lot for your help!
From the Python docs, it looks like you need to do:
p.close()
p.join()
after the map() to indicate that the workers should terminate and then wait for them to do so.
Is there any easy way to make 2 methods, let's say MethodA() and MethodB() run in 2 different cores? I don't mean 2 different threads. I'm running in Windows, but I'd like to know if it is possible to be platform independent.
edit: And what about
http://docs.python.org/dev/library/multiprocessing.html
and
parallel python ?
You have to use separate processes (because of the often-mentioned GIL). The multiprocessing module is here to help.
from multiprocessing import Process
from somewhere import A, B
if __name__ == '__main__':
procs = [ Process(target=t) for t in (A,B) ]
for p in procs:
p.start()
for p in procs:
p.join()
Assuming you use CPython (the reference implementation) the answer is NO because of the Global Interpreter Lock. In CPython threads are mainly used when there is much IO to do (one thread waits, another does computation).
In general, running different threads is the best portable way to run on multiple cores. Of course, in Python, the global interpreter lock makes this a moot point -- only one thread will make progress at a time.
Because of the global interpreter lock, Python programs only ever run one thread at a time. If you want true multicore Python programming, you could look into Jython (which has access to the JVM's threads), or the brilliant stackless, which has Go-like channels and tasklets.