Is there any easy way to make 2 methods, let's say MethodA() and MethodB() run in 2 different cores? I don't mean 2 different threads. I'm running in Windows, but I'd like to know if it is possible to be platform independent.
edit: And what about
http://docs.python.org/dev/library/multiprocessing.html
and
parallel python ?
You have to use separate processes (because of the often-mentioned GIL). The multiprocessing module is here to help.
from multiprocessing import Process
from somewhere import A, B
if __name__ == '__main__':
procs = [ Process(target=t) for t in (A,B) ]
for p in procs:
p.start()
for p in procs:
p.join()
Assuming you use CPython (the reference implementation) the answer is NO because of the Global Interpreter Lock. In CPython threads are mainly used when there is much IO to do (one thread waits, another does computation).
In general, running different threads is the best portable way to run on multiple cores. Of course, in Python, the global interpreter lock makes this a moot point -- only one thread will make progress at a time.
Because of the global interpreter lock, Python programs only ever run one thread at a time. If you want true multicore Python programming, you could look into Jython (which has access to the JVM's threads), or the brilliant stackless, which has Go-like channels and tasklets.
Related
Can I do multithreads on each process of a multiprocess program?
For example, let say I have 4 cores available, can I add 30 threads to each of these 4 cores?
This might sound confusing so here's a sample code that shows my question better
from multiprocessing import Process
from threading import Thread
if __name__ == "__main__":
processes = []
for i in range(4):
processes.append(Process(target=target))
for p in processes:
# Can I add threads on each of these processes
# p.append(Thread(target=target2))
p.start()
for p in processes:
p.join()
This is not for a specific project it's just for my general knowledge.
Thank you
Yes, each Process can spawn in their own Thread objects. In fact, when you are using Threads without the multiprocessing module you are witnessing this since your main script is being run in its own process and it is spawning Threads! To have many processes each with their own threads will quickly become complicated to manage shared memory though (mostly because processes have separate memory), and you will have to be very careful to avoid deadlock. Your script will likely be quite lengthy to accomplish something useful with this technique. I think in general it would be best to stick to one or the other. To quote this post that you would probably be interested in:
Spawning processes is a bit slower than spawning threads. Once they are running, there is not much difference.
The setup
I have written a pretty complex piece of software in Python (on a Windows PC). My software starts basically two Python interpreter shells. The first shell starts up (I suppose) when you double click the main.py file. Within that shell, other threads are started in the following way:
# Start TCP_thread
TCP_thread = threading.Thread(name = 'TCP_loop', target = TCP_loop, args = (TCPsock,))
TCP_thread.start()
# Start UDP_thread
UDP_thread = threading.Thread(name = 'UDP_loop', target = UDP_loop, args = (UDPsock,))
TCP_thread.start()
The Main_thread starts a TCP_thread and a UDP_thread. Although these are separate threads, they all run within one single Python shell.
The Main_threadalso starts a subprocess. This is done in the following way:
p = subprocess.Popen(['python', mySubprocessPath], shell=True)
From the Python documentation, I understand that this subprocess is running simultaneously (!) in a separate Python interpreter session/shell. The Main_threadin this subprocess is completely dedicated to my GUI. The GUI starts a TCP_thread for all its communications.
I know that things get a bit complicated. Therefore I have summarized the whole setup in this figure:
I have several questions concerning this setup. I will list them down here:
Question 1 [Solved]
Is it true that a Python interpreter uses only one CPU core at a time to run all the threads? In other words, will the Python interpreter session 1 (from the figure) run all 3 threads (Main_thread, TCP_thread and UDP_thread) on one CPU core?
Answer: yes, this is true. The GIL (Global Interpreter Lock) ensures that all threads run on one CPU core at a time.
Question 2 [Not yet solved]
Do I have a way to track which CPU core it is?
Question 3 [Partly solved]
For this question we forget about threads, but we focus on the subprocess mechanism in Python. Starting a new subprocess implies starting up a new Python interpreter instance. Is this correct?
Answer: Yes this is correct. At first there was some confusion about whether the following code would create a new Python interpreter instance:
p = subprocess.Popen(['python', mySubprocessPath], shell = True)
The issue has been clarified. This code indeed starts a new Python interpreter instance.
Will Python be smart enough to make that separate Python interpreter instance run on a different CPU core? Is there a way to track which one, perhaps with some sporadic print statements as well?
Question 4 [New question]
The community discussion raised a new question. There are apparently two approaches when spawning a new process (within a new Python interpreter instance):
# Approach 1(a)
p = subprocess.Popen(['python', mySubprocessPath], shell = True)
# Approach 1(b) (J.F. Sebastian)
p = subprocess.Popen([sys.executable, mySubprocessPath])
# Approach 2
p = multiprocessing.Process(target=foo, args=(q,))
The second approach has the obvious downside that it targets just a function - whereas I need to open up a new Python script. Anyway, are both approaches similar in what they achieve?
Q: Is it true that a Python interpreter uses only one CPU core at a time to run all the threads?
No. GIL and CPU affinity are unrelated concepts. GIL can be released during blocking I/O operations, long CPU intensive computations inside a C extension anyway.
If a thread is blocked on GIL; it is probably not on any CPU core and therefore it is fair to say that pure Python multithreading code may use only one CPU core at a time on CPython implementation.
Q: In other words, will the Python interpreter session 1 (from the figure) run all 3 threads (Main_thread, TCP_thread and UDP_thread) on one CPU core?
I don't think CPython manages CPU affinity implicitly. It is likely relies on OS scheduler to choose where to run a thread. Python threads are implemented on top of real OS threads.
Q: Or is the Python interpreter able to spread them over multiple cores?
To find out the number of usable CPUs:
>>> import os
>>> len(os.sched_getaffinity(0))
16
Again, whether or not threads are scheduled on different CPUs does not depend on Python interpreter.
Q: Suppose that the answer to Question 1 is 'multiple cores', do I have a way to track on which core each thread is running, perhaps with some sporadic print statements? If the answer to Question 1 is 'only one core', do I have a way to track which one it is?
I imagine, a specific CPU may change from one time-slot to another. You could look at something like /proc/<pid>/task/<tid>/status on old Linux kernels. On my machine, task_cpu can be read from /proc/<pid>/stat or /proc/<pid>/task/<tid>/stat:
>>> open("/proc/{pid}/stat".format(pid=os.getpid()), 'rb').read().split()[-14]
'4'
For a current portable solution, see whether psutil exposes such info.
You could restrict the current process to a set of CPUs:
os.sched_setaffinity(0, {0}) # current process on 0-th core
Q: For this question we forget about threads, but we focus on the subprocess mechanism in Python. Starting a new subprocess implies starting up a new Python interpreter session/shell. Is this correct?
Yes. subprocess module creates new OS processes. If you run python executable then it starts a new Python interpeter. If you run a bash script then no new Python interpreter is created i.e., running bash executable does not start a new Python interpreter/session/etc.
Q: Supposing that it is correct, will Python be smart enough to make that separate interpreter session run on a different CPU core? Is there a way to track this, perhaps with some sporadic print statements as well?
See above (i.e., OS decides where to run your thread and there could be OS API that exposes where the thread is run).
multiprocessing.Process(target=foo, args=(q,)).start()
multiprocessing.Process also creates a new OS process (that runs a new Python interpreter).
In reality, my subprocess is another file. So this example won't work for me.
Python uses modules to organize the code. If your code is in another_file.py then import another_file in your main module and pass another_file.foo to multiprocessing.Process.
Nevertheless, how would you compare it to p = subprocess.Popen(..)? Does it matter if I start the new process (or should I say 'python interpreter instance') with subprocess.Popen(..)versus multiprocessing.Process(..)?
multiprocessing.Process() is likely implemented on top of subprocess.Popen(). multiprocessing provides API that is similar to threading API and it abstracts away details of communication between python processes (how Python objects are serialized to be sent between processes).
If there are no CPU intensive tasks then you could run your GUI and I/O threads in a single process. If you have a series of CPU intensive tasks then to utilize multiple CPUs at once, either use multiple threads with C extensions such as lxml, regex, numpy (or your own one created using Cython) that can release GIL during long computations or offload them into separate processes (a simple way is to use a process pool such as provided by concurrent.futures).
Q: The community discussion raised a new question. There are apparently two approaches when spawning a new process (within a new Python interpreter instance):
# Approach 1(a)
p = subprocess.Popen(['python', mySubprocessPath], shell = True)
# Approach 1(b) (J.F. Sebastian)
p = subprocess.Popen([sys.executable, mySubprocessPath])
# Approach 2
p = multiprocessing.Process(target=foo, args=(q,))
"Approach 1(a)" is wrong on POSIX (though it may work on Windows). For portability, use "Approach 1(b)" unless you know you need cmd.exe (pass a string in this case, to make sure that the correct command-line escaping is used).
The second approach has the obvious downside that it targets just a function - whereas I need to open up a new Python script. Anyway, are both approaches similar in what they achieve?
subprocess creates new processes, any processes e.g., you could run a bash script. multprocessing is used to run Python code in another process. It is more flexible to import a Python module and run its function than to run it as a script. See Call python script with input with in a python script using subprocess.
Since you are using the threading module which is build up on thread. As the documentation suggests, it uses the ''POSIX thread implementation'' pthread of your OS.
The threads are managed by the OS instead of Python interpreter. So the answer will depend on the pthread library in your system. However, CPython uses GIL to prevent multiple threads from executing Python bytecodes simutanously. So they will be sequentialized. But still they can be separated to different cores, which depends on your pthread libs.
Simplly use a debugger and attach it to your python.exe. For example the GDB thread command.
Similar to question 1, the new process is managed by your OS and probably running on a different core. Use debugger or any process monitor to see it. For more details, go to the CreatProcess() documentation page.
1, 2: You have three real threads, but in CPython they're limited by GIL , so, assuming they're running pure python, code you'll see CPU usage as if only one core used.
3: As said gdlmx it's up to OS to choose a core to run a thread on,
but if you really need control, you can set process or thread affinity using
native API via ctypes. Since you are on Windows, it would be like this:
# This will run your subprocess on core#0 only
p = subprocess.Popen(['python', mySubprocessPath], shell = True)
cpu_mask = 1
ctypes.windll.kernel32.SetProcessAffinityMask(p._handle, cpu_mask)
I use here private Popen._handle for simplicty. The clean way would beOpenProcess(p.tid) etc.
And yes, subprocess runs python like everything else in another new process.
The following code seems to be executed sequentially rather than concurrently.
And it only made use of one CPU core.
Is there a way to make it use multiple cores or switch content between threads?
(I hope it could work like Thread class in java.)
import threading
def work(s) :
for i in range(100) :
print s
for j in range (12345678) :
pass
a = []
for i in range(3) :
thd = threading.Thread(target = work('#'+str(i)))
a.append(thd)
for k in a : k.start()
for k in a : k.join()
print "Ended."
Threads cannot utilize multiple cores in Python. Processes however can.
multiprocessing is a package that supports spawning processes using an
API similar to the threading module. The multiprocessing package
offers both local and remote concurrency, effectively side-stepping
the Global Interpreter Lock by using subprocesses instead of threads.
Due to this, the multiprocessing module allows the programmer to fully
leverage multiple processors on a given machine. It runs on both Unix
and Windows.
Click here for more information
A friend of mine asked me this once. In your case, just use multiprocessing.process and that will use all your cores.
I have a python script that has to take many permutations of a large dataset, score each permutation, and retain only the highest scoring permutations. The dataset is so large that this script takes almost 3 days to run.
When I check my system resources in windows, only 12% of my CPU is being used and only 4 out of 8 cores are working at all. Even if I put the python.exe process at highest priority, this doesn't change.
My assumption is that dedicating more CPU usage to running the script could make it run faster, but my ultimate goal is to reduce the runtime by at least half. Is there a python module or some code that could help me do this? As an aside, does this sound like a problem that could benefit from a smarter algorithm?
Thank you in advance!
There are a few ways to go about this, but check out the multiprocessing module. This is a standard library module for creating multiple processes, similar to threads but without the limitations of the GIL.
You can also look into the excellent Celery library. This is a distrubuted task queue, and has a lot of great features. Its a pretty easy install, and easy to get started with.
I can answer a HOW-TO with a simple code sample. While this is running, run /bin/top and see your processes. Simple to do. Note, I've even included how to clean up afterwards from a keyboard interrupt - without that, your subprocesses will keep running and you'll have to kill them manually.
from multiprocessing import Process
import traceback
import logging
import time
class AllDoneException(Exception):
pass
class Dum(object):
def __init__(self):
self.numProcesses = 10
self.logger = logging.getLogger()
self.logger.setLevel(logging.INFO)
self.logger.addHandler(logging.StreamHandler())
def myRoutineHere(self, processNumber):
print "I'm in process number %d" % (processNumber)
time.sleep(10)
# optional: raise AllDoneException
def myRoutine(self):
plist = []
try:
for pnum in range(0, self.numProcesses):
p = Process(target=self.myRoutineHere, args=(pnum, ))
p.start()
plist.append(p)
while 1:
isAliveList = [p.is_alive() for p in plist]
if not True in isAliveList:
break
time.sleep(1)
except KeyboardInterrupt:
self.logger.warning("Caught keyboard interrupt, exiting.")
except AllDoneException:
self.logger.warning("Caught AllDoneException, Exiting normally.")
except:
self.logger.warning("Caught Exception, exiting: %s" % (traceback.format_exc()))
for p in plist:
p.terminate()
d = Dum()
d.myRoutine()
You should spawn new processes instead of threads to utilize cores in your CPU. My general rule is one process per core. So you split your problem input space into the number of cores available, each process getting part of the problem space.
Multiprocessing is best for this. You could also use Parallel Python.
Very late to the party - but in addition to using multiprocessing module as reptilicus said, also make sure to set "affinity".
Some python modules fiddle with it, effectively lowering the number of cores available to Python:
https://stackoverflow.com/a/15641148/4195846
Due to Global Interpreter Lock one Python process cannot take advantage of multiple cores. But if you can somehow parallelize your problem (which you should do anyway), then you can use multiprocessing to spawn as many Python processes as you have cores and process that data in each subprocess.
Here is the pseudo code for what I want to do.
import time
def run():
while x < 10000000:
x += 1
if __name__ == "__main__":
p = Process(run)
p.start()
time.sleep(3)
#some code that I don't know that will give me the current value of x
Pythons threading module seems to be the way to go however I have yet to successfully implement this example.
Everything you need is in the multiprocessing module. Perhaps a shared memory object would help here?
Note that threading in Python is affected by the Global Interpreter Lock, which essentially prevents multithreaded Python code.
Well here it is
from multiprocessing import Process, Pipe
import time
def f(conn):
x = 0
while x < 10000000:
if conn.poll():
if conn.recv() == "get":
conn.send(x)
x += 1
conn.close()
if __name__ == '__main__':
parent_conn, child_conn = Pipe()
p = Process(target=f, args=(child_conn,))
p.start()
time.sleep(2)
parent_conn.send("get")
print(parent_conn.recv())
p.join()
turned out to be a duplicate, my version is just more generic.
It really depends on what you're trying to accomplish and the frequency of creation and memory usage of your subprocesses. A few long-lived ones, and you can easily get away with multiple OS-level processes (see the subprocess module`). If you're spawning a lot of little ones, threading is faster and has less memory overhead. But with threading you run into problems like "thread safety", the global interpreter lock, and nasty, boring stuff like semaphores and deadlocks.
Data sharing strategies between two processes or threads can be roughly divided into two categories: "Let's share a block of memory" (using Locks and Mutexes) and "Let's share copies of data" (using messaging, pipes, or sockets). The sharing method is light on memory, but difficult to manage because it means ensuring that one thread doesn't read the same part of shared memory as another thread is writing to it, which is not trivial and hard to debug. The copying method is heavier on memory, but easier to make sense of. Also, it has the distinct advantage of being able to be pretty trivially ported to a network, allowing for distributed computing.
You'll also have to think about the underlying OS. I don't know the specifics, but some are better than others at different approaches.
I'd say start with something like RabbitMQ.