On what CPU cores are my Python processes running?

On what CPU cores are my Python processes running? - python

The setup
I have written a pretty complex piece of software in Python (on a Windows PC). My software starts basically two Python interpreter shells. The first shell starts up (I suppose) when you double click the main.py file. Within that shell, other threads are started in the following way:
# Start TCP_thread
TCP_thread = threading.Thread(name = 'TCP_loop', target = TCP_loop, args = (TCPsock,))
TCP_thread.start()
# Start UDP_thread
UDP_thread = threading.Thread(name = 'UDP_loop', target = UDP_loop, args = (UDPsock,))
TCP_thread.start()
The Main_thread starts a TCP_thread and a UDP_thread. Although these are separate threads, they all run within one single Python shell.
The Main_threadalso starts a subprocess. This is done in the following way:
p = subprocess.Popen(['python', mySubprocessPath], shell=True)
From the Python documentation, I understand that this subprocess is running simultaneously (!) in a separate Python interpreter session/shell. The Main_threadin this subprocess is completely dedicated to my GUI. The GUI starts a TCP_thread for all its communications.
I know that things get a bit complicated. Therefore I have summarized the whole setup in this figure:
I have several questions concerning this setup. I will list them down here:
Question 1 [Solved]
Is it true that a Python interpreter uses only one CPU core at a time to run all the threads? In other words, will the Python interpreter session 1 (from the figure) run all 3 threads (Main_thread, TCP_thread and UDP_thread) on one CPU core?
Answer: yes, this is true. The GIL (Global Interpreter Lock) ensures that all threads run on one CPU core at a time.
Question 2 [Not yet solved]
Do I have a way to track which CPU core it is?
Question 3 [Partly solved]
For this question we forget about threads, but we focus on the subprocess mechanism in Python. Starting a new subprocess implies starting up a new Python interpreter instance. Is this correct?
Answer: Yes this is correct. At first there was some confusion about whether the following code would create a new Python interpreter instance:
p = subprocess.Popen(['python', mySubprocessPath], shell = True)
The issue has been clarified. This code indeed starts a new Python interpreter instance.
Will Python be smart enough to make that separate Python interpreter instance run on a different CPU core? Is there a way to track which one, perhaps with some sporadic print statements as well?
Question 4 [New question]
The community discussion raised a new question. There are apparently two approaches when spawning a new process (within a new Python interpreter instance):
# Approach 1(a)
p = subprocess.Popen(['python', mySubprocessPath], shell = True)
# Approach 1(b) (J.F. Sebastian)
p = subprocess.Popen([sys.executable, mySubprocessPath])
# Approach 2
p = multiprocessing.Process(target=foo, args=(q,))
The second approach has the obvious downside that it targets just a function - whereas I need to open up a new Python script. Anyway, are both approaches similar in what they achieve?

Q: Is it true that a Python interpreter uses only one CPU core at a time to run all the threads?
No. GIL and CPU affinity are unrelated concepts. GIL can be released during blocking I/O operations, long CPU intensive computations inside a C extension anyway.
If a thread is blocked on GIL; it is probably not on any CPU core and therefore it is fair to say that pure Python multithreading code may use only one CPU core at a time on CPython implementation.
Q: In other words, will the Python interpreter session 1 (from the figure) run all 3 threads (Main_thread, TCP_thread and UDP_thread) on one CPU core?
I don't think CPython manages CPU affinity implicitly. It is likely relies on OS scheduler to choose where to run a thread. Python threads are implemented on top of real OS threads.
Q: Or is the Python interpreter able to spread them over multiple cores?
To find out the number of usable CPUs:
>>> import os
>>> len(os.sched_getaffinity(0))
16
Again, whether or not threads are scheduled on different CPUs does not depend on Python interpreter.
Q: Suppose that the answer to Question 1 is 'multiple cores', do I have a way to track on which core each thread is running, perhaps with some sporadic print statements? If the answer to Question 1 is 'only one core', do I have a way to track which one it is?
I imagine, a specific CPU may change from one time-slot to another. You could look at something like /proc/<pid>/task/<tid>/status on old Linux kernels. On my machine, task_cpu can be read from /proc/<pid>/stat or /proc/<pid>/task/<tid>/stat:
>>> open("/proc/{pid}/stat".format(pid=os.getpid()), 'rb').read().split()[-14]
'4'
For a current portable solution, see whether psutil exposes such info.
You could restrict the current process to a set of CPUs:
os.sched_setaffinity(0, {0}) # current process on 0-th core
Q: For this question we forget about threads, but we focus on the subprocess mechanism in Python. Starting a new subprocess implies starting up a new Python interpreter session/shell. Is this correct?
Yes. subprocess module creates new OS processes. If you run python executable then it starts a new Python interpeter. If you run a bash script then no new Python interpreter is created i.e., running bash executable does not start a new Python interpreter/session/etc.
Q: Supposing that it is correct, will Python be smart enough to make that separate interpreter session run on a different CPU core? Is there a way to track this, perhaps with some sporadic print statements as well?
See above (i.e., OS decides where to run your thread and there could be OS API that exposes where the thread is run).
multiprocessing.Process(target=foo, args=(q,)).start()
multiprocessing.Process also creates a new OS process (that runs a new Python interpreter).
In reality, my subprocess is another file. So this example won't work for me.
Python uses modules to organize the code. If your code is in another_file.py then import another_file in your main module and pass another_file.foo to multiprocessing.Process.
Nevertheless, how would you compare it to p = subprocess.Popen(..)? Does it matter if I start the new process (or should I say 'python interpreter instance') with subprocess.Popen(..)versus multiprocessing.Process(..)?
multiprocessing.Process() is likely implemented on top of subprocess.Popen(). multiprocessing provides API that is similar to threading API and it abstracts away details of communication between python processes (how Python objects are serialized to be sent between processes).
If there are no CPU intensive tasks then you could run your GUI and I/O threads in a single process. If you have a series of CPU intensive tasks then to utilize multiple CPUs at once, either use multiple threads with C extensions such as lxml, regex, numpy (or your own one created using Cython) that can release GIL during long computations or offload them into separate processes (a simple way is to use a process pool such as provided by concurrent.futures).
Q: The community discussion raised a new question. There are apparently two approaches when spawning a new process (within a new Python interpreter instance):
# Approach 1(a)
p = subprocess.Popen(['python', mySubprocessPath], shell = True)
# Approach 1(b) (J.F. Sebastian)
p = subprocess.Popen([sys.executable, mySubprocessPath])
# Approach 2
p = multiprocessing.Process(target=foo, args=(q,))
"Approach 1(a)" is wrong on POSIX (though it may work on Windows). For portability, use "Approach 1(b)" unless you know you need cmd.exe (pass a string in this case, to make sure that the correct command-line escaping is used).
The second approach has the obvious downside that it targets just a function - whereas I need to open up a new Python script. Anyway, are both approaches similar in what they achieve?
subprocess creates new processes, any processes e.g., you could run a bash script. multprocessing is used to run Python code in another process. It is more flexible to import a Python module and run its function than to run it as a script. See Call python script with input with in a python script using subprocess.

Since you are using the threading module which is build up on thread. As the documentation suggests, it uses the ''POSIX thread implementation'' pthread of your OS.
The threads are managed by the OS instead of Python interpreter. So the answer will depend on the pthread library in your system. However, CPython uses GIL to prevent multiple threads from executing Python bytecodes simutanously. So they will be sequentialized. But still they can be separated to different cores, which depends on your pthread libs.
Simplly use a debugger and attach it to your python.exe. For example the GDB thread command.
Similar to question 1, the new process is managed by your OS and probably running on a different core. Use debugger or any process monitor to see it. For more details, go to the CreatProcess() documentation page.

1, 2: You have three real threads, but in CPython they're limited by GIL , so, assuming they're running pure python, code you'll see CPU usage as if only one core used.
3: As said gdlmx it's up to OS to choose a core to run a thread on,
but if you really need control, you can set process or thread affinity using
native API via ctypes. Since you are on Windows, it would be like this:
# This will run your subprocess on core#0 only
p = subprocess.Popen(['python', mySubprocessPath], shell = True)
cpu_mask = 1
ctypes.windll.kernel32.SetProcessAffinityMask(p._handle, cpu_mask)
I use here private Popen._handle for simplicty. The clean way would beOpenProcess(p.tid) etc.
And yes, subprocess runs python like everything else in another new process.

Related

Does using subprocess.Popen get around the GIL? [duplicate]

When calling a linux binary which takes a relatively long time through Python's subprocess module, does this release the GIL?
I want to parallelise some code which calls a binary program from the command line. Is it better to use threads (through threading and a multiprocessing.pool.ThreadPool) or multiprocessing? My assumption is that if subprocess releases the GIL then choosing the threading option is better.

When calling a linux binary which takes a relatively long time through Python's subprocess module, does this release the GIL?
Yes, it releases the Global Interpreter Lock (GIL) in the calling process.
As you are likely aware, on POSIX platforms subprocess offers convenience interfaces atop the "raw" components from fork, execve, and waitpid.
By inspection of the CPython 2.7.9 sources, fork and execve do not release the GIL. However, those calls do not block, so we'd not expect the GIL to be released.
waitpid of course does block, but we see it's implementation does give up the GIL using the ALLOW_THREADS macros:
static PyObject *
posix_waitpid(PyObject *self, PyObject *args)
{
....
Py_BEGIN_ALLOW_THREADS
pid = waitpid(pid, &status, options);
Py_END_ALLOW_THREADS
....
This could also be tested by calling out to some long running program like sleep from a demonstration multithreaded python script.

GIL doesn't span multiple processes. subprocess.Popen starts a new process. If it starts a Python process then it will have its own GIL.
You don't need multiple threads (or processes created by multiprocessing) if all you want is to run some linux binaries in parallel:
from subprocess import Popen
# start all processes
processes = [Popen(['program', str(i)]) for i in range(10)]
# now all processes run in parallel
# wait for processes to complete
for p in processes:
p.wait()
You could use multiprocessing.ThreadPool to limit number of concurrently run programs.

Since subprocess is for running executable (it is essentially a wrapper around os.fork() and os.execve()), it probably makes more sense to use it. You can use subprocess.Popen. Something like:
import subprocess
process = subprocess.Popen(["binary"])
This will run in as a separate process, hence not being affected by the GIL. You can then use the Popen.poll() method to check if child process has terminated:
if process.poll():
# process has finished its work
returncode = process.returncode
Just need to make sure you don't call any of the methods that wait for the process to finish its work (e.g. Popen.communicate()) to avoid your Python script blocking.
As mentioned in this answer
multiprocessing is for running functions within your existing
(Python) code with support for more flexible communications among the
family of processes. multiprocessing module is intended to provide
interfaces and features which are very similar to threading while
allowing CPython to scale your processing among multiple CPUs/cores
despite the GIL.
So, given your use-case, subprocess seems to be the right choice.

Multiprocessing slower than serial processing in Windows (but not in Linux)

I'm trying to parallelize a for loop to speed-up my code, since the loop processing operations are all independent. Following online tutorials, it seems the standard multiprocessing library in Python is a good start, and I've got this working for basic examples.
However, for my actual use case, I find that parallel processing (using a dual core machine) is actually a little (<5%) slower, when run on Windows. Running the same code on Linux, however, results in a parallel processing speed-up of ~25%, compared to serial execution.
From the docs, I believe this may relate to Window's lack of fork() function, which means the process needs to be initialised fresh each time. However, I don't fully understand this and wonder if anyone can confirm this please?
Particularly,
--> Does this mean that all code in the calling python file gets run for each parallel process on Windows, even initialising classes and importing packages?
--> If so, can this be avoided by somehow passing a copy (e.g. using deepcopy) of the class into the new processes?
--> Are there any tips / other strategies for efficient parallelisation of code design for both unix and windows.
My exact code is long and uses many files, so I have created a pseucode-style example structure which hopefully shows the issue.
# Imports
from my_package import MyClass
imports many other packages / functions
# Initialization (instantiate class and call slow functions that get it ready for processing)
my_class = Class()
my_class.set_up(input1=1, input2=2)
# Define main processing function to be used in loop
def calculation(_input_data):
# Perform some functions on _input_data
......
# Call method of instantiate class to act on data
return my_class.class_func(_input_data)
input_data = np.linspace(0, 1, 50)
output_data = np.zeros_like(input_data)
# For Loop (SERIAL implementation)
for i, x in enumerate(input_data):
output_data[i] = calculation(x)
# PARALLEL implementation (this doesn't work well!)
with multiprocessing.Pool(processes=4) as pool:
results = pool.map_async(calculation, input_data)
results.wait()
output_data = results.get()
EDIT: I do not believe the question is a duplicate of the one suggested, since this relates to a difference in Windows and Linunx, which is not mentioned at all in the suggested duplicate question.

NT Operating Systems lack the UNIX fork primitive. When a new process is created, it starts as a blank process. It's responsibility of the parent to instruct the new process on how to bootstrap.
Python multiprocessing APIs abstracts the process creation trying to give the same feeling for the fork, forkserver and spawn start methods.
When you use the spawn starting method, this is what happens under the hood.
A blank process is created
The blank process starts a brand new Python interpreter
The Python interpreter is given the MFA (Module Function Arguments) you specified via the Process class initializer
The Python interpreter loads the given module resolving all the imports
The target function is looked up within the module and called with the given args and kwargs
The above flow brings few implications.
As you noticed yourself, it is a much more taxing operation compared to fork. That's why you notice such a difference in performance.
As the module gets imported from scratch in the child process, all import side effects are executed anew. This means that constants, global variables, decorators and first level instructions will be executed again.
On the other side, initializations made during the parent process execution will not be propagated to the child. See this example.
This is why in the multiprocessing documentation they added a specific paragraph for Windows in the Programming Guidelines. I highly recommend to read the Programming Guidelines as they already include all the required information to write portable multi-processing code.

Does python os.fork uses the same python interpreter?

I understand that threads in Python use the same instance of Python interpreter. My question is it the same with process created by os.fork? Or does each process created by os.fork has its own interpreter?

Whenever you fork, the entire Python process is duplicated in memory (including the Python interpreter, your code and any libraries, current stack etc.) to create a second process - one reason why forking a process is much more expensive than creating a thread.
This creates a new copy of the python interpreter.
One advantage of having two python interpreters running is that you now have two GIL's (Global Interpreter Locks), and therefore can have true multi-processing on a multi-core system.
Threads in one process share the same GIL, meaning only one runs at a given moment, giving only the illusion of parallelism.

While fork does indeed create a copy of the current Python interpreter rather than running with the same one, it usually isn't what you want, at least not on its own. Among other problems:
There can be problems forking multi-threaded processes on some platforms. And some libraries (most famously Apple's Cocoa/CoreFoundation) may start threads for you in the background, or use thread-local APIs even though you've only got one thread, etc., without your knowledge.
Some libraries assume that every process will be initialized properly, but if you fork after initialization that isn't true. Most infamously, if you let ssl seed its PRNG in the main process, then fork, you now have potentially predictable random numbers, which is a big hole in your security.
Open file descriptors are inherited (as dups) by the children, with details that vary in annoying ways between platforms.
POSIX only requires platforms to implement a very specific set of syscalls between a fork and an exec. If you never call exec, you can only use those syscalls. Which basically means you can't do anything portably.
Anything to do with signals is especially annoying and nonportable after fork.
See POSIX fork or your platform's manpage for details on these issues.
The right answer is almost always to use multiprocessing, or concurrent.futures (which wraps up multiprocessing), or a similar third-party library.
With 3.4+, you can even specify a start method. The fork method basically just calls fork. The forkserver method runs a single "clean" process (no threads, signal handlers, SSL initialization, etc.) and forks off new children from that. The spawn method calls fork then exec, or an equivalent like posix_spawn, to get you a brand-new interpreter instead of a copy. So you can start off with fork, ut then if there are any problems, switch to forkserver or spawn and nothing else in your code has to change. Which is pretty nice.

os.fork() is equivalent to the fork() syscall in many UNIC(es). So yes your sub-process(es) will be separate from the parent and have a different interpreter (as such).
man fork:
FORK(2)
NAME
fork - create a child process
SYNOPSIS
#include
pid_t fork(void);
DESCRIPTION
fork() creates a new process by duplicating the calling process. The new process, referred to as the child,
is an exact duplicate of the calling process, referred to as the parent, except for the following points:
pydoc os.fork():
os.fork() Fork a child process. Return 0 in the child and the
child’s process id in the parent. If an error occurs OSError is
raised.
Note that some platforms including FreeBSD <= 6.3, Cygwin and OS/2 EMX
have known issues when using fork() from a thread.
See also: Martin Konecny's response as to the why's and advantages of "forking" :)
For brevity; other approaches to concurrency which don't involve a separate process and therefore a separate Python interpreter include:
Green or Lightweight threads; ala greenlet
Coroutines ala Python generators and the new Python 3+ yield from
Async I/O ala asyncio, Twisted, circuits, etc.

Multiprocessing in python on Mac OS

Any basic information would be greatly appreciated.
I am almost completed with my project, all I have to do now is run my code to get my data. However, it takes a very long time, and it has been suggested that I make my code (python) available to multiprocess. However, I am clueless on how to do this and have had a lot of trouble running how. I use a Mac OS X 10.8.2. I know that I need a semaphore.
I have looked up the multiprocessing module and the Thread module, although I could not understand most of this. Do the Process() or Manager() functions have anything to do with this?
Lastly, I have 16 processors available for this.

You need to use the multiprocessing module.
Both modules enable concurrency, but only multiprocessing enables true parallelism. Due to Python's Global Interpreter Lock, multiple threads cannot execute simultaneously.
Keeping all 16 of your processors busy comes at the cost of a certain increased difficulty in programming since separate processes do not execute in a shared memory space, so if a spawned process needs to share data with its parent process you will need to serialize it.

python Global Interpreter Lock GIL problem

I want to provide a service on the web that people can test out the performance of an algo, which is written in python and running on the linux machine
basically what I want to do is that, there is a very trivial PHP handler, let's say start_algo.php, which accepts the request coming from browser, and in the php code through system() or popen() (something like exec( "python algo.py" ) ) to issue a new process running the python script, I think it is doable in this part
problem is that since it is a web service, surely it has to serve multiple users at the same time, but I am quite confused by the Global Interpreter Lock GIL http://wiki.python.org/moin/GlobalInterpreterLock
that the 'standard' CPython has implemented,
does it mean, if I have 3 users running the algo now (which means 3 separated processes, correct me if I am wrong plz), at a particular moment, there is only one user is being served by the Python interpreter and the other 2 are waiting for their turns?
Many thanks in advance
Ted

If you are opening each script by invoking a new process; you will not run afoul of the GIL. Each process gets its own interpreter and therefore its own interpreter lock.

The GIL is per-process. If you start multiple python processes, each will have its own GIL that prevents the interpreter(s) in this specific process from running more than one thread at a time. But independent processes can run at the same time.
Also, multiple threads inside one Python process do take turns on running (rather frequently, IIRC once per hundred opcode instructions or a few dozen milliseconds depending on the version), so it's not like the GIL prevents concurrency at all - it just prevents multi-threading.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.