What is runtime in context of Python? What does it consist of? - python

In context to this question What is “runtime”? (https://stackoverflow.com/questions/3900549/what-is-runtime/3900561)
I am trying to understand what would a python runtime be made of. My guess is:
The python process that contains all runtime variables.
The GIL
The underlying interpreter code (CPython etc.).
Now if this is right, can we say that multiprocessing in python creates multiple runtimes and a python process is something we can directly relate to the runtime? (I think this is the right option)
Or, every python thread with its own stack which works on the same GIL and memory space as the parent process can be called as having a separate runtime?
Or, doesn't matter how many threads or processes are running, it will all come under a single runtime?
Simply put, what is the definition of runtime in the context of Python?
PS: I understand the difference between threads and processes. GIL: I understand the impacts but I do not grok it.

You are talking about two different (yet similar) concepts in computer science; multiprocess, and multithreading. Here is some compilation of questions/answers that might be useful:
Multiprocessing -- Wikipedia
Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system.The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them.
Multithreading -- Wikipedia
In computer architecture, multithreading is the ability of a central processing unit (CPU) (or a single core in a multi-core processor) to provide multiple threads of execution concurrently, supported by the operating system. This approach differs from multiprocessing. In a multithreaded application, the threads share the resources of a single or multiple cores, which include the computing units, the CPU caches, and the translation lookaside buffer (TLB).
What is the difference between a process and a thread? -- StackOverflow
Process
Each process provides the resources needed to execute a program. A process has a virtual address space, executable code, open handles to system objects, a security context, a unique process identifier, environment variables, a priority class, minimum and maximum working set sizes, and at least one thread of execution. Each process is started with a single thread, often called the primary thread, but can create additional threads from any of its threads.
Thread
A thread is an entity within a process that can be scheduled for execution. All threads of a process share its virtual address space and system resources. In addition, each thread maintains exception handlers, a scheduling priority, thread local storage, a unique thread identifier, and a set of structures the system will use to save the thread context until it is scheduled. The thread context includes the thread's set of machine registers, the kernel stack, a thread environment block, and a user stack in the address space of the thread's process. Threads can also have their own security context, which can be used for impersonating clients.
Meaning of “Runtime Environment” and of “Software framework”? -- StackOverflow
A runtime environment basically is a virtual machine that runs on top of a machine - provides machine abstraction. It is generally lower level than a library. A framework can contain a runtime environment, but is generally tied to a library.
Runtime System -- Wikipedia
In computer programming, a runtime system, also called runtime environment, primarily implements portions of an execution model. Most languages have some form of runtime system that provides an environment in which programs run. This environment may address a number of issues including the layout of application memory, how the program accesses variables, mechanisms for passing parameters between procedures, interfacing with the operating system, and otherwise. Typically the runtime system will have some responsibility for setting up and managing the stack and heap, and may include features such as garbage collection, threads or other dynamic features built into the language.
global interpreter lock -- Python Docs
The mechanism used by the CPython interpreter to assure that only one thread executes Python bytecode at a time. This simplifies the CPython implementation by making the object model (including critical built-in types such as dict) implicitly safe against concurrent access. Locking the entire interpreter makes it easier for the interpreter to be multi-threaded, at the expense of much of the parallelism afforded by multi-processor machines.
However, some extension modules, either standard or third-party, are designed so as to release the GIL when doing computationally-intensive tasks such as compression or hashing. Also, the GIL is always released when doing I/O.
Past efforts to create a “free-threaded” interpreter (one which locks shared data at a much finer granularity) have not been successful because performance suffered in the common single-processor case. It is believed that overcoming this performance issue would make the implementation much more complicated and therefore costlier to maintain.
What is the Python Global Interpreter Lock (GIL)?
-- Real Python
Useful source for more info on GIL.
Does python os.fork uses the same python interpreter? -- StackOverflow
Whenever you fork, the entire Python process is duplicated in memory (including the Python interpreter, your code and any libraries, current stack etc.) to create a second process - one reason why forking a process is much more expensive than creating a thread.
This creates a new copy of the python interpreter.
One advantage of having two python interpreters running is that you now have two GIL's (Global Interpreter Locks), and therefore can have true multi-processing on a multi-core system.
Threads in one process share the same GIL, meaning only one runs at a given moment, giving only the illusion of parallelism.
Memory Management -- Python Docs
Memory management in Python involves a private heap containing all Python objects and data structures. The management of this private heap is ensured internally by the Python memory manager. The Python memory manager has different components which deal with various dynamic storage management aspects, like sharing, segmentation, preallocation or caching.
When you spawn a thread via the threading library, you are effectively spawning jobs inside a single Python runtime. This runtime ensures the threads have a shared memory and manages the running sequence of these threads via the global interpreter lock:
Understanding the Python GIL -- dabeaz
When you spawn a process via the multiprocessing library, you are spawning a new process that contains a new Python interpreter (a new runtime) that runs the designated code. If you want to share memory you have to use multiprocessing.shared_memory:
multiprocessing.shared_memory -- Python Docs
This module provides a class, SharedMemory, for the allocation and management of shared memory to be accessed by one or more processes on a multicore or symmetric multiprocessor (SMP) machine. To assist with the life-cycle management of shared memory especially across distinct processes, a BaseManager subclass, SharedMemoryManager, is also provided in the multiprocessing.managers module.
Can we say that multiprocessing in python creates multiple runtimes and a python process is something we can directly relate to the runtime?
Yes. Different GIL, different memory space, different runtime.
Every python thread with its own stack which works on the same GIL and memory space as the parent process can be called as having a separate runtime?
Depends what you mean by "stack". Same GIL, shared memory space, same runtime.
Doesn't matter how many threads and processes are running, it will all come under a single runtime?
Depends if multithreading/multiprocess.
Simply put, what is the definition of runtime in the context of Python?
The runtime environment is literally python.exe or /usr/bin/python. It's the Python executable that will interpret your Python code by transforming it into CPU-readable bytecode. When you multithread, you only have one python running. When you multiprocess you have multiple pythons running.
I hope that a core dev can come in and speak more to this in greater detail. For now the above is simply just a compilation of sources for you to start understanding/seeing the bigger picture.

Related

Is anyone using zeromq to coordinate multiple Python interpreters in the same process?

I love Python's global interpreter lock because it makes the underlying C code simple.
But it means that each Python interpreter main loop is restricted to one thread at a time.
This is bad because recently the number of cores per processor chip has been doubling frequently.
One of the supposed advantages to zeromq is that it makes multi-threaded programming "easy" or easier.
Is it possible to launch multiple Python interpreters in the same process and have them communicate only using in-process zeromq with no other shared state? Has anyone tried it? Does it work well? Please comment and/or provide links.
I don't know of any way to create multiple instances of the Python interpreter within a single process, but I do have experience with splitting multiple instances across multiple processes and communicating with zmq.
I've been using multiprocessing to implement an island-model architecture for global optimization, with zmq for managing communication between the islands. Each island is its own process with its own Python interpreter, created and managed by the master archipelago process.
Using multiprocessing allows you to launch as many independent Python interpreters as you wish, but they all reside in their own processes with a separate memory space. I believe the OS scheduler takes care of assigning processes to cores and sharing CPU time. The separate memory space is the hardest part, because it means you have to explicitly communicate. To communicate between processes, the objects/data you wish to send must be serializable, because zmq sends byte-strings.
The nice thing about zmq is that it's a piece of cake to scale across systems distributed over a network, and it's pretty lightweight. You can create just about any communication pattern you wish, using REP/REQ, PUB/SUB, or whatever.
But no, it's not as easy as just spinning up a few threads from the threading module.
Edit: Also, here's a Stack Overflow question similar to yours. Inside are some more relevant links which indicate that it may be possible to run multiple Python interpreters within a single process, but it doesn't look simple. Multiple independent embedded Python Interpreters on multiple operating system threads invoked from C/C++ program

Why thread is slower than subprocess ? when should I use subprocess in place of thread and vise versa

In my application, I have tried python threading and subprocess module to open firefox, and I have noticed that subprocess is faster than threading. what could be the reason behind this?
when to use them in place of each other?
Python (or rather CPython, the c-based implementation that is commonly used) has a Global Intepreter Lock (a.k.a. the GIL).
Some kind of locking is necessary to synchronize memory access when several threads are accessing the same memory, which is what happens inside a process. Memory is not shared by between processes (unless you specifically allocate such memory), so no lock is needed there.
The globalness of the lock prevents several threads from running python code in the same process. When running mulitiple processes, the GIL does not interfere.
So, Python code does not scale on threads, you need processes for that.
Now, had your Python code mostly been calling C-APIs (NumPy/OpenGL/etc), there would be scaling since the GIL is usually released when native code is executing, so it's alright (and actually a good idea) to use Python to manage several threads that mostly execute native code.
(There are other Python interpreter implementations out there that do scale across threads (like Jython, IronPython, etc) but these aren't really mainstream.. yet, and usually a bit slower than CPython in single-thread scenarios.)

Multiple independent embedded Python Interpreters on multiple operating system threads invoked from C/C++ program

Embedding Python interpreter in a C/C++ application is well documented. What is the best approach to run multiple python interpreter on multiple operating system threads (i.e. one interpreter on one operating system thread within the same process) which are invoked from the C/C++ application? Such applications may also have problems related to memory fragmentation and limitations of Py_Finalize().
One such approach can be the following:
Python thread and hence GIL disabled in pyconfig.h to keep it simple (#undef WITH_THREAD)
All mutable global variables of Python Interpreter source code moved to heap-allocated struct referenced via Thread Local Storage (Reference: Python on a Phone).
My questions are:
Is there any better approach?
Are there any tools which can automate conversion of global variables of Python Interpreter source code to heap-allocated struct referenced via TLS (Thread Local Storage)?
Similar topics are discussed here:
Multiple independent Python interpreters in a C/C++ program?
Multiple python interpreters within the same process
Lua Versus Python
It's not exactly an answer to your question, but you could use separate processes instead of threads, then the problems should vanish.
Pros:
No need hacking python (and making sure the result works in all of the intended cases)
Probably less development effort overall
Easy upgrading to new python versions
Clearly defined interfaces between different processes, thus easier to get right and debug
Cons:
Maybe slightly more overweight, depending on your platform (relatively light-weight processes on linux)
If you use shared memory for IPC, your resulting application code shouldn't differ too much from what you'd get with threads.
Given that some people are arguing you should always use processes over threads, I'd at least consider it as an alternative if it fits your constraints in any way.

Will Python use all processors in thread mode?

While developing a Django app deployed on Apache mod_wsgi I found that in case of multithreading (Python threads; mod_wsgi processes=1 threads=8) Python won't use all available processors. With the multiprocessing approach (mod_wsgi processes=8 threads=1) all is fine and I can load my machine at full.
So the question: is this Python behavior normal? I doubt it because using 1 process with few threads is the default mod_wsgi approach.
The system is:
2xIntel Xeon 5XXX series (8 cores (16 with hyperthreading)) on FreeBSD 7.2 AMD64 and Python 2.6.4
Thanks all for answers.
We all found that this behavior is normal because of GIL. Here is a good explanation:
http://jessenoller.com/2009/02/01/python-threads-and-the-global-interpreter-lock/
or stackoverflow GIL discussion: What is a global interpreter lock (GIL)?.
Will Python use all processors in thread mode? No.
Python won't use all available processors; is this Python behavior normal? Yes, it's normal because of the GIL.
For a discussion see http://mail.python.org/pipermail/python-3000/2007-May/007414.html.
You may find that having a couple (or 4) of threads per core/process can still improve performance if there is some blocking, for example waiting for a response from the database would cause that process to block other connections otherwise.
Will python use all processors in thread mode? No.
It this normal? Yes, this is normal. Python makes no effort to locate all your cores.
"1 process with few threads is default mod_wsgi approach". But that's not optimal or even desirable. That's just a default. Don't read anything into it.
If you want to use all your computer's resources, make the OS handle it. Use processes.
The distinction between multi-processing and multi-threading is hard to measure for the most part. Using processes or threads barely matters. It's usually simpler to use processes, since there's trivial OS support for this.
Bottom Line
Use multiple processes, that allows the OS (and Apache) to make as much use as possible of the system.
Threads share a limited set of I/O resources for the Process they're part of, and web page serving is I/O bound. Processes have independent I/O resources and will more easily max out your processor.
There is still hope. The GIL is only an implementation artifact of the C Python implementation that you download from python.org. Jython and IronPython are two other implementations of Python, and they have no GIL, so you may have better threading results with one of them.
Yes. Python is not really multi-threaded. Instead, there is a global lock and each thread gets to execute a few operations in turn. This makes it much more simple to write MT applications in Python since there can't be any problems with stale caches, etc.
So one Python process can only ever occupy a single CPU. To fully utilize a multi-core system, you must run several Python processes.
I don't know if it is still the case, but there is a global lock in the Python interpreter, which prevents the use of all processor resources from a single interpreter, even when using multi threading. IIRC, the global lock has to do with I/O.
It seems you are watching the result of this lock, so, personally, I would use multiple processes with a single thread.

Does running separate python processes avoid the GIL?

I'm curious in how the Global Interpreter Lock in python actually works. If I have a c++ application launch four separate instances of a python script will they run in parallel on separate cores, or does the GIL go even deeper then just the single process that was launched and control all python process's regardless of the process that spawned it?
The GIL only affects threads within a single process. The multiprocessing module is in fact an alternative to threading that lets Python programs use multiple cores &c. Your scenario will easily allow use of multiple cores, too.
As Alex Martelli points out you can indeed avoid the GIL by running multiple processes, I just want to add and point out that the GIL is a limitation of the implementation (CPython) and not of Python in general, it's possible to implement Python without this limitation. Stackless Python comes to mind.

Categories