CPython is a multi-threaded application, and as such on Unix it uses (p)threads. Python extensions (written in C, say) often need to hold GIL to make sure Python objects don't get corrupted in critical sections of the code. How about other types of data? Specifically, does holding GIL in a Python extension guarantee that all other threads of CPython stop?
The reason for asking is that I am trying to port to FreeBSD a Python extension (which works on Linux and OSX) that embeds a Lisp compiler/system ECL using Boehm GC, and which crashes during the initialisation of the embedded Boehm GC. Backtraces suggest that another thread kicks in and causes havoc (pthread implementation on Linux are sufficiently different from FreeBSD's to expect trouble along these lines, too). Is there another mutex in CPython that may be used to achieve the locking?
Specifically, does holding GIL in a Python extension guarantee that all other threads of CPython stop?
The short answer is no - if other threads are executing code without holding the GIL (e.g. if they're running a C extension that releases the GIL), then they will keep running until try try to re-acquire the GIL (usually when they try to return (data) to the world of python).
It's also possible that core parts of the CPython (the core interpreter and/or built in fucntions/packages could release the GIL in similar circumstances/reasons that you would do in an extension. I have no idea if they actually do though.
Related
I have some python code which is heavily dependent on greenlets. I can use either gevent or eventlet.
I have packaged some sections of the code in a C-extension but these calls do not yield to other greenlets. Is it possible to write my extension such that it will yield control to other python threads while it does not require the GIL?
You can use normal PyObject_CallFunction(eventlet/greenlet.sleep) to yield control to other green threads. It must be run with GIL locked, like any other Python code.
You can not run Python code without GIL. (you can but it will quickly go sideways and corrupt memory).
AFAIK, Python, using import thread, (or C#) doesn't do "real" multithreading, meaning all threads run on 1 CPU core.
But in C, using pthreads in linux, You get real multithreading.
Is this true ?
Assuming it is true, is there any difference between them when you have only 1 CPU core (I have it in a VM)?
Python uses something called a Global Interpreter Lock which means multiple python threads can only run within one native Thread.
There is more documentation in the official Docs here: https://wiki.python.org/moin/GlobalInterpreterLock
There shouldn't be a real performance difference on single core systems. On multicore systems the difference will varie based on what you do. (I/O is for the most part not affected by the GIL).
I'm not aware of how it works C# internally, but for CPython (the "official" python interpreter) it is true: threads are not really parallel due to GIL.
Other implementation of the Python interpreter do not suffer of this problem (like C's pthreads library).
Howevere if you only have 1 CPU you won't notice any difference.
As a side note: if you need real parallelism in CPython you could you multiprocessing module, which uses processes instead of threads.
EDIT:
Also thread module is a bit deprecated, you should consider using threading.
I have some functions that interactively load python modules using __import__
I recently stumbled upon some article about an "import lock" in Python, that is, a lock specifically for imports (not just the GIL). But the article was old so maybe that's not true anymore.
This makes me wonder about the practice of importing in a thread.
Are import/__import__ thread safe?
Can they create dead locks?
Can they cause performance issues in a threaded application?
EDIT 12 Sept 2012
Thanks for the great reply Soravux.
So import are thread safe, and I'm not worrying about deadlocks, since the functions that use __import__ in my code don't call each others.
Do you know if the lock is acquired even if the module has already been imported ?
If that is the case, I should probably look in sys.modules to check if the module has already been imported before making a call to __import__.
Sure this shouldn't make a lot of difference in CPython since there is the GIL anyway.
However it could make a lot of difference on other implementations like Jython or stackless python.
EDIT 19 Sept 2012
About Jython, here's what they say in the doc:
http://www.jython.org/jythonbook/en/1.0/Concurrency.html#module-import-lock
Python does, however, define a module import lock, which is
implemented by Jython. This lock is acquired whenever an import of any
name is made. This is true whether the import goes through the import
statement, the equivalent __import__ builtin, or related code. It’s
important to note that even if the corresponding module has already
been imported, the module import lock will still be acquired, if only
briefly.
So, it seems that it would make sense to check in sys.modules before making an import, to avoid acquiring the lock. What do you think?
Update: Since Python 3.3, import locks are per-module instead of global, and imp is deprecated in favor of importlib. More information on the changelog and this issue ticket.
The original answer below predates Python 3.3
Normal imports are thread-safe because they acquire an import lock prior to execution and release it once the import is done. If you add your own custom imports using the hooks available, be sure to add this locking scheme to it. Locking facilities in Python may be accessed by the imp module (imp.lock_held()/acquire_lock()/release_lock()). Edit: This is deprecated since Python 3.3, no need to manually handle the lock.
Using this import lock won't create any deadlocks or dependency errors aside from the circular dependencies that are already known (module a imports module b which imports module a). Edit: Python 3.3 changed for a per-module locking mechanism to prevent those deadlocks caused by circular imports.
There exist multiple ways to create new processes or threads, for example fork and clone (assuming a Linux environment). Each way yields different memory behaviors when creating the new process. By default, a fork copies most memory segments (Data (often COW), Stack, Code, Heap), effectively not sharing its content between the child and its parent. The result of a clone (often called a thread, this is what Python uses for threading) shares all memory segments with its parent except the stack. The import mechanism in Python uses the global namespace which is not placed on the stack, thus using a shared segment between its threads. This means that all memory modifications (except for the stack) performed by an import in a thread will be visible to all its other related threads and parent. If the imported module is Python-only, it is thread-safe by design. If an imported module uses non-Python libraries, make sure those are thread-safe, otherwise it will cause mayhem in your multithreaded Python code.
By the way, threaded programs in Python suffers the GIL which won't allow much performance gains unless your program is I/O bound or rely on C or external thread-safe libraries (since they should release the GIL before executing). Running in two threads the same imported Python function won't execute concurrently because of this GIL. Note that this is only a limitation of CPython and other implementations of Python may have a different behavior.
To answer your edit: imported modules are all cached by Python. If the module is already loaded in the cache, it won't be run again and the import statement (or function) will return right away. You don't have to implement yourself the cache lookup in sys.modules, Python does that for you and won't imp lock anything, aside from the GIL for the sys.modules lookup.
To answer your second edit: I prefer having to maintain a simpler code than trying to optimize calls to the libraries I use (in this case, the standard library). The rationale is that the time required to perform something is usually way more important than the time required to import the module that does it. Furthermore, the time required to maintain this kind of code throughout the project is way higher than the time it will take to execute. It all boils down to: "programmer time is more valuable than CPU time".
I could not find an answer in official documentation on this but it appears that in some versions of CPython 3.x, __import__ calls are not thread-safe, and can cause a deadlock. See: https://bugs.python.org/issue38884.
I have been compiling diagrams (pun intended) in hope of understanding the different implementations of common programming languages. I understand whether code is compiled or interpreted depends on the implementation of the code, and is not an aspect of the programming language itself.
I am interested in comparing Python interpretation with direct compilation (ex of C++)
and the virtual machine model (ex Java or C#)
In light of these two diagrams above, could you please help me develop a similar flowchart of how the .py file is converted to .pyc, uses the standard libraries (I gather they are called modules) and then actually run. Many programmers on SO indicate that python as a scripting language is not executed by the CPU but rather the interpreter, but that sounds quite impossible because ultimately hardware must be doing the computation.
First off, this is an implementation detail. I am limiting my answer to CPython and PyPy because I am familiar with them. Answers for Jython, IronPython, and other implementations will differ - probably radically.
Python is closer to the "virtual machine model". Python code is, contrary to the statements of some too-loud-for-their-level-of-knowledge people and despite everyone (including me) conflating it in casual discussion, never interpreted. It is always compiled to bytecode (again, on CPython and PyPy) when it is loaded. If it was loaded because a module was imported and was loaded from a .py file, a .pyc file may be created to cache the compilation output. This step is not mandatory; you can turn it off via various means, and program execution is not affected the tiniest bit (except that the next process to load the module has to do it again). However, the compilation to bytecode is not avoidable, the bytecode is generated in memory if it is not loaded from disk.
This bytecode (the exact details of which are an implementation detail and differ between versions) is then executed, at module level, which entails building function objects, class objects, and the like. These objects simply reuse (hold a pointer to) the bytecode which is already in memory. This is unlike C++ and Java, where code and classes are set in stone during/after compilation. During execution, import statements may be encountered. I lack the space, time and understanding to describe the import machinery, but the short story is:
If it was already imported once, you get that module object (another runtime construct for a thing static languages only have at compile time). A couple of builtin modules (well, all of them in PyPy, for reasons beyond the scope of this question) are already imported before any Python code runs, simply because they are so tightly integrated with the core of the interpreter and so fundamental. sys is such a module. Some Python code may also run beforehand, especially when you start the interactive interpreter (look up site.py).
Otherwise, the module is located. The rules for this are not our concern. In the end, these rules arrive at either a Python file or a dynamically-linked piece of machine code (.DLL on Windows, though Python modules specifically use the extension .pyd but that's just a name; on unix the equivalent .so is used).
The module is first loaded into memory (loaded dynamically, or parsed and compiled to bytecode).
Then, the module is initialized. Extension modules have a special function for that which is called. Python modules are simply run, from top to bottom. In well-behaved modules this just sets up global data, defines functions and classes, and imports dependencies. Of course, anything else can also happen. The resulting module object is cached (remember step one) and returned.
All of this applies to standard library modules as well as third party modules. That's also why you can get a confusing error message if you call a script of yours just like a standard library module which you import in that script (it imports itself, albeit without crashing due to caching - one of many things I glossed over).
How the bytecode is executed (the last part of your question) differs. CPython simply interprets it, but as you correctly note, that doesn't mean it magically doesn't use the CPU. Instead, there is a large ugly loop which detects what bytecode instruction shall be executed next, and then jumps to some native code which carries out the semantics of that instruction. PyPy is more interesting; it starts off interpreting but records some stats along the way. When it decides it's worth doing so, it starts recording what the interpreter does in detail, and generates some highly optimized native code. The interpreter is still used for other parts of the Python code. Note that it's the same with many JVMs and possibly .NET, but the diagram you cite glosses over that.
For the reference implementation of python:
(.py) -> python (checks for .pyc) -> (.pyc) -> python (execution dynamically loads modules)
There are other implementations. Most notable are:
jython which compiles (.py) to (.class) and follows the java pattern from there
pypy which employs a JIT as it compiles (.py). the chain from there could vary (pypy could be run in cpython, jython or .net environments)
Python is technically a scripted language but it is also compiled, python source is taken from its source file and fed into the interpreter which often compiles the source to bytecode either internally and then throws it away or externally and saves it like a .pyc
Yes python is a single virtual machine that then sits ontop of the actual hardware but all python bytecode is, is a series of instructions for the pvm (python virtual machine) much like assembler for the actual CPU.
I have a script in python which uses a resource which can not be used by more than a certain amount of concurrent scripts running.
Classically, this would be solved by a named semaphores but I can not find those in the documentation of the multiprocessing module or threading .
Am I missing something or are named semaphores not implemented / exposed by Python? and more importantly, if the answer is no, what is the best way to emulate one?
Thanks,
Boaz
PS. For reasons which are not so relevant to this question, I can not aggregate the task to a continuously running process/daemon or work with spawned processes - both of which, it seems, would have worked with the python API.
I suggest a third party extension like these, ideally the posix_ipc one -- see in particular the sempahore section in the docs.
These modules are mostly about exposing the "system V IPC" (including semaphores) in a unixy way, but at least one of them (posix_ipc specifically) is claimed to work with Cygwin on Windows (I haven't verified that claim). There are some documented limitations on FreeBSD 7.2 and Mac OSX 10.5, so take care if those platforms are important to you.
You can emulate them by using the filesystem instead of a kernel path (named semaphores are implemented this way on some platforms anyhow). You'll have to implement sem_[open|wait|post|unlink] yourself, but it ought to be relatively trivial to do so. Your synchronization overhead might be significant (depending on how often you have to fiddle with the semaphore in your app), so you might want to initialize a ramdisk when you launch your process in which to store named semaphores.
Alternatively if you're not comfortable rolling your own, you could probably wrap boost::interprocess::named_semaphore (docs here) in a simple extension module.