I have some functions that interactively load python modules using __import__
I recently stumbled upon some article about an "import lock" in Python, that is, a lock specifically for imports (not just the GIL). But the article was old so maybe that's not true anymore.
This makes me wonder about the practice of importing in a thread.
Are import/__import__ thread safe?
Can they create dead locks?
Can they cause performance issues in a threaded application?
EDIT 12 Sept 2012
Thanks for the great reply Soravux.
So import are thread safe, and I'm not worrying about deadlocks, since the functions that use __import__ in my code don't call each others.
Do you know if the lock is acquired even if the module has already been imported ?
If that is the case, I should probably look in sys.modules to check if the module has already been imported before making a call to __import__.
Sure this shouldn't make a lot of difference in CPython since there is the GIL anyway.
However it could make a lot of difference on other implementations like Jython or stackless python.
EDIT 19 Sept 2012
About Jython, here's what they say in the doc:
http://www.jython.org/jythonbook/en/1.0/Concurrency.html#module-import-lock
Python does, however, define a module import lock, which is
implemented by Jython. This lock is acquired whenever an import of any
name is made. This is true whether the import goes through the import
statement, the equivalent __import__ builtin, or related code. It’s
important to note that even if the corresponding module has already
been imported, the module import lock will still be acquired, if only
briefly.
So, it seems that it would make sense to check in sys.modules before making an import, to avoid acquiring the lock. What do you think?
Update: Since Python 3.3, import locks are per-module instead of global, and imp is deprecated in favor of importlib. More information on the changelog and this issue ticket.
The original answer below predates Python 3.3
Normal imports are thread-safe because they acquire an import lock prior to execution and release it once the import is done. If you add your own custom imports using the hooks available, be sure to add this locking scheme to it. Locking facilities in Python may be accessed by the imp module (imp.lock_held()/acquire_lock()/release_lock()). Edit: This is deprecated since Python 3.3, no need to manually handle the lock.
Using this import lock won't create any deadlocks or dependency errors aside from the circular dependencies that are already known (module a imports module b which imports module a). Edit: Python 3.3 changed for a per-module locking mechanism to prevent those deadlocks caused by circular imports.
There exist multiple ways to create new processes or threads, for example fork and clone (assuming a Linux environment). Each way yields different memory behaviors when creating the new process. By default, a fork copies most memory segments (Data (often COW), Stack, Code, Heap), effectively not sharing its content between the child and its parent. The result of a clone (often called a thread, this is what Python uses for threading) shares all memory segments with its parent except the stack. The import mechanism in Python uses the global namespace which is not placed on the stack, thus using a shared segment between its threads. This means that all memory modifications (except for the stack) performed by an import in a thread will be visible to all its other related threads and parent. If the imported module is Python-only, it is thread-safe by design. If an imported module uses non-Python libraries, make sure those are thread-safe, otherwise it will cause mayhem in your multithreaded Python code.
By the way, threaded programs in Python suffers the GIL which won't allow much performance gains unless your program is I/O bound or rely on C or external thread-safe libraries (since they should release the GIL before executing). Running in two threads the same imported Python function won't execute concurrently because of this GIL. Note that this is only a limitation of CPython and other implementations of Python may have a different behavior.
To answer your edit: imported modules are all cached by Python. If the module is already loaded in the cache, it won't be run again and the import statement (or function) will return right away. You don't have to implement yourself the cache lookup in sys.modules, Python does that for you and won't imp lock anything, aside from the GIL for the sys.modules lookup.
To answer your second edit: I prefer having to maintain a simpler code than trying to optimize calls to the libraries I use (in this case, the standard library). The rationale is that the time required to perform something is usually way more important than the time required to import the module that does it. Furthermore, the time required to maintain this kind of code throughout the project is way higher than the time it will take to execute. It all boils down to: "programmer time is more valuable than CPU time".
I could not find an answer in official documentation on this but it appears that in some versions of CPython 3.x, __import__ calls are not thread-safe, and can cause a deadlock. See: https://bugs.python.org/issue38884.
Related
I am starting to read up over possible ways to parallelise Python code.
DISCLAIMER. This is NOT a question about Multiprocessing vs Multithreading.
At this link https://ipyparallel.readthedocs.io/en/latest/demos.html one finds references to several
concurrency packages for Python to avoid the GIL: https://scipy.github.io/old-wiki/pages/ParallelProgramming
-IPython1
-mpi4py
-parallel python
-Numba
There is also a multiprocessing package:
https://docs.python.org/3/library/multiprocessing.html
And another one called processing:
https://pypi.org/project/processing/
First of all, it is not at all clear to me the difference between the latter two above; what is the difference in using between the multiprocessing module and the processing module?.
In general, I fail to understand the differences between those all -- which must be there, given some developers made the effort to create a mpi4py version for the MPI used in C++. I guess this is not just about the dualism between "threading" and "multiprocessing" approaches, where in one case the memory is shared while the other has each process with its own memory and interpreter, something more must be different between all of those different packages out there.
Thanks to all of those who will dedicate time to answer this!
The difference is that the last version of processing was released in April of 2008 and multiprocessing was added in Python 2.6 in October 2008.
processing was a library that was used before multiprocessing was distributed with Python.
As far as the specific difference between other modules designed for multiprocessing: The scipy page you linked says that "This is a subject for graduate courses in computer science, and I'm not going to address it here....there are some python tools you can use to implement the things you learn in that graduate course." While they admit that may be a bit of an exaggeration, independent study of multiprocessing in general will be required to discern the difference between these libraries, you should probably just stick to the built in multiprocessing module for your initial experiments while you learn how it works. One you're more comfortable with multiprocessing, you might want to check out the pathos framework.
But here are the basics for the packages you mention:
Numba adds decorators that automatically compile functions to make them run faster, it isn't really a multiprocessing tool as much as a JIT compiling tool.
Parallel Python overcomes the GIL to utilize multiple cores or multiple computers, it's designed to be easy to use and to handle all the complex stuff behind the scenes.
MPI for Python is like Paralell Python with less emphasis on simplicity.
IPython is a toolkit with many features, including a shell and Jupyter kernel, it's also not really a multiprocessing tool.
Keep in mind that plenty of libraries/modules do the same thing, there doesn't need to be a reason more than one exists. Use whatever works for you.
CPython is a multi-threaded application, and as such on Unix it uses (p)threads. Python extensions (written in C, say) often need to hold GIL to make sure Python objects don't get corrupted in critical sections of the code. How about other types of data? Specifically, does holding GIL in a Python extension guarantee that all other threads of CPython stop?
The reason for asking is that I am trying to port to FreeBSD a Python extension (which works on Linux and OSX) that embeds a Lisp compiler/system ECL using Boehm GC, and which crashes during the initialisation of the embedded Boehm GC. Backtraces suggest that another thread kicks in and causes havoc (pthread implementation on Linux are sufficiently different from FreeBSD's to expect trouble along these lines, too). Is there another mutex in CPython that may be used to achieve the locking?
Specifically, does holding GIL in a Python extension guarantee that all other threads of CPython stop?
The short answer is no - if other threads are executing code without holding the GIL (e.g. if they're running a C extension that releases the GIL), then they will keep running until try try to re-acquire the GIL (usually when they try to return (data) to the world of python).
It's also possible that core parts of the CPython (the core interpreter and/or built in fucntions/packages could release the GIL in similar circumstances/reasons that you would do in an extension. I have no idea if they actually do though.
I have been compiling diagrams (pun intended) in hope of understanding the different implementations of common programming languages. I understand whether code is compiled or interpreted depends on the implementation of the code, and is not an aspect of the programming language itself.
I am interested in comparing Python interpretation with direct compilation (ex of C++)
and the virtual machine model (ex Java or C#)
In light of these two diagrams above, could you please help me develop a similar flowchart of how the .py file is converted to .pyc, uses the standard libraries (I gather they are called modules) and then actually run. Many programmers on SO indicate that python as a scripting language is not executed by the CPU but rather the interpreter, but that sounds quite impossible because ultimately hardware must be doing the computation.
First off, this is an implementation detail. I am limiting my answer to CPython and PyPy because I am familiar with them. Answers for Jython, IronPython, and other implementations will differ - probably radically.
Python is closer to the "virtual machine model". Python code is, contrary to the statements of some too-loud-for-their-level-of-knowledge people and despite everyone (including me) conflating it in casual discussion, never interpreted. It is always compiled to bytecode (again, on CPython and PyPy) when it is loaded. If it was loaded because a module was imported and was loaded from a .py file, a .pyc file may be created to cache the compilation output. This step is not mandatory; you can turn it off via various means, and program execution is not affected the tiniest bit (except that the next process to load the module has to do it again). However, the compilation to bytecode is not avoidable, the bytecode is generated in memory if it is not loaded from disk.
This bytecode (the exact details of which are an implementation detail and differ between versions) is then executed, at module level, which entails building function objects, class objects, and the like. These objects simply reuse (hold a pointer to) the bytecode which is already in memory. This is unlike C++ and Java, where code and classes are set in stone during/after compilation. During execution, import statements may be encountered. I lack the space, time and understanding to describe the import machinery, but the short story is:
If it was already imported once, you get that module object (another runtime construct for a thing static languages only have at compile time). A couple of builtin modules (well, all of them in PyPy, for reasons beyond the scope of this question) are already imported before any Python code runs, simply because they are so tightly integrated with the core of the interpreter and so fundamental. sys is such a module. Some Python code may also run beforehand, especially when you start the interactive interpreter (look up site.py).
Otherwise, the module is located. The rules for this are not our concern. In the end, these rules arrive at either a Python file or a dynamically-linked piece of machine code (.DLL on Windows, though Python modules specifically use the extension .pyd but that's just a name; on unix the equivalent .so is used).
The module is first loaded into memory (loaded dynamically, or parsed and compiled to bytecode).
Then, the module is initialized. Extension modules have a special function for that which is called. Python modules are simply run, from top to bottom. In well-behaved modules this just sets up global data, defines functions and classes, and imports dependencies. Of course, anything else can also happen. The resulting module object is cached (remember step one) and returned.
All of this applies to standard library modules as well as third party modules. That's also why you can get a confusing error message if you call a script of yours just like a standard library module which you import in that script (it imports itself, albeit without crashing due to caching - one of many things I glossed over).
How the bytecode is executed (the last part of your question) differs. CPython simply interprets it, but as you correctly note, that doesn't mean it magically doesn't use the CPU. Instead, there is a large ugly loop which detects what bytecode instruction shall be executed next, and then jumps to some native code which carries out the semantics of that instruction. PyPy is more interesting; it starts off interpreting but records some stats along the way. When it decides it's worth doing so, it starts recording what the interpreter does in detail, and generates some highly optimized native code. The interpreter is still used for other parts of the Python code. Note that it's the same with many JVMs and possibly .NET, but the diagram you cite glosses over that.
For the reference implementation of python:
(.py) -> python (checks for .pyc) -> (.pyc) -> python (execution dynamically loads modules)
There are other implementations. Most notable are:
jython which compiles (.py) to (.class) and follows the java pattern from there
pypy which employs a JIT as it compiles (.py). the chain from there could vary (pypy could be run in cpython, jython or .net environments)
Python is technically a scripted language but it is also compiled, python source is taken from its source file and fed into the interpreter which often compiles the source to bytecode either internally and then throws it away or externally and saves it like a .pyc
Yes python is a single virtual machine that then sits ontop of the actual hardware but all python bytecode is, is a series of instructions for the pvm (python virtual machine) much like assembler for the actual CPU.
I have a script in python which uses a resource which can not be used by more than a certain amount of concurrent scripts running.
Classically, this would be solved by a named semaphores but I can not find those in the documentation of the multiprocessing module or threading .
Am I missing something or are named semaphores not implemented / exposed by Python? and more importantly, if the answer is no, what is the best way to emulate one?
Thanks,
Boaz
PS. For reasons which are not so relevant to this question, I can not aggregate the task to a continuously running process/daemon or work with spawned processes - both of which, it seems, would have worked with the python API.
I suggest a third party extension like these, ideally the posix_ipc one -- see in particular the sempahore section in the docs.
These modules are mostly about exposing the "system V IPC" (including semaphores) in a unixy way, but at least one of them (posix_ipc specifically) is claimed to work with Cygwin on Windows (I haven't verified that claim). There are some documented limitations on FreeBSD 7.2 and Mac OSX 10.5, so take care if those platforms are important to you.
You can emulate them by using the filesystem instead of a kernel path (named semaphores are implemented this way on some platforms anyhow). You'll have to implement sem_[open|wait|post|unlink] yourself, but it ought to be relatively trivial to do so. Your synchronization overhead might be significant (depending on how often you have to fiddle with the semaphore in your app), so you might want to initialize a ramdisk when you launch your process in which to store named semaphores.
Alternatively if you're not comfortable rolling your own, you could probably wrap boost::interprocess::named_semaphore (docs here) in a simple extension module.
i'm working with pydev + jython.great ide , but quite slow when i try to run a jython program.
this is probably something due to libraries load time.
What can i do to speed it up ?
Thanks ,
yaniv
Jython startup time is slow ... there's a lot to bootup!
Everytime you run a Jython script from scratch, it will incur the same Jython startup time cost.
Hence, the reason Jython, Java, and Python are not great for CGI invocations. Hence, the reason for mod_python in Apache.
The key is to start-up Jython once and reuse it. But this is not always possible especially during development because your modules are always changing and Jython does not recognize these changes automatically.
Jython needs a way to know which modules have changed for automatic reloads. This is not built into Jython and you'll have to rely on some other third party library to help with this. The concept is to remove from 'sys.modules' the modules which have changed. A simple solution is to just clear all the modules from sys.modules - which will cause all modules to be reloaded. This is obviously, not the most efficient solution.
Another tip is to only import modules that your module needs at the time that it 'really' needs them. If you import every module at the top of your modules, that will increase your module import cost. So, refactor imports to within methods/functions where they are needed and where it 'makes sense'. Of course, if your method/function is computation heavy and is used frequently, it does not make sence to import modules within that method/function.
Hopefully, that helps you out!
If you have a machine with more than one processor you could try starting eclipse/pydev with the options -vmargs -XX:+UseParallelGC You could also try different JVMs to see if any of them give better performance.