Python: Multi platform multi processing 2.4 - python

I have to write a python application that has to run with python 2.4 for Unix and 2.7 for Windows.
This application must run parallel tasks that will synchronize and have to share message between them.
What would be the best, most simple, reliable and lightweight solution to do that?
I found a library that uses os.fork() but unfortunately os.fork() is not windows compatible.
The multiprocess package is not python 2.4 compatible.
I think the only solution left is subprocess but I was wondering if there was another solution to solve my problem.

If you use Threads it may be easier. There's a practical guide here.
The guide also has examples on using Queue's with threads.
Using queues you can pass messages between the threads.
Threads are lighter than creating a new process, and should be cross platform, and it looks like they're in python version 2.4. This guide I think was written for 2.4

Related

Python concurrent logging to avoid disk bottleneck in code

I'm not super familiar with asyncio, but I was hoping there would be some easy way to use asyncio.Queue to push log messages to a Queue instead of writing them to the disk, and then have a worker on a thread wait for these Queue events and write them to disk when resources are available. This seems pretty widely necessary, as logging is a huge bottleneck in a lot of code but isn't always needed in real time. Are there any pre-existing packages for this or can anyone with more experience write a short example script to get me started? NOTE: This needs to interface with existing code, so making it all packaged in a class would probably be preferred.
It's handled in the standard library in recent Python versions. See this post for information, and the official documentation. This functionality predates asyncio, and so doesn't use it (and doesn't especially need to).
For Python 2.7, you can use the logutils package which provides equivalent functionality.

How to do parallel computing in python 2.7 using MPI?

I have written a python code to carry out genetic algorithm optimization, but it is too slow. I would like to know how to run the same in parallel mode making use of multiple CPUs ?
For more clarity, another python code will be called by my code for, say 100 times one after the other, I wanted to divide this between 4 CPUs. So that 25 times the outside python code is solved by each CPU. Thereby increasing the speed.
Its highly appreciated if someone can help me with is ?
Thanks in advance!
There are several packages that provide parallel computing for python2. I am the author of a package called pathos, which provides parallel computing with several parallel backends and gives them a common API. pathos provides parallel pipes and maps for multi-process, multi-threaded, parallel over sockets, MPI-parallel, and also interactions with schedulers and over ssh. pathos relies on several packages, which you can pick from if you don't want all the different options.
pathos uses: pyina which in turn uses mpi4py. mpi4py provides bindings to MPI, but you can't run the code from python 'normally'… you need to run with whatever you use to run MPI. pyina enables you to run mpi4py from normal python, and to interact with schedulers. Plus, pyina uses dill, which can serialize most python objects, and thus you are much more able to send what you want across processes.
pathos provides a fork of multiprocessing that also plays well with dill and pyina. Using both can enable you to do hierarchical parallel competing -- like launching MPI parallel that then spawns multiprocess or multithreaded parallel.
pathos also uses ppft, which is a fork of pp (Parallel Python), which provides parallel computing across sockets -- so that means you can connect a parallel map across several machines.
There are alternatives to pathos, such as IPython-parallel. However, the ability to use MPI is very new, and I don't know how capable it is yet. It may or may not leverage IPython-cluster-helper, which has been in development for a little while. Note that IPython doesn't use pp, it uses zmq instead, and IPython also provides connectivity to EC2 if you like cloud stuff.
Here are some relevant links:
pathos, 'pyina, dill, ppft: https://github.com/uqfoundation
Ipython-cluster-helper: https://github.com/roryk/ipython-cluster-helper
IPython: http://ipython.org/ipython-doc/dev/parallel/parallel_mpi.html

posix_ipc python package equivalent for Windows?

Inter process communication primitives (Semaphores, Shared Memory) in python on windows? posix_ipc works great on linux, anything similar for windows?
You can use most (all) of the win32 ipc when you install pywin32
For semaphores on Windows I've created https://pypi.org/project/semaphore-win-ctypes/.
This provides low level access to the Windows Semaphore APIs from python. There's some interesting use cases this enables like letting python processes and non-python processes acquire the same semaphore object.

Will Python use all processors in thread mode?

While developing a Django app deployed on Apache mod_wsgi I found that in case of multithreading (Python threads; mod_wsgi processes=1 threads=8) Python won't use all available processors. With the multiprocessing approach (mod_wsgi processes=8 threads=1) all is fine and I can load my machine at full.
So the question: is this Python behavior normal? I doubt it because using 1 process with few threads is the default mod_wsgi approach.
The system is:
2xIntel Xeon 5XXX series (8 cores (16 with hyperthreading)) on FreeBSD 7.2 AMD64 and Python 2.6.4
Thanks all for answers.
We all found that this behavior is normal because of GIL. Here is a good explanation:
http://jessenoller.com/2009/02/01/python-threads-and-the-global-interpreter-lock/
or stackoverflow GIL discussion: What is a global interpreter lock (GIL)?.
Will Python use all processors in thread mode? No.
Python won't use all available processors; is this Python behavior normal? Yes, it's normal because of the GIL.
For a discussion see http://mail.python.org/pipermail/python-3000/2007-May/007414.html.
You may find that having a couple (or 4) of threads per core/process can still improve performance if there is some blocking, for example waiting for a response from the database would cause that process to block other connections otherwise.
Will python use all processors in thread mode? No.
It this normal? Yes, this is normal. Python makes no effort to locate all your cores.
"1 process with few threads is default mod_wsgi approach". But that's not optimal or even desirable. That's just a default. Don't read anything into it.
If you want to use all your computer's resources, make the OS handle it. Use processes.
The distinction between multi-processing and multi-threading is hard to measure for the most part. Using processes or threads barely matters. It's usually simpler to use processes, since there's trivial OS support for this.
Bottom Line
Use multiple processes, that allows the OS (and Apache) to make as much use as possible of the system.
Threads share a limited set of I/O resources for the Process they're part of, and web page serving is I/O bound. Processes have independent I/O resources and will more easily max out your processor.
There is still hope. The GIL is only an implementation artifact of the C Python implementation that you download from python.org. Jython and IronPython are two other implementations of Python, and they have no GIL, so you may have better threading results with one of them.
Yes. Python is not really multi-threaded. Instead, there is a global lock and each thread gets to execute a few operations in turn. This makes it much more simple to write MT applications in Python since there can't be any problems with stale caches, etc.
So one Python process can only ever occupy a single CPU. To fully utilize a multi-core system, you must run several Python processes.
I don't know if it is still the case, but there is a global lock in the Python interpreter, which prevents the use of all processor resources from a single interpreter, even when using multi threading. IIRC, the global lock has to do with I/O.
It seems you are watching the result of this lock, so, personally, I would use multiple processes with a single thread.

How to make Ruby or Python web sites to use multiple cores?

Even though Python and Ruby have one kernel thread per interpreter thread, they have a global interpreter lock (GIL) that is used to protect potentially shared data structures, so this inhibits multi-processor execution. Even though the portions in those languajes that are written in C or C++ can be free-threaded, that's not possible with pure interpreted code unless you use multiple processes. What's the best way to achieve this? Using FastCGI? Creating a cluster or a farm of virtualized servers? Using their Java equivalents, JRuby and Jython?
I'm not totally sure which problem you want so solve, but if you deploy your python/django application via an apache prefork MPM using mod_python apache will start several worker processes for handling different requests.
If one request needs so much resources, that you want to use multiple cores have a look at pyprocessing. But I don't think that would be wise.
The 'standard' way to do this with rails is to run a "pack" of Mongrel instances (ie: 4 copies of the rails application) and then use apache or nginx or some other piece of software to sit in front of them and act as a load balancer.
This is probably how it's done with other ruby frameworks such as merb etc, but I haven't used those personally.
The OS will take care of running each mongrel on it's own CPU.
If you install mod_rails aka phusion passenger it will start and stop multiple copies of the rails process for you as well, so it will end up spreading the load across multiple CPUs/cores in a similar way.
Use an interface that runs each response in a separate interpreter, such as mod_wsgi for Python. This lets multi-threading be used without encountering the GIL.
EDIT: Apparently, mod_wsgi no longer supports multiple interpreters per process because idiots couldn't figure out how to properly implement extension modules. It still supports running requests in separate processes FastCGI-style, though, so that's apparently the current accepted solution.
In Python and Ruby it is only possible to use multiple cores, is to spawn new (heavyweight) processes.
The Java counterparts inherit the possibilities of the Java platform. You could imply use Java threads. That is for example a reason why sometimes (often) Java Application Server like Glassfish are used for Ruby on Rails applications.
For Python, the PyProcessing project allows you to program with processes much like you would use threads. It is included in the standard library of the recently released 2.6 version as multiprocessing. The module has many features for establishing and controlling access to shared data structures (queues, pipes, etc.) and support for common idioms (i.e. managers and worker pools).

Categories