Does Arrayfire python support multi-GPU programming - python

I am trying to use arrayFire python (https://github.com/arrayfire/arrayfire-python) for multi-GPU programming.
However, when I try to interface it with the concurrent futures (https://docs.python.org/3/library/concurrent.futures.html) library, I run into synchronization issues.
Does anyone have inputs on how to use arrayfire-python to parallel process on multiple GPUs ?

ArrayFire allows Mutli-GPU programming but does not distribute the work load automatically. It is up to the user to decide which memory and functions run on which device.
ArrayFire as it stands now is NOT thread safe. Hence running anything on multiple threads can cause issues.
Disclosure: I am a developer for ArrayFire.

Related

Implications of using MPI with TensorFlow

I come from a sort of HPC background and I am just starting to learn about machine learning in general and TensorFlow in particular. I was initially surprised to find out that distributed TensorFlow is designed to communicate with TCP/IP by default though it makes sense in hindsight given what Google is and the kind of hardware it uses most commonly.
I am interested in experimenting with TensorFlow in a parallel way with MPI on a cluster. From my perspective, this should be advantageous because latency should be much lower due to MPI's use of Remote Direct Memory Access (RDMA) across machines without shared memory.
So my question is, why doesn't this approach seem to be more common given the increasing popularity of TensorFlow and machine learning ? Isn't latency a bottleneck ? Is there some typical problem that is solved, that makes this sort of solution impractical? Are there likely to be any meaningful differences between calling TensorFlow functions in a parallel way vs implementing MPI calls inside of the TensorFlow library ?
Thanks
It seems tensorflow already supports MPI, as stated at https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/mpi
MPI support for tensorflow was also discussed at https://arxiv.org/abs/1603.02339
Generally speaking, keep in mind MPI is best at sending/receiving messages, but not so great at sending notifications and acting upon events.
Last but not least, MPI support of multi-threaded applications (e.g. MPI_THREAD_MULTIPLE) has not always been production-ready among MPI implementation s.
These were two general statements and i honestly do not know if they are relevant for tensorflow.
According to the doc in Tensorflow git repo,actually tf utilizes gRPC library by detault, which is based on HTTP2 protocol, rather than TCP/IP protocol, and this paper should give you some insight, hope this information is useful.

How to do parallel computing in python 2.7 using MPI?

I have written a python code to carry out genetic algorithm optimization, but it is too slow. I would like to know how to run the same in parallel mode making use of multiple CPUs ?
For more clarity, another python code will be called by my code for, say 100 times one after the other, I wanted to divide this between 4 CPUs. So that 25 times the outside python code is solved by each CPU. Thereby increasing the speed.
Its highly appreciated if someone can help me with is ?
Thanks in advance!
There are several packages that provide parallel computing for python2. I am the author of a package called pathos, which provides parallel computing with several parallel backends and gives them a common API. pathos provides parallel pipes and maps for multi-process, multi-threaded, parallel over sockets, MPI-parallel, and also interactions with schedulers and over ssh. pathos relies on several packages, which you can pick from if you don't want all the different options.
pathos uses: pyina which in turn uses mpi4py. mpi4py provides bindings to MPI, but you can't run the code from python 'normally'… you need to run with whatever you use to run MPI. pyina enables you to run mpi4py from normal python, and to interact with schedulers. Plus, pyina uses dill, which can serialize most python objects, and thus you are much more able to send what you want across processes.
pathos provides a fork of multiprocessing that also plays well with dill and pyina. Using both can enable you to do hierarchical parallel competing -- like launching MPI parallel that then spawns multiprocess or multithreaded parallel.
pathos also uses ppft, which is a fork of pp (Parallel Python), which provides parallel computing across sockets -- so that means you can connect a parallel map across several machines.
There are alternatives to pathos, such as IPython-parallel. However, the ability to use MPI is very new, and I don't know how capable it is yet. It may or may not leverage IPython-cluster-helper, which has been in development for a little while. Note that IPython doesn't use pp, it uses zmq instead, and IPython also provides connectivity to EC2 if you like cloud stuff.
Here are some relevant links:
pathos, 'pyina, dill, ppft: https://github.com/uqfoundation
Ipython-cluster-helper: https://github.com/roryk/ipython-cluster-helper
IPython: http://ipython.org/ipython-doc/dev/parallel/parallel_mpi.html

Python threads for networking - threads don't run in parallel

I am trying to use two threads in a server program, one for listening for any communications from clients using the Twisted library, and the other for doing some other computations on the server. In my attempt to implement the threads, it seems that the python threading library doesn't support parallel threads as answered in this question. I was wondering if there is any other python library that addresses this problem? Or any other way to circumvent this limitation?
Thank you in advance.
Python's GIL (global interpreter lock) prevents two threads to simultaneously execute Python code. Fortunately that doesn't include I/O, so if your threads do significant amounts of networking, database, or filesystem, then usual threads do work correctly. They won't let you take advantage of multiple cores for computation, but will let other threads advance while any number of them are waiting for something to happen.
If your needs are more for computation than for I/O, then threads (as implemented on Python) won't help. Better use the multiprocessing module (standard since Python 2.6), which uses a 'thread-like' API to spawn multiple processes, each one with an independent Python interpreter, and therefore it's own GIL.

Will Python use all processors in thread mode?

While developing a Django app deployed on Apache mod_wsgi I found that in case of multithreading (Python threads; mod_wsgi processes=1 threads=8) Python won't use all available processors. With the multiprocessing approach (mod_wsgi processes=8 threads=1) all is fine and I can load my machine at full.
So the question: is this Python behavior normal? I doubt it because using 1 process with few threads is the default mod_wsgi approach.
The system is:
2xIntel Xeon 5XXX series (8 cores (16 with hyperthreading)) on FreeBSD 7.2 AMD64 and Python 2.6.4
Thanks all for answers.
We all found that this behavior is normal because of GIL. Here is a good explanation:
http://jessenoller.com/2009/02/01/python-threads-and-the-global-interpreter-lock/
or stackoverflow GIL discussion: What is a global interpreter lock (GIL)?.
Will Python use all processors in thread mode? No.
Python won't use all available processors; is this Python behavior normal? Yes, it's normal because of the GIL.
For a discussion see http://mail.python.org/pipermail/python-3000/2007-May/007414.html.
You may find that having a couple (or 4) of threads per core/process can still improve performance if there is some blocking, for example waiting for a response from the database would cause that process to block other connections otherwise.
Will python use all processors in thread mode? No.
It this normal? Yes, this is normal. Python makes no effort to locate all your cores.
"1 process with few threads is default mod_wsgi approach". But that's not optimal or even desirable. That's just a default. Don't read anything into it.
If you want to use all your computer's resources, make the OS handle it. Use processes.
The distinction between multi-processing and multi-threading is hard to measure for the most part. Using processes or threads barely matters. It's usually simpler to use processes, since there's trivial OS support for this.
Bottom Line
Use multiple processes, that allows the OS (and Apache) to make as much use as possible of the system.
Threads share a limited set of I/O resources for the Process they're part of, and web page serving is I/O bound. Processes have independent I/O resources and will more easily max out your processor.
There is still hope. The GIL is only an implementation artifact of the C Python implementation that you download from python.org. Jython and IronPython are two other implementations of Python, and they have no GIL, so you may have better threading results with one of them.
Yes. Python is not really multi-threaded. Instead, there is a global lock and each thread gets to execute a few operations in turn. This makes it much more simple to write MT applications in Python since there can't be any problems with stale caches, etc.
So one Python process can only ever occupy a single CPU. To fully utilize a multi-core system, you must run several Python processes.
I don't know if it is still the case, but there is a global lock in the Python interpreter, which prevents the use of all processor resources from a single interpreter, even when using multi threading. IIRC, the global lock has to do with I/O.
It seems you are watching the result of this lock, so, personally, I would use multiple processes with a single thread.

How to make Ruby or Python web sites to use multiple cores?

Even though Python and Ruby have one kernel thread per interpreter thread, they have a global interpreter lock (GIL) that is used to protect potentially shared data structures, so this inhibits multi-processor execution. Even though the portions in those languajes that are written in C or C++ can be free-threaded, that's not possible with pure interpreted code unless you use multiple processes. What's the best way to achieve this? Using FastCGI? Creating a cluster or a farm of virtualized servers? Using their Java equivalents, JRuby and Jython?
I'm not totally sure which problem you want so solve, but if you deploy your python/django application via an apache prefork MPM using mod_python apache will start several worker processes for handling different requests.
If one request needs so much resources, that you want to use multiple cores have a look at pyprocessing. But I don't think that would be wise.
The 'standard' way to do this with rails is to run a "pack" of Mongrel instances (ie: 4 copies of the rails application) and then use apache or nginx or some other piece of software to sit in front of them and act as a load balancer.
This is probably how it's done with other ruby frameworks such as merb etc, but I haven't used those personally.
The OS will take care of running each mongrel on it's own CPU.
If you install mod_rails aka phusion passenger it will start and stop multiple copies of the rails process for you as well, so it will end up spreading the load across multiple CPUs/cores in a similar way.
Use an interface that runs each response in a separate interpreter, such as mod_wsgi for Python. This lets multi-threading be used without encountering the GIL.
EDIT: Apparently, mod_wsgi no longer supports multiple interpreters per process because idiots couldn't figure out how to properly implement extension modules. It still supports running requests in separate processes FastCGI-style, though, so that's apparently the current accepted solution.
In Python and Ruby it is only possible to use multiple cores, is to spawn new (heavyweight) processes.
The Java counterparts inherit the possibilities of the Java platform. You could imply use Java threads. That is for example a reason why sometimes (often) Java Application Server like Glassfish are used for Ruby on Rails applications.
For Python, the PyProcessing project allows you to program with processes much like you would use threads. It is included in the standard library of the recently released 2.6 version as multiprocessing. The module has many features for establishing and controlling access to shared data structures (queues, pipes, etc.) and support for common idioms (i.e. managers and worker pools).

Categories