Python best practice for when an module is not always available - python

I have a Python code that runs on cuda. Now I need to support new deployment devices that cannot run cuda because they don't have Nvidia GPUs. Since I have many cupy imports in the code, I am wondering what is the best practice for this situation.
For instance, I might have to import certain classes based on the availability of cuda. This seems nasty. Is there any good programming pattern I can follow?
For instance, I would end up doing something like this:
from my_custom_autoinit import is_cupy_available
if is_cupy_available:
import my_module_that_uses_cupy
where my_custom_autoinit.py is:
try:
import cupy as cp
is_cupy_available = True
except ModuleNotFoundError:
is_cupy_available = False
This comes with a nasty drawback: every time I want to use my_module_that_uses_cupy I need to check if cupy is available.
I don't personally like this and I guess somebody came up with something better than this. Thank you

You could add a module called cupywrapper to your project, containing your try..except
cupywrapper.py
try:
import cupy as cp
is_cupy_available = True
except ModuleNotFoundError:
import numpy as cp
is_cupy_available = False
I'm assuming you can substitute cupy with numpy because from the website:
CuPy's interface is highly compatible with NumPy; in most cases it can be used as a drop-in replacement. All you need to do is just replace numpy with cupy in your Python code.
Then, in your code, you'd do:
import cupywrapper
cp = cupywrapper.cp
# Now cp is either cupy or numpy
x = [1, 2, 3]
y = [4, 5, 6]
z = cp.dot(x, y)
print(z)
print("cupy? ", cupywrapper.is_cupy_available)
On my computer, I don't have cupy installed and this falls back to numpy.dot, giving an output of
32
cupy? False

Related

Python: what is the difference between a package and a compiler?

I was reading the wiki page for Numba, and it says Numba is a "compiler". But then later on, it says that to use Numba, you import it like a package. I later looked up how to use Numba, and indeed, you just pip install it.
So now I am confused. I thought Numba was a compiler? But it seems to be used just like any other package, like numpy or pandas? What's the difference?
A compiler is a program that inputs something in human-readable form (usually a program in a specified language) and outputs a functionally equivalent stream in another, more machine-digestible form. Just as with any other transformation, it's equally viable as a command-line invocation or a function call.
As long as it's wrapped properly in a package for general use, it's perfect reasonable to deliver a compiler as a Python package.
Does that clear up the difficulty?
From what I have read at Numba documentation it's a package that you import to you project and then use the Numba decorator do indicate parts of your code that you would like to have compiled in JIT (Just in Time) in order to optimize them. Like in the following example:
from numba import jit
import random
#jit(nopython=True)
def monte_carlo_pi(nsamples):
acc = 0
for i in range(nsamples):
x = random.random()
y = random.random()
if (x ** 2 + y ** 2) < 1.0:
acc += 1
return 4.0 * acc / nsamples
When the monte_carlo_pi function is called Numba will have it compiled in order to optimize it, so there isn't a compilation step that you can take.

Python import statement with argument [duplicate]

This question already has answers here:
Python import functions from module twice with different internal imports
(3 answers)
Closed 4 years ago.
I am using numpy in one of my libraries. No surprise there.
One user would essentially like a copy of my project where I don't use the default numpy, but the one bundled with autograd. For instance, let's say I have a dumb function:
import numpy
def doSomething(x):
return numpy.sin(x)
They would like a copy of the library where all of these import numpy are replaced by from autograd import numpy:
from autograd import numpy
def doSomething(x):
return numpy.sin(x)
This would allow them to easily compute gradients and jacobians of my functions.
I would like to know what the easiest way to handle this is without copying the whole codebase and replacing all of these lines.
Options I am aware of:
I could make a copy of the codebase (lib and lib_autograd) where the first uses import numpy, and the second uses from autograd import numpy. This is bad because then I have to maintain two codebases.
I could automatically import from autograd if it is available:
try:
from autograd import numpy
except ImportError:
import numpy
The reason I do not want to do this is that many people have highly optimized numpy installs, whereas autograd might not. So I want to give the user an option which version to import. Forcing the user to use the autograd version if they have it seems bad since it would not be apparent to the user what is going on, and would require the user to uninstall autograd if they want to use the library with their default numpy installation.
So what are my options?
Ideally there would be a way of doing something like passing a parameter to the import statement (I do realize that you can't do this):
useAutograd = False
from lib(useAutograd) import doSomething
You can have 'conditional' import with:
try:
from autograd import numpy
except ImportError:
import numpy
One of other options is to have environment variable that switches whether you want to use numpy from autograd or regular one, because here you either use autograd.numpy (if it exists) or numpy. You don't have an option to use numpy if there is autograd module/package.
To elaborate on giving user an option to switch, here is one possibility:
import os
if os.environ.get('AUTOGRADNUMPY'):
try:
from autograd import numpy
except ImportError:
import numpy
else:
import numpy
Having environment variable AUTOGRADNUMPY set to True (or anything else that is not empty string) when you want to load numpy from autograd package. If it is not set or doesn't exist, regular numpy is imported.
All of this stands if user has at least numpy installed.
This might help:
try:
from autograd import numpy as np
except ImportError:
import numpy as np
...
...
np.sum(..)

Anaconda package for cufft keeping arrays in gpu memory between fft / ifft calls

I am using the anaconda suite with ipython 3.6.1 and their accelerate package. There is a cufft sub-package in this two functions fft and ifft. These, as far as I understand, takes in a numpy array and outputs to a numpy array, both in system ram, i.e. all gpu-memory and transfer between system and gpu memory is handled automatically and gpu memory is releaseed as function is ended. This seems all very nice and seems to work for me. However, I would like to run multiple fft/ifft calls on the same array and for each time extract just one number from the array. It would be nice to keep the array in the gpu memory to minimize system <-> gpu transfer. Am I correct that this is not possible using this package? If so, is there another package that would do the same. I have noticed the reikna project but that doesn't seem available in anaconda.
The thing I am doing (and would like to do efficiently on gpu) is in short shown here using numpy.fft
import math as m
import numpy as np
import numpy.fft as dft
nr = 100
nh = 2**16
h = np.random.rand(nh)*1j
H = np.zeros(nh,dtype='complex64')
h[10] = 1
r = np.zeros(nr,dtype='complex64')
fftscale = m.sqrt(nh)
corr = 0.12j
for i in np.arange(nr):
r[i] = h[10]
H = dft.fft(h,nh)/fftscale
h = dft.ifft(h*corr)*fftscale
r[nr-1] = h[10]
print(r)
Thanks in advance!
So I found Arrayfire which seems rather easy to work with.

Using Numpy in different platforms

I have a piece of code which computes the Helmholtz-Hodge Decomposition.
I've been running on my Mac OS Yosemite and it was working just fine. A month ago, however, my Mac got pretty slow (it was really old), and I opted to buy a new notebook (Windows 8.1, Dell).
After installing all Python libs and so on, I continued my work running this same code (versioned in Git). And then the result was pretty weird, completely different from the one obtained in the old notebook.
For instance, what I do is to construct to matrices a and b(really long calculus) and then I call the solver:
s = numpy.linalg.solve(a, b)
This was returning a (wrong, and different of the result obtained in my Mac, which was right).
Then, I tried to use:
s = scipy.linalg.solve(a, b)
And the program exits with code 0 but at the middle of it.
Then, I just made a simple test of:
print 'here1'
s = scipy.linalg.solve(a, b)
print 'here2'
And here2 is never printed.
I tried:
print 'here1'
x, info = numpy.linalg.cg(a, b)
print 'here2'
And the same happens.
I also tried to check the solution after using numpy.linalg.solve:
print numpy.allclose(numpy.dot(a, s), b)
And I got a False (?!).
I don't know what is happening, how to find a solution, I just know that the same code runs in my Mac, but it would be very good if I could run it in other platforms. Now I'm stucked in this problem (don't have a Mac anymore) and with no clue about the cause.
The weirdest thing is that I don't receive any error on runtime warning, no feedback at all.
Thank you for any help.
EDIT:
Numpy Suit Test Results:
Scipy Suit Test Results:
Download Anaconda package manager
http://continuum.io/downloads
When you download this it will already have all the dependencies for numpy worked out for you. It installs locally and will work on most platforms.
This is not really an answer, but this blog discusses in length the problems of having a numpy ecosystem that evolves fast, at the expense of reproducibility.
By the way, which version of numpy are you using? The documentation for the latest 1.9 does not report any method called cg as the one you use...
I suggest the use of this example so that you (and others) can check the results.
>>> import numpy as np
>>> import scipy.linalg
>>> np.random.seed(123)
>>> a = np.random.random(size=(10000, 10000))
>>> b = np.random.random(size=(10000,))
>>> s_np = np.linalg.solve(a, b)
>>> s_sc = scipy.linalg.solve(a, b)
>>> np.allclose(s_np,s_sc)
>>> s_np
array([-15.59186559, 7.08345804, 4.48174646, ..., -16.43310046,
-8.81301553, -10.77509242])
I hope you can find the answer - one option in the future is to create a virtual machine for each of your projects, using Docker. This allows easy portability.
See a great article here discussing Docker for research.

numpy OpenBLAS set maximum number of threads

I am using numpy and my model involves intensive matrix-matrix multiplication.
To speed up, I use OpenBLAS multi-threaded library to parallelize the numpy.dot function.
My setting is as follows,
OS : CentOS 6.2 server #CPUs = 12, #MEM = 96GB
python version: Python2.7.6
numpy : numpy 1.8.0
OpenBLAS + IntelMKL
$ OMP_NUM_THREADS=8 python test_mul.py
code, of which I took from https://gist.github.com/osdf/
test_mul.py :
import numpy
import sys
import timeit
try:
import numpy.core._dotblas
print 'FAST BLAS'
except ImportError:
print 'slow blas'
print "version:", numpy.__version__
print "maxint:", sys.maxint
print
x = numpy.random.random((1000,1000))
setup = "import numpy; x = numpy.random.random((1000,1000))"
count = 5
t = timeit.Timer("numpy.dot(x, x.T)", setup=setup)
print "dot:", t.timeit(count)/count, "sec"
when I use OMP_NUM_THREADS=1 python test_mul.py, the result is
dot: 0.200172233582 sec
OMP_NUM_THREADS=2
dot: 0.103047609329 sec
OMP_NUM_THREADS=4
dot: 0.0533880233765 sec
things go well.
However, when I set OMP_NUM_THREADS=8.... the code starts to "occasionally works".
sometimes it works, sometimes it does not even run and and gives me core dumps.
when OMP_NUM_THREADS > 10. the code seems to break all the time..
I am wondering what is happening here ? Is there something like a MAXIMUM number threads that each process can use ? Can I raise that limit, given that I have 12 CPUs in my machine ?
Thanks
Firstly, I don't really understand what you mean by 'OpenBLAS + IntelMKL'. Both of those are BLAS libraries, and numpy should only link to one of them at runtime. You should probably check which of these two numpy is actually using. You can do this by calling:
$ ldd <path-to-site-packages>/numpy/core/_dotblas.so
Update: numpy/core/_dotblas.so was removed in numpy v1.10, but you can check the linkage of numpy/core/multiarray.so instead.
For example, I link against OpenBLAS:
...
libopenblas.so.0 => /opt/OpenBLAS/lib/libopenblas.so.0 (0x00007f788c934000)
...
If you are indeed linking against OpenBLAS, did you build it from source? If you did, you should see that in the Makefile.rule there is a commented option:
...
# You can define maximum number of threads. Basically it should be
# less than actual number of cores. If you don't specify one, it's
# automatically detected by the the script.
# NUM_THREADS = 24
...
By default OpenBLAS will try to set the maximum number of threads to use automatically, but you could try uncommenting and editing this line yourself if it is not detecting this correctly.
Also, bear in mind that you will probably see diminishing returns in terms of performance from using more threads. Unless your arrays are very large it is unlikely that using more than 6 threads will give much of a performance boost because of the increased overhead involved in thread creation and management.

Categories