what are the differences between jax.numpy.vectorizeand jax.vmap?
Here is a small snipset
import jax
import jax.numpy as jnp
def f(x):
return jnp.exp(-x)*jnp.sin(x)
gf = jax.grad(f)
x = jnp.arange(0,1,0.1)
jax.vmap(gf)(x)
jnp.vectorize(gf)(x)
Both computations give the same results:
DeviceArray([ 1. , 0.80998397, 0.63975394, 0.4888039 ,
0.35637075, 0.24149445, 0.14307144, 0.05990037,
-0.00927836, -0.06574923], dtype=float32)
How to decide which one to use, and if there is a difference in terms of performance?
jax.vmap and jax.numpy.vectorize have quite different semantics, and only happen to be similar in the case of a single 1D input as in your example.
The purpose of jax.vmap is to map a function over one or more inputs along a single explicit axis, as specified by the in_axes parameter. On the other hand, jax.numpy.vectorize maps a function over one or more inputs along zero or more implicit axes according to numpy broadcasting rules.
To see the difference, let's pass two 2-dimensional inputs and print the shape within the function:
import jax
import jax.numpy as jnp
def print_shape(x, y):
print(f"x.shape = {x.shape}")
print(f"y.shape = {y.shape}")
return x + y
x = jnp.zeros((20, 10))
y = jnp.zeros((20, 10))
_ = jax.vmap(print_shape)(x, y)
# x.shape = (10,)
# y.shape = (10,)
_ = jnp.vectorize(print_shape)(x, y)
# x.shape = ()
# y.shape = ()
Notice that vmap only maps along the first axis, while vectorize maps along both input axes.
And notice also that the implicit mapping of vectorize means it can be used much more flexibly; for example:
x2 = jnp.arange(10)
y2 = jnp.arange(20).reshape(20, 1)
def add(x, y):
# vectorize always maps over all axes, such that the function is applied elementwise
assert x.shape == y.shape == ()
return x + y
jnp.vectorize(add)(x2, y2).shape
# (20, 10)
vectorize will iterate over all axes of the inputs according to numpy broadcasting rules. On the other hand, vmap cannot handle this by default:
jax.vmap(add)(x2, y2)
# ValueError: vmap got inconsistent sizes for array axes to be mapped:
# arg 0 has shape (10,) and axis 0 is to be mapped
# arg 1 has shape (20, 1) and axis 0 is to be mapped
# so
# arg 0 has an axis to be mapped of size 10
# arg 1 has an axis to be mapped of size 20
To accomplish this same operation with vmap requires more thought, because there are two separate mapped axes, and some of the axes are broadcast. But you can accomplish the same thing this way:
jax.vmap(jax.vmap(add, in_axes=(None, 0)), in_axes=(0, None))(x2, y2[:, 0]).shape
# (20, 10)
This latter nested vmap is essentially what is happening under the hood when you use jax.numpy.vectorize.
As for which to use in any given situation:
if you want to map a function across a single, explicitly specified axis of the inputs, use jax.vmap
if you want a function's inputs to be mapped across zero or more axes according to numpy's broadcasting rules as applied to the input, use jax.numpy.vectorize.
in situations where the transforms are identical (for example when mapping a function of 1D inputs) lean toward using vmap, because it more directly does what you want to do.
Related
I'm trying to implement the n-mode tensor-matrix product (as defined by Kolda and Bader: https://www.sandia.gov/~tgkolda/pubs/pubfiles/SAND2007-6702.pdf) efficiently in Python using Numpy. The operation effectively gets down to (for matrix U, tensor X and axis/mode k):
Extract all vectors along axis k from X by collapsing all other axes.
Multiply these vectors on the left by U using standard matrix multiplication.
Insert the vectors again into the output tensor using the same shape, apart from X.shape[k], which is now equal to U.shape[0] (initially, X.shape[k] must be equal to U.shape[1], as a result of the matrix multiplication).
I've been using an explicit implementation for a while which performs all these steps separately:
Transpose the tensor to bring axis k to the front (in my full code I added an exception in case k == X.ndim - 1, in which case it's faster to leave it there and transpose all future operations, or at least in my application, but that's not relevant here).
Reshape the tensor to collapse all other axes.
Calculate the matrix multiplication.
Reshape the tensor to reconstruct all other axes.
Transpose the tensor back into the original order.
I would think this implementation creates a lot of unnecessary (big) arrays, so once I discovered np.einsum I thought this would speed things up considerably. However using the code below I got worse results:
import numpy as np
from time import time
def mode_k_product(U, X, mode):
transposition_order = list(range(X.ndim))
transposition_order[mode] = 0
transposition_order[0] = mode
Y = np.transpose(X, transposition_order)
transposed_ranks = list(Y.shape)
Y = np.reshape(Y, (Y.shape[0], -1))
Y = U # Y
transposed_ranks[0] = Y.shape[0]
Y = np.reshape(Y, transposed_ranks)
Y = np.transpose(Y, transposition_order)
return Y
def einsum_product(U, X, mode):
axes1 = list(range(X.ndim))
axes1[mode] = X.ndim + 1
axes2 = list(range(X.ndim))
axes2[mode] = X.ndim
return np.einsum(U, [X.ndim, X.ndim + 1], X, axes1, axes2, optimize=True)
def test_correctness():
A = np.random.rand(3, 4, 5)
for i in range(3):
B = np.random.rand(6, A.shape[i])
X = mode_k_product(B, A, i)
Y = einsum_product(B, A, i)
print(np.allclose(X, Y))
def test_time(method, amount):
U = np.random.rand(256, 512)
X = np.random.rand(512, 512, 256)
start = time()
for i in range(amount):
method(U, X, 1)
return (time() - start)/amount
def test_times():
print("Explicit:", test_time(mode_k_product, 10))
print("Einsum:", test_time(einsum_product, 10))
test_correctness()
test_times()
Timings for me:
Explicit: 3.9450525522232054
Einsum: 15.873924326896667
Is this normal or am I doing something wrong? I know there are circumstances where storing intermediate results can decrease complexity (e.g. chained matrix multiplication), however in this case I can't think of any calculations that are being repeated. Is matrix multiplication so optimized that it removes the benefits of not transposing (which technically has a lower complexity)?
I'm more familiar with the subscripts style of using einsum, so worked out these equivalences:
In [194]: np.allclose(np.einsum('ij,jkl->ikl',B0,A), einsum_product(B0,A,0))
Out[194]: True
In [195]: np.allclose(np.einsum('ij,kjl->kil',B1,A), einsum_product(B1,A,1))
Out[195]: True
In [196]: np.allclose(np.einsum('ij,klj->kli',B2,A), einsum_product(B2,A,2))
Out[196]: True
With a mode parameter, your approach in einsum_product may be best. But the equivalences help me visualize the calculation better, and may help others.
Timings should basically be the same. There's an extra setup time in einsum_product that should disappear in larger dimensions.
After updating Numpy, Einsum is only slightly slower than the explicit method, with or without multi-threading (see comments to my question).
I am trying to apply gaussian filtering on the toy digits dataset images. It stores images in a (1797, 8, 8) array. Individually, I can make it work but when I try to apply it for the whole image set with apply_along_axis, something goes wrong.
Here is the core example:
from sklearn.datasets import load_digits
from scipy.ndimage.filters import gaussian_filter
images = load_digits().images
# Filter individually
individual = gaussian_filter(images[0], sigma=1, order=0)
# Use `apply_along_axis`
transformed = np.apply_along_axis(
func1d=lambda x: gaussian_filter(x, sigma=1, order=0),
axis=2,
arr=images
)
# They produce different arrays
(transformed[0] != individual).all()
Out: True
I tried to change the axis but that did not help. I also checked by, first, simply returning the image/squared values. In these cases, the results seem equivalent. Applying dot product, however, again produces different results.
# Squared values
transformed = np.apply_along_axis(
func1d=lambda x: x ** 2,
axis=2,
arr=images
)
# They produce the same arrays
(transformed[0] == images[0] ** 2).all()
Out: True
# Dot product
transformed = np.apply_along_axis(
func1d=lambda x: np.dot(x, x),
axis=2,
arr=images
)
individual = np.dot(images[0], images[0])
# They produce different arrays
(transformed[0] != individual).all()
Out: True
I am sure I misunderstand the way these functions work. What am I doing wrong?
Update: As #hpaulj pointed out in the comments, the func1d parameter in apply_along_axis takes in only 1d arrays. See...
I am running an optimization with scipy.optimize.minimize
sig_init = 2
b_init = np.array([0.2,0.01,0.5,-0.02])
params_init = np.array([b_init, sig_init])
mle_args = (y,x)
results = opt.minimize(crit, params_init, args=(mle_args))
The problem is, I need to set a bound on sig_init. But the opt.minimize() requires that I specify bounds for each of the input parameters. But one of my inputs is a numpy array.
How can I specify the bounds given that one of my inputs is a numpy array?
First of all, scipy.optimize.minimize expects a flat array as its second argument x0 (documentation) (which means the function it optimizes also takes a flat array and optional additional arguments). Therefore it is my understanding you would have to give it something like :
b_init = [0.2,0.01,0.5,-0.02]
sig_init = [2]
params_init = np.array(b_init + sig_init])
for the optimization to work.
Then, if you will have to give the bounds for each scalar in you array. One rudimentary example if you wanted [-1, 1] bounds on sig and didn't want bounds on b :
bounds = [(-np.inf, np.inf) for _ in b_init] + [(-1, 1)]
tl;dr: How do I predict the shape returned by numpy broadcasting across several arrays without having to actually add the arrays?
I have a lot of scripts that make use of numpy (Python) broadcasting rules so that essentially 1D inputs result in a multiple-dimension output. For a basic example, the ideal gas law (pressure = rho * R_d * temperature) might look like
def rhoIdeal(pressure,temperature):
rho = np.zeros_like(pressure + temperature)
rho += pressure / (287.05 * temperature)
return rho
It's not necessary here, but in more complicated functions it's very useful to initialize the array with the right shape. If pressure and temperature have the same shape, then rho also has that shape. If pressure has shape (n,) and temperature has shape (m,), I can call
rhoIdeal(pressure[:,np.newaxis], temperature[np.newaxis,:])
to get rho with shape (n,m). This lets me make plots with multiple values of temperature without having to loop over rhoIdeal, while still allowing the script to accept arrays of the same shape and compute the result element-by-element.
My question is: Is there a built-in function to return the shape compatible with several inputs? Something that behaves like
def returnShape(list_of_arrays):
return np.zeros_like(sum(list_of_arrays)).shape
without actually having to sum the arrays? If there's no built-in function, what would a good implementation look like?
You could use np.broadcast. This function returns an object encapsulating the result of broadcasting two or more arrays together. No actual operation (e.g. addition) is performed - the object simply has some of the same attributes that an array produced by means of other operations would have (shape, ndim, etc.).
For example:
x = np.array([1,2,3]) # shape (3,)
y = x.reshape(3,1) # shape (3, 1)
z = np.ones((5,1,1)) # shape (5, 1, 1)
Then you can check what the shape of the array returned by broadcasting x, y and z would be by inspecting the shape attribute:
>>> np.broadcast(x, y, z).shape
(5, 3, 3)
This means that you could implement your function simply as follows:
def returnShape(*args):
return np.broadcast(*args).shape
I am trying to plot a function I created against a range of values (y-axis vs. x-axis).
The operation I would like to compute is common in "matrix multiplication" :
r^T * C * r
where r^T should be of shape (1,100), r of shape (100,1), and C is a matrix of shape (100,100) (or an ndarray shape 100,100) . Multiplied together using numpy.dot(), the output should be a single value.
The function only has one input, which can be an array of data.
import numpy as np
# The user first sets the values used by the function
# Not "true code", because input() too complex for the question at hand
r = data # an numpy ndarray of 100 values, (100,)
original_matrix = M # set matrix, such that M.shape = (100, 100)
param = array of data # EITHER an array of values, shape (50,),
# OR one value, i.e. a 32/64-bit float
# e.g. parameters = np.array of 50 values
def function(param):
# using broadcasting, "np.sum(param * original_matrix for i in r)"
new_matrix = np.sum(param[:, None, None] * original_matrix, axis=0)
# now perform r^T * C * r
return np.dot( r.transpose(), np.dot( new_matrix, r) )
Calling the function
function(param)
results in one value, in format = numpy.float64.
I would like to plot this function against a series of values, i.e. I need this function to input a np.array and output a np.cdarray, must like other ufuncs in NumPy. The function will evaluate each element in the ndarray, and plot this as a function.
For example,
import pylab
X = np.arange(100)
Y = sin(X)
pylab.plot(X, Y)
outputs
Given that my original function (which is solely a function of the array "parameters") results in np.float64 format, how can I turn this function into a ufunc? I would like to plot my function on the y-axis against parameters on the x-axis.
What if you change your function to take a single paramater rather than an array?
Then you could just do
X = range(50)
Y = [function(x) for x in X]
pylab.plot(X, Y)
I can offer two solutions
You can make (almost) any function a ufunc using np.vectorize which handles numbers as well as np.arrays like the np.sin function
def my_func_1(param):
# using broadcasting, "np.sum(param * original_matrix for i in r)"
new_matrix = np.sum(param * original_matrix[None,:,:], axis=0)
# now perform r^T * C * r
return np.dot( r.transpose(), np.dot( new_matrix, r) )
my_vec_func_1 = np.vectorize(my_func_1)
Note that np.vectorize does not really vectorize your code ... I just makes automatically a forloop if an array is passed as an argument. There is not gain in runtime by using it ... see the timings below.
You can define a truly vectorized function which takes (for the following code) only one-dimensional lists or np.arrays as an argument:
def my_vec_func_2(param):
param = np.asarray(param)
new_matrix = np.sum(param[:,None,None,None] * original_matrix[None,None,:,:],axis=1)
return np.dot(r, np.dot(new_matrix,r).transpose())
Truly vectorized codes are usually considerably faster than forloops. Why the gain is so small in this case I can not explain for this case ...
Timings
I used the following code to test the runtime
import numpy as np
from numpy.random import randint
r = randint(10,size=(100)) # an numpy ndarray of 100 values, (100,)
original_matrix = randint(30,size=(100,100))
timeit my_vec_func_1(np.arange(10000))
1 loops, best of 3: 508 ms per loop
timeit my_vec_func_2(np.arange(10000))
1 loops, best of 3: 488 ms per loop
timeit [my_func_1(x) for x in np.arange(10000)]
1 loops, best of 3: 505 ms per loop