Vectorization guidelnes for jax - python

suppose I have a function (for simplicity, covariance between two series, though the question is more general):
def cov(x, y):
return jnp.dot((x-jnp.mean(x)), (y-jnp.mean(y)))
Now I have a "dataframe" D (a 2-dimenisonal array, whose columns are my series) and I want to vectorize cov in such a way that the application of the vectorized function produces the covariance matrix. Now, there is an obvious way of doing it:
cov1 = jax.vmap(cov, in_axes=(None, 1))
cov2 = jax.vmap(cov1, in_axes=(1, None))
but that seems a little clunky. Is there a "canonical" way of doing this?

If you want to express logic equivalent to nested for loops with vmap, then yes it requires nested vmaps. I think what you've written is probably as canonical as you can get for an operation like this, although it might be slightly more clear if written using decorators:
from functools import partial
#partial(jax.vmap, in_axes=(1, None))
#partial(jax.vmap, in_axes=(None, 1))
def cov(x, y):
return jnp.dot((x-jnp.mean(x)), (y-jnp.mean(y)))
For this particular function, though, note that you can express the same thing using a single dot product if you wish:
result = jnp.dot((x - x.mean(0)).T, (y - y.mean(0)))

Related

How to force a function to broadcast without invoking `np.vectorize`

I want to look for a way to force a function to broadcast.
There are scenarios in which the function/method may be overwritten in a later instance, to constant function. In such case if
arr = np.arange(0, 1, 0.0001)
f = lambda x: 5
f(arr) # this gives just integer 5, i want [5, 5,..., 5]
I am aware of methods like np.vectorize which force the function to broadcast, but the problem is this is inefficient, as it is essentially for loop under the hood. (see documentation)
We can also use factory methods like np.frompyfunc which allows us to transform python function to numpy universal function ufunc See here for instance. This outperformed np.vectorize, but still is way less efficient than builtin ufunc methods.
I was wondering if there is any efficient numpy way of handling this, namely to force the function to broadcast?
If there was a better way to make arbitrary Python functions broadcast, numpy.vectorize would use it. You really have to write the function with broadcasting in mind if you want it to broadcast efficiently.
In the particular case of a constant function, you can write a broadcasting constant function using numpy.full:
def f(x):
return numpy.full(numpy.shape(x), 5)
numba.vectorize can also vectorize functions more effectively than numpy.vectorize, but you need Numba, and you need to write your function in a way that Numba can compile efficiently.
For those who can live without generic answer, the best answer would be np.full_like(arr, val) which improves by about 20% than np.full(arr.shape, val)
And after raising this issue to author, I found some best middle ground which achieves both generality while perform rather well:
np.broadcast_arrays(x, f(x))[1]
and here are some time analysis:
arr = np.arange(1, 2, 0.0001).reshape(10, -1)
def master_f(x): return np.broadcast_arrays(x, f(x))[-1].copy('K')
def master_f_nocopy(x): return np.broadcast_arrays(x, f(x))[-1]
def vector_f(x): return np.vectorize(f)(x)
%timeit arr+1 # this takes about 10microsec
%timeit master_f(arr) # this takes about 40 mircrosec
%timeit master_f_nocopy(arr) # this takes about 20 microsec
Note this allows one to apply to projection functions such as f(x,y):=y, which is beyond the help of np.full_like.
Moreover, when it comes more complicated function like np.sin and np.cos you'll notice the difference between f(arr) and master_f_nocopy(arr) is almost negligible.

Creating a function in Python which runs over a range and returns a new value to an array each time

Basically, what I'm trying to create is a function which takes an array, in this case:
numpy.linspace(0, 0.2, 100)
and runs a lot of other code for each of the elements in the array and at the end creates a new array with one a number for each of the calculations for each element. A simple example would be that the function is doing a multiplication like this:
def func(x):
y = x * 10
return (y)
However, I want it to be able to take an array as an argument and return an array consisting of each y for each multiplication. The function above works for this, but the one I've tried creating for my code doesn't work with this method and only returns one value instead. Is there another way to make the function work as intended? Thanks for the help!
You could use this simple code:
def func(x):
y = []
for i in x:
y.append(i*10)
return y
Maybe take a look at np.vectorize:
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.vectorize.html
np.vectorize can for example be used as a decorator:
#np.vectorize
def func(value):
...
return return_value
The function to be vectorized (here func) has to be a function,
that takes a value as input and returns a value.
This function then gets vectorized over the whole array.
It is mentioned in the documentation, but it cant hurt to emphasize it here:
In general this function is only used for convenience not for performance,
it is basically equivalent to using a for-loop.
If you are able to build up your function from numpys ufuncs like (np.add, np.mean, etc.) this will likely be much faster.
Or you could write your own:
https://docs.scipy.org/doc/numpy-1.13.0/reference/ufuncs.html
You can do this with numpy already with your function. For example, the code below will do what you want:
x = numpy.linspace(0, 0.2, 100)
y = x*10
If you defined x as above and passed it to your function it would perform exactly as you want.

Fastest way of avoiding a for loop in python

I have a function which takes a scalar as an argument. I want to map a numpy array as argument of this function.
def myfn(scalar):
return transformed_scalar
I know about np.vectorize, which I have already used to vectorize it
vmyfunc = np.vectorize(myfn)
mapped_array = vmyfunc([x1,x2,...,xn])
Reading the documentation I realized it's just a for loop, see quote below:
Notes
The vectorize function is provided primarily for convenience, not for
performance. The implementation is essentially a for loop
I'm looking for another way that improves to implement this vectorization of the function that improves the performance of this method.
Maybe something using pandas?
UPDATE
The function that takes the scalar looks like the following:
def Loss_func(x, y, W):
delta = 1.0
scores = W.dot(np.transpose(x))
margins = np.maximum(0, scores - scores[y] + delta)
margins[y] = 0
loss_i = np.sum(margins)
return loss_i
Where y a scalar, X and W 2-D np arrays.

Vectorized assignment of a 2-dimensional array

I work with Python 2.7, numpy and pandas.
I have :
a function y=f(x) where both x and y are scalars.
a one-dimensional array of scalars of length n : [x0, x1, ..., x(n-1)]
I need to construct a 2-dimensional array D[i,j]=f(xi)*f(xj) where i,j are indices in [0,...,n-1].
I could use loops and/or a comprehension list, but that would be slow. I would like to use a vectorized approach instead.
I thought that "numpy.indices" would help me (see Create a numpy matrix with elements a function of indices), but I admit I am at a loss on how to use that command for my purpose.
Thanks in advance!
Ignore the comments that dismiss vectorization; it's a good habit to have, and it does deliver performance with the right accelerators. Anyway, what I really wanted to say was that you want to find the outer product:
x_ = numpy.array(x)
y = f(x_)
numpy.outer(y, y)
If you're working with numbers you should be working with numpy data structures anyway. Then you get fast, readable code like this.
I would like to use a vectorized approach instead.
You sound like you might be a Matlab user -- you should be aware that numpy's vectorize function provides no performance benefit:
The vectorize function is provided primarily for convenience, not for
performance. The implementation is essentially a for loop.
Unless it just so happens that there's already an operation in numpy that does exactly what you want, you're going to be stuck with numpy.vectorize and nothing to really gain over a for loop. With that being said, you should be able to do that like so:
def makeArray():
a = [1, 2, 3, 4]
def addTo(arr):
return f(a[math.floor(arr/4)]) * f(a[arr % 4])
vecAdd = numpy.vectorize(addTo)
return vecAdd(numpy.arange(4 * 4).reshape(4, 4))
EDIT:
If f is actually a one-dimensional array, you can do this:
f_matrix = numpy.matrix(f)
D = f_matrix.T * f_matrix
You can use fromfunc to vectorize the function then use the dot product to multiply:
f2 = numpy.fromfunc(f, 1, 1) # vectorize the function
res1 = f2(x) # get the results for f(x)
res1 = res1[np.newaxis] # result has to be 2D for the next step
res2 = np.dot(a.T, a) # get f(xi)*f(xj)

Applying a function element-wise to multiple numpy arrays

Say I have two numpy arrays of the same dimensions, e.g.:
a = np.ones((4,))
b = np.linspace(0,4,4)
and a function that is supposed to operate on elements of those arrays:
def my_func (x,y):
# do something, e.g.
z = x+y
return z
How can I apply this function to the elements of a and b in an element-wise fashion and get the result back?
It depends, really. For the given function; how about 'a+b', for instance? Presumably you have something more complex in mind though.
The most general solution is np.vectorize; but its also the slowest. Depending on what you want to do, more clever solutions may exist though. Take a look at numexp for example.

Categories