How do I vectorize a function in python or numpy? - python

For instance, in Julia language, a function can easily be vectorized as shown
function circumference_of_circle(r)
return 2*π * r
end
a = collect([i for i=1:200])
circumference_of_circle.(a) # easy vactorization using just (.)
Although I like Julia very much, it has not matured like Python.
Is there a similar vectorization technique in the Python function?

In [1]: def foo(r):
...: return 2*np.pi * r
...:
In [2]: arr = np.arange(5)
In [3]: foo(arr)
Out[3]: array([ 0. , 6.28318531, 12.56637061, 18.84955592, 25.13274123])
All operations in your function work with numpy arrays. There's no need to do anything special.
If your function only works with scalar arguments, "vectorizing" becomes trickier, especially if you are seeking compiled performance.
Have you spent much time reading the numpy basics? https://numpy.org/doc/stable/user/absolute_beginners.html
===
I don't know julia, but this code
function _collect(::Type{T}, itr, isz::SizeUnknown) where T
a = Vector{T}()
for x in itr
push!(a, x)
end
return a
end
looks a lot like
def foo(func, arr):
alist = []
for i in arr:
alist.append(func(i))
return alist # or np.array(alist)
or equivalently the list comprehension proposed in the other answer.
or list(map(func, arr))

I'm not familiar with Julia or vectorization of functions, but if I'm understanding correctly, I believe in Python there are a few ways to do this. The plain-jane python way is using list comprehension
An example using your circumference function would be:
def circumference_of_circle(r):
return 2 * 3.14152 * r
circles = [[x, circumference_of_circle(x)] for x in range(1,201)]
print(circles)
circles list will contain inner lists that have both the radius (generated by the range() function) as well as its circumference. Like Julia function vectorization, python list comprehension is just short-hand for loops, but they take in a list object and return a list object, so they are very handy.

Your function contains only simple math. Python's numpy and pandas modules are designed in ways that allow such operations to be performed on them.
import numpy as np
a = np.array([1,2,3,4])
def circumference_of_circle(r):
return 2 * np.pi * r
print(circumference_of_circle(a)) # array([ 6.28318531, 12.56637061, 18.84955592, 25.13274123])
More complicated functions cannot be applied directly to an array. You may be able to rewrite the function in a vectorized way, for example using np.where for conditions that would be represented by an if block within a normal function.
If this isn't an option, or speed is not a major concern, then you can iterate over the list using a list comprehension [func(v) for v in arr], numpy's vectorize, pandas's apply. You can sometimes optimize these approaches by pre-compiling parts of the code.

Related

Vectorization guidelnes for jax

suppose I have a function (for simplicity, covariance between two series, though the question is more general):
def cov(x, y):
return jnp.dot((x-jnp.mean(x)), (y-jnp.mean(y)))
Now I have a "dataframe" D (a 2-dimenisonal array, whose columns are my series) and I want to vectorize cov in such a way that the application of the vectorized function produces the covariance matrix. Now, there is an obvious way of doing it:
cov1 = jax.vmap(cov, in_axes=(None, 1))
cov2 = jax.vmap(cov1, in_axes=(1, None))
but that seems a little clunky. Is there a "canonical" way of doing this?
If you want to express logic equivalent to nested for loops with vmap, then yes it requires nested vmaps. I think what you've written is probably as canonical as you can get for an operation like this, although it might be slightly more clear if written using decorators:
from functools import partial
#partial(jax.vmap, in_axes=(1, None))
#partial(jax.vmap, in_axes=(None, 1))
def cov(x, y):
return jnp.dot((x-jnp.mean(x)), (y-jnp.mean(y)))
For this particular function, though, note that you can express the same thing using a single dot product if you wish:
result = jnp.dot((x - x.mean(0)).T, (y - y.mean(0)))

Piecewise Operation on List of Numpy Arrays

My question is, can I make a function or variable that can perform an on operation or numpy method on each np.array element within a list in a more succinct way than what I have below (preferably by just calling one function or variable)?
Generating the list of arrays:
import numpy as np
array_list = [np.random.rand(3,3) for x in range(5)]
array_list
Current Technique of operating on each element:
My current method (as seen below) involves unpacking it and doing something to it:
[arr.std() for arr in array_list]
[arr + 2 for arr in array_list]
Goal:
My hope it to get something that could perform the operations above by simply typing:
x.std()
or
x +2
Yes - use an actual NumPy array and perform your operations over the desired axes, instead of having them stuffed in a list.
actual_array = np.array(array_list)
actual_array.std(axis=(1, 2))
# array([0.15792346, 0.25781021, 0.27554279, 0.2693581 , 0.28742179])
If you generally wanted all axes except the first, this could be something like tuple(range(1, actual_array.ndim)) instead of explicitly specifying the tuple.

Creating a function in Python which runs over a range and returns a new value to an array each time

Basically, what I'm trying to create is a function which takes an array, in this case:
numpy.linspace(0, 0.2, 100)
and runs a lot of other code for each of the elements in the array and at the end creates a new array with one a number for each of the calculations for each element. A simple example would be that the function is doing a multiplication like this:
def func(x):
y = x * 10
return (y)
However, I want it to be able to take an array as an argument and return an array consisting of each y for each multiplication. The function above works for this, but the one I've tried creating for my code doesn't work with this method and only returns one value instead. Is there another way to make the function work as intended? Thanks for the help!
You could use this simple code:
def func(x):
y = []
for i in x:
y.append(i*10)
return y
Maybe take a look at np.vectorize:
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.vectorize.html
np.vectorize can for example be used as a decorator:
#np.vectorize
def func(value):
...
return return_value
The function to be vectorized (here func) has to be a function,
that takes a value as input and returns a value.
This function then gets vectorized over the whole array.
It is mentioned in the documentation, but it cant hurt to emphasize it here:
In general this function is only used for convenience not for performance,
it is basically equivalent to using a for-loop.
If you are able to build up your function from numpys ufuncs like (np.add, np.mean, etc.) this will likely be much faster.
Or you could write your own:
https://docs.scipy.org/doc/numpy-1.13.0/reference/ufuncs.html
You can do this with numpy already with your function. For example, the code below will do what you want:
x = numpy.linspace(0, 0.2, 100)
y = x*10
If you defined x as above and passed it to your function it would perform exactly as you want.

Python matrix query - must be simpler way

I have created a small Python program to multiply two 2 by 2 matrices but am wondering if it could be simplified in any way (particularly the creation of new arrays)
The python code is below:
matA=[0]*2
matB=[0]*2
matC=[0]*2
matC[0]=[0]*2
matC[1]=[0]*2
# creating new arrays for multiplying two 2 by 2 matrices
# must be a more simple way
def multiply2by2matrices(a,b):
matC[0][0]=a[0][0]*b[0][0]+a[0][1]*b[1][0]
matC[0][1]=a[0][0]*b[0][1]+a[0][1]*b[1][1]
matC[1][0]=a[1][0]*b[0][0]+a[1][1]*b[1][0]
matC[1][1]=a[1][0]*b[0][1]+a[1][1]*b[1][1]
print ((matC[0][0]),(matC[0][1]))
print ((matC[1][0]),(matC[1][1]))
matA[0]=[4,3]
matA[1]=[2,12]
matB[0]=[5,-2]
matB[1]=[6,3]
multiply2by2matrices(matA, matB)
Any thoughts will be greatly received.
Don't implement by hand. You are reinventing the wheel and there very good wheels around already.
Numpy is the answer.
import numpy as np
a = np.arange(20).reshape(5,4)
b = (np.arange(20) + 10).reshape(4,5)
np.dot(a,b)
Docs:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html
Cheers, P
Unless you are required to use only vanilla python it would be much simpler to use numpy. You can use the matrix class or just use 2d arrays and the dot function. For example:
import numpy as np
a=np.array([[1,1],[2,2]])
b=np.array([[3,3],[3,3]])
c=np.dot(a,b)
produces:
array([[ 6, 6],
[12, 12]])
You can definitely simplify matrix multiplication and make it general. The trick is to use zip. There are other methods of course, but I think zip might produce some of the cleaner code.
I didn't test the following, but I think my linear algebra serves me right.
def matmult(a,b):
zipb = zip(*b)
return [[sum(ax*bx for ax,bx in zip(rowa, colb)) for colb in zipb] for rowa in a]
If you don't use a list comprehension you're going to need to pre-allocate a list or use append/extend in between for loops.

Vectorized assignment of a 2-dimensional array

I work with Python 2.7, numpy and pandas.
I have :
a function y=f(x) where both x and y are scalars.
a one-dimensional array of scalars of length n : [x0, x1, ..., x(n-1)]
I need to construct a 2-dimensional array D[i,j]=f(xi)*f(xj) where i,j are indices in [0,...,n-1].
I could use loops and/or a comprehension list, but that would be slow. I would like to use a vectorized approach instead.
I thought that "numpy.indices" would help me (see Create a numpy matrix with elements a function of indices), but I admit I am at a loss on how to use that command for my purpose.
Thanks in advance!
Ignore the comments that dismiss vectorization; it's a good habit to have, and it does deliver performance with the right accelerators. Anyway, what I really wanted to say was that you want to find the outer product:
x_ = numpy.array(x)
y = f(x_)
numpy.outer(y, y)
If you're working with numbers you should be working with numpy data structures anyway. Then you get fast, readable code like this.
I would like to use a vectorized approach instead.
You sound like you might be a Matlab user -- you should be aware that numpy's vectorize function provides no performance benefit:
The vectorize function is provided primarily for convenience, not for
performance. The implementation is essentially a for loop.
Unless it just so happens that there's already an operation in numpy that does exactly what you want, you're going to be stuck with numpy.vectorize and nothing to really gain over a for loop. With that being said, you should be able to do that like so:
def makeArray():
a = [1, 2, 3, 4]
def addTo(arr):
return f(a[math.floor(arr/4)]) * f(a[arr % 4])
vecAdd = numpy.vectorize(addTo)
return vecAdd(numpy.arange(4 * 4).reshape(4, 4))
EDIT:
If f is actually a one-dimensional array, you can do this:
f_matrix = numpy.matrix(f)
D = f_matrix.T * f_matrix
You can use fromfunc to vectorize the function then use the dot product to multiply:
f2 = numpy.fromfunc(f, 1, 1) # vectorize the function
res1 = f2(x) # get the results for f(x)
res1 = res1[np.newaxis] # result has to be 2D for the next step
res2 = np.dot(a.T, a) # get f(xi)*f(xj)

Categories