Fastest way of avoiding a for loop in python - python

I have a function which takes a scalar as an argument. I want to map a numpy array as argument of this function.
def myfn(scalar):
return transformed_scalar
I know about np.vectorize, which I have already used to vectorize it
vmyfunc = np.vectorize(myfn)
mapped_array = vmyfunc([x1,x2,...,xn])
Reading the documentation I realized it's just a for loop, see quote below:
Notes
The vectorize function is provided primarily for convenience, not for
performance. The implementation is essentially a for loop
I'm looking for another way that improves to implement this vectorization of the function that improves the performance of this method.
Maybe something using pandas?
UPDATE
The function that takes the scalar looks like the following:
def Loss_func(x, y, W):
delta = 1.0
scores = W.dot(np.transpose(x))
margins = np.maximum(0, scores - scores[y] + delta)
margins[y] = 0
loss_i = np.sum(margins)
return loss_i
Where y a scalar, X and W 2-D np arrays.

Related

Vectorization guidelnes for jax

suppose I have a function (for simplicity, covariance between two series, though the question is more general):
def cov(x, y):
return jnp.dot((x-jnp.mean(x)), (y-jnp.mean(y)))
Now I have a "dataframe" D (a 2-dimenisonal array, whose columns are my series) and I want to vectorize cov in such a way that the application of the vectorized function produces the covariance matrix. Now, there is an obvious way of doing it:
cov1 = jax.vmap(cov, in_axes=(None, 1))
cov2 = jax.vmap(cov1, in_axes=(1, None))
but that seems a little clunky. Is there a "canonical" way of doing this?
If you want to express logic equivalent to nested for loops with vmap, then yes it requires nested vmaps. I think what you've written is probably as canonical as you can get for an operation like this, although it might be slightly more clear if written using decorators:
from functools import partial
#partial(jax.vmap, in_axes=(1, None))
#partial(jax.vmap, in_axes=(None, 1))
def cov(x, y):
return jnp.dot((x-jnp.mean(x)), (y-jnp.mean(y)))
For this particular function, though, note that you can express the same thing using a single dot product if you wish:
result = jnp.dot((x - x.mean(0)).T, (y - y.mean(0)))

Creating a function in Python which runs over a range and returns a new value to an array each time

Basically, what I'm trying to create is a function which takes an array, in this case:
numpy.linspace(0, 0.2, 100)
and runs a lot of other code for each of the elements in the array and at the end creates a new array with one a number for each of the calculations for each element. A simple example would be that the function is doing a multiplication like this:
def func(x):
y = x * 10
return (y)
However, I want it to be able to take an array as an argument and return an array consisting of each y for each multiplication. The function above works for this, but the one I've tried creating for my code doesn't work with this method and only returns one value instead. Is there another way to make the function work as intended? Thanks for the help!
You could use this simple code:
def func(x):
y = []
for i in x:
y.append(i*10)
return y
Maybe take a look at np.vectorize:
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.vectorize.html
np.vectorize can for example be used as a decorator:
#np.vectorize
def func(value):
...
return return_value
The function to be vectorized (here func) has to be a function,
that takes a value as input and returns a value.
This function then gets vectorized over the whole array.
It is mentioned in the documentation, but it cant hurt to emphasize it here:
In general this function is only used for convenience not for performance,
it is basically equivalent to using a for-loop.
If you are able to build up your function from numpys ufuncs like (np.add, np.mean, etc.) this will likely be much faster.
Or you could write your own:
https://docs.scipy.org/doc/numpy-1.13.0/reference/ufuncs.html
You can do this with numpy already with your function. For example, the code below will do what you want:
x = numpy.linspace(0, 0.2, 100)
y = x*10
If you defined x as above and passed it to your function it would perform exactly as you want.

iterate over two numpy arrays return 1d array

I often have a function that returns a single value such as a maximum or integral. I then would like to iterate over another parameter. Here is a trivial example using a parabolic. I don't think its broadcasting since I only want the 1D array. In this case its maximums. A real world example is the maximum power point of a solar cell as a function of light intensity but the principle is the same as this example.
import numpy as np
x = np.linspace(-1,1) # sometimes this is read from file
parameters = np.array([1,12,3,5,6])
maximums = np.zeros_like(parameters)
for idx, parameter in enumerate(parameters):
y = -x**2 + parameter
maximums[idx] = np.max(y) # after I have the maximum I don't need the rest of the data.
print(maximums)
What is the best way to do this in Python/Numpy? I know one simplification is to make the function a def and then use np.vectorize but my understanding is it doesn't make the code any faster.
Extend one of those arrays to 2D and then let broadcasting do those outer additions in a vectorized way -
maximums = (-x**2 + parameters[:,None]).max(1).astype(parameters.dtype)
Alternatively, with the explicit use of the outer addition method -
np.add.outer(parameters, -x**2).max(1).astype(parameters.dtype)

How can you index python functions for use in a for loop?

I'd like to figure out how to code the following pseudo-code:
# Base-case
u_0(x) = x^3
for i in [0,5):
u_(i+1)(x) = u_(i)(x)^2
So that in the end I can call u_5(x), for example.
The difficulty I'm having with accomplishing the above is finding a way to index Python functions by i so that I can iteratively define each function.
I tried using recursion with two functions in place of indexing but I get "maximum recursion depth exceeded".
Here is a minimal working example:
import math
import sympy as sym
a,b = sym.symbols('x y')
def f1(x,y):
return sym.sin(x) + sym.cos(y)*sym.tan(x*y)
for i in range(0,5):
def f2(x,y):
return sym.diff(f1(x,y),x) + sym.cos(sym.diff(f1(x,y),y,y))
def f1(x,y):
return f2(x,y)
print(f2(a,b))
Yes, the general idea would be to "index" the results in order to avoid recalculating them. The simplest way to achieve that is to "memoize", meaning telling a function to remember the result for values it has already calculated.
If f(i+1) is based on f(i) where i is a natural number, that can be especially effective.
In Python3, doing it for a 1 variable function is surprisingly simple, with a decorator:
import functools
#functools.lru_cache(maxsize=None)
def f(x):
.....
return ....
To know more about this, you can consult
What is memoization and how can I use it in Python?. (If you are using Python 2.7, there is also a way to do it with a prepackaged decorator.)
Your specific case (if my understanding of your pseudo-code is correct) relies on a two variables function, where i is an integer variable and x is a symbol (i.e. not supposed to be resolved here). So you would need to memoize along i.
To avoid blowing the stack up when you brutally ask for the image of 5 (not sure why, but no doubt there is more recursion than meets the eye), then use a for loop to calculate your images on the range from 0 to 5 (in that order: 0, 1, 2...).
I hope this helps.
The answer is actually pretty simple:
Pseudocode:
u_0(x) = x^3
for i in [0,5):
u_(i+1)(x) = u_(i)(x)^2
Actual code:
import sympy as sym
u = [None]*6 #Creates an empty array of 6 entries, i.e., u[0], u[1], ..., u[5]
x=sym.symbols('x')
u[0] = lambda x: x**3
for i in range(0,5):
u[i+1] = lambda x, i=i: (u[i](x))**2 #the i=i in the argument of the lambda function is
#necessary in Python; for more about this, see this question.
#Now, the functions are stores in the array u. However, to call them (e.g., evaluate them,
#plot them, print them, etc) requires that we "lambdify" them, i.e., replace sympy
#functions with numpy functions, which the following for loop accomplishes:
for i in range(0,6):
ulambdified[i] = sym.lambdify(x,u[i](x),"numpy")
for i in range(0,6):
print(ulambdified[i](x))

Vectorized assignment of a 2-dimensional array

I work with Python 2.7, numpy and pandas.
I have :
a function y=f(x) where both x and y are scalars.
a one-dimensional array of scalars of length n : [x0, x1, ..., x(n-1)]
I need to construct a 2-dimensional array D[i,j]=f(xi)*f(xj) where i,j are indices in [0,...,n-1].
I could use loops and/or a comprehension list, but that would be slow. I would like to use a vectorized approach instead.
I thought that "numpy.indices" would help me (see Create a numpy matrix with elements a function of indices), but I admit I am at a loss on how to use that command for my purpose.
Thanks in advance!
Ignore the comments that dismiss vectorization; it's a good habit to have, and it does deliver performance with the right accelerators. Anyway, what I really wanted to say was that you want to find the outer product:
x_ = numpy.array(x)
y = f(x_)
numpy.outer(y, y)
If you're working with numbers you should be working with numpy data structures anyway. Then you get fast, readable code like this.
I would like to use a vectorized approach instead.
You sound like you might be a Matlab user -- you should be aware that numpy's vectorize function provides no performance benefit:
The vectorize function is provided primarily for convenience, not for
performance. The implementation is essentially a for loop.
Unless it just so happens that there's already an operation in numpy that does exactly what you want, you're going to be stuck with numpy.vectorize and nothing to really gain over a for loop. With that being said, you should be able to do that like so:
def makeArray():
a = [1, 2, 3, 4]
def addTo(arr):
return f(a[math.floor(arr/4)]) * f(a[arr % 4])
vecAdd = numpy.vectorize(addTo)
return vecAdd(numpy.arange(4 * 4).reshape(4, 4))
EDIT:
If f is actually a one-dimensional array, you can do this:
f_matrix = numpy.matrix(f)
D = f_matrix.T * f_matrix
You can use fromfunc to vectorize the function then use the dot product to multiply:
f2 = numpy.fromfunc(f, 1, 1) # vectorize the function
res1 = f2(x) # get the results for f(x)
res1 = res1[np.newaxis] # result has to be 2D for the next step
res2 = np.dot(a.T, a) # get f(xi)*f(xj)

Categories