Making a vectorized numpy function behave like a ufunc - python

Let's suppose that we have a Python function that takes in Numpy arrays and returns another array:
import numpy as np
def f(x, y, method='p'):
"""Parameters: x (np.ndarray) , y (np.ndarray), method (str)
Returns: np.ndarray"""
z = x.copy()
if method == 'p':
mask = x < 0
else:
mask = x > 0
z[mask] = 0
return z*y
although the actual implementation does not matter. We can assume that x and y will always be arrays of the same shape, and that the output is of the same shape as x.
The question is what would be the simplest/most elegant way of wrapping such function so it would work with both ND arrays (N>1) and scalar arguments, in a manner somewhat similar to universal functions in Numpy.
For instance, the expected output for the above function should be,
In [1]: f_ufunc(np.arange(-1,2), np.ones(3), method='p')
Out[1]: array([ 0., 0., 1.]) # random array input -> output of the same shape
In [2]: f_ufunc(np.array([1]), np.array([1]), method='p')
Out[2]: array([1]) # array input of len 1 -> output of len 1
In [3]: f_ufunc(1, 1, method='p')
Out[3]: 1 # scalar input -> scalar output
The function f cannot be changed, and it will fail if given a scalar argument for x or y.
When x and y are scalars, we transform them to 1D arrays, do the calculation then transform them back to scalars at the end.
f is optimized to work with arrays, scalar input being mostly a convenience. So writing a function that work with scalars and then using np.vectorize or np.frompyfunc would not be acceptable.
A beginning of an implementation could be,
def atleast_1d_inverse(res, x):
# this function fails in some cases (see point 1 below).
if res.shape[0] == 1:
return res[0]
else:
return res
def ufunc_wrapper(func, args=[]):
""" func: the wrapped function
args: arguments of func to which we apply np.atleast_1d """
# this needs to be generated dynamically depending on the definition of func
def wrapper(x, y, method='p'):
# we apply np.atleast_1d to the variables given in args
x = np.atleast_1d(x)
y = np.atleast_1d(x)
res = func(x, y, method='p')
return atleast_1d_inverse(res, x)
return wrapper
f_ufunc = ufunc_wrapper(f, args=['x', 'y'])
which mostly works, but will fail the tests 2 above, producing a scalar output instead of a vector one. If we want to fix that, we would need to add more tests on the type of the input (e.g. isinstance(x, np.ndarray), x.ndim>0, etc), but I'm afraid to forget some corner cases there. Furthermore, the above implementation is not generic enough to wrap a function with a different number of arguments (see point 2 below).
This seems to be a rather common problem, when working with Cython / f2py function, and I was wondering if there was a generic solution for this somewhere?
Edit: a bit more precisions following #hpaulj's comments. Essentially, I'm looking for
a function that would be the inverse of np.atleast_1d, such as
atleast_1d_inverse( np.atleast_1d(x), x) == x, where the second argument is only used to determine the type or the number of dimensions of the original object x. Returning numpy scalars (i.e. arrays with ndim = 0) instead of a python scalar is ok.
A way to inspect the function f and generate a wrapper that is consistent with its definition. For instance, such wrapper could be used as,
f_ufunc = ufunc_wrapper(f, args=['x', 'y'])
and then if we have a different function def f2(x, option=2): return x**2, we could also use
f2_ufunc = ufunc_wrapper(f2, args=['x']).
Note: the analogy with ufuncs might be a bit limited, as this corresponds to the opposite problem. Instead of having a scalar function that we transform to accept both vector and scalar input, I have a function designed to work with vectors (that can be seen as something that was previously vectorized), that I would like to accept scalars again, without changing the original function.

This doesn't fully answer the question of making a vectorized function truly behave like a ufunc, but I did recently run into a slight annoyance with numpy.vectorize that sounds similar to your issue. That wrapper insists on returning an array (with ndim=0 and shape=()) even if passed scalar inputs.
But it appears that the following does the right thing. In this case I am vectorizing a simple function to return a floating point value to a certain number of significant digits.
def signif(x, digits):
return round(x, digits - int(np.floor(np.log10(abs(x)))) - 1)
def vectorize(f):
vf = np.vectorize(f)
def newfunc(*args, **kwargs):
return vf(*args, **kwargs)[()]
return newfunc
vsignif = vectorize(signif)
This gives
>>> vsignif(0.123123, 2)
0.12
>>> vsignif([[0.123123, 123.2]], 2)
array([[ 0.12, 120. ]])
>>> vsignif([[0.123123, 123.2]], [2, 1])
array([[ 0.12, 100. ]])

Related

Why is numpy.vectorize() changing the division output of a scalar function?

I'm obtaining a strange result when I vectorise a function with numpy.
import numpy as np
def scalar_function(x, y):
""" A function that returns x*y if x<y and x/y otherwise
"""
if x < y :
out = x * y
else:
out = x/y
return out
def vector_function(x, y):
"""
Make it possible to accept vectors as input
"""
v_scalar_function = np.vectorize(scalar_function)
return v_scalar_function(x, y)
we do have
scalar_function(4,3)
# 1.3333333333333333
Why is the vectorized version giving this strange output?
vector_function(np.array([3,4]), np.array([4,3]))
[12 1]
While this call to the vectorized version works fine:
vector_function(np.array([4,4]), np.array([4,3]))
[1. 1.33333333]
Reading numpy.divide:
Notes
The floor division operator // was added in Python 2.2 making // and / equivalent operators. The default floor division operation of / can be replaced by true division with from __future__ import division.
In Python 3.0, // is the floor division operator and / the true division operator. The true_divide(x1, x2) function is equivalent to true division in Python.
Makes me think this might be a remaining issue related to python2?
But I'm using python 3!
The docs for numpy.vectorize state:
The output type is determined by evaluating the first element of the
input, unless it is specified
Since you did not specify a return data type, and the first example is integer multiplication, the first array is also of integer type and rounds the values. Conversely, when the first operation is division, the datatype is automatically upcasted to float. You can fix your code by specifying a dtype in vector_function (which doesn't necessarily have to be as big as 64-bit for this problem):
def vector_function(x, y):
"""
Make it possible to accept vectors as input
"""
v_scalar_function = np.vectorize(scalar_function, otypes=[np.float64])
return v_scalar_function(x, y)
Separately, you should also make note from that very same documentation that numpy.vectorize is a convenience function and basically just wraps a Python for loop so is not vectorized in the sense that it provides any real performance gains.
For a binary choice like this, a better overall approach would be:
def vectorized_scalar_function(arr_1, arr_2):
return np.where(arr_1 < arr_2, arr_1 * arr_2, arr_1 / arr_2)
print(vectorized_scalar_function(np.array([4,4]), np.array([4,3])))
print(vectorized_scalar_function(np.array([3,4]), np.array([4,3])))
The above should be orders of magnitude faster and (possibly coincidentally rather than a hard-and-fast rule to rely on) doesn't suffer the type casting issue for the result.
Checking which statemets are triggered:
import numpy as np
def scalar_function(x, y):
""" A function that returns x*y if x<y and x/y otherwise
"""
if x < y :
print('if x: ',x)
print('if y: ',y)
out = x * y
print('if out', out)
else:
print('else x: ',x)
print('else y: ',y)
out = x/y
print('else out', out)
return out
def vector_function(x, y):
"""
Make it possible to accept vectors as input
"""
v_scalar_function = np.vectorize(scalar_function)
return v_scalar_function(x, y)
vector_function(np.array([3,4]), np.array([4,3]))
if x: 3
if y: 4
if out 12
if x: 3
if y: 4
if out 12
else x: 4
else y: 3
else out 1.3333333333333333 # <-- seems that the value is calculated correctly, but the wrong dtype is returned
So, you can rewrite the scalar function:
def scalar_function(x, y):
""" A function that returns x*y if x<y and x/y otherwise
"""
if x < y :
out = x * y
else:
out = x/y
return float(out)
vector_function(np.array([3,4]), np.array([4,3]))
array([12. , 1.33333333])

passing in initial/boundary conditions for a function in scipy.optimize.root as the args argument

I am trying to solve a non linear system. Here is the code for a toy problem.
import collections
import numpy as np
import scipy
def flat(x):
''' flattens a shallow list
ex: [[1,2,3],[4,5],[6]] ----> flattens to [1,2,3,4,5]
numpy flatten does not work on lists.
'''
if isinstance(x, collections.Iterable):
return [a for i in x for a in flat(i)]
else:
return [x]
def func(X):
'''setups the matrix dynamic equation and the set of constraints
'''
A = [[0,1,0,1],[2,1,0,4],[1,4,1,3],[3, 2, 1,0]]
A1 = [[1,0,1,-1], [0,-1,2,1],[1,2,0,1],[1,2,0,-2]]
x = X[:-1]
alpha = X[-1]
x0 = [1,2,3,4]
y = x - x0
# x[0] = 0.5
# x[3] = 0.3
dyneqn = np.dot(A,y) + alpha * np.dot(A1,x)
cons = (1/2.0)*np.dot(x.T,np.dot(A1,x)) + np.dot([-1,1,2,-3], x) + 0.5
return flat([dyneqn, cons])
sol = scipy.optimize.root(func,[1,-1,2,0,-1])
sol.x
Problem Statement
The argument X of the objective function f has five unknowns that we are solving for. I want to set the first parameter, i.e., X[0]=0.5and the fourth parameter i.e., X[3] = 0.3 and solve for the remaining 3 unknowns. Let us assume for simplicity that such a solution exists and my initial guess is somehow a good one.
Attempt:
I know I should probably pass these arguments to the args=() argument in scipy.optimize.root. I tried setting
args = (X[0]=0.5, X[3]=0.3)
init_guess = [0.5,-1,2,0.3,-1]
scipy.optimize.root(func,init_guess, args=args)
This is obviously wrong.
Question? How can I fix this?.
Note: I added the flat function so that the code is self contained. It has nothing to do with this question.
Typically with scipy functions like root, minimize, etc
root(func, x0, args=(a, b, c, ...))
requires a func that accepts:
func(x0, a, b, c, ...)
# do something those arguments
return value
x0 is the value that root varies, a,b,c are args value that are passed unchanged to your function. Depending of the problem x0 may be an array. The nature of the args is entirely up to you.
From your example I reconstruct that you want to solve for the second and third component of some vector x as well as the parameter alpha. With the args keyword of scipy.optmize.root that would look something like
def func(x_solve, x0, x3):
#x_solve.size should be 3
x = np.empty(4)
x[0], x[3] = x0, x3
x[1:3] = x_solve[:2]
alpha = x_solve[2]
...
scipy.optimize.root(func, [-1,2,-1], args=(.5, .3))
As Azat and kazemakase pointed out, I'm also not sure if you actually want to use root, but the usage of scipy.optimize.minimize is pretty much the same.
Edit: It should be possible to have a flexible set of fixed variables by using a dictionary as an additional argument which specifies those:
def func(x_solve, fixed):
x = x_solve[:-1] # last value is alpha
for idx in fixed.keys(): # overwrite fixed entries
x[idx] = fixed[idx]
alpha = x_solve[-1]
# fixed variables, key is the index
fixed_vars = {0:.5, 3:.3}
# find roots
scipy.optimize.root(func,
[.5, -1, 2, .3, -1],
args=(fixed_vars,))
That way, when the optimizer in root numerically evaluates the Jacobian it obtains zero for the fixed variables and should therefore leave those invariant. However, that might lead to complications in the convergence of the algorithm.

Vectorized Partial Derivative of Multivariate Function in Python

There was a phenomenal answer posted by alko for computing a partial derivative of a multivariate function numerically in this thread.
I have a follow-up question now about enhancing this function to accept an array of input values. I have some code where I'm looping through a big long list of n-dimensional points, calculating the partial derivatives with respect to each variable, and this becomes quite computationally expensive.
It's easy enough to vectorize the function in question with np.vectorize, but it causes issues with the partial_derivative wrapper:
from scipy.misc import derivative
import numpy as np
def foo(x, y):
return(x**2 + y**3)
def partial_derivative(func, var=0, point=[]):
args = point[:]
def wraps(x):
args[var] = x
return func(*args)
return derivative(wraps, point[var], dx=1e-6)
vfoo = np.vectorize(foo)
>>>foo(3,1)
>>>10
>>>vfoo([3,3], [1,1])
>>>array([10,10])
>>>partial_derivative(foo,0,[3,1])
>>>6.0
>>>partial_derivative(vfoo,0,[[3,3], [1,1]])
>>>TypeError: can only concatenate list (not "float") to list
The last line should ideally return [6.0, 6.0]. In this case the two arrays supplied to the vectorized function vfoo are essentially zipped up pairwise, so ([3,3], [1,1]) gets transformed into two points, [3,1] and [3,1]. This seems to get mangled when it gets passed to the function wraps. The point that it ends up passing to the function derivative is [3,3]. In addition, there's obviously the TypeError thrown up.
Does anyone have any recommendations or suggestions? Has anyone ever had a need to do something similar?
Edit
Sometimes I think posting on SO is just what it takes to break a mental block. I think I've got it working for anyone who might be interested:
vfoo = np.vectorize(foo)
foo(3,1)
X = np.array([3,3])
Y = np.array([1,1])
vfoo(X, Y)
partial_derivative(foo,0,[3,1])
partial_derivative(vfoo,0,[X, Y])
And the last line now returns array([ 6., 6.])
I have a small problem with args[var] = x : this might forever change args[var] , and all values have been passed by reference however small your change is. So you might not get the exact answer you are looking for. Here is an example:
In[67]: a = np.arange(9).reshape(3,3)
In[68]: b = a[:]
In[69]: b[0,0]=42
In[70]: a
Out[70]:
array([[42, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]])
you need to fix it by e.g.:
def wraps(x):
tmp = args[var]
args[var] = x
ret= func(*args)
args[var] = tmp
return ret
Also, you can use numdifftools. They seem to know what they are doing. This will do all the partial derivatives:
import numpy as np
import numdifftools as nd
def partial_function(f___,input,pos,value):
tmp = input[pos]
input[pos] = value
ret = f___(*input)
input[pos] = tmp
return ret
def partial_derivative(f,input):
ret = np.empty(len(input))
for i in range(len(input)):
fg = lambda x:partial_function(f,input,i,x)
ret[i] = nd.Derivative(fg)(input[i])
return ret
if __name__ == "__main__":
f = lambda x,y: x*x*x+y*y
input = np.array([1.0,1.0])
print ('partial_derivative of f() at: '+str(input))
print (partial_derivative(f,input))
Finally: if you want your function to take an array of the parameters, e.g.:
f = lambda x: x[0]*x[0]*x[0]+x[1]*x[1]
then replace the respective line with (removed the '*')
ret = f___(input)

How to index 0-d array in Python?

This may be a well-known question stored in some FAQ but i can't google the solution. I'm trying to write a scalar function of scalar argument but allowing for ndarray argument. The function should check its argument for domain correctness because domain violation may cause an exception. This example demonstrates what I tried to do:
import numpy as np
def f(x):
x = np.asarray(x)
y = np.zeros_like(x)
y[x>0.0] = 1.0/x
return y
print f(1.0)
On assigning y[x>0.0]=... python says 0-d arrays can't be indexed.
What's the right way to solve this execution?
This will work fine in NumPy >= 1.9 (not released as of writing this). On previous versions you can work around by an extra np.asarray call:
x[np.asarray(x > 0)] = 0
Could you call f([1.0]) instead?
Otherwise you can do:
x = np.asarray(x)
if x.ndim == 0:
x = x[..., None]

How to write a function to process arrays as input element by element and returns array

I am trying to write a function and I want it to return one element when the input is element and an array of outputs if the input is array such that each element of output array is associated with the same place in input array. I am giving a dummy example:
import numpy as np
def f(a):
if a<5:
print a;
f(np.arange(11))
This code returns the error:
if a<5:
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()
I expect the output to be:
0
1
2
3
4
How can I make it to work the way I explained as I believe many python functions are working in this way?
Thanks.
When I have had to deal with this kind of thing, I usually start by doing an np.asarray of the input at the beginning, setting a flag if it is 0-dimensional (i.e. a scalar), promoting it to 1-D, running the function on the array, and converting it back to a scalar before returning if the flag was set. With you example:
def f(a):
a = np.asarray(a)
is_scalar = False if a.ndim > 0 else True
a.shape = (1,)*(1-a.ndim) + a.shape
less_than_5 = a[a < 5]
return (less_than_5 if not is_scalar else
(less_than_5[0] if less_than_5 else None))
>>> f(4)
4
>>> f(5)
>>> f([3,4,5,6])
array([3, 4])
>>> f([5,6,7])
array([], dtype=int32)
If you do this very often, you could add all that handling in a function decorator.
if you want the function to react depending upon whether the given input is a list or just an int, use:
def f(a):
if type(a)==type([]):
#do stuff
elif type(a)==type(5):
#do stuff
else
print "Enter an int or list"
by the above step, the function checks if the given input is an array, if the condition is true, the first block is used. next if block checks if the input is an int. Else the else block is executed
import numpy as np
def f(a):
result = a[a<5]
return result
def report(arr):
for elt in arr:
print(elt)
report(f(np.arange(11)))
Generally speaking I dislike putting print statements in functions (unless the function does nothing but print.) If you keep the I/O separate from the computation, then your functions will be more re-usable.
It is also generally a bad idea to write a function that returns different types of output, such as a scalar for some input and an array for other input. If you do that, then subsequent code that uses this function will all have to check if the output is a scalar or an array. Or, the code will have to be written very carefully to control what kind of input is sent to the function. The code can be come very complicated or very buggy if you do this.
Write simple functions -- ones that either always return an array, or always return a scalar.
You can use isinstance to check the type of an argument, and then have your function take the correct action;
In [15]: a = np.arange(11)
In [16]: isinstance(a, np.ndarray)
Out[16]: True
In [17]: b = 12.7
In [18]: isinstance(b, float)
Out[18]: True
In [19]: c = 3
In [20]: isinstance(c, int)
Out[20]: True
In [21]: d = '43.1'
In [23]: isinstance(d, str)
Out[23]: True
In [24]: float(d)
Out[24]: 43.1
In [25]: float('a3')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-25-caad719e0e75> in <module>()
----> 1 float('a3')
ValueError: could not convert string to float: a3
This way you can create a function that does the right thing wether it is given a str, a float, an int, a list or an numpy.ndarray.

Categories