Strange numpy divide behaviour for scalars - python

I have been trying to upgrade a library which has a bunch of geometric operations for scalars so they will work with numpy arrays as well. While doing this I noticed some strange behaviour with numpy divide.
In original code checks a normalised difference between to variables if neither variable is zero, swapping across to numpy this ended up looking something like:
import numpy as np
a = np.array([0, 1, 2, 3, 4])
b = np.array([1, 2, 3, 0, 4])
o = np.zeros(len(a))
o = np.divide(np.subtract(a, b), b, out=o, where=np.logical_and(a != 0, b != 0))
print(f'First implementation: {o}')
where I passed in a output buffer initialised to zero for instances which could not be calculated; this returns:
First implementation: [ 0. -0.5 -0.33333333 0. 0. ]
I had to slightly modify this for scalars as out required an array, but it seemed fine.
a = 0
b = 4
o = None if np.isscalar(a) else np.zeros(len(a))
o = np.divide(np.subtract(a, b), b, out=o, where=np.logical_and(b != 0, a != 0))
print(f'Modified out for scalar: {o}')
returns
Modified out for scalar: 0.0.
Then ran this through some test functions and found a lot of them failed. Digging into this, I found that the first time I call the divide with a scalar with where set to False the function returns zero, but if I called it again, the second time it returns something unpredictable.
a = 0
b = 4
print(f'First divide: {np.divide(b, a, where=False)}')
print(f'Second divide: {np.divide(b, a, where=False)}')
returns
First divide: 0.0
Second divide: 4.0
Looking at the documentation, it says "locations within it where the condition is False will remain uninitialized", so I guess numpy as some internal buffer which is initially set to zero then subsequently it ends up carrying over an earlier intermediate value.
I am struggling to see how I can use divide with or without a where clause; if I use where I get an unpredictable output and if I don't I can't protect against divide by zero. Am I missing something or do I just need to have a different code path in these cases? I realise I'm half way to a different code path already with the out variable.
I would be grateful for any advice.

It looks like a bug to me. But I think you'd want to short-circuit the calls to ufuncs in the case of scalars for performance reasons anyway, so its a question of trying to keep it from being too messy. Since either a or b could be scalars, you need to check them both. Put that check into a function that conditionally returns an output array or None, and you could do
def scalar_test_np_zeros(a, b):
"""Return np.zeros for the length of arguments unless both
arguments are scalar, then None."""
if a_is:=np.isscalar(a) and np.isscalar(b):
return None
else:
return np.zeros(len(a) if a_is else len(b))
a = 0
b = 4
if o := scalar_test_np_zeros(a, b) is None:
o = (a-b)/b if a and b else 0.0
else:
np.divide(np.subtract(a, b), b, out=o,
where=np.logical_and(b != 0, a != 0))
The scalar test would be useful in other code with similar problems.

For what it's worth, if I helps anyone I have come to the conclusion I need to wrap np.divide to use it safely in functions which can take arrays and scalars. This is my wrapping function:
import numpy as np
def divide_where(a, b, where, out=None, fill=0):
""" wraps numpy divide to safely handle where clause for both arrays and scalars
- a: dividend array or scalar
- b: divisor array or scalar
- where: locations where is True a/b will be set
- out: location where data is written to; if None, an output array will be created using fill value
- fill: defines fill value. if scalar and where True value will used; if out not set fill value is used creating output array
"""
if (a_is_scalar := np.isscalar(a)) and np.isscalar(b):
return fill if not where else a / b
if out is None:
out = np.full_like(b if a_is_scalar else a, fill)
return np.divide(a, b, out=out, where=where)

Related

Use for loop to check each elements in matrix in python

A = np.arange(2,42).reshape(5,8)
B = np.arange(4,68).reshape(8,8)
C=np.dot(A,B)
how to use for loop to check each element in C is larger than 100 or not, then the output is True or False.
I have no idea because it is a matrix not a number.
Is there someone help me please
Do you want to return True, if EVERY element of C is >100? or do you want to create a matrix, whose entries are True or False if the entries of C are >100 or <100?
In both cases I would recommend not using for-loops. For case1,you can try:
print(min(C.flatten())<100)
which will print False, if all elements of C are bigger than 100 and else True (Note that the .flatten command just rewrites the 2D Array into a 1D one temporarily. The shape of C stays in its original state.)
for case2, you can just type
print(C<100)
and it will print a matrix with entries True or False, based on whether C is > or <100 in that entry.
if you want to use for-loops: First note that the shape of C is (5,8), meaning that C is a 2D object. Now, in order to access all entries of C via for-loops, you can write something like this:
import numpy as np
A = np.arange(2,42).reshape(5,8)
B = np.arange(4,68).reshape(8,8)
C=np.dot(A,B)
D = np.zeros(C.shape,dtype = bool)
for i in range(C.shape[0]): #meaning in range 5
for j in range(C.shape[1]): #meaning in range 8
if(C[i,j]>100):
D[i,j] = False
else:
D[i,j] = True
print(D)
where I introduced a new matrix D, which is just a new matrix in the same shape of C, in which we consecutively fill up True or False, based on if C is > or <100 at that entry. This code is equivalent, but slower and more complicated than the one I proposed above.
I hope this answers your question sufficiently. If you have any more questions on detials etc., dont hestitate to ask ;).
You must use numpy filter method. This ways that say are very awful and slow. Numpy filtering methods are very optimal
import numpy as np
filter_arr = C > 100
newarr = arr[C]
print(newarr)

Is there a way to get every element of a list without using loops?

I found this task in a book of my prof:
def f(x):
return f = log(exp(z))
def problem(M: List)
return np.array([f(x) for x in M])
How do I implement a solution?
Numpy is all about performing operations on entire arrays. Your professor is expecting you to use that functionality.
Start by converting your list M into array z:
z = np.array(M)
Now you can do elementwise operations like exp and log:
e = np.exp(z)
f = 1 + e
g = np.log(f)
The functions np.exp and np.log are applied to each element of an array. If the input is not an array, it will be converted into one.
Operations like 1 + e work on an entire array as well, in this case using the magic of broadcasting. Since 1 is a scalar, it can unambiguously expanded to the same shape as e, and added as if by np.add.
Normally, the sequence of operations can be compactified into a single line, similarly to what you did in your initial attempt. You can reduce the number of operations slightly by using np.log1p:
def f(x):
return np.log1p(np.exp(x))
Notice that I did not convert x to an array first since np.exp will do that for you.
A fundamental problem with this naive approach is that np.exp will overflow for values that we would expect to get reasonable results. This can be solved using the technique in this answer:
def f(x):
return np.log1p(np.exp(-np.abs(x))) + np.maximum(x, 0)

Evaluating a function using numpy

What is the significance of the return part when evaluating functions? Why is this necessary?
Your assumption is right: dfdx[0] is indeed the first value in that array, so according to your code that would correspond to evaluating the derivative at x=-1.0.
To know the correct index where x is equal to 0, you will have to look for it in the x array.
One way to find this is the following, where we find the index of the value where |x-0| is minimal (so essentially where x=0 but float arithmetic requires taking some precautions) using argmin :
index0 = np.argmin(np.abs(x-0))
And we then get what we want, dfdx at the index where x is 0 :
print dfdx[index0]
An other but less robust way regarding float arithmetic trickery is to do the following:
# we make a boolean array that is True where x is zero and False everywhere else
bool_array = (x==0)
# Numpy alows to use a boolean array as a way to index an array
# Doing so will get you the all the values of dfdx where bool_array is True
# In our case that will hopefully give us dfdx where x=0
print dfdx[bool_array]
# same thing as oneliner
print dfdx[x==0]
You give the answer. x[0] is -1.0, and you want the value at the middle of the array.`np.linspace is the good function to build such series of values :
def f1(x):
g = np.sin(math.pi*np.exp(-x))
return g
n = 1001 # odd !
x=linspace(-1,1,n) #x[n//2] is 0
f1x=f1(x)
df1=np.diff(f1(x),1)
dx=np.diff(x)
df1dx = - math.pi*np.exp(-x)*np.cos(math.pi*np.exp(-x))[:-1] # to discard last element
# In [3]: np.allclose(df1/dx,df1dx,atol=dx[0])
# Out[3]: True
As an other tip, numpy arrays are more efficiently and readably used without loops.

How to code a function that accepts float, list or numpy.array?

I have the following simple Python function:
def get_lerp_factor( a, x, b ):
if x <= a: return 0.
if x >= b: return 1.
return (x - a) / (b - a)
Many numpy functions, like numpy.sin(x) can handle a float or an array.
So how can I extend this in the same manner, so that it can also handle a numpy array for x?
def get_lerp_factor( a, x_maybe_array, b ):
out = (x_maybe_array - a) / (b - a) # this should work...
# but now I have to clamp each element of out between 0 and 1
Would I have to specifically check the type of x, and branch accordingly?
How about:
def get_lerp_factor( a, x_anything, b ):
x = np.array( x_anything )
out = ...(x)
# now typecast out back into the same type as x... will this work?
?
You need numpy.asarray. This takes as its first argument:
Input data, in any form that can be converted to an array. This includes lists, lists of tuples, tuples, tuples of tuples, tuples of lists and ndarrays.
and it returns:
Array interpretation of a. No copy is performed if the input is already an ndarray.
So you can implement your function like this:
import numpy as np
def get_lerp_factor(a, x, b):
a, x, b = np.asarray(a), np.asarray(x), np.asarray(b)
return ((x - a) / (b - a)).clip(0, 1)
This works for scalars:
>>> get_lerp_factor(0, 9, 16)
0.5625
and also for iterables:
>>> get_lerp_factor(2, range(8), 6)
array([ 0. , 0. , 0. , 0.25, 0.5 , 0.75, 1. , 1. ])
Actually, as long as the numpy array has the semantics you want for the operators you're using (-, /, <=, >=), then it already works. This is called "duck typing", where you don't really care what type your arguments have, just that they behave in a specific way.
Of course, a list is not going to behave like that. And the numpy array might not behave entirely like that either (the <= operator works, but results in an array of booleans, the => operator is not defined). So, you will have to check the type at runtime. One way to do this is to check if the __len__ method is supported:
try:
if len(a) == len(b) == len(x):
# handle list / array case
except TypeError:
# oops, was a float
Note that it is generally a very bad idea to check the type explicitly (with isinstance(o, t)) since you want to maintain as much duck type semantics as possible. But sometimes you need to do just that.
Note also, that this is only really acceptable when the "semantics" of the function remain the same, regardless of the input type. If you are changing the meening of the function based on input types, then you're going to have a bad time! Or worse: Your users (consumers of your function) are going to have a bad time.

python printing a generator list after vectorization?

I am new with vectorization and generators. So far I have created the following function:
import numpy as np
def ismember(a,b):
for i in a:
if len(np.where(b==i)[0]) == 0:
lv_var = 0
else:
lv_var = np.int(np.where(b==i)[0])
yield lv_var
vect = np.vectorize(ismember)
A = np.array(xrange(700000))
B = np.array(xrange(700000))
lv_result = vect(A,B)
When I try to cast lv_result as a list or loop through the resulting numpy array I get a list of generator objects. I need to somehow get the actual result. How do I print the actual results from this function ? .next() on generator doesn't seem to do the job.
Could someone tell me what is that I am doing wrong or how could I reconfigure the code to achieve the end goal?
---------------------------------------------------
OK so I understand the vectorize part now (thanks Viet Nguyen for the example).
I was also able to print the generator object results. The code has been modified. Please see below.
For the generator part:
What I am trying to do is to mimic a MATLAB function called ismember (The one that is formatted as: [Lia,Locb] = ismember(A,B). I am just trying to get the Locb part only.
From Matlab: Locb, contain the lowest index in B for each value in A that is a member of B. The output array, Locb, contains 0 wherever A is not a member of B
One of the main problems is that I need to be able to perform this operation as efficient as possible. For testing I have two arrays of 700k elements. Creating a generator and going through the values of the generator doesn't seem to get any better performance wise.
To print the Generator, I have created function f() below.
import numpy as np
def ismember(a,b):
for i in a:
index = np.where(b==i)[0]
if len(index) == 0:
yield 0
else:
yield index
def f(A, gen_obj):
my_array = np.arange(len(A))
for i in my_array:
my_array[i] = gen_obj.next()
return my_array
A = np.arange(700000)
B = np.arange(700000)
gen_obj = ismember(A,B)
f(A, gen_obj)
print 'done'
Note: if we were to try the above code with smaller arrays:
Lets say.
A = np.array([3,4,4,3,6])
B = np.array([2,5,2,6,3])
The result will be an array of : [4 0 0 4 3]
Just like matlabs function: the goal is to get the lowest index in B for each value in A that is a member of B. The output array, Locb, contains 0 wherever A is not a member of B.
Numpy's intersection function doesn't help me to achieve the goal. Also the size of the returning array needs to be kept the same size as the size of array A.
So far this process is taking forever(for arrays of 700k elements). Unfortunately I haven't been able to find the best solution yet. Any inputs on how could I reconfigure the code to achieve the end goal, with the best performance, will be much appreciated.
Optimization Problem solved in:
python-run-generator-using-multiple-cores-for-optimization
I believe you've misunderstood the inputs to a numpy.vectorize function. The "vectorized" function operates on the arrays on a per-element basis (see numpy.vectorize reference). Your function ismember seems to presume that the inputs a and b are arrays. Instead, think of the function as something you would use with built-in map().
> import numpy as np
> def mask(a, b):
> return 1 if a == b else 0
> a = np.array([1, 2, 3, 4])
> b = np.array([1, 3, 4, 5])
> maskv = np.vectorize(mask)
> maskv(a, b)
array([1, 0, 0, 0])
Also, if I'm understanding your intention correctly, NumPy comes with an intersection function.

Categories