comparing numpy arrays element-wise setting an element-wise result - python

possibly this has been asked before, but I have a hard time finding a corresponding solution, since I can't find the right keywords to search for it.
One huge advantage of numpy arrays that I like is that they are transparent for many operations.
However in my code I have a function that has a conditional statement in the form (minimal working example):
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([1, 1, 3])
def func(x, y):
if x > y:
z = 1
else:
z = 2
return z
func(arr1, arr2) obviously results in an error message:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I do understand what the problem here is and why it can't work like this.
What I would like to do here is that x > y is evaluated for each element and then an array z is returned with the corresponding result for each comparison. (Needs to ensure of course that the arrays are of equal length, but that's not a problem here)
I know that I could do this by changing func such that it iterates over the elements, but since I'm trying to improve my numpy skills: is there a way to do this without explicit iteration?

arr1 > arr2 does exactly what you'd hope it does: compare each element of the two arrays and build an array with the result. The result can be used to index in any of the two arrays, should you need to. The equivalent of your function, however, can be done with np.where:
res = np.where(arr1 > arr2, 1, 2)
or equivalently (but slightly less efficiently), using the boolean array directly:
res = np.ones(arr1.shape)
res[arr1 <= arr2] = 2 # note that I have to invert the condition to select the cells where you want the 2

Related

2 different specified elements from 2 numpy arrays

I have two numpy arrays with 0s and 1s in them. How can I find the indexes with 1 in the first array and 0 in the second?
I tried np.logical_and
But got error message (builtin_function_or_method' object is not subscriptable)
Use np.where(arr1==1) and np.where(arr2==0)
import numpy as np
array1 = np.array([0,0,0,1,1,0,1])
array2 = np.array([0,1,0,0,1,0,1])
ones = np.where(array1 == 1)
zeroes = np.where(array2 == 0)
print("array 1 has 1 at",ones)
print("array 2 has 0 at",zeroes)
returns:
array 1 has 1 at (array([3, 4, 6]),)
array 2 has 0 at (array([0, 2, 3, 5]),)
I'm not sure if theres some built-in numpy function that will do this for you, since it's a fairly specific problem. EDIT: there is, see bottom
Nonetheless, if one were to exist, it would have to be a linear time algorithm if you're passing in a bare numpy array, so writing your own isn't difficult.
If I have any numpy array (or python array) myarray, and I want a collection of indices where some object myobject appears, we can do this in one line using a list comprehension:
indices = [i for i in range(len(myarray)) if myarray[i] == myobject]
So what's going on here?
A list comprehension works in the following format:
[<output> for <input> in <iterable> if <condition>]
In our case, <input> and <output> are the indices of myarray, and the <condition> block checks if the value at the current index is equal to that of our desired value.
Edit: as White_Sirilo helpfully pointed out, numpy.where does the same thing, I stand corrected
Let's say your arrays are called j and k. The following code returns all indices where j[index] = 1 and k[index] = 0 if both arrays are 1-dimensional. It also works if j and k are different sizes.
idx_1 = np.where(j == 1)[0]
idx_2 = np.where(k == 0)[0]
final_indices = np.intersect1d(idx_1, idx_2, return_indices=False)
If your array is 2-dimensional, you can use the above code in a function and then go row-by-row. There are almost definitely better ways to do this, but this works in a pinch.
tow numpy array given in problem.
array1 and array2
just use
one_index=np.where(array1==1)
and
zero_index=np.where(array2==0)

Python: general sum over numpy rows

I want to sum all the lines of one matrix hence, if I have a n x 2 matrix, the result should be a 1 x 2 vector with all rows summed. I can do something like that with np.sum( arg, axis=1 ) but I get an error if I supply a vector as argument. Is there any more general sum function which doesn't throw an error when a vector is supplied? Note: This was never a problem in MATLAB.
Background: I wrote a function which calculates some stuff and sums over all rows of the matrix. Depending on the number of inputs, the matrix has a different number of rows and the number of rows is >= 1
According to numpy.sum documentation, you cannot specify axis=1 for vectors as you would get a numpy AxisError saying axis 1 is out of bounds for array of dimension 1.
A possible workaround could be, for example, writing a dedicated function that checks the size before performing the sum. Please find below a possible implementation:
import numpy as np
M = np.array([[1, 4],
[2, 3]])
v = np.array([1, 4])
def sum_over_columns(input_arr):
if len(input_arr.shape) > 1:
return input_arr.sum(axis=1)
return input_arr.sum()
print(sum_over_columns(M))
print(sum_over_columns(v))
In a more pythonic way (not necessarily more readable):
def oneliner_sum(input_arr):
return input_arr.sum(axis=(1 if len(input_arr.shape) > 1 else None))
You can do
np.sum(np.atleast_2d(x), axis=1)
This will first convert vectors to singleton-dimensional 2D matrices if necessary.

If I have a function of two variables, how do I efficiently create a two dimensional array out of the functional values in python? [duplicate]

If I have a random function like func(x,y) = cos(x) + sen(y) + x*y how can I apply it to all the pairs of elements in 2 arrays?
I found https://docs.scipy.org/doc/numpy/reference/generated/numpy.outer.html and
discovered that there are outer functions for all the basic operations. But what if I want to do it with a custom function?
Imagine array1 is [1,2], array2 is [3,4] and the function I wanted to apply is called f(float, float)
The expected output would be
[f(1,3) f(1,4)
f(2,3) f(2,4)]
As long as you make sure to write your function in such a way that it broadcasts properly, you can do
func(x_arr[:, None], y_arr)
to apply it to all pairs of elements in two 1-dimensional arrays x_arr and y_arr.
For example, to write your example function in a way that broadcasts, you'd write it as
def func(x, y):
return np.cos(x) + np.sin(y) + x*y
since np.cos, np.sin, +, and * broadcast and vectorize across arrays.
As for if it doesn't broadcast? Well, some might suggest np.vectorize, but that has a lot of tricky things you have to keep in mind, like maintaining a consistent output dtype and not having side effects. If your function doesn't broadcast, I'd recommend just using list comprehensions:
np.array([[func(xval, yval) for yval in y_arr] for xval in x_arr])
One way to solve this is broadcasting:
import numpy as np
def func(x, y):
x, y = np.asarray(x)[:, None], np.asarray(y)
return np.cos(x) + np.sin(y) + x*y
The [:, None] adds another dimension to an array and therefor triggers NumPys broadcasting.
>>> func([1,2], [3,4])
array([[ 3.68142231, 3.78349981],
[ 5.72497317, 6.82705067]])
You could do this using a nested loop. Using a single element in one array, iterate through all the elements in the other array, then proceed to the next element in the first array and repeat.
This would be simple if you just want to print the result, if you want to store the results into another array, this would still work, it just uses a lot of resources.

Evaluating a function using numpy

What is the significance of the return part when evaluating functions? Why is this necessary?
Your assumption is right: dfdx[0] is indeed the first value in that array, so according to your code that would correspond to evaluating the derivative at x=-1.0.
To know the correct index where x is equal to 0, you will have to look for it in the x array.
One way to find this is the following, where we find the index of the value where |x-0| is minimal (so essentially where x=0 but float arithmetic requires taking some precautions) using argmin :
index0 = np.argmin(np.abs(x-0))
And we then get what we want, dfdx at the index where x is 0 :
print dfdx[index0]
An other but less robust way regarding float arithmetic trickery is to do the following:
# we make a boolean array that is True where x is zero and False everywhere else
bool_array = (x==0)
# Numpy alows to use a boolean array as a way to index an array
# Doing so will get you the all the values of dfdx where bool_array is True
# In our case that will hopefully give us dfdx where x=0
print dfdx[bool_array]
# same thing as oneliner
print dfdx[x==0]
You give the answer. x[0] is -1.0, and you want the value at the middle of the array.`np.linspace is the good function to build such series of values :
def f1(x):
g = np.sin(math.pi*np.exp(-x))
return g
n = 1001 # odd !
x=linspace(-1,1,n) #x[n//2] is 0
f1x=f1(x)
df1=np.diff(f1(x),1)
dx=np.diff(x)
df1dx = - math.pi*np.exp(-x)*np.cos(math.pi*np.exp(-x))[:-1] # to discard last element
# In [3]: np.allclose(df1/dx,df1dx,atol=dx[0])
# Out[3]: True
As an other tip, numpy arrays are more efficiently and readably used without loops.

python printing a generator list after vectorization?

I am new with vectorization and generators. So far I have created the following function:
import numpy as np
def ismember(a,b):
for i in a:
if len(np.where(b==i)[0]) == 0:
lv_var = 0
else:
lv_var = np.int(np.where(b==i)[0])
yield lv_var
vect = np.vectorize(ismember)
A = np.array(xrange(700000))
B = np.array(xrange(700000))
lv_result = vect(A,B)
When I try to cast lv_result as a list or loop through the resulting numpy array I get a list of generator objects. I need to somehow get the actual result. How do I print the actual results from this function ? .next() on generator doesn't seem to do the job.
Could someone tell me what is that I am doing wrong or how could I reconfigure the code to achieve the end goal?
---------------------------------------------------
OK so I understand the vectorize part now (thanks Viet Nguyen for the example).
I was also able to print the generator object results. The code has been modified. Please see below.
For the generator part:
What I am trying to do is to mimic a MATLAB function called ismember (The one that is formatted as: [Lia,Locb] = ismember(A,B). I am just trying to get the Locb part only.
From Matlab: Locb, contain the lowest index in B for each value in A that is a member of B. The output array, Locb, contains 0 wherever A is not a member of B
One of the main problems is that I need to be able to perform this operation as efficient as possible. For testing I have two arrays of 700k elements. Creating a generator and going through the values of the generator doesn't seem to get any better performance wise.
To print the Generator, I have created function f() below.
import numpy as np
def ismember(a,b):
for i in a:
index = np.where(b==i)[0]
if len(index) == 0:
yield 0
else:
yield index
def f(A, gen_obj):
my_array = np.arange(len(A))
for i in my_array:
my_array[i] = gen_obj.next()
return my_array
A = np.arange(700000)
B = np.arange(700000)
gen_obj = ismember(A,B)
f(A, gen_obj)
print 'done'
Note: if we were to try the above code with smaller arrays:
Lets say.
A = np.array([3,4,4,3,6])
B = np.array([2,5,2,6,3])
The result will be an array of : [4 0 0 4 3]
Just like matlabs function: the goal is to get the lowest index in B for each value in A that is a member of B. The output array, Locb, contains 0 wherever A is not a member of B.
Numpy's intersection function doesn't help me to achieve the goal. Also the size of the returning array needs to be kept the same size as the size of array A.
So far this process is taking forever(for arrays of 700k elements). Unfortunately I haven't been able to find the best solution yet. Any inputs on how could I reconfigure the code to achieve the end goal, with the best performance, will be much appreciated.
Optimization Problem solved in:
python-run-generator-using-multiple-cores-for-optimization
I believe you've misunderstood the inputs to a numpy.vectorize function. The "vectorized" function operates on the arrays on a per-element basis (see numpy.vectorize reference). Your function ismember seems to presume that the inputs a and b are arrays. Instead, think of the function as something you would use with built-in map().
> import numpy as np
> def mask(a, b):
> return 1 if a == b else 0
> a = np.array([1, 2, 3, 4])
> b = np.array([1, 3, 4, 5])
> maskv = np.vectorize(mask)
> maskv(a, b)
array([1, 0, 0, 0])
Also, if I'm understanding your intention correctly, NumPy comes with an intersection function.

Categories