Numpy apply along axis based on row index - python

Trying to apply numpy inbuilt function apply_along_axis based on row index position
import numpy as np
sa = np.array(np.arange(4))
sa_changed = (np.repeat(sa.reshape(1,len(sa)),repeats=2,axis=0))
print (sa_changed)
OP:
[[0 1 2 3]
[0 1 2 3]]
The function:
np.apply_along_axis(lambda x: x+10,0,sa_changed)
Op:
array([[10, 11, 12, 13],
[10, 11, 12, 13]])
But is there a way to use this function based on row index position for example, if its a even row index then add 10 and if its a odd row index then add 50
Sample:
def func(x):
if x.index//2==0:
x = x+10
else:
x = x+50
return x

When iterating on array, directly or with apply_along_axis, the subarray does not have a .index attribute. So we have to pass an explicit index value to your function:
In [248]: def func(i,x):
...: if i//2==0:
...: x = x+10
...: else:
...: x = x+50
...: return x
...:
In [249]: arr = np.arange(10).reshape(5,2)
apply doesn't have a way to add this index, so instead we have to use an explicit iteration.
In [250]: np.array([func(i,v) for i,v in enumerate(arr)])
Out[250]:
array([[10, 11],
[12, 13],
[54, 55],
[56, 57],
[58, 59]])
replacing // with %
In [251]: def func(i,x):
...: if i%2==0:
...: x = x+10
...: else:
...: x = x+50
...: return x
...:
In [252]: np.array([func(i,v) for i,v in enumerate(arr)])
Out[252]:
array([[10, 11],
[52, 53],
[14, 15],
[56, 57],
[18, 19]])
But a better way is to skip the iteration entirely:
Make an array of the row additions:
In [253]: np.where(np.arange(5)%2,10,50)
Out[253]: array([50, 10, 50, 10, 50])
apply it via broadcasting:
In [256]: x+np.where(np.arange(5)%2,50,10)[:,None]
Out[256]:
array([[10, 11],
[52, 53],
[14, 15],
[56, 57],
[18, 19]])

here's one way to do this
import numpy as np
x = np.array([[0, 1, 2, 3],
[0, 1, 2, 3]])
y = x.copy() # if you dont wish to modify x
for even row index
y[::2] = y[::2] + 10
and for odd row index
y[1::2] = y[1::2] + 50
output :
array([[10, 11, 12, 13],
[50, 51, 52, 53]])

Related

Numpy: Apply function pairwise on two arrays of different length

I am trying to make use of numpy vectorized operations. But I struggle on the following task: The setting is two arrays of different length (X1, X2). I want to apply a method to each pair (e.g. X1[0] with X2[0], X2[1], etc). I wrote the following working code using loops, but I'd like to get rid of the loops.
result = []
for i in range(len(X1)):
result.append([])
for j in range(len(X2)):
tmp = my_method(X1[i] - X2[j])
result[i].append(tmp)
result = np.asarray(result)
You can reshape one of your vectors to be (N, 1) and then use vectorize which will broadcast the operation as normal:
import numpy as np
X1 = np.arange(5)
X2 = np.arange(3)
print(X1, X2)
# [0 1 2 3 4] [0 1 2]
def my_op(x, y):
return x + y
np.vectorize(my_op)(X1[:, np.newaxis], X2)
# array([[0, 1, 2],
# [1, 2, 3],
# [2, 3, 4],
# [3, 4, 5],
# [4, 5, 6]])
Note that my_op is just defined as an example; if your function is actually anything included in numpy's vectorized operations, it'd be much faster to just use that directly, e.g.:
X1[:, np.newaxis] + X2
itertools.product might be what you're looking for:
from itertools import product
import numpy as np
x1 = np.array(...)
x2 = np.array(...)
result = np.array([my_method(x_1 - x_2) for x_1, x_2 in product(x1,x2)])
Alternatively you could also use a double list comprehension:
result = np.array([my_method(x_1 - x_2) for x_1 in x1 for x_2 in x2])
This obviously depends on what my_method is doing and operating on and what you have stored in x1 and x2.
Assuming a simple function my_method(a, b), which adds the two numbers.
And this input:
X1 = np.arange(10)
X2 = np.arange(10,60,10)
You code is:
result = []
for i in range(len(X1)):
result.append([])
for j in range(len(X2)):
tmp = my_method(X1[i], X2[j])
result[i].append(tmp)
result = np.asarray(result)
You can replace it with broadcasting:
X1[:,None]+X2
output:
array([[10, 20, 30, 40, 50],
[11, 21, 31, 41, 51],
[12, 22, 32, 42, 52],
[13, 23, 33, 43, 53],
[14, 24, 34, 44, 54],
[15, 25, 35, 45, 55],
[16, 26, 36, 46, 56],
[17, 27, 37, 47, 57],
[18, 28, 38, 48, 58],
[19, 29, 39, 49, 59]])
Now you need to see if your operation can be vectorized… please share details on what you want to achieve. Functions can be vectorized using numpy.vectorize, but this is not a magic tool as it will loop on the elements, which can be slow. The best is to have a true vector operation.

How to create an NumPy array based on the index stored in another array?

Let say I have this NumPY array
A =
array([[0, 1, 3],
[1, 2, 4]])
I have another array
B =
array([[10, 41, 26, 50, 12, 24],
[20, 15, 42, 40, 41, 62]])
I wanted to create another array, where it selects the element in B using the index of the column in A. That is
C =
array([[10, 41, 50],
[15, 42, 41]])
Try:
B[[[0],[1]], A]
Or more generally:
B[np.arange(A.shape[0])[:,None], A]
Output:
array([[10, 41, 50],
[15, 42, 41]])
You can use np.take_along_axis
np.take_along_axis(B, A, axis=1)
output:
array([[10, 41, 50],
[15, 42, 41]])
This can be simply done using list rather than numpy
Though, in the ending we can convert it into numpy.
Code:
import numpy as np
#to make it simpler take a 1d list
a = [0,1,3]
b = [10, 41, 26, 50, 12, 24]
c = []
a = np.array(a)
b = np.array(b)
#here we are using for loop to find the value in a and append the index of b in c
for i in range(len(a)):
print(i)
i = a[i]
c.append(b[i])
print(c)
c = np.array(c)
print(type(c))
#To make it more fun, you can use the random module to get random digits

choose rows from two matrices

I am trying to solve the following problem. I have two matrices A and B and I want to create a new matrix C which consists of the rows of the matrices A and B depending on some condition which is encoded in the array v, i.e. if the i'th entry of v is a one then I want the i'th row of C to be the i'th row of B and if it is a zero then it should be the i'th row of A. I came up with the following solution
C = np.choose(v,A.T,B.T).T
but it is too slow. One obvious bad thing are the two transposes, but since np.choose does not take an axis argument I don't know how to get rid of them. Any ideas for a fast solution of this problem?
For Example let
A = np.arange(20).reshape([4,5])
and
B = 10 - A
Then one could imagine that one wants the matrix C to be the matrix of rows with smallest maximum norm. So we let
v = np.sum(A,axis=1)<np.sum(B,axis=1)
and then C is the matrix
C = np.choose(v,[A.T,B.T]).T
which is
array([[10, 9, 8, 7, 6],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
Seems like a good setup to use np.where to do the chosing operation based on the mask/binary input data -
C = np.where(v[:,None],B,A)
That v[:,None] part basically extends v to broadcastable shape as A and B allowing the broadcasting to let chosing work along the appropriate axis, axis=0 in this case for the two 2D arrays.
Sample run -
In [58]: A
Out[58]:
array([[82, 78, 57],
[14, 97, 32],
[72, 11, 49],
[98, 34, 41],
[89, 71, 52],
[34, 51, 55],
[26, 92, 59]])
In [59]: B
Out[59]:
array([[55, 67, 50],
[49, 64, 21],
[34, 18, 72],
[24, 61, 65],
[56, 59, 23],
[44, 77, 13],
[56, 55, 58]])
In [62]: v
Out[62]: array([1, 0, 0, 0, 0, 1, 1])
In [63]: np.where(v[:,None],B,A)
Out[63]:
array([[55, 67, 50],
[14, 97, 32],
[72, 11, 49],
[98, 34, 41],
[89, 71, 52],
[44, 77, 13],
[56, 55, 58]])
If v doesn't strictly consist of 0s and 1s only, use v[:,None]==1 as the first argument with np.where.
Another approach would be with boolean-indexing -
C = A.copy()
mask = v==1
C[mask] = B[mask]
Note : If v is already a boolean array, skip the comparison against 1 for the mask creation.
Runtime test -
In [77]: A = np.random.randint(11,99,(10000,3))
In [78]: B = np.random.randint(11,99,(10000,3))
In [79]: v = np.random.rand(A.shape[0])>0.5
In [82]: def choose_rows_copy(A, B, v):
...: C = A.copy()
...: C[v] = B[v]
...: return C
...:
In [83]: %timeit np.where(v[:,None],B,A)
10000 loops, best of 3: 107 µs per loop
In [84]: %timeit choose_rows_copy(A, B, v)
1000 loops, best of 3: 226 µs per loop

Remove elements when satisfying certain condition

I have a large list:
a=[[4,34,1], [5,87,2], [2,76,9],...]
I want to compare all pairs of sub-lists, such that if
a[i][0]>a[j][0] and a[i][1]>a[j][1]
then the sub-list a[i] should be removed.
How could I achieve this goal in Python 2.7?
Here's a slightly more idiomatic way of implementing #MisterMiyagi approach:
drop = set()
for i, j in itertools.combinations(range(len(a)), 2):
# I would've used ``enumerate`` here as well, but it is
# easier to see the filtering criteria with explicit
# indexing.
if a[i][0] > a[j][0] and a[i][1] > a[j][1]:
drop.add(i)
a = [value for idx, value in enumerate(a) if idx not in drop]
print(a)
How is it more idiomatic?
Combinatorial iterator from itertools instead of a double forloop.
No extra 0: in slices.
enumerate instead of explicit indexing to build the answer.
P.S. This is a O(N^2) solution so it might take a while for large inputs.
If you sort the list first (an O(n log n) operation), then you can identify
the items to keep (or reject) in one pass by comparing neighbors (an O(n)
operation). So for long lists this should be much faster than comparing all
pairs (an O(n**2) operation).
At the bottom of the post you'll find the code for using_sort:
In [22]: using_sort([[4,34,1], [5,87,2], [2,76,9]])
Out[22]: [[2, 76, 9], [4, 34, 1]]
In [23]: using_sort([[4, 34, 1], [5, 87, 2], [2, 76, 9], [4, 56, 12], [9, 34, 76]])
Out[23]: [[2, 76, 9], [4, 56, 12], [4, 34, 1], [9, 34, 76]]
We can compare that against a O(n**2) algorithm, using_product, based on Sergei Lebedev's answer.
First, let's check that they give the same result:
import numpy as np
tests = [
[[4, 34, 1], [5, 87, 2], [2, 76, 9], [4, 56, 12], [9, 34, 76]],
[[87, 26, 37], [50, 37, 23], [70, 97, 19], [86, 91, 55], [57, 55, 68],
[25, 35, 64], [82, 79, 66], [1, 30, 75], [16, 14, 71], [32, 89, 6]],
np.random.randint(100, size=(10, 3)).tolist(),
np.random.randint(100, size=(50, 3)).tolist(),
np.random.randint(100, size=(100, 3)).tolist()]
assert all([sorted(using_product(test)) == sorted(using_sort(test))
for test in tests])
Here is a benchmark showing using_sort is much faster than using_product.
Since using_sort is O(n log n) while using_product is O(n**2),
the speed advantage increases with the length of a.
In [17]: a = np.random.randint(100, size=(10**4, 3)).tolist()
In [20]: %timeit using_sort(a)
100 loops, best of 3: 9.44 ms per loop
In [21]: %timeit using_product(a)
1 loops, best of 3: 6.17 s per loop
I found visualizing the solution helpful. For each point in the result there is
a blue rectangular region emanating from it with the given point in the lower
left corner. This rectangular region depicts the set of points which can be
eliminated due to that point being in the result.
With using_sort, each time a point is found in the result, it keeps checking subsequent points in the sorted list against this point until it finds the next point in the result.
import itertools as IT
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
np.random.seed(2016)
def using_sort(a):
if len(a) == 0: return []
a = sorted(a, key=lambda x: (x[0], -x[1]))
result = []
pt = a[0]
nextpt = pt
for key, grp in IT.groupby(a, key=lambda x: x[0]):
for item in grp:
if not (item[0] > pt[0] and item[1] > pt[1]):
result.append(item)
nextpt = item
pt = nextpt
return result
def using_product(a):
drop = set()
for i, j in IT.product(range(len(a)), repeat=2):
if (i != j
and i not in drop
and j not in drop
and a[i][0] > a[j][0]
and a[i][1] > a[j][1]):
drop.add(i)
a = [value for idx, value in enumerate(a) if idx not in drop]
return a
def show(a, *args, **kwargs):
a = sorted(a, key=lambda x: (x[0], -x[1]))
points = np.array(a)[:, :2]
ax = kwargs.pop('ax', plt.gca())
xmax, ymax = kwargs.pop('rects', [None, None])
ax.plot(points[:, 0], points[:, 1], *args, **kwargs)
if xmax:
for x, y in points:
rect = mpatches.Rectangle((x, y), xmax-x, ymax-y, color="blue", alpha=0.1)
ax.add_patch(rect)
tests = [
[[4, 34, 1], [5, 87, 2], [2, 76, 9], [4, 56, 12], [9, 34, 76]],
[[87, 26, 37], [50, 37, 23], [70, 97, 19], [86, 91, 55], [57, 55, 68],
[25, 35, 64], [82, 79, 66], [1, 30, 75], [16, 14, 71], [32, 89, 6]],
np.random.randint(100, size=(10, 3)).tolist(),
np.random.randint(100, size=(50, 3)).tolist(),
np.random.randint(100, size=(100, 3)).tolist()]
assert all([sorted(using_product(test)) == sorted(using_sort(test))
for test in tests])
for test in tests:
print('test: {}'.format(test))
show(test, 'o', label='test')
for func, s in [('using_product', 20), ('using_sort', 10)]:
result = locals()[func](test)
print('{}: {}'.format(func, result))
xmax, ymax = np.array(test)[:, :2].max(axis=0)
show(result, 'o--', label=func, markersize=s, alpha=0.5, rects=[xmax, ymax])
print('-'*80)
plt.legend()
plt.show()
Does this work?
a=[[4,94,1], [3,67,2], [2,76,9]]
b = a
c = []
for lista in a:
condition = False
for listb in b:
if (lista[0] > listb[0] and lista[1] > listb[1]):
condition = True
break
if not condition:
c.append(lista)
c will then contain the list of lists you want.
EDIT: Changed boolean condition based on Sergei's comment.

2D numpy argsort index returns 3D when used in the original matrix

I am trying to obtain the top 2 values from each row in a matrix using argsort. The indexing is working, as in argsort is returning the correct values. However, when I put the argsort result as an index, it returns a 3 dimensional result.
For example:
test_mat = np.matrix([[0 for i in range(5)] for j in range(5)])
for i in range(5):
for j in range(5):
test_mat[i, j] = i * j
test_mat[range(2,3)] = test_mat[range(2,3)] * -1
last_two = range(-1, -3, -1)
index = np.argsort(test_mat, axis=1)
index = index[:, last_k]
This gives:
index.shape
Out[402]: (5L, 5L)
test_mat[index].shape
Out[403]: (5L, 5L, 5L)
Python is new to me and I find indexing to be very confusing in general even after reading the various array manuals. I spend more time trying to get the right values out of objects than actually solving problems. I'd welcome any tips on where to properly learn what is going on. Thanks.
You can use linear indexing to solve your case, like so -
# Say A is your 2D input array
# Get sort indices for the top 2 values in each row
idx = A.argsort(1)[:,::-1][:,:2]
# Get row offset numbers
row_offset = A.shape[1]*np.arange(A.shape[0])[:,None]
# Add row offsets with top2 sort indices giving us linear indices of
# top 2 elements in each row. Index into input array with those for output.
out = np.take( A, idx + row_offset )
Here's a step-by-step sample run -
In [88]: A
Out[88]:
array([[34, 45, 16, 20, 24],
[37, 13, 49, 37, 21],
[42, 36, 35, 24, 18],
[26, 28, 21, 13, 44]])
In [89]: idx = A.argsort(1)[:,::-1][:,:2]
In [90]: idx
Out[90]:
array([[1, 0],
[2, 3],
[0, 1],
[4, 1]])
In [91]: row_offset = A.shape[1]*np.arange(A.shape[0])[:,None]
In [92]: row_offset
Out[92]:
array([[ 0],
[ 5],
[10],
[15]])
In [93]: np.take( A, idx + row_offset )
Out[93]:
array([[45, 34],
[49, 37],
[42, 36],
[44, 28]])
You can directly get the top 2 values from each row with just sorting along the second axis and some slicing, like so -
out = np.sort(A,1)[:,:-3:-1]

Categories