Related
I am trying to make use of numpy vectorized operations. But I struggle on the following task: The setting is two arrays of different length (X1, X2). I want to apply a method to each pair (e.g. X1[0] with X2[0], X2[1], etc). I wrote the following working code using loops, but I'd like to get rid of the loops.
result = []
for i in range(len(X1)):
result.append([])
for j in range(len(X2)):
tmp = my_method(X1[i] - X2[j])
result[i].append(tmp)
result = np.asarray(result)
You can reshape one of your vectors to be (N, 1) and then use vectorize which will broadcast the operation as normal:
import numpy as np
X1 = np.arange(5)
X2 = np.arange(3)
print(X1, X2)
# [0 1 2 3 4] [0 1 2]
def my_op(x, y):
return x + y
np.vectorize(my_op)(X1[:, np.newaxis], X2)
# array([[0, 1, 2],
# [1, 2, 3],
# [2, 3, 4],
# [3, 4, 5],
# [4, 5, 6]])
Note that my_op is just defined as an example; if your function is actually anything included in numpy's vectorized operations, it'd be much faster to just use that directly, e.g.:
X1[:, np.newaxis] + X2
itertools.product might be what you're looking for:
from itertools import product
import numpy as np
x1 = np.array(...)
x2 = np.array(...)
result = np.array([my_method(x_1 - x_2) for x_1, x_2 in product(x1,x2)])
Alternatively you could also use a double list comprehension:
result = np.array([my_method(x_1 - x_2) for x_1 in x1 for x_2 in x2])
This obviously depends on what my_method is doing and operating on and what you have stored in x1 and x2.
Assuming a simple function my_method(a, b), which adds the two numbers.
And this input:
X1 = np.arange(10)
X2 = np.arange(10,60,10)
You code is:
result = []
for i in range(len(X1)):
result.append([])
for j in range(len(X2)):
tmp = my_method(X1[i], X2[j])
result[i].append(tmp)
result = np.asarray(result)
You can replace it with broadcasting:
X1[:,None]+X2
output:
array([[10, 20, 30, 40, 50],
[11, 21, 31, 41, 51],
[12, 22, 32, 42, 52],
[13, 23, 33, 43, 53],
[14, 24, 34, 44, 54],
[15, 25, 35, 45, 55],
[16, 26, 36, 46, 56],
[17, 27, 37, 47, 57],
[18, 28, 38, 48, 58],
[19, 29, 39, 49, 59]])
Now you need to see if your operation can be vectorized… please share details on what you want to achieve. Functions can be vectorized using numpy.vectorize, but this is not a magic tool as it will loop on the elements, which can be slow. The best is to have a true vector operation.
Let say I have this NumPY array
A =
array([[0, 1, 3],
[1, 2, 4]])
I have another array
B =
array([[10, 41, 26, 50, 12, 24],
[20, 15, 42, 40, 41, 62]])
I wanted to create another array, where it selects the element in B using the index of the column in A. That is
C =
array([[10, 41, 50],
[15, 42, 41]])
Try:
B[[[0],[1]], A]
Or more generally:
B[np.arange(A.shape[0])[:,None], A]
Output:
array([[10, 41, 50],
[15, 42, 41]])
You can use np.take_along_axis
np.take_along_axis(B, A, axis=1)
output:
array([[10, 41, 50],
[15, 42, 41]])
This can be simply done using list rather than numpy
Though, in the ending we can convert it into numpy.
Code:
import numpy as np
#to make it simpler take a 1d list
a = [0,1,3]
b = [10, 41, 26, 50, 12, 24]
c = []
a = np.array(a)
b = np.array(b)
#here we are using for loop to find the value in a and append the index of b in c
for i in range(len(a)):
print(i)
i = a[i]
c.append(b[i])
print(c)
c = np.array(c)
print(type(c))
#To make it more fun, you can use the random module to get random digits
I saw a lot of articles and answers to other questions about slicing 3D lists in python, but I can't apply those methods to my case.
I have a 3D list:
list = [
[[0, 56, 78], [4, 86, 90], [7, 87, 34]],
[[1, 49, 76], [0, 76, 78], [8, 60, 7]],
[[9, 6, 58], [6, 57, 78], [10, 46, 2]]
]
The the last 2 values of the 3rd dimension stay constant but change every time I rerun the code. What the code needs to do is find 2 specific pairs of those last 2 values and slice from one pair to the other. So for example:
pair1 = 86, 90
pair2 = 76, 78
The output should be:
[4, 86, 90], [7, 87, 34], [1, 49, 76], [0, 76, 78]
I know how to find the 2 pairs, I'm just not sure how to slice the list. Thanks in advance for your help and leave a comment if something is unclear.
Using numpy, you can slice the array to keep the last two elements on the last axis, find the indices where each pair takes place, flatten the result and use it to slice the array:
a = np.array(my_list) # don't call your list "list"
a_sliced = a[...,1:]
ix1 = np.flatnonzero((a_sliced == pair1).all(-1).ravel()).item()
ix2 = np.flatnonzero((a_sliced == pair2).all(-1).ravel()).item()
np.concatenate(a)[ix1:ix2+1]
array([[ 4, 86, 90],
[ 7, 87, 34],
[ 1, 49, 76],
[ 0, 76, 78]])
a being defined as a numpy array, and both pairs defined as tuples:
pair1 = 86, 90
pair2 = 76, 78
I have a large list:
a=[[4,34,1], [5,87,2], [2,76,9],...]
I want to compare all pairs of sub-lists, such that if
a[i][0]>a[j][0] and a[i][1]>a[j][1]
then the sub-list a[i] should be removed.
How could I achieve this goal in Python 2.7?
Here's a slightly more idiomatic way of implementing #MisterMiyagi approach:
drop = set()
for i, j in itertools.combinations(range(len(a)), 2):
# I would've used ``enumerate`` here as well, but it is
# easier to see the filtering criteria with explicit
# indexing.
if a[i][0] > a[j][0] and a[i][1] > a[j][1]:
drop.add(i)
a = [value for idx, value in enumerate(a) if idx not in drop]
print(a)
How is it more idiomatic?
Combinatorial iterator from itertools instead of a double forloop.
No extra 0: in slices.
enumerate instead of explicit indexing to build the answer.
P.S. This is a O(N^2) solution so it might take a while for large inputs.
If you sort the list first (an O(n log n) operation), then you can identify
the items to keep (or reject) in one pass by comparing neighbors (an O(n)
operation). So for long lists this should be much faster than comparing all
pairs (an O(n**2) operation).
At the bottom of the post you'll find the code for using_sort:
In [22]: using_sort([[4,34,1], [5,87,2], [2,76,9]])
Out[22]: [[2, 76, 9], [4, 34, 1]]
In [23]: using_sort([[4, 34, 1], [5, 87, 2], [2, 76, 9], [4, 56, 12], [9, 34, 76]])
Out[23]: [[2, 76, 9], [4, 56, 12], [4, 34, 1], [9, 34, 76]]
We can compare that against a O(n**2) algorithm, using_product, based on Sergei Lebedev's answer.
First, let's check that they give the same result:
import numpy as np
tests = [
[[4, 34, 1], [5, 87, 2], [2, 76, 9], [4, 56, 12], [9, 34, 76]],
[[87, 26, 37], [50, 37, 23], [70, 97, 19], [86, 91, 55], [57, 55, 68],
[25, 35, 64], [82, 79, 66], [1, 30, 75], [16, 14, 71], [32, 89, 6]],
np.random.randint(100, size=(10, 3)).tolist(),
np.random.randint(100, size=(50, 3)).tolist(),
np.random.randint(100, size=(100, 3)).tolist()]
assert all([sorted(using_product(test)) == sorted(using_sort(test))
for test in tests])
Here is a benchmark showing using_sort is much faster than using_product.
Since using_sort is O(n log n) while using_product is O(n**2),
the speed advantage increases with the length of a.
In [17]: a = np.random.randint(100, size=(10**4, 3)).tolist()
In [20]: %timeit using_sort(a)
100 loops, best of 3: 9.44 ms per loop
In [21]: %timeit using_product(a)
1 loops, best of 3: 6.17 s per loop
I found visualizing the solution helpful. For each point in the result there is
a blue rectangular region emanating from it with the given point in the lower
left corner. This rectangular region depicts the set of points which can be
eliminated due to that point being in the result.
With using_sort, each time a point is found in the result, it keeps checking subsequent points in the sorted list against this point until it finds the next point in the result.
import itertools as IT
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
np.random.seed(2016)
def using_sort(a):
if len(a) == 0: return []
a = sorted(a, key=lambda x: (x[0], -x[1]))
result = []
pt = a[0]
nextpt = pt
for key, grp in IT.groupby(a, key=lambda x: x[0]):
for item in grp:
if not (item[0] > pt[0] and item[1] > pt[1]):
result.append(item)
nextpt = item
pt = nextpt
return result
def using_product(a):
drop = set()
for i, j in IT.product(range(len(a)), repeat=2):
if (i != j
and i not in drop
and j not in drop
and a[i][0] > a[j][0]
and a[i][1] > a[j][1]):
drop.add(i)
a = [value for idx, value in enumerate(a) if idx not in drop]
return a
def show(a, *args, **kwargs):
a = sorted(a, key=lambda x: (x[0], -x[1]))
points = np.array(a)[:, :2]
ax = kwargs.pop('ax', plt.gca())
xmax, ymax = kwargs.pop('rects', [None, None])
ax.plot(points[:, 0], points[:, 1], *args, **kwargs)
if xmax:
for x, y in points:
rect = mpatches.Rectangle((x, y), xmax-x, ymax-y, color="blue", alpha=0.1)
ax.add_patch(rect)
tests = [
[[4, 34, 1], [5, 87, 2], [2, 76, 9], [4, 56, 12], [9, 34, 76]],
[[87, 26, 37], [50, 37, 23], [70, 97, 19], [86, 91, 55], [57, 55, 68],
[25, 35, 64], [82, 79, 66], [1, 30, 75], [16, 14, 71], [32, 89, 6]],
np.random.randint(100, size=(10, 3)).tolist(),
np.random.randint(100, size=(50, 3)).tolist(),
np.random.randint(100, size=(100, 3)).tolist()]
assert all([sorted(using_product(test)) == sorted(using_sort(test))
for test in tests])
for test in tests:
print('test: {}'.format(test))
show(test, 'o', label='test')
for func, s in [('using_product', 20), ('using_sort', 10)]:
result = locals()[func](test)
print('{}: {}'.format(func, result))
xmax, ymax = np.array(test)[:, :2].max(axis=0)
show(result, 'o--', label=func, markersize=s, alpha=0.5, rects=[xmax, ymax])
print('-'*80)
plt.legend()
plt.show()
Does this work?
a=[[4,94,1], [3,67,2], [2,76,9]]
b = a
c = []
for lista in a:
condition = False
for listb in b:
if (lista[0] > listb[0] and lista[1] > listb[1]):
condition = True
break
if not condition:
c.append(lista)
c will then contain the list of lists you want.
EDIT: Changed boolean condition based on Sergei's comment.
This question already has answers here:
Get intersecting rows across two 2D numpy arrays
(9 answers)
Closed 7 years ago.
I have two large lists of points in 2D and I want to find their common sublists, if they have some. Both of the lists are quite large and efficiency is an issue.
t1 = [[3, 41], [5, 82], [10, 31], [11, 34], [14, 54]]
t2 = [[161, 160], [169, 260], [187, 540], [192, 10], [205, 23]]
I tried itertools like below, but I get "ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()".
for i in itertools.chain.from_iterable(t1):
if i in t2:
print "yes",i
I tried the first answer from here too, but I get 'numpy.int64' object is not iterable.
Also, I think this simple code would work, but it takes so much time:
intersection = [i for i in t1 if i in t2]
Any advice? Thanks.
Lists are not hashable so we need to convert the inner list to tuple then we can use set intersection to find common element
t1 = [[3, 41], [5, 82], [10, 31], [11, 34], [14, 54]]
t2 = [[161, 160], [169, 260], [187, 540], [192, 10], [205, 23], [3,41]]
nt1 = map(tuple, t1)
nt2 = map(tuple, t2)
st1 = set(nt1)
st2 = set(nt2)
print st1.intersection(st2)
Output
set([3,41])
Since we are making the list into sets we are not accounting for repetitions. consider the following inputs
t1 = [[3, 41], [3, 41], [5, 82], [10, 31], [11, 34], [14, 54]]
t2 = [[3,41], [3,41], [161, 160], [169, 260], [187, 540], [192, 10], [205, 23]]
We have two [3,41] in both the lists but the previous python program will output only one [3,41] in the output. The following program will handle duplicate entries by counting them initially and repeating them after.
t1 = [[3, 41], [3, 41], [5, 82], [10, 31], [11, 34], [14, 54]]
t2 = [[3,41], [3,41], [161, 160], [169, 260], [187, 540], [192, 10], [205, 23]]
nt1 = map(tuple, t1)
nt2 = map(tuple, t2)
st1 = set(nt1)
st2 = set(nt2)
from collections import defaultdict
d1 = defaultdict(int)
d2 = defaultdict(int)
for i in nt1:
d1[i] += 1#counting element occurrence from first list
for i in nt2:
d2[i] += 1 #counting element occurrence from second list
result_list = []
for i in st1.intersection(st2):
min_count = min(d1[i], d2[i]) #selecting the minimum one to multiply
result_list+=map(lambda x:list(i), xrange(0, min_count))
print result_list
Output
[[3, 41], [3, 41]]
If you are really only using list then, you can create set from list and use set().intersection() for your case -
l1 = [[1,2],[2,3]]
l2 = [[3,4],[2,3]]
list(set(map(tuple,l1)).intersection(set(map(tuple,l2))))
>> [(2, 3)]
But with very very very large lists this method may be slow.
EDIT : Using map function.