I have a large list:
a=[[4,34,1], [5,87,2], [2,76,9],...]
I want to compare all pairs of sub-lists, such that if
a[i][0]>a[j][0] and a[i][1]>a[j][1]
then the sub-list a[i] should be removed.
How could I achieve this goal in Python 2.7?
Here's a slightly more idiomatic way of implementing #MisterMiyagi approach:
drop = set()
for i, j in itertools.combinations(range(len(a)), 2):
# I would've used ``enumerate`` here as well, but it is
# easier to see the filtering criteria with explicit
# indexing.
if a[i][0] > a[j][0] and a[i][1] > a[j][1]:
drop.add(i)
a = [value for idx, value in enumerate(a) if idx not in drop]
print(a)
How is it more idiomatic?
Combinatorial iterator from itertools instead of a double forloop.
No extra 0: in slices.
enumerate instead of explicit indexing to build the answer.
P.S. This is a O(N^2) solution so it might take a while for large inputs.
If you sort the list first (an O(n log n) operation), then you can identify
the items to keep (or reject) in one pass by comparing neighbors (an O(n)
operation). So for long lists this should be much faster than comparing all
pairs (an O(n**2) operation).
At the bottom of the post you'll find the code for using_sort:
In [22]: using_sort([[4,34,1], [5,87,2], [2,76,9]])
Out[22]: [[2, 76, 9], [4, 34, 1]]
In [23]: using_sort([[4, 34, 1], [5, 87, 2], [2, 76, 9], [4, 56, 12], [9, 34, 76]])
Out[23]: [[2, 76, 9], [4, 56, 12], [4, 34, 1], [9, 34, 76]]
We can compare that against a O(n**2) algorithm, using_product, based on Sergei Lebedev's answer.
First, let's check that they give the same result:
import numpy as np
tests = [
[[4, 34, 1], [5, 87, 2], [2, 76, 9], [4, 56, 12], [9, 34, 76]],
[[87, 26, 37], [50, 37, 23], [70, 97, 19], [86, 91, 55], [57, 55, 68],
[25, 35, 64], [82, 79, 66], [1, 30, 75], [16, 14, 71], [32, 89, 6]],
np.random.randint(100, size=(10, 3)).tolist(),
np.random.randint(100, size=(50, 3)).tolist(),
np.random.randint(100, size=(100, 3)).tolist()]
assert all([sorted(using_product(test)) == sorted(using_sort(test))
for test in tests])
Here is a benchmark showing using_sort is much faster than using_product.
Since using_sort is O(n log n) while using_product is O(n**2),
the speed advantage increases with the length of a.
In [17]: a = np.random.randint(100, size=(10**4, 3)).tolist()
In [20]: %timeit using_sort(a)
100 loops, best of 3: 9.44 ms per loop
In [21]: %timeit using_product(a)
1 loops, best of 3: 6.17 s per loop
I found visualizing the solution helpful. For each point in the result there is
a blue rectangular region emanating from it with the given point in the lower
left corner. This rectangular region depicts the set of points which can be
eliminated due to that point being in the result.
With using_sort, each time a point is found in the result, it keeps checking subsequent points in the sorted list against this point until it finds the next point in the result.
import itertools as IT
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
np.random.seed(2016)
def using_sort(a):
if len(a) == 0: return []
a = sorted(a, key=lambda x: (x[0], -x[1]))
result = []
pt = a[0]
nextpt = pt
for key, grp in IT.groupby(a, key=lambda x: x[0]):
for item in grp:
if not (item[0] > pt[0] and item[1] > pt[1]):
result.append(item)
nextpt = item
pt = nextpt
return result
def using_product(a):
drop = set()
for i, j in IT.product(range(len(a)), repeat=2):
if (i != j
and i not in drop
and j not in drop
and a[i][0] > a[j][0]
and a[i][1] > a[j][1]):
drop.add(i)
a = [value for idx, value in enumerate(a) if idx not in drop]
return a
def show(a, *args, **kwargs):
a = sorted(a, key=lambda x: (x[0], -x[1]))
points = np.array(a)[:, :2]
ax = kwargs.pop('ax', plt.gca())
xmax, ymax = kwargs.pop('rects', [None, None])
ax.plot(points[:, 0], points[:, 1], *args, **kwargs)
if xmax:
for x, y in points:
rect = mpatches.Rectangle((x, y), xmax-x, ymax-y, color="blue", alpha=0.1)
ax.add_patch(rect)
tests = [
[[4, 34, 1], [5, 87, 2], [2, 76, 9], [4, 56, 12], [9, 34, 76]],
[[87, 26, 37], [50, 37, 23], [70, 97, 19], [86, 91, 55], [57, 55, 68],
[25, 35, 64], [82, 79, 66], [1, 30, 75], [16, 14, 71], [32, 89, 6]],
np.random.randint(100, size=(10, 3)).tolist(),
np.random.randint(100, size=(50, 3)).tolist(),
np.random.randint(100, size=(100, 3)).tolist()]
assert all([sorted(using_product(test)) == sorted(using_sort(test))
for test in tests])
for test in tests:
print('test: {}'.format(test))
show(test, 'o', label='test')
for func, s in [('using_product', 20), ('using_sort', 10)]:
result = locals()[func](test)
print('{}: {}'.format(func, result))
xmax, ymax = np.array(test)[:, :2].max(axis=0)
show(result, 'o--', label=func, markersize=s, alpha=0.5, rects=[xmax, ymax])
print('-'*80)
plt.legend()
plt.show()
Does this work?
a=[[4,94,1], [3,67,2], [2,76,9]]
b = a
c = []
for lista in a:
condition = False
for listb in b:
if (lista[0] > listb[0] and lista[1] > listb[1]):
condition = True
break
if not condition:
c.append(lista)
c will then contain the list of lists you want.
EDIT: Changed boolean condition based on Sergei's comment.
Related
m1 = [[['64,56'], ['77,9'], ['3,55,44,22,11']]]
m2 = [[[64, 56], [77, 9], [3, 55, 44, 22, 11]]]
How do I go from "m1" to "m2"?
You can use list comprehension with split:
m1 = [[['64,56'], ['77,9'], ['3,55,44,22,11']]]
m2 = [[int(x) for x in lst[0].split(',')] for lst in m1[0]]
print(m2) # [[[64, 56], [77, 9], [3, 55, 44, 22, 11]]]
Try this:
m1 = [[['64,56'], ['77,9'], ['3,55,44,22,11']]]
def to_int(l):
for x in range(0,len(l)):
if isinstance(l[x],list):
to_int(l[x])
else:
l[x]=[int(y) for y in l[x].split(",")]
to_int(m1)
print(m1)
This function means it doesn't matter how many nested lists there are, the function turns them all to int values.
I am trying to make use of numpy vectorized operations. But I struggle on the following task: The setting is two arrays of different length (X1, X2). I want to apply a method to each pair (e.g. X1[0] with X2[0], X2[1], etc). I wrote the following working code using loops, but I'd like to get rid of the loops.
result = []
for i in range(len(X1)):
result.append([])
for j in range(len(X2)):
tmp = my_method(X1[i] - X2[j])
result[i].append(tmp)
result = np.asarray(result)
You can reshape one of your vectors to be (N, 1) and then use vectorize which will broadcast the operation as normal:
import numpy as np
X1 = np.arange(5)
X2 = np.arange(3)
print(X1, X2)
# [0 1 2 3 4] [0 1 2]
def my_op(x, y):
return x + y
np.vectorize(my_op)(X1[:, np.newaxis], X2)
# array([[0, 1, 2],
# [1, 2, 3],
# [2, 3, 4],
# [3, 4, 5],
# [4, 5, 6]])
Note that my_op is just defined as an example; if your function is actually anything included in numpy's vectorized operations, it'd be much faster to just use that directly, e.g.:
X1[:, np.newaxis] + X2
itertools.product might be what you're looking for:
from itertools import product
import numpy as np
x1 = np.array(...)
x2 = np.array(...)
result = np.array([my_method(x_1 - x_2) for x_1, x_2 in product(x1,x2)])
Alternatively you could also use a double list comprehension:
result = np.array([my_method(x_1 - x_2) for x_1 in x1 for x_2 in x2])
This obviously depends on what my_method is doing and operating on and what you have stored in x1 and x2.
Assuming a simple function my_method(a, b), which adds the two numbers.
And this input:
X1 = np.arange(10)
X2 = np.arange(10,60,10)
You code is:
result = []
for i in range(len(X1)):
result.append([])
for j in range(len(X2)):
tmp = my_method(X1[i], X2[j])
result[i].append(tmp)
result = np.asarray(result)
You can replace it with broadcasting:
X1[:,None]+X2
output:
array([[10, 20, 30, 40, 50],
[11, 21, 31, 41, 51],
[12, 22, 32, 42, 52],
[13, 23, 33, 43, 53],
[14, 24, 34, 44, 54],
[15, 25, 35, 45, 55],
[16, 26, 36, 46, 56],
[17, 27, 37, 47, 57],
[18, 28, 38, 48, 58],
[19, 29, 39, 49, 59]])
Now you need to see if your operation can be vectorized… please share details on what you want to achieve. Functions can be vectorized using numpy.vectorize, but this is not a magic tool as it will loop on the elements, which can be slow. The best is to have a true vector operation.
Trying to apply numpy inbuilt function apply_along_axis based on row index position
import numpy as np
sa = np.array(np.arange(4))
sa_changed = (np.repeat(sa.reshape(1,len(sa)),repeats=2,axis=0))
print (sa_changed)
OP:
[[0 1 2 3]
[0 1 2 3]]
The function:
np.apply_along_axis(lambda x: x+10,0,sa_changed)
Op:
array([[10, 11, 12, 13],
[10, 11, 12, 13]])
But is there a way to use this function based on row index position for example, if its a even row index then add 10 and if its a odd row index then add 50
Sample:
def func(x):
if x.index//2==0:
x = x+10
else:
x = x+50
return x
When iterating on array, directly or with apply_along_axis, the subarray does not have a .index attribute. So we have to pass an explicit index value to your function:
In [248]: def func(i,x):
...: if i//2==0:
...: x = x+10
...: else:
...: x = x+50
...: return x
...:
In [249]: arr = np.arange(10).reshape(5,2)
apply doesn't have a way to add this index, so instead we have to use an explicit iteration.
In [250]: np.array([func(i,v) for i,v in enumerate(arr)])
Out[250]:
array([[10, 11],
[12, 13],
[54, 55],
[56, 57],
[58, 59]])
replacing // with %
In [251]: def func(i,x):
...: if i%2==0:
...: x = x+10
...: else:
...: x = x+50
...: return x
...:
In [252]: np.array([func(i,v) for i,v in enumerate(arr)])
Out[252]:
array([[10, 11],
[52, 53],
[14, 15],
[56, 57],
[18, 19]])
But a better way is to skip the iteration entirely:
Make an array of the row additions:
In [253]: np.where(np.arange(5)%2,10,50)
Out[253]: array([50, 10, 50, 10, 50])
apply it via broadcasting:
In [256]: x+np.where(np.arange(5)%2,50,10)[:,None]
Out[256]:
array([[10, 11],
[52, 53],
[14, 15],
[56, 57],
[18, 19]])
here's one way to do this
import numpy as np
x = np.array([[0, 1, 2, 3],
[0, 1, 2, 3]])
y = x.copy() # if you dont wish to modify x
for even row index
y[::2] = y[::2] + 10
and for odd row index
y[1::2] = y[1::2] + 50
output :
array([[10, 11, 12, 13],
[50, 51, 52, 53]])
I am trying to solve the following problem. I have two matrices A and B and I want to create a new matrix C which consists of the rows of the matrices A and B depending on some condition which is encoded in the array v, i.e. if the i'th entry of v is a one then I want the i'th row of C to be the i'th row of B and if it is a zero then it should be the i'th row of A. I came up with the following solution
C = np.choose(v,A.T,B.T).T
but it is too slow. One obvious bad thing are the two transposes, but since np.choose does not take an axis argument I don't know how to get rid of them. Any ideas for a fast solution of this problem?
For Example let
A = np.arange(20).reshape([4,5])
and
B = 10 - A
Then one could imagine that one wants the matrix C to be the matrix of rows with smallest maximum norm. So we let
v = np.sum(A,axis=1)<np.sum(B,axis=1)
and then C is the matrix
C = np.choose(v,[A.T,B.T]).T
which is
array([[10, 9, 8, 7, 6],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
Seems like a good setup to use np.where to do the chosing operation based on the mask/binary input data -
C = np.where(v[:,None],B,A)
That v[:,None] part basically extends v to broadcastable shape as A and B allowing the broadcasting to let chosing work along the appropriate axis, axis=0 in this case for the two 2D arrays.
Sample run -
In [58]: A
Out[58]:
array([[82, 78, 57],
[14, 97, 32],
[72, 11, 49],
[98, 34, 41],
[89, 71, 52],
[34, 51, 55],
[26, 92, 59]])
In [59]: B
Out[59]:
array([[55, 67, 50],
[49, 64, 21],
[34, 18, 72],
[24, 61, 65],
[56, 59, 23],
[44, 77, 13],
[56, 55, 58]])
In [62]: v
Out[62]: array([1, 0, 0, 0, 0, 1, 1])
In [63]: np.where(v[:,None],B,A)
Out[63]:
array([[55, 67, 50],
[14, 97, 32],
[72, 11, 49],
[98, 34, 41],
[89, 71, 52],
[44, 77, 13],
[56, 55, 58]])
If v doesn't strictly consist of 0s and 1s only, use v[:,None]==1 as the first argument with np.where.
Another approach would be with boolean-indexing -
C = A.copy()
mask = v==1
C[mask] = B[mask]
Note : If v is already a boolean array, skip the comparison against 1 for the mask creation.
Runtime test -
In [77]: A = np.random.randint(11,99,(10000,3))
In [78]: B = np.random.randint(11,99,(10000,3))
In [79]: v = np.random.rand(A.shape[0])>0.5
In [82]: def choose_rows_copy(A, B, v):
...: C = A.copy()
...: C[v] = B[v]
...: return C
...:
In [83]: %timeit np.where(v[:,None],B,A)
10000 loops, best of 3: 107 µs per loop
In [84]: %timeit choose_rows_copy(A, B, v)
1000 loops, best of 3: 226 µs per loop
This question already has answers here:
Get intersecting rows across two 2D numpy arrays
(9 answers)
Closed 7 years ago.
I have two large lists of points in 2D and I want to find their common sublists, if they have some. Both of the lists are quite large and efficiency is an issue.
t1 = [[3, 41], [5, 82], [10, 31], [11, 34], [14, 54]]
t2 = [[161, 160], [169, 260], [187, 540], [192, 10], [205, 23]]
I tried itertools like below, but I get "ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()".
for i in itertools.chain.from_iterable(t1):
if i in t2:
print "yes",i
I tried the first answer from here too, but I get 'numpy.int64' object is not iterable.
Also, I think this simple code would work, but it takes so much time:
intersection = [i for i in t1 if i in t2]
Any advice? Thanks.
Lists are not hashable so we need to convert the inner list to tuple then we can use set intersection to find common element
t1 = [[3, 41], [5, 82], [10, 31], [11, 34], [14, 54]]
t2 = [[161, 160], [169, 260], [187, 540], [192, 10], [205, 23], [3,41]]
nt1 = map(tuple, t1)
nt2 = map(tuple, t2)
st1 = set(nt1)
st2 = set(nt2)
print st1.intersection(st2)
Output
set([3,41])
Since we are making the list into sets we are not accounting for repetitions. consider the following inputs
t1 = [[3, 41], [3, 41], [5, 82], [10, 31], [11, 34], [14, 54]]
t2 = [[3,41], [3,41], [161, 160], [169, 260], [187, 540], [192, 10], [205, 23]]
We have two [3,41] in both the lists but the previous python program will output only one [3,41] in the output. The following program will handle duplicate entries by counting them initially and repeating them after.
t1 = [[3, 41], [3, 41], [5, 82], [10, 31], [11, 34], [14, 54]]
t2 = [[3,41], [3,41], [161, 160], [169, 260], [187, 540], [192, 10], [205, 23]]
nt1 = map(tuple, t1)
nt2 = map(tuple, t2)
st1 = set(nt1)
st2 = set(nt2)
from collections import defaultdict
d1 = defaultdict(int)
d2 = defaultdict(int)
for i in nt1:
d1[i] += 1#counting element occurrence from first list
for i in nt2:
d2[i] += 1 #counting element occurrence from second list
result_list = []
for i in st1.intersection(st2):
min_count = min(d1[i], d2[i]) #selecting the minimum one to multiply
result_list+=map(lambda x:list(i), xrange(0, min_count))
print result_list
Output
[[3, 41], [3, 41]]
If you are really only using list then, you can create set from list and use set().intersection() for your case -
l1 = [[1,2],[2,3]]
l2 = [[3,4],[2,3]]
list(set(map(tuple,l1)).intersection(set(map(tuple,l2))))
>> [(2, 3)]
But with very very very large lists this method may be slow.
EDIT : Using map function.