Find overlapping elements in a list of tuples? - python

From my understanding of the intersection function, it finds complete overlap between elements in a list. For example:
tup_1 = [(1,2,3),(4,5,6)]
tup_2 = [(4,5,6)]
ol_tup = set(tup_1).intersection(tup_2)
print ol_tup
would yield:
set([(4, 5, 6)])
However, suppose my list of tuples are set up as this:
tup_1 = [(1,2,3),(4,5,5)]
tup_2 = [(4,5,6)]
Where there's an overlap in 2 elements of the 2nd tuple in tup_1 and 1st tuple in tup_2. If I want to python to return these 2 tuples: (4,5,5) and (4,5,6), is there an easier way than this nested for loop (below)?
for single_tuple_1 in tup_1:
for single_tuple_2 in tup_2:
if single_tuple_1[0] == single_tuple_2[0] and single_tuple_1[1] == single_tuple_2[1]:
print single_tuple_1,single_tuple_2
EDIT:
For this case, suppose order matters and suppose the tuples contain 5 elements:
tup_1 = [(1,2,3,4,5),(4,5,6,7,8),(11,12,13,14,15)]
tup_2 = [(1,2,3,4,8),(4,5,1,7,8),(11,12,13,14,-5)]
And I would like to find the tuples that intersect with each other in their respective first 4 elements. So the result should be:
[(1,2,3,4,5),(1,2,3,4,8),(11,12,13,14,15),(11,12,13,14,-5)]
How would the code change to accommodate this?

If you want to return all the pairs of "overlapping" tuples there's no way around comparing all the pairs, i.e. a quadratic algorithm. But you could make the code a bit more elegant using a list comprehension, product for the combinations and zip and sum for the comparison:
>>> tup_1 = [(1,2,3),(4,5,5),(7,8,9)]
>>> tup_2 = [(4,5,6),(0,5,5),(9,8,7)]
>>> [(a, b) for (a, b) in itertools.product(tup_1, tup_2)
... if sum(1 for ai, bi in zip(a, b) if ai == bi) >= 2]
[((4, 5, 5), (4, 5, 6)), ((4, 5, 5), (0, 5, 5))]
Note: This checks whether two tuples have the same element in at least two positions, i.e. order matters. If order should not matter, you can convert a and b to set instead and check the size of their intersection, but that might fail for repeated numbers, i.e. the intersection of (1,1,2) and (1,1,3) would just be 1 instead of 2.
If you only want to match the first two, or first two and last two elements, you can compare slices of the tuples in an accordant disjunction:
>>> [(a, b) for (a, b) in itertools.product(tup_1, tup_2)
... if a[:2] == b[:2]]
[((4, 5, 5), (4, 5, 6))]
>>> [(a, b) for (a, b) in itertools.product(tup_1, tup_2)
... if a[:2] == b[:2] or a[-2:] == b[-2:]]
[((4, 5, 5), (4, 5, 6)), ((4, 5, 5), (0, 5, 5))]

This is one way using a list comprehension. The logic as written checks for an overlap of at least 2 elements.
Note that if there is no overlap you will be left with the one element of tup_2, but that can be trivially identified.
from itertools import chain
tup_1 = [(1,2,3),(4,5,5)]
tup_2 = [(4,5,6)]
y = sorted(tup_2[0])
res = [i for i in chain(tup_1, tup_2) if
sum(i==j for i, j in zip(sorted(i), y)) > 1]
print res
[(4, 5, 5), (4, 5, 6)]

Related

How can we count text 'changes' in a list?

How can we count text 'changes' in a list?
The list below has 0 'changes'. Everything is the same.
['Comcast', 'Comcast', 'Comcast', 'Comcast', 'Comcast', 'Comcast']
The list below has 3 changes. First change is Comcast>Sprint, second change is Sprint>AT&T and third change is AT&T>Comcast.
['Comcast', 'Comcast', 'Sprint', 'AT&T', 'Comcast', 'Comcast']
I Googled this before posting here. Finding unique items seems pretty easy. Finding changes, or switches, seems not so easy.
One option is to use itertools.groupby. This counts the number of "chunks" and subtract one (to get the "boundaries").
from itertools import groupby
lst = ['Comcast', 'Comcast', 'Sprint', 'AT&T', 'Comcast', 'Comcast']
output = sum(1 for _ in groupby(lst)) - 1
print(output) # 3
You want to compare elements pairwise, so you can create an iterator to pair up adjacent elements:
>>> l = [ 1, 1, 2, 3, 1, 1, 1, ]
>>> list(zip(l[:-1], l[1:]))
[(1, 1), (1, 2), (2, 3), (3, 1), (1, 1), (1, 1)]
Then iterate over them and test if they're pairwise equal:
>>> [x == y for (x, y) in zip(l[0:-1], l[1:])]
[True, False, False, False, True, True]
Then count them where they are not equal:
>>> sum(1 for (x, y) in zip(l[0:-1], l[1:]) if x != y)
3

Is There A Universal Selector Option For if...in Clauses?

I have a "large" list of tuples:
thelist=[(1,2),(1,3),(2,3)]
I want to check whether any tuple in the list starts with a 1, and if it does, print "aaa":
for i in thelist:
templist.append((i[0],i))
for i in templist:
if i[0]==1:
print("aaa")
break
Which is rather ardurous as I have to create the templist. Is there any way I can do this:
if (1,_) in thelist:
print("aaa")
Where _ is the universal selector. Note that the list would be very large and thus it is very costly to implement another list.
There isn't, although you can just use any
any(i[0] == 1 for i in thelist) --> Returns true if the first element is 1
If you don’t actually need the actual tuple, like you do in your example, then you can actually use tuple unpacking for exactly that purpose:
>>> the_list = [(1, 2), (1, 3), (2, 3)]
>>> for x, y in the_list:
if x == 1:
print('aaa')
break
aaa
If you add a * in front of the y, you can also unpack tuples of different sizes, collecting the remainder of the tuple:
>>> other_list = [(1, 2, 3, 4, 5), (1, 3), (2, 3)]
>>> for x, *y in other_list:
if x == 1:
print(y)
break
[2, 3, 4, 5]
Otherwise, if you just want to filter your list based on some premise and then do something on those filtered items, you can use filter with a custom function:
>>> def startsWithOne(x):
return x[0] == 1
>>> thelist = [(1, 2), (1, 3), (2, 3)]
>>> for x in filter(starts_with_one, the_list):
print(x)
(1, 2)
(1, 3)
This is probably the most flexible way which also avoids creating a separate list in memory, as the elements are filtered lazily when you interate the list with your loop.
Finally, if you just want to figure out if any of your items starts with a 1, like you do in your example code, then you could just do it like this:
>>> if any(filter(starts_with_one, the_list)):
print('aaa')
aaa
But I assume that this was just an oversimplified example.

How to isolate specific rows of a Cartesian Product - Python

I've created a cartesian product and printed from the following code:
A = [0, 3, 5]
B = [5, 10, 15]
C = product(A, B)
for n in C:
print(n)
And we have a result of
(0, 5)
(0, 10)
(0, 15)
(3, 5)
(3, 10)
(3, 15)
(5, 5)
(5, 10)
(5, 15)
Is it possible to check the sum of each set within the cartesian product to a target value of 10? Can we then return or set aside the sets that match?
Result should be:
(0, 10)
(5, 5)
Finally, how can I count the frequency of numbers in my resulting subset to show that 0 appeared once, 10 appeared once, and 5 appeared twice? I would appreciate any feedback or guidance.
You can use a list comprehension like this:
C = [p for p in product(A, B) if sum(p) == 10]
And to count the frequency of the numbers in the tuples, as #Amadan suggests, you can use collections.Counter after using itertools.chain.from_iterable to chain the sub-lists into one sequence:
from collections import Counter
from itertools import chain
Counter(chain.from_iterable(C))
which returns, given your sample input:
Counter({5: 2, 0: 1, 10: 1})
which is an instance of a dict subclass, so you can iterate over its items for output:
for n, freq in Counter({5: 2, 0: 1, 10: 1}).items():
print('{}: {}'.format(n, freq))

Eliminating tuples from list of tuples based on a given criterion

So the problem is essentially this: I have a list of tuples made up of n ints that have to be eliminated if they dont fit certain criteria. This criterion boils down to that each element of the tuple must be equal to or less than the corresponding int of another list (lets call this list f) in the exact position.
So, an example:
Assuming I have a list of tuples called wk, made up of tuples of ints of length 3, and a list f composed of 3 ints. Like so:
wk = [(1,3,8),(8,9,1),(1,1,1)]
f = [2,5,8]
=== After applying the function ===
wk_result = [(1,3,8),(1,1,1)]
The rationale would be that when looking at the first tuple of wk ((1,3,8)), the first element of it is smaller than the first element of f. The second element of wk also complies with the rule, and the same applies for the third. This does not apply for the second tuple tho given that the first and second element (8 and 9) are bigger than the first and second elements of f (2 and 5).
Here's the code I have:
for i,z in enumerate(wk):
for j,k in enumerate(z):
if k <= f[j]:
pass
else:
del wk[i]
When I run this it is not eliminating the tuples from wk. What could I be doing wrong?
EDIT
One of the answers provided by user #James actually made it a whole lot simpler to do what I need to do:
[t for t in wk if t<=tuple(f)]
#returns:
[(1, 3, 8), (1, 1, 1)]
The thing is in my particular case it is not getting the job done, so I assume it might have to do with the previous steps of the process which I will post below:
max_i = max(f)
siz = len(f)
flist = [i for i in range(1,max_i +1)]
def cartesian_power(seq, p):
if p == 0:
return [()]
else:
result = []
for x1 in seq:
for x2 in cartesian_power(seq, p - 1):
result.append((x1,) + x2)
return result
wk = cartesian_power(flist, siz)
wk = [i for i in wk if i <= tuple(f) and max(i) == max_i]
What is happening is the following: I cannot use the itertools library to do permutations, that is why I am using a function that gets the job done. Once I produce a list of tuples (wk) with all possible permutations, I filter this list using two parameters: the one that brought me here originally and another one not relevant for the discussion.
Ill show an example of the results with numbers, given f = [2,5,8]:
[(1, 1, 8), (1, 2, 8), (1, 3, 8), (1, 4, 8), (1, 5, 8), (1, 6, 8), (1, 7, 8), (1, 8, 1), (1, 8, 2), (1, 8, 3), (1, 8, 4), (1, 8, 5), (1, 8, 6), (1, 8, 7), (1, 8, 8), (2, 1, 8), (2, 2, 8), (2, 3, 8), (2, 4, 8), (2, 5, 8)]
As you can see, there are instances where the ints in the tuple are bigger than the corresponding position in the f list, like (1,6,8) where the second position of the tuple (6) is bigger than the number in the second position of f (5).
You can use list comprehension with a (short-circuiting) predicate over each tuple zipped with the list f.
wk = [(1, 3, 8), (8, 9, 1), (1, 1, 1), (1, 9, 1)]
f = [2, 5, 8] # In this contrived example, f could preferably be a 3-tuple as well.
filtered = [t for t in wk if all(a <= b for (a, b) in zip(t, f))]
print(filtered) # [(1, 3, 8), (1, 1, 1)]
Here, all() has been used to specify a predicate that all tuple members must be less or equal to the corresponding element in the list f; all() will short-circuit its testing of a tuple as soon as one of its members does not pass the tuple member/list member <= sub-predicate.
Note that I added a (1, 9, 1) tuple for an example where the first tuple element passes the sub-predicate (<= corresponding element in f) whereas the 2nd tuple element does not (9 > 5).
You can do this with a list comprehension. It iterates over the list of tuples and checks that all of the elements of the tuple are less than or equal to the corresponding elements in f. You can compare tuples directly for element-wise inequality
[t for t in wk if all(x<=y for x,y in zip(t,f)]
# returns:
[(1, 3, 8), (1, 1, 1)]
Here is without loop solution which will compare each element in tuple :
wk_1 = [(1,3,8),(8,9,1),(1,1,1)]
f = [2,5,8]
final_input=[]
def comparison(wk, target):
if not wk:
return 0
else:
data=wk[0]
if data[0]<=target[0] and data[1]<=target[1] and data[2]<=target[2]:
final_input.append(data)
comparison(wk[1:],target)
comparison(wk_1,f)
print(final_input)
output:
[(1, 3, 8), (1, 1, 1)]
P.S : since i don't know you want less and equal or only less condition so modify it according to your need.

Python: Compare two lists and create a dictionary

In Python, I have a list of pairs (A) and a list of integers (B). A and B always have the same length. I want to know of a fast way of finding all the elements (pairs) of A that correspond to the same value in B (by comparison of indices of A and B) and then store the values in a dictionary (C) (the keys of the dictionary would correspond to elements of B). As an example, if
A = [(0, 0), (0, 1), (0, 3), (0, 6), (0, 7), (1, 3), (1, 7)]
B = [ 2, 5, 5, 1, 5, 4, 1 ]
then
C = {1: [(0,6),(1,7)], 2: [(0,0)], 4: [(1,3)], 5[(0,1), (0,3), (0,7)]}
Presently, I am trying this approach:
C = {}
for a, b in zip(A, B):
C.setdefault(b, [])
C[b].append(a)
While this approach gives me the desired result, I would like some approach which will be way faster (since I need to work with big datasets). I will be thankful if anyone can suggest a fast way to implement this (i.e. find the dictionary C once one is in knowledge of lists A and B).
I would have suggested
for i in range (0,len(B)):
C2.setdefault(B[i], [])
C2[B[i]].append(A[i])
it would save the zip (A,B) process
import collections
C = collections.defaultdict(list)
for ind, key in enumerate(B):
C[key].append(A[ind])

Categories