Longest continuous pairs - python

1st pair's 1st element should be less than the 2nd pairs 1st element : same for 2nd elements individually in a sorted list of pairs.
xlist = [(3, 9), (4, 6), (5, 7), (6, 0)] # sorted by first element of pair
ylist = [(j,i) for i,j in (sorted([(y,x) for x,y in xlist]))] = [(6, 0), (4, 6), (5, 7), (3, 9)] # sorted by second element of pair
What I want is to find the longest pair that is continuous, i.e. (4, 6), (5, 7)
PS. there can be other continuous pairs like that, but is there a way to extract the longest continuous pairs?
(4, 6), (5, 7) is determined as the longest pair based on the fact that the next pair's 1st element(5) is less than current(4). The next pair's 2nd element(7) is less than current(6) (Basically 5 > 4 and 6 > 7). And lets add another element to that list say (8, 10); this is added to the output sequence as well, as 8 > 5 and 10 > 7. So the longest pairs become (4, 6), (5, 7), (8, 10)

If you mean the maximal common subsequence of the two lists, here a code using difflib that do what you want. I don't know exactly the implementation of SequenceMatcher but it seems quite optimized as it avoids for-loop in the whole lists:
from difflib import SequenceMatcher
xlist = [(3, 9), (4, 6), (5, 7), (6, 0), (7, 8)]
ylist = [(6, 0), (4, 6), (5, 7), (7, 8), (3, 9)]
out = SequenceMatcher(None, xlist, ylist).get_matching_blocks()
max_block = max(out, key=lambda x: x.size)
start, end = max_block.a, max_block.a + max_block.size
out = xlist[start:end]
print(out) # [(4, 6), (5, 7)]
If you mean the longest increasing sequence of the second coordinate in xlist (same as previously but allowing "skips" in sequence), you can go with:
xlist = [(3, 9), (4, 6), (5, 7), (6, 0), (7, 8)]
def find_lis_2nd_coord(pairs: List[Tuple]) -> List[Tuple]:
"""Find longest increasing subsequence (LIS).
LIS is determined along 2nd coordinate of input pairs.
"""
# lis[i] stores the longest increasing subsequence of sublist
# `pairs[0…i][1]` that ends with `pairs[i][1]`
lis = [[] for _ in range(len(pairs))]
# lis[0] denotes the longest increasing subsequence ending at `pairs[0][1]`
lis[0].append(pairs[0])
# Start from the second element in the list
for i in range(1, len(pairs)):
# Do for each element in sublist `pairs[0…i-1][1]`
for j in range(i):
# Find the longest increasing subsequence that ends with
# `pairs[j][1]` where it is less than the current element
# `pairs[i][1]`
if pairs[j][1] < pairs[i][1] and len(lis[j]) > len(lis[i]):
lis[i] = lis[j].copy()
# include `pairs[i]` in `lis[i]`
lis[i].append(pairs[i])
return max(lis, key=len)
print(find_lis_2nd_coord(xlist)) # [(4, 6), (5, 7), (7, 8)]
Disclaimer: this version is O(n^2) but I didn't find more optimized idea or implementation. At least it works.

Related

Merge/Fuse a list of tuples in a certain way

Given a list of tuples that represent edges:
edges = [(2, 4), (3, 4), (6, 8), (6, 9), (7, 10), (11, 13)]
I want to merge or blend those edges to get a list of merged tuples, for example (2, 4), (3, 4) will be merged into (2, 4).
The final output of the the list above should look like:
[(2, 4), (6, 10), (11, 13)]
My idea is to use a double for loop to iterate over the list and find intersections and substitute the 2 edges with (min(e1[0], e2[0]), max(e1[1], e2[1])) but this method won't
work for all cases.
Any good thoughts?
Here's my solution:
edges = [(2, 4), (3, 4), (6, 8), (6, 9), (7, 10), (11, 13)]
edges = sorted(edges, key=lambda x:(x[0], -x[1]))
fused = []
i = 0
while i < len(edges):
start,end = edges[i]
for j in range(i+1, len(edges)):
s,e = edges[j]
if s <= end:
# edges[j] is included in the fused range
# Update end to the greater value
end = max(e, end)
else:
break
fused.append((start, end))
del edges[i:j]
print(fused)
Explanation:
The logic works as follows: we sort the list in ascending order of the start values. If two ranges have the same start value, we arrange them in descending order of their end elements. This way two ranges with the same start value will be 'eaten up' by the range with the farther end value.
Now that the list is sorted in this unique way, there's a nice little property here: If you start from the first range, you can decide whether or not you want to fuse with the next range or not. If you do fuse with it, then update the end of the first range to merge with the 'fusable' range. If you decide NOT to fuse with it, then everything from the first range till now will get fused and added to the new list.
edges = sorted(edges, key=lambda x:(x[0], -x[1]))
Sorts edges in ascending order of the start values and descending order of end values.
del edges[i:j]
Deletes all the fused ranges from the original list. This is important because i always points to the new range that we'll start fusing from.

How to remove duplicate from list of tuple when order is important

I have seen some similar answers, but I can't find something specific for this case.
I have a list of tuples:
[(5, 0), (3, 1), (3, 2), (5, 3), (6, 4)]
What I want is to remove tuples from this list only when first element of tuple has occurred previously in the list and the tuple which remains should have the smallest second element.
So the output should look like this:
[(5, 0), (3, 1), (6, 4)]
Here's a linear time approach that requires two iterations over your original list.
t = [(5, 0), (3, 1), (3, 2), (5, 3), (6, 4)] # test case 1
#t = [(5, 3), (3, 1), (3, 2), (5, 0), (6, 4)] # test case 2
smallest = {}
inf = float('inf')
for first, second in t:
if smallest.get(first, inf) > second:
smallest[first] = second
result = []
seen = set()
for first, second in t:
if first not in seen and second == smallest[first]:
seen.add(first)
result.append((first, second))
print(result) # [(5, 0), (3, 1), (6, 4)] for test case 1
# [(3, 1), (5, 0), (6, 4)] for test case 2
Here is a compact version I came up with using OrderedDict and skipping replacement if new value is larger than old.
from collections import OrderedDict
a = [(5, 3), (3, 1), (3, 2), (5, 0), (6, 4)]
d = OrderedDict()
for item in a:
# Get old value in dictionary if exist
old = d.get(item[0])
# Skip if new item is larger than old
if old:
if item[1] > old[1]:
continue
#else:
# del d[item[0]]
# Assign
d[item[0]] = item
list(d.values())
Returns:
[(5, 0), (3, 1), (6, 4)]
Or if you use the else-statement (commented out):
[(3, 1), (5, 0), (6, 4)]
Seems to me that you need to know two things:
The tuple that has the smallest second element for each first element.
The order to index each first element in the new list
We can get #1 by using itertools.groupby and a min function.
import itertools
import operator
lst = [(3, 1), (5, 3), (5, 0), (3, 2), (6, 4)]
# I changed this slightly to make it harder to accidentally succeed.
# correct final order should be [(3, 1), (5, 0), (6, 4)]
tmplst = sorted(lst, key=operator.itemgetter(0))
groups = itertools.groupby(tmplst, operator.itemgetter(0))
# group by first element, in this case this looks like:
# [(3, [(3, 1), (3, 2)]), (5, [(5, 3), (5, 0)]), (6, [(6, 4)])]
# note that groupby only works on sorted lists, so we need to sort this first
min_tuples = {min(v, key=operator.itemgetter(1)) for _, v in groups}
# give the best possible result for each first tuple. In this case:
# {(3, 1), (5, 0), (6, 4)}
# (note that this is a set comprehension for faster lookups later.
Now that we know what our result set looks like, we can re-tackle lst to get them in the right order.
seen = set()
result = []
for el in lst:
if el not in min_tuples: # don't add to result
continue
elif el not in seen: # add to result and mark as seen
result.append(el)
seen.add(el)
This will do what you need:
# I switched (5, 3) and (5, 0) to demonstrate sorting capabilities.
list_a = [(5, 3), (3, 1), (3, 2), (5, 0), (6, 4)]
# Create a list to contain the results
list_b = []
# Create a list to check for duplicates
l = []
# Sort list_a by the second element of each tuple to ensure the smallest numbers
list_a.sort(key=lambda i: i[1])
# Iterate through every tuple in list_a
for i in list_a:
# Check if the 0th element of the tuple is in the duplicates list; if not:
if i[0] not in l:
# Add the tuple the loop is currently on to the results; and
list_b.append(i)
# Add the 0th element of the tuple to the duplicates list
l.append(i[0])
>>> print(list_b)
[(5, 0), (3, 1), (6, 4)]
Hope this helped!
Using enumerate() and list comprehension:
def remove_if_first_index(l):
return [item for index, item in enumerate(l) if item[0] not in [value[0] for value in l[0:index]]]
Using enumerate() and a for loop:
def remove_if_first_index(l):
# The list to store the return value
ret = []
# Get the each index and item from the list passed
for index, item in enumerate(l):
# Get the first number in each tuple up to the index we're currently at
previous_values = [value[0] for value in l[0:index]]
# If the item's first number is not in the list of previously encountered first numbers
if item[0] not in previous_values:
# Append it to the return list
ret.append(item)
return ret
Testing
some_list = [(5, 0), (3, 1), (3, 2), (5, 3), (6, 4)]
print(remove_if_first_index(some_list))
# [(5, 0), (3, 1), (6, 4)]
I had this idea without seeing the #Anton vBR's answer.
import collections
inp = [(5, 0), (3, 1), (3, 2), (5, 3), (6, 4)]
od = collections.OrderedDict()
for i1, i2 in inp:
if i2 <= od.get(i1, i2):
od.pop(i1, None)
od[i1] = i2
outp = list(od.items())
print(outp)

Generate ordered tuples of infinite sequences

I have two generators genA and genB and each of them generates an infinite, strictly monotonically increasing sequence of integers.
Now I need a generator that generates all tuples (a, b) such that a is produced by genA and b is produced by genB and a < b, ordered by a + b ascending. In case of ambiguity the ordering is of no importance, i.e. if a + b == c + d, it doesn't matter if it generates (a, b) first or (c, d) first.
For instance. If both genA and genB generate the prime numbers, then the new generator should generate:
(2, 3), (2, 5), (3, 5), (2, 7), (3, 7), (5, 7), (2, 11), ...
If genA and genB were finite lists, zipping and then sorting would do the trick.
Apparenyly for all tuples of form (x, b) the following holds: first(genA) <= x <= max(genA,b) <= b, being first(genA) the first element generated by genA and max(genA,b) the last element generated by genA which is less than b.
This is how far I have gotten. Any ideas of how to combine two generators in the described manner?
I don't think it is possible to do this without saving all the results from genA. A solution might look something like this:
import heapq
def gen_weird_sequence(genA, genB):
heap = []
a0 = next_a = next(genA)
saved_a = []
for b in genB:
while next_a < b:
saved_a.append(next_a)
next_a = next(genA)
# saved_a now contains all a < b
for a in saved_a:
heapq.heappush(heap, (a+b, a, b)) #decorate pair with sorting key a+b
# (minimum sum in the next round) > b + a0, so yield everything smaller
while heap and heap[0][0] <= b + a0:
yield heapq.heappop(heap)[1:] # pop smallest and undecorate
Explanation: The main loop iterates simply over all elements in genB, and then gets all elements from genA that are smaller than b and saves them in a list. It then generates all the tuples (a0, b), (a1, b), ..., (a_n, b) and stores them in a min-heap, which is an efficient data-structure when you are only interested in extracting the minimum value of a collection. As with sorting, you can do the trick to not save the pairs itself, but prepend them with the value you want to sort on (a+b), since comparisons between tuples will start by comparing the first item. Finally, it pops all the elements off the heap for which the sum is guaranteed smaller than the sum of any pair generated for the next b and yields them.
Note that both heap and saved_a will increase while you are generating results, I guess proportionally to the square root of the number of elements generated so far.
Quick test with some primes:
In [2]: genA = (a for a in [2,3,5,7,11,13,17,19])
In [3]: genB = (b for b in [2,3,5,7,11,13,17,19])
In [4]: for pair in gen_weird_sequence(genA, genB): print pair
(2, 3)
(2, 5)
(3, 5)
(2, 7)
(3, 7)
(5, 7)
(2, 11)
(3, 11)
(2, 13)
(3, 13)
(5, 11)
(5, 13)
(7, 11)
(2, 17)
(3, 17)
(7, 13)
as expected. Test with infinite generators:
In [11]: from itertools import *
In [12]: list(islice(gen_weird_sequence(count(), count()), 16))
Out[12]: [(0, 1), (0, 2), (0, 3), (1, 2), (0, 4), (1, 3), (0, 5), (1, 4),
(2, 3), (0, 6), (1, 5), (2, 4), (0, 7), (1, 6), (2, 5), (3, 4)]

How to generate list of tuples relating records

I need to generate a list from the list of tuples:
a = [(1,2), (1,3), (2,3), (2,5), (2,6), (3,4), (3,6), (4,7), (5 6), (5,9), (5,10), (6,7)
(6.10) (6.11) (7.8) (7.12) (8.12) (9.10) (10.11)]
The rule is:
- I have a record from any (begin = random.choice (a))
- Items from the new list must have the following relationship:
the last item of each tuple in the list must be equal to the first item of the next tuple to be inserted.
Example of a valid output (starting by the tuple (3.1)):
[(3, 1), (1, 2), (2, 3), (3, 4), (4, 7), (7, 8), (8, 12), (12, 7), (7, 6), (6, 2), (2, 5), (5, 6), (6, 10), (10, 5) (5, 9), (9, 10), (10, 11), (11, 6), (6, 3)]
How can I do this? Its make using list comprehensions?
Thanks!
Here, lisb will be populated with tuples in the order that you seek. This is, of course, if lisa provides appropriate tuples (ie, each tuple has a 1th value matching another tuple's 0th value). Your sample list will not work, regardless of the implementation, because all the values don't match up (for example, there is no 0th element with 12, so that tuple can't be connected forward to any other tuple)...so you should come up with a better sample list.
Tested, working.
import random
lisa = [(1, 2), (3, 4), (2, 3), (4, 0), (0, 9), (9, 1)]
lisb = []
current = random.choice(lisa)
while True:
lisa.remove(current)
lisb.append(current)
current = next((y for y in lisa if y[0] == current[1]), None)
if current == None:
break
print lisb
If you don't want to delete items from lisa, just slice a new list.
As a generator function:
def chained_tuples(x):
oldlist = x[::]
item = random.choice(oldlist)
oldlist.remove(item)
yield item
while oldlist:
item = next(next_item for next_item in oldlist if next_item[0] == item[1])
oldlist.remove(item)
yield item
As noted, you'll get an incomplete response if your list isn't actually chainable all the way through, like your example list.
Just to add another way of solving this problem:
import random
from collections import defaultdict
lisa = [(1, 2), (3, 4), (2, 3), (4, 0), (0, 9), (9, 1)]
current_start, current_end = lisa[random.randint(0, len(lisa) - 1)]
starts = defaultdict(list)
lisb = [(current_start, current_end)]
for start, end in lisa:
starts[start].append(end)
while True:
if not starts[current_end]:
break
current_start, current_end = current_end, starts[current_end].pop()
lisb.append((current_start, current_end))
Note: You have to make sure lisa is not empty.
I think all of the answers so far are missing the requirement (at least based on your example output) that the longest chain be found.
My suggested solution is to recursively parse all possible chains that can be constructed, and return the longest result. The function looks like this:
def generateTuples(list, offset, value = None):
if value == None: value = list[offset]
list = list[:offset]+list[offset+1:]
res = []
for i,(a,b) in enumerate(list):
if value[1] in (a,b):
if value[1] == a:
subres = generateTuples(list, i, (a,b))
else:
subres = generateTuples(list, i, (b,a))
if len(subres) > len(res):
res = subres
return [value] + res
And you would call it like this:
results = generateTuples(a, 1, (3,1))
Producing the list:
[(3, 1), (1, 2), (2, 3), (3, 4), (4, 7), (7, 8), (8, 12), (12, 7), (7, 6),
(6, 2), (2, 5), (5, 6), (6, 10), (10, 5), (5, 9), (9, 10), (10, 11),
(11, 6), (6, 3)]
The first parameter of the function is the source list of tuples, the second parameter is the offset of the first element to use, the third parameter is optional, but allows you to override the value of the first element. The latter is useful when you want to start with a tuple in its reversed order as you have done in your example.

Collapse a list of range tuples into the overlapping ranges

I'm looking for the most memory efficient way to solve this problem.
I have a list of tuples representing partial string matches in a sentence:
[(0, 2), (1, 2), (0, 4), (2,6), (23, 2), (22, 6), (26, 2), (26, 2), (26, 2)]
The first value of each tuple is the start position for the match, the second value is the length.
The idea is to collapse the list so that only the longest continue string match is reported. In this case it would be:
[(0,4), (2,6), (22,6)]
I do not want just the longest range, like in algorithm to find longest non-overlapping sequences, but I want all the ranges collapsed by the longest.
In case your wondering, I am using a pure python implementation of the Aho-Corasick for matching terms in a static dictionary to the given text snippet.
EDIT: Due to the nature of these tuple lists, overlapping but not self-contained ranges should be printed out individually. For example, having the words betaz and zeta in the dictionary, the matches for betazeta are [(0,5),(4,8)]. Since these ranges overlap, but none is contained in the other, the answer should be [(0,5),(4,8)]. I have also modified the input dataset above so that this case is covered.
Thanks!
import operator
lst = [(0, 2), (1, 2), (0, 4), (2,6), (23, 2), (22, 6), (26, 2), (26, 2), (26, 2)]
lst.sort(key=operator.itemgetter(1))
for i in reversed(xrange(len(lst)-1)):
start, length = lst[i]
for j in xrange(i+1, len(lst)):
lstart, llength = lst[j]
if start >= lstart and start + length <= lstart + llength:
del lst[i]
break
print lst
#[(0, 4), (2, 6), (22, 6)]
a = [(0, 2), (1, 2), (0, 4), (23, 2), (22, 6), (26, 2), (26, 2), (26, 2)]
b = [set(xrange(i, i + j)) for i, j in a]
c = b.pop().union(*b)
collapsed = sorted(c)
print collapsed
#Maybe this is useful?:
[0, 1, 2, 3, 22, 23, 24, 25, 26, 27]
#But if you want the requested format, then do this:
d = []
start = collapsed[0]
length = 0
for val in collapsed:
if start + length < val:
d.append((start,length))
start = val
length = 0
elif val == collapsed[-1]:
d.append((start,length + 1))
length += 1
print d
#Output:
[(0,4), (22,6)]
So, taking you at your word that your main interest is space efficiency, here's one way to do what you want:
lst = [(0, 2), (1, 2), (0, 4), (23, 2), (22, 6), (26, 2), (26, 2), (26, 2)]
lst.sort()
start, length = lst.pop(0)
i = 0
while i < len(lst):
x, l = lst[i]
if start + length < x:
lst[i] = (start, length)
i += 1
start, length = x, l
else:
length = max(length, x + l - start)
lst.pop(i)
lst.append((start, length))
This modifies the list in place, never makes the list longer, only uses a small handful of variables to keep state, and only needs one pass through the list
A much faster algorithm is possible if you don't want to modify the list in place - popping items from the middle of a list can be slow, especially if the list is long.
One reasonable optimization would be to keep a list of which indices you're going to remove, and then come back and rebuild the list in a second pass, that way you could rebuild the whole list in one go and avoid the pop overhead. But that would use more memory!

Categories