Generate ordered tuples of infinite sequences - python

I have two generators genA and genB and each of them generates an infinite, strictly monotonically increasing sequence of integers.
Now I need a generator that generates all tuples (a, b) such that a is produced by genA and b is produced by genB and a < b, ordered by a + b ascending. In case of ambiguity the ordering is of no importance, i.e. if a + b == c + d, it doesn't matter if it generates (a, b) first or (c, d) first.
For instance. If both genA and genB generate the prime numbers, then the new generator should generate:
(2, 3), (2, 5), (3, 5), (2, 7), (3, 7), (5, 7), (2, 11), ...
If genA and genB were finite lists, zipping and then sorting would do the trick.
Apparenyly for all tuples of form (x, b) the following holds: first(genA) <= x <= max(genA,b) <= b, being first(genA) the first element generated by genA and max(genA,b) the last element generated by genA which is less than b.
This is how far I have gotten. Any ideas of how to combine two generators in the described manner?

I don't think it is possible to do this without saving all the results from genA. A solution might look something like this:
import heapq
def gen_weird_sequence(genA, genB):
heap = []
a0 = next_a = next(genA)
saved_a = []
for b in genB:
while next_a < b:
saved_a.append(next_a)
next_a = next(genA)
# saved_a now contains all a < b
for a in saved_a:
heapq.heappush(heap, (a+b, a, b)) #decorate pair with sorting key a+b
# (minimum sum in the next round) > b + a0, so yield everything smaller
while heap and heap[0][0] <= b + a0:
yield heapq.heappop(heap)[1:] # pop smallest and undecorate
Explanation: The main loop iterates simply over all elements in genB, and then gets all elements from genA that are smaller than b and saves them in a list. It then generates all the tuples (a0, b), (a1, b), ..., (a_n, b) and stores them in a min-heap, which is an efficient data-structure when you are only interested in extracting the minimum value of a collection. As with sorting, you can do the trick to not save the pairs itself, but prepend them with the value you want to sort on (a+b), since comparisons between tuples will start by comparing the first item. Finally, it pops all the elements off the heap for which the sum is guaranteed smaller than the sum of any pair generated for the next b and yields them.
Note that both heap and saved_a will increase while you are generating results, I guess proportionally to the square root of the number of elements generated so far.
Quick test with some primes:
In [2]: genA = (a for a in [2,3,5,7,11,13,17,19])
In [3]: genB = (b for b in [2,3,5,7,11,13,17,19])
In [4]: for pair in gen_weird_sequence(genA, genB): print pair
(2, 3)
(2, 5)
(3, 5)
(2, 7)
(3, 7)
(5, 7)
(2, 11)
(3, 11)
(2, 13)
(3, 13)
(5, 11)
(5, 13)
(7, 11)
(2, 17)
(3, 17)
(7, 13)
as expected. Test with infinite generators:
In [11]: from itertools import *
In [12]: list(islice(gen_weird_sequence(count(), count()), 16))
Out[12]: [(0, 1), (0, 2), (0, 3), (1, 2), (0, 4), (1, 3), (0, 5), (1, 4),
(2, 3), (0, 6), (1, 5), (2, 4), (0, 7), (1, 6), (2, 5), (3, 4)]

Related

Longest continuous pairs

1st pair's 1st element should be less than the 2nd pairs 1st element : same for 2nd elements individually in a sorted list of pairs.
xlist = [(3, 9), (4, 6), (5, 7), (6, 0)] # sorted by first element of pair
ylist = [(j,i) for i,j in (sorted([(y,x) for x,y in xlist]))] = [(6, 0), (4, 6), (5, 7), (3, 9)] # sorted by second element of pair
What I want is to find the longest pair that is continuous, i.e. (4, 6), (5, 7)
PS. there can be other continuous pairs like that, but is there a way to extract the longest continuous pairs?
(4, 6), (5, 7) is determined as the longest pair based on the fact that the next pair's 1st element(5) is less than current(4). The next pair's 2nd element(7) is less than current(6) (Basically 5 > 4 and 6 > 7). And lets add another element to that list say (8, 10); this is added to the output sequence as well, as 8 > 5 and 10 > 7. So the longest pairs become (4, 6), (5, 7), (8, 10)
If you mean the maximal common subsequence of the two lists, here a code using difflib that do what you want. I don't know exactly the implementation of SequenceMatcher but it seems quite optimized as it avoids for-loop in the whole lists:
from difflib import SequenceMatcher
xlist = [(3, 9), (4, 6), (5, 7), (6, 0), (7, 8)]
ylist = [(6, 0), (4, 6), (5, 7), (7, 8), (3, 9)]
out = SequenceMatcher(None, xlist, ylist).get_matching_blocks()
max_block = max(out, key=lambda x: x.size)
start, end = max_block.a, max_block.a + max_block.size
out = xlist[start:end]
print(out) # [(4, 6), (5, 7)]
If you mean the longest increasing sequence of the second coordinate in xlist (same as previously but allowing "skips" in sequence), you can go with:
xlist = [(3, 9), (4, 6), (5, 7), (6, 0), (7, 8)]
def find_lis_2nd_coord(pairs: List[Tuple]) -> List[Tuple]:
"""Find longest increasing subsequence (LIS).
LIS is determined along 2nd coordinate of input pairs.
"""
# lis[i] stores the longest increasing subsequence of sublist
# `pairs[0…i][1]` that ends with `pairs[i][1]`
lis = [[] for _ in range(len(pairs))]
# lis[0] denotes the longest increasing subsequence ending at `pairs[0][1]`
lis[0].append(pairs[0])
# Start from the second element in the list
for i in range(1, len(pairs)):
# Do for each element in sublist `pairs[0…i-1][1]`
for j in range(i):
# Find the longest increasing subsequence that ends with
# `pairs[j][1]` where it is less than the current element
# `pairs[i][1]`
if pairs[j][1] < pairs[i][1] and len(lis[j]) > len(lis[i]):
lis[i] = lis[j].copy()
# include `pairs[i]` in `lis[i]`
lis[i].append(pairs[i])
return max(lis, key=len)
print(find_lis_2nd_coord(xlist)) # [(4, 6), (5, 7), (7, 8)]
Disclaimer: this version is O(n^2) but I didn't find more optimized idea or implementation. At least it works.

How do I impose a condition that must be satisfied by *any two* members of a set?

I want to use python to define one set in terms of another, as follows: For some set N that consists of sets, define C as the set such that an element n of N is in C just in case any two elements of n both satisfy some specific condition.
Here is the particular problem I need to solve. Consider the power set N of the set of ordered pairs of elements in x={1,2,3,4,5,6}, and the following subsets of N:
i = {{1,1},{2,2},{3,4},{4,3},{5,6},{6,5}}
j = {{3,3},{4,4},{1,2},{2,1},{5,6},{6,5}}
k = {{5,5},{6,6},{1,2},{2,1},{3,4},{4,3}}
Using python, I want to define a special class of subsets of N: the subsets of N such that any two of their members are both either in i, j, or k.
More explicitly, I want to define the set terms: C = {n in N| for all a, b in n, either a and b are both in i or a and b are both in j or a and b are both in k}.
I'm attaching what I tried to do in Python. But this doesn't give me the right result: the set C I'm defining here is not such that any two of its members are both either in i, j, or k.
Any leads would be much appreciated!
import itertools
def powerset(iterable):
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
x = [1,2,3,4,5,6]
ordered_pairs = [[j,k] for j in x for k in x if k>=j]
powers = list(powerset(ordered_pairs))
i = [[1,1],[2,2],[3,4],[4,3],[5,6],[6,5]]
j = [[3,3],[4,4],[1,2],[2,1],[5,6],[6,5]]
k = [[5,5],[6,6],[1,2],[2,1],[3,4],[4,3]]
M = [i,j,k]
C = []
for n in powers:
for a in n:
for b in n:
for m in M:
if a in m:
if b in m:
if a != b:
C.append(n)
if len(n) == 1:
C.append(n)
First of all, note that the ordered pairs you list are sets, not pairs. Use tuples, since they're hashable, and you'll be able to easily generate the power set using itertools. With that done, you have an easier time identifying the qualifying subsets.
The code below implements that much of the process. You can accumulate the hits at the HIT line of code. Even better, you can collapse the loop into a nested comprehension using any to iterate over the three zone sets.
test_list = [
set(((1,1),(2,2))), # trivial hit on i
set(), # trivial miss
set(((1, 1), (4, 4), (6, 6))), # one element in each target set
set(((3, 3), (6, 2), (4, 4), (2, 2))), # two elements in j
]
i = set(((1,1),(2,2),(3,4),(4,3),(5,6),(6,5)))
j = set(((3,3),(4,4),(1,2),(2,1),(5,6),(6,5)))
k = set(((5,5),(6,6),(1,2),(2,1),(3,4),(4,3)))
zone = [i, j, k]
for candidate in test_list:
for target in zone:
overlap = candidate.intersection(target)
if len(overlap) >= 2:
print("HIT", candidate, target)
break
else:
print("MISS", candidate)
Output:
HIT {(1, 1), (2, 2)} {(5, 6), (4, 3), (2, 2), (3, 4), (1, 1), (6, 5)}
MISS set()
MISS {(4, 4), (1, 1), (6, 6)}
HIT {(6, 2), (3, 3), (4, 4), (2, 2)} {(1, 2), (3, 3), (5, 6), (4, 4), (2, 1), (6, 5)}

Remove element from itertools.combinations while iterating?

Given a list l and all combinations of the list elements is it possible to remove any combination containing x while iterating over all combinations, so that you never consider a combination containing x during the iteration after it is removed?
for a, b in itertools.combinations(l, 2):
if some_function(a,b):
remove_any_tup_with_a_or_b(a, b)
My list l is pretty big so I don't want to keep the combinations in memory.
A cheap trick to accomplish this would be to filter by disjoint testing using a dynamically updated set of exclusion values, but it wouldn't actually avoid generating the combinations you wish to exclude, so it's not a major performance benefit (though filtering using a C built-in function like isdisjoint will be faster than Python level if checks with continue statements typically, by pushing the filter work to the C layer):
from future_builtins import filter # Only on Py2, for generator based filter
import itertools
blacklist = set()
for a, b in filter(blacklist.isdisjoint, itertools.combinations(l, 2)):
if some_function(a,b):
blacklist.update((a, b))
If you want to remove all tuples containing the number x from the list of combinations itertools.combinations(l, 2), consider that you there is a one-to-one mapping (mathematically speaking) from the set itertools.combinations([i for i in range(1,len(l)], 2) to the itertools.combinations(l, 2) that don't contain the number x.
Example:
The set of all of combinations from itertools.combinations([1,2,3,4], 2) that don't contain the number 1 is given by [(2, 3), (2, 4), (3, 4)]. Notice that the number of elements in this list is equal to the number of elements of combinations in the list itertools.combinations([1,2,3], 2)=[(1, 2), (1, 3), (2, 3)].
Since order doesn't matter in combinations, you can map 1 to 4 in [(1, 2), (1, 3), (2, 3)] to get [(1, 2), (1, 3), (2, 3)]=[(4, 2), (4, 3), (2, 3)]=[(2, 4), (3, 4), (2, 3)]=[(2, 3), (2, 4), (3, 4)].

How to generate list of tuples relating records

I need to generate a list from the list of tuples:
a = [(1,2), (1,3), (2,3), (2,5), (2,6), (3,4), (3,6), (4,7), (5 6), (5,9), (5,10), (6,7)
(6.10) (6.11) (7.8) (7.12) (8.12) (9.10) (10.11)]
The rule is:
- I have a record from any (begin = random.choice (a))
- Items from the new list must have the following relationship:
the last item of each tuple in the list must be equal to the first item of the next tuple to be inserted.
Example of a valid output (starting by the tuple (3.1)):
[(3, 1), (1, 2), (2, 3), (3, 4), (4, 7), (7, 8), (8, 12), (12, 7), (7, 6), (6, 2), (2, 5), (5, 6), (6, 10), (10, 5) (5, 9), (9, 10), (10, 11), (11, 6), (6, 3)]
How can I do this? Its make using list comprehensions?
Thanks!
Here, lisb will be populated with tuples in the order that you seek. This is, of course, if lisa provides appropriate tuples (ie, each tuple has a 1th value matching another tuple's 0th value). Your sample list will not work, regardless of the implementation, because all the values don't match up (for example, there is no 0th element with 12, so that tuple can't be connected forward to any other tuple)...so you should come up with a better sample list.
Tested, working.
import random
lisa = [(1, 2), (3, 4), (2, 3), (4, 0), (0, 9), (9, 1)]
lisb = []
current = random.choice(lisa)
while True:
lisa.remove(current)
lisb.append(current)
current = next((y for y in lisa if y[0] == current[1]), None)
if current == None:
break
print lisb
If you don't want to delete items from lisa, just slice a new list.
As a generator function:
def chained_tuples(x):
oldlist = x[::]
item = random.choice(oldlist)
oldlist.remove(item)
yield item
while oldlist:
item = next(next_item for next_item in oldlist if next_item[0] == item[1])
oldlist.remove(item)
yield item
As noted, you'll get an incomplete response if your list isn't actually chainable all the way through, like your example list.
Just to add another way of solving this problem:
import random
from collections import defaultdict
lisa = [(1, 2), (3, 4), (2, 3), (4, 0), (0, 9), (9, 1)]
current_start, current_end = lisa[random.randint(0, len(lisa) - 1)]
starts = defaultdict(list)
lisb = [(current_start, current_end)]
for start, end in lisa:
starts[start].append(end)
while True:
if not starts[current_end]:
break
current_start, current_end = current_end, starts[current_end].pop()
lisb.append((current_start, current_end))
Note: You have to make sure lisa is not empty.
I think all of the answers so far are missing the requirement (at least based on your example output) that the longest chain be found.
My suggested solution is to recursively parse all possible chains that can be constructed, and return the longest result. The function looks like this:
def generateTuples(list, offset, value = None):
if value == None: value = list[offset]
list = list[:offset]+list[offset+1:]
res = []
for i,(a,b) in enumerate(list):
if value[1] in (a,b):
if value[1] == a:
subres = generateTuples(list, i, (a,b))
else:
subres = generateTuples(list, i, (b,a))
if len(subres) > len(res):
res = subres
return [value] + res
And you would call it like this:
results = generateTuples(a, 1, (3,1))
Producing the list:
[(3, 1), (1, 2), (2, 3), (3, 4), (4, 7), (7, 8), (8, 12), (12, 7), (7, 6),
(6, 2), (2, 5), (5, 6), (6, 10), (10, 5), (5, 9), (9, 10), (10, 11),
(11, 6), (6, 3)]
The first parameter of the function is the source list of tuples, the second parameter is the offset of the first element to use, the third parameter is optional, but allows you to override the value of the first element. The latter is useful when you want to start with a tuple in its reversed order as you have done in your example.

Collapse a list of range tuples into the overlapping ranges

I'm looking for the most memory efficient way to solve this problem.
I have a list of tuples representing partial string matches in a sentence:
[(0, 2), (1, 2), (0, 4), (2,6), (23, 2), (22, 6), (26, 2), (26, 2), (26, 2)]
The first value of each tuple is the start position for the match, the second value is the length.
The idea is to collapse the list so that only the longest continue string match is reported. In this case it would be:
[(0,4), (2,6), (22,6)]
I do not want just the longest range, like in algorithm to find longest non-overlapping sequences, but I want all the ranges collapsed by the longest.
In case your wondering, I am using a pure python implementation of the Aho-Corasick for matching terms in a static dictionary to the given text snippet.
EDIT: Due to the nature of these tuple lists, overlapping but not self-contained ranges should be printed out individually. For example, having the words betaz and zeta in the dictionary, the matches for betazeta are [(0,5),(4,8)]. Since these ranges overlap, but none is contained in the other, the answer should be [(0,5),(4,8)]. I have also modified the input dataset above so that this case is covered.
Thanks!
import operator
lst = [(0, 2), (1, 2), (0, 4), (2,6), (23, 2), (22, 6), (26, 2), (26, 2), (26, 2)]
lst.sort(key=operator.itemgetter(1))
for i in reversed(xrange(len(lst)-1)):
start, length = lst[i]
for j in xrange(i+1, len(lst)):
lstart, llength = lst[j]
if start >= lstart and start + length <= lstart + llength:
del lst[i]
break
print lst
#[(0, 4), (2, 6), (22, 6)]
a = [(0, 2), (1, 2), (0, 4), (23, 2), (22, 6), (26, 2), (26, 2), (26, 2)]
b = [set(xrange(i, i + j)) for i, j in a]
c = b.pop().union(*b)
collapsed = sorted(c)
print collapsed
#Maybe this is useful?:
[0, 1, 2, 3, 22, 23, 24, 25, 26, 27]
#But if you want the requested format, then do this:
d = []
start = collapsed[0]
length = 0
for val in collapsed:
if start + length < val:
d.append((start,length))
start = val
length = 0
elif val == collapsed[-1]:
d.append((start,length + 1))
length += 1
print d
#Output:
[(0,4), (22,6)]
So, taking you at your word that your main interest is space efficiency, here's one way to do what you want:
lst = [(0, 2), (1, 2), (0, 4), (23, 2), (22, 6), (26, 2), (26, 2), (26, 2)]
lst.sort()
start, length = lst.pop(0)
i = 0
while i < len(lst):
x, l = lst[i]
if start + length < x:
lst[i] = (start, length)
i += 1
start, length = x, l
else:
length = max(length, x + l - start)
lst.pop(i)
lst.append((start, length))
This modifies the list in place, never makes the list longer, only uses a small handful of variables to keep state, and only needs one pass through the list
A much faster algorithm is possible if you don't want to modify the list in place - popping items from the middle of a list can be slow, especially if the list is long.
One reasonable optimization would be to keep a list of which indices you're going to remove, and then come back and rebuild the list in a second pass, that way you could rebuild the whole list in one go and avoid the pop overhead. But that would use more memory!

Categories