Merge/Fuse a list of tuples in a certain way - python

Given a list of tuples that represent edges:
edges = [(2, 4), (3, 4), (6, 8), (6, 9), (7, 10), (11, 13)]
I want to merge or blend those edges to get a list of merged tuples, for example (2, 4), (3, 4) will be merged into (2, 4).
The final output of the the list above should look like:
[(2, 4), (6, 10), (11, 13)]
My idea is to use a double for loop to iterate over the list and find intersections and substitute the 2 edges with (min(e1[0], e2[0]), max(e1[1], e2[1])) but this method won't
work for all cases.
Any good thoughts?

Here's my solution:
edges = [(2, 4), (3, 4), (6, 8), (6, 9), (7, 10), (11, 13)]
edges = sorted(edges, key=lambda x:(x[0], -x[1]))
fused = []
i = 0
while i < len(edges):
start,end = edges[i]
for j in range(i+1, len(edges)):
s,e = edges[j]
if s <= end:
# edges[j] is included in the fused range
# Update end to the greater value
end = max(e, end)
else:
break
fused.append((start, end))
del edges[i:j]
print(fused)
Explanation:
The logic works as follows: we sort the list in ascending order of the start values. If two ranges have the same start value, we arrange them in descending order of their end elements. This way two ranges with the same start value will be 'eaten up' by the range with the farther end value.
Now that the list is sorted in this unique way, there's a nice little property here: If you start from the first range, you can decide whether or not you want to fuse with the next range or not. If you do fuse with it, then update the end of the first range to merge with the 'fusable' range. If you decide NOT to fuse with it, then everything from the first range till now will get fused and added to the new list.
edges = sorted(edges, key=lambda x:(x[0], -x[1]))
Sorts edges in ascending order of the start values and descending order of end values.
del edges[i:j]
Deletes all the fused ranges from the original list. This is important because i always points to the new range that we'll start fusing from.

Related

Longest continuous pairs

1st pair's 1st element should be less than the 2nd pairs 1st element : same for 2nd elements individually in a sorted list of pairs.
xlist = [(3, 9), (4, 6), (5, 7), (6, 0)] # sorted by first element of pair
ylist = [(j,i) for i,j in (sorted([(y,x) for x,y in xlist]))] = [(6, 0), (4, 6), (5, 7), (3, 9)] # sorted by second element of pair
What I want is to find the longest pair that is continuous, i.e. (4, 6), (5, 7)
PS. there can be other continuous pairs like that, but is there a way to extract the longest continuous pairs?
(4, 6), (5, 7) is determined as the longest pair based on the fact that the next pair's 1st element(5) is less than current(4). The next pair's 2nd element(7) is less than current(6) (Basically 5 > 4 and 6 > 7). And lets add another element to that list say (8, 10); this is added to the output sequence as well, as 8 > 5 and 10 > 7. So the longest pairs become (4, 6), (5, 7), (8, 10)
If you mean the maximal common subsequence of the two lists, here a code using difflib that do what you want. I don't know exactly the implementation of SequenceMatcher but it seems quite optimized as it avoids for-loop in the whole lists:
from difflib import SequenceMatcher
xlist = [(3, 9), (4, 6), (5, 7), (6, 0), (7, 8)]
ylist = [(6, 0), (4, 6), (5, 7), (7, 8), (3, 9)]
out = SequenceMatcher(None, xlist, ylist).get_matching_blocks()
max_block = max(out, key=lambda x: x.size)
start, end = max_block.a, max_block.a + max_block.size
out = xlist[start:end]
print(out) # [(4, 6), (5, 7)]
If you mean the longest increasing sequence of the second coordinate in xlist (same as previously but allowing "skips" in sequence), you can go with:
xlist = [(3, 9), (4, 6), (5, 7), (6, 0), (7, 8)]
def find_lis_2nd_coord(pairs: List[Tuple]) -> List[Tuple]:
"""Find longest increasing subsequence (LIS).
LIS is determined along 2nd coordinate of input pairs.
"""
# lis[i] stores the longest increasing subsequence of sublist
# `pairs[0…i][1]` that ends with `pairs[i][1]`
lis = [[] for _ in range(len(pairs))]
# lis[0] denotes the longest increasing subsequence ending at `pairs[0][1]`
lis[0].append(pairs[0])
# Start from the second element in the list
for i in range(1, len(pairs)):
# Do for each element in sublist `pairs[0…i-1][1]`
for j in range(i):
# Find the longest increasing subsequence that ends with
# `pairs[j][1]` where it is less than the current element
# `pairs[i][1]`
if pairs[j][1] < pairs[i][1] and len(lis[j]) > len(lis[i]):
lis[i] = lis[j].copy()
# include `pairs[i]` in `lis[i]`
lis[i].append(pairs[i])
return max(lis, key=len)
print(find_lis_2nd_coord(xlist)) # [(4, 6), (5, 7), (7, 8)]
Disclaimer: this version is O(n^2) but I didn't find more optimized idea or implementation. At least it works.

Need clarification on Sort by Frequency of second element in Tuple List program

from collections import Counter
test_list = [(6, 5), (2, 7), (2, 5), (8, 7), (9, 8), (3, 7)]
freq_2ndEle=Counter(val for key,val in test_list)
res=sorted(test_list,key=lambda ele:freq_2ndEle[ele[1]],reverse=True)
print(res)
Input : test_list = [(6, 5), (1, 7), (2, 5), (8, 7), (9, 8), (3, 7)]
Output : [(1, 7), (8, 7), (3, 7), (6, 5), (2, 5), (9, 8)]
Explanation : 7 occurs 3 times as 2nd element, hence all tuples with 7, are aligned first.
please clarify how the code is working especially, this part
res=sorted(test_list,key=lambda ele:freq_2ndEle[ele[1]],reverse=True)
I have confusion on ele:freq_2ndEle[ele[1]].
Here is an explanation - in the future, you should try following similar steps, including reading the documentation:
Counter takes an iterable or a map as an argument. In your case, val for key,val in test_list is an iterable. You fetch values from test_list and feed them to Counter.
You don't need the key, val semantics, it is confusing in this context, as it suggests you are looping through a dictionary. Instead, you are looping through a list of tuples so freq_2ndEle=Counter(tp[1] for tp in test_list) is much clearer - here you access the second tuple element, indexed with 1.
Counter gives you number of occurrences of each of the second tuple elements. If you print freq_2ndEle, you will see this:
Counter({7: 3, 5: 2, 8: 1}), which is a pair of how many times each second element appears in the list.
In the last step you're sorting the original list by the frequency of the second element using sorted,
res=sorted(test_list,key=lambda ele:freq_2ndEle[ele[1]],reverse=True)
So you take in test_list as an argument to sort, and then you specify the key by which you want to sort: in your case the key is the the time second tuple element occurred.
freq_2ndEle stores key-value pairs of second second element name:times it ocurred in test_list - it is a dictionary in a way, so you access it as you would access a dictionary, that is - you get the value that corresponds to ele[1] which is the (name) of the second tuple element. Name is not the base term, but I thought it may be clearer. The value you fetch with freq_2ndEle[ele[1]] is exactly the time ele[1] occurred in test_list
Lastly, you sort the keys, but in reverse order - that is, descending, highest to lowest, [(2, 7), (8, 7), (3, 7), (6, 5), (2, 5), (9, 8)] with the values that have the same keys (like 7 and 5) grouped together. Note, according to the documentation sorted is stable, meaning it will preserve the order of elements from input, and this is why when the keys are the same, you get them in the order as in test_list i.e. (2,7) goes first and (3,7) last in the "7" group.
freq_2ndEle is a dictionary that contains the second elements of the tuple as keys, and their frequencies as values. Passing this frequency as a return value of lambda in the key argument of the function sorted will sort the list by this return value of lambda (which is the frequency).
If your question is about how lambda works, you can refer to this brief explanation which is pretty simple.

Python 3 - Reducing a list by cyclical reconstructing a given set

I am looking for an algorithm that reduces a list of tuples cyclical by reconstructing a given set as pattern.
Each tuple contains an id and a set, like (1, {'xy'}).
Example
query = {'xyz'}
my_dict = [(1, {'x'}), (2, {'yx'}), (3, {'yz'}),
(4, {'z'}), (5, {'x'}), (6, {'y'}),
(7, {'xyz'}), (8, {'xy'}), (9, {'x'}),]
The goal is to recreate the pattern xyz as often as possible given the second value of the tuples in my_dict. Remaining elements from which the query set can not be completely reconstructed shall be cut off, hence 'reduce'.
my_dict contains in total: 6 times x, 5 times y, 3 times z.
Considering the my_dict, valid solutions would be for example:
result_1 = [(7, {'xyz'}), (8, {'xy'}), (4, {'z'}), (1, {'x'}), (3, {'yz'})]
result_2 = [(7, {'xyz'}), (2, {'yx'}), (4, {'z'}), (1, {'x'}), (3, {'yz'})]
result_3 = [(7, {'xyz'}), (9, {'x'}), (6, {'y'}), (4, {'z'}), (1, {'x'}), (3, {'yz'})]
The order of the tuples in the list is NOT important, i sorted them in the order of the query pattern xyz for the purpose of illustration.
Goal
The goal is to have a list of tuples where the total number of occurrences of the elements from the query set is most optimal evenly distributed.
result_1, result_2 and result_3 all contain in total: 3 times x, 3 times y, 3 times z.
Does anyone know a way/ approach how to do this?
Thanks for your help!
Depending on your application context, a naive brute-force approach might be enough: using the powerset function from this SO answer,
def find_solutions(query, supply):
for subset in powerset(supply):
if is_solution(query, subset):
yield subset
You would need to implement a function is_solution(query, subset) that returns True when the given subset of the supply (my_dict.values()) is a valid solution for the given query.

How to select increasing elements of a list of tuples?

I have the following list of tuples:
a = [(1, 2), (2, 4), (3, 1), (4, 4), (5, 2), (6, 8), (7, -1)]
I would like to select the elements which second value in the tuple is increasing compared to the previous one. For example I would select (2, 4) because 4 is superior to 2, in the same manner I would select (4, 4) and (6, 8).
Can this be done in a more elegant way than a loop starting explicitly on the second element ?
To clarify, I want to select the tuples which second elements are superior to the second element of the prior tuple.
>>> [right for left, right in pairwise(a) if right[1] > left[1]]
[(2, 4), (4, 4), (6, 8)]
Where pairwise is an itertools recipe you can find here.
You can use a list comprehension to do this fairly easily:
a = [a[i] for i in range(1, len(a)) if a[i][1] > a[i-1][1]]
This uses range(1, len(a)) to start from the second element in the list, then compares the second value in each tuple with the second value from the preceding tuple to determine whether it should be in the new list.
Alternatively, you could use zip to generate pairs of neighbouring tuples:
a = [two for one, two in zip(a, a[1:]) if two[1] > one[1]]
You can use enumerate to derive indices and then make list comprehension:
a = [t[1] for t in enumerate(a[1:]) if t[1][1] > a[t[0]-1][1]]
You can use list comprehension
[i for i in a if (i[0] < i[1])]
Returns
[(1, 2), (2, 4), (6, 8)]
Edit: I was incorrect in my understanding of the question. The above code will return all tuples in which the second element is greater than the first. This is not the answer to the question OP asked.

Sort dict of tuples by either tuple value with customised comparator

I'm working on some python dicts of tuples. Each tuple containing 2 ints. The first digit in the tuple is reffered to as value and the second digit is referred to as work. I have 3 different comparators and I need to sort the dicts into descending order. This order should be determined by which comparator is called. i.e. the dict can be sorted 3 different ways. I've tried as many different ways as I could find to get this to work. I can do it without using the comparator by just breaking it up into a list and sorting by slicing the tuples but if anyone can shed some light on the syntax to sort using the comparators it would be greatly appreciated. Mine seems to be returning correctly for cmpWork but the other 2 aren't reversed.
Also it would be great if I could get the dict sorted by the tuple values.
I got a sort working with
sortedSubjects = sorted(tmpSubjects.iteritems(), key = operator.itemgetter(1), reverse = True)
but this doesn't let me slice the tuples.
First time posting noob so apologies for any mistakes.
def cmpValue(subInfo1, subInfo2):
return cmp(subInfo2[0] , subInfo1[0])
def cmpWork(subInfo1, subInfo2):
return cmp(subInfo1[1] , subInfo2[1])
def cmpRatio(subInfo1, subInfo2):
return cmp((float(subInfo2[0]) / subInfo2[1]) , (float(subInfo1[0]) / subInfo1[1]))
def greedyAdvisor(subjects, comparator):
tmpSubjects = subjects.copy()
sortedSubjects = sorted(tmpSubjects.values(), comparator, reverse = True)
print sortedSubjects
smallCatalog = {'6.00': (16, 8),'1.00': (7, 7),'6.01': (5, 3),'15.01': (9, 6)}
greedyAdvisor(smallCatalog, cmpRatio)
greedyAdvisor(smallCatalog, cmpValue)
greedyAdvisor(smallCatalog, cmpWork)
[(7, 7), (9, 6), (5, 3), (16, 8)]
[(5, 3), (7, 7), (9, 6), (16, 8)]
[(16, 8), (7, 7), (9, 6), (5, 3)]
ps
The line
sortedSubjects = sorted(tmpSubjects.iteritems(), key = operator.itemgetter(1), reverse = True)
returns
[('6.00', (16, 8)), ('15.01', (9, 6)), ('1.00', (7, 7)), ('6.01', (5, 3))]
which is almost exactly what I'm looking for except that I can't sort by the second value in the tuple and I can't sort by cmpRatio either.
but this doesn't let me slice the tuples
Starting with your example:
sortedSubjects = sorted(tmpSubjects.iteritems(),
key=operator.itemgetter(1),
cmp=comparator, # What about specifying the comparison?
reverse=True)
If you need to sort dictionary - use collections.OrderedDict
E.g., sort by 1st element of tuple
OrderedDict(sorted(smallCatalog.items(), key=lambda e:e[1][0]))
Out[109]: OrderedDict([('6.01', (5, 3)), ('1.00', (7, 7)), ('15.01', (9, 6)), ('6.00', (16, 8))])

Categories