How to use bisect.insort_left with a key?

How to use bisect.insort_left with a key? - python

Doc's are lacking an example...How do you use bisect.insort_left)_ based on a key?
Trying to insert based on key.
bisect.insort_left(data, ('brown', 7))
puts insert at data[0].
From docs...
bisect.insort_left(a, x, lo=0, hi=len(a))
Insert x in a in sorted order. This is equivalent to a.insert(bisect.bisect_left(a, x, lo, hi), x) assuming that a is already sorted. Keep in mind that the O(log n) search is dominated by the slow O(n) insertion step.
Sample usage:
>>> data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)]
>>> data.sort(key=lambda r: r[1])
>>> keys = [r[1] for r in data] # precomputed list of keys
>>> data[bisect_left(keys, 0)]
('black', 0)
>>> data[bisect_left(keys, 1)]
('blue', 1)
>>> data[bisect_left(keys, 5)]
('red', 5)
>>> data[bisect_left(keys, 8)]
('yellow', 8)
>>>
I want to put ('brown', 7) after ('red', 5) on sorted list in data using bisect.insort_left. Right now bisect.insort_left(data, ('brown', 7)) puts ('brown', 7) at data[0]...because I am not using the keys to do insert...docs don't show to do inserts using the keys.

You could wrap your iterable in a class that implements __getitem__ and __len__. This allows you the opportunity to use a key with bisect_left. If you set up your class to take the iterable and a key function as arguments.
To extend this to be usable with insort_left it's required to implement the insert method. The problem here is that if you do that is that insort_left will try to insert your key argument into the list containing the objects of which the the key is a member.
An example is clearer
from bisect import bisect_left, insort_left
class KeyWrapper:
def __init__(self, iterable, key):
self.it = iterable
self.key = key
def __getitem__(self, i):
return self.key(self.it[i])
def __len__(self):
return len(self.it)
def insert(self, index, item):
print('asked to insert %s at index%d' % (item, index))
self.it.insert(index, {"time":item})
timetable = [{"time": "0150"}, {"time": "0250"}, {"time": "0350"}, {"time": "0450"}, {"time": "0550"}, {"time": "0650"}, {"time": "0750"}]
bslindex = bisect_left(KeyWrapper(timetable, key=lambda t: t["time"]), "0359")
islindex = insort_left(KeyWrapper(timetable, key=lambda t: t["time"]), "0359")
See how in my insert method I had to make it specific to the timetable dictionary otherwise insort_left would try insert "0359" where it should insert {"time": "0359"}?
Ways round this could be to construct a dummy object for the comparison, inherit from KeyWrapper and override insert or pass some sort of factory function to create the object. None of these ways are particularly desirable from an idiomatic python point of view.
So the easiest way is to just use the KeyWrapper with bisect_left, which returns you the insert index and then do the insert yourself. You could easily wrap this in a dedicated function.
e.g.
bslindex = bisect_left(KeyWrapper(timetable, key=lambda t: t["time"]), "0359")
timetable.insert(bslindex, {"time":"0359"})
In this case ensure you don't implement insert, so you will be immediately aware if you accidentally pass a KeyWrapper to a mutating function like insort_left which probably wouldn't do the right thing.
To use your example data
from bisect import bisect_left
class KeyWrapper:
def __init__(self, iterable, key):
self.it = iterable
self.key = key
def __getitem__(self, i):
return self.key(self.it[i])
def __len__(self):
return len(self.it)
data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)]
data.sort(key=lambda c: c[1])
newcol = ('brown', 7)
bslindex = bisect_left(KeyWrapper(data, key=lambda c: c[1]), newcol[1])
data.insert(bslindex, newcol)
print(data)
Here is the class with proper typing:
from typing import TypeVar, Generic, Sequence, Callable
T = TypeVar('T')
V = TypeVar('V')
class KeyWrapper(Generic[T, V]):
def __init__(self, iterable: Sequence[T], key: Callable[[T], V]):
self.it = iterable
self.key = key
def __getitem__(self, i: int) -> V:
return self.key(self.it[i])
def __len__(self) -> int:
return len(self.it)

This does essentially the same thing the SortedCollection recipe does that the bisect documentation mentions in its See also: section at the end, but unlike the insert() method in the recipe, the function shown supports a key-function.
What's being done is a separate sorted keys list is maintained in parallel with the sorted data list to improve performance (it's faster than creating the keys list before each insertion, but keeping it around and updating it isn't strictly required). The ActiveState recipe encapsulated this for you within a class, but in the code below they're just two separate independent lists being passed around (so it'd be easier for them to get out of sync than it would be if they were both held in an instance of the recipe's class).
from bisect import bisect_left
def insert(seq, keys, item, keyfunc=lambda v: v):
"""Insert an item into a sorted list using a separate corresponding
sorted keys list and a keyfunc() to extract the key from each item.
Based on insert() method in SortedCollection recipe:
http://code.activestate.com/recipes/577197-sortedcollection/
"""
k = keyfunc(item) # Get key.
i = bisect_left(keys, k) # Determine where to insert item.
keys.insert(i, k) # Insert key of item to keys list.
seq.insert(i, item) # Insert the item itself in the corresponding place.
# Initialize the sorted data and keys lists.
data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)]
data.sort(key=lambda r: r[1]) # Sort data by key value
keys = [r[1] for r in data] # Initialize keys list
print(data) # -> [('black', 0), ('blue', 1), ('red', 5), ('yellow', 8)]
insert(data, keys, ('brown', 7), keyfunc=lambda x: x[1])
print(data) # -> [('black', 0), ('blue', 1), ('red', 5), ('brown', 7), ('yellow', 8)]
Follow-on question:
    Can bisect.insort_left be used?
No, you can't simply use the bisect.insort_left() function to do this because it wasn't written in a way that supports a key-function—instead it just compares the whole item passed to it to insert, x, with one of the whole items in the array in its if a[mid] < x: statement. You can see what I mean by looking at the source for the bisect module in Lib/bisect.py.
Here's the relevant excerpt:
def insort_left(a, x, lo=0, hi=None):
"""Insert item x in list a, and keep it sorted assuming a is sorted.
If x is already in a, insert it to the left of the leftmost x.
Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
"""
if lo < 0:
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
if a[mid] < x: lo = mid+1
else: hi = mid
a.insert(lo, x)
You could modify the above to accept an optional key-function argument and use it:
def my_insort_left(a, x, lo=0, hi=None, keyfunc=lambda v: v):
x_key = keyfunc(x) # Get comparison value.
. . .
if keyfunc(a[mid]) < x_key: # Compare key values.
lo = mid+1
. . .
...and call it like this:
my_insort_left(data, ('brown', 7), keyfunc=lambda v: v[1])
Actually, if you're going to write a custom function, for the sake of more efficiency at the expense of unneeded generality, you could dispense with the adding of a generic key function argument and just hardcode everything to operate the way needed with the data format you have. This will avoid the overhead of repeated calls to a key-function while doing the insertions.
def my_insort_left(a, x, lo=0, hi=None):
x_key = x[1] # Key on second element of each item in sequence.
. . .
if a[mid][1] < x_key: lo = mid+1 # Compare second element to key.
. . .
...called this way without passing keyfunc:
my_insort_left(data, ('brown', 7))

Add comparison methods to your class
Sometimes this is the least painful way, especially if you already have a class and just want to sort by a key from it:
#!/usr/bin/env python3
import bisect
import functools
#functools.total_ordering
class MyData:
def __init__(self, color, number):
self.color = color
self.number = number
def __lt__(self, other):
return self.number < other.number
def __str__(self):
return '{} {}'.format(self.color, self.number)
mydatas = [
MyData('red', 5),
MyData('blue', 1),
MyData('yellow', 8),
MyData('black', 0),
]
mydatas_sorted = []
for mydata in mydatas:
bisect.insort(mydatas_sorted, mydata)
for mydata in mydatas_sorted:
print(mydata)
Output:
black 0
blue 1
red 5
yellow 8
See also: "Enabling" comparison for classes
Tested in Python 3.5.2.
Upstream requests/patches
I get the feeling this is going to happen sooner or later ;-)
https://github.com/python/cpython/pull/13970
https://bugs.python.org/issue4356

As of Python 3.10, all the binary search helpers in the bisect module now accept a key argument:
key specifies a key function of one argument that is used to extract a
comparison key from each input element. The default value is None
(compare the elements directly).
Therefore, you can pass the same function you used to sort the data:
>>> import bisect
>>> data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)]
>>> data.sort(key=lambda r: r[1])
>>> data
[('black', 0), ('blue', 1), ('red', 5), ('yellow', 8)]
>>> bisect.insort_left(data, ('brown', 7), key=lambda r: r[1])
>>> data
[('black', 0), ('blue', 1), ('red', 5), ('brown', 7), ('yellow', 8)]

If your goal is to mantain a list sorted by key, performing usual operations like bisect insert, delete and update, I think sortedcontainers should suit your needs as well, and you'll avoid O(n) inserts.

From python version 3.10, the key argument has been added.
It will be something like:
import bisect
bisect.bisect_left(('brown', 7), data, key=lambda r: r[1])
Sources:
GitHub feature request
Documentation for version 3.10
See that documentation for version 3.9 does not have the key argument.

Related

Sort list of objects based on multiple attributes of the objects? [duplicate]

I have a list of lists:
[[12, 'tall', 'blue', 1],
[2, 'short', 'red', 9],
[4, 'tall', 'blue', 13]]
If I wanted to sort by one element, say the tall/short element, I could do it via s = sorted(s, key = itemgetter(1)).
If I wanted to sort by both tall/short and colour, I could do the sort twice, once for each element, but is there a quicker way?

A key can be a function that returns a tuple:
s = sorted(s, key = lambda x: (x[1], x[2]))
Or you can achieve the same using itemgetter (which is faster and avoids a Python function call):
import operator
s = sorted(s, key = operator.itemgetter(1, 2))
And notice that here you can use sort instead of using sorted and then reassigning:
s.sort(key = operator.itemgetter(1, 2))

I'm not sure if this is the most pythonic method ...
I had a list of tuples that needed sorting 1st by descending integer values and 2nd alphabetically. This required reversing the integer sort but not the alphabetical sort. Here was my solution: (on the fly in an exam btw, I was not even aware you could 'nest' sorted functions)
a = [('Al', 2),('Bill', 1),('Carol', 2), ('Abel', 3), ('Zeke', 2), ('Chris', 1)]
b = sorted(sorted(a, key = lambda x : x[0]), key = lambda x : x[1], reverse = True)
print(b)
[('Abel', 3), ('Al', 2), ('Carol', 2), ('Zeke', 2), ('Bill', 1), ('Chris', 1)]

Several years late to the party but I want to both sort on 2 criteria and use reverse=True. In case someone else wants to know how, you can wrap your criteria (functions) in parenthesis:
s = sorted(my_list, key=lambda i: ( criteria_1(i), criteria_2(i) ), reverse=True)

It appears you could use a list instead of a tuple.
This becomes more important I think when you are grabbing attributes instead of 'magic indexes' of a list/tuple.
In my case I wanted to sort by multiple attributes of a class, where the incoming keys were strings. I needed different sorting in different places, and I wanted a common default sort for the parent class that clients were interacting with; only having to override the 'sorting keys' when I really 'needed to', but also in a way that I could store them as lists that the class could share
So first I defined a helper method
def attr_sort(self, attrs=['someAttributeString']:
'''helper to sort by the attributes named by strings of attrs in order'''
return lambda k: [ getattr(k, attr) for attr in attrs ]
then to use it
# would defined elsewhere but showing here for consiseness
self.SortListA = ['attrA', 'attrB']
self.SortListB = ['attrC', 'attrA']
records = .... #list of my objects to sort
records.sort(key=self.attr_sort(attrs=self.SortListA))
# perhaps later nearby or in another function
more_records = .... #another list
more_records.sort(key=self.attr_sort(attrs=self.SortListB))
This will use the generated lambda function sort the list by object.attrA and then object.attrB assuming object has a getter corresponding to the string names provided. And the second case would sort by object.attrC then object.attrA.
This also allows you to potentially expose outward sorting choices to be shared alike by a consumer, a unit test, or for them to perhaps tell you how they want sorting done for some operation in your api by only have to give you a list and not coupling them to your back end implementation.

convert the list of list into a list of tuples then sort the tuple by multiple fields.
data=[[12, 'tall', 'blue', 1],[2, 'short', 'red', 9],[4, 'tall', 'blue', 13]]
data=[tuple(x) for x in data]
result = sorted(data, key = lambda x: (x[1], x[2]))
print(result)
output:
[(2, 'short', 'red', 9), (12, 'tall', 'blue', 1), (4, 'tall', 'blue', 13)]

Here's one way: You basically re-write your sort function to take a list of sort functions, each sort function compares the attributes you want to test, on each sort test, you look and see if the cmp function returns a non-zero return if so break and send the return value.
You call it by calling a Lambda of a function of a list of Lambdas.
Its advantage is that it does single pass through the data not a sort of a previous sort as other methods do. Another thing is that it sorts in place, whereas sorted seems to make a copy.
I used it to write a rank function, that ranks a list of classes where each object is in a group and has a score function, but you can add any list of attributes.
Note the un-lambda-like, though hackish use of a lambda to call a setter.
The rank part won't work for an array of lists, but the sort will.
#First, here's a pure list version
my_sortLambdaLst = [lambda x,y:cmp(x[0], y[0]), lambda x,y:cmp(x[1], y[1])]
def multi_attribute_sort(x,y):
r = 0
for l in my_sortLambdaLst:
r = l(x,y)
if r!=0: return r #keep looping till you see a difference
return r
Lst = [(4, 2.0), (4, 0.01), (4, 0.9), (4, 0.999),(4, 0.2), (1, 2.0), (1, 0.01), (1, 0.9), (1, 0.999), (1, 0.2) ]
Lst.sort(lambda x,y:multi_attribute_sort(x,y)) #The Lambda of the Lambda
for rec in Lst: print str(rec)
Here's a way to rank a list of objects
class probe:
def __init__(self, group, score):
self.group = group
self.score = score
self.rank =-1
def set_rank(self, r):
self.rank = r
def __str__(self):
return '\t'.join([str(self.group), str(self.score), str(self.rank)])
def RankLst(inLst, group_lambda= lambda x:x.group, sortLambdaLst = [lambda x,y:cmp(x.group, y.group), lambda x,y:cmp(x.score, y.score)], SetRank_Lambda = lambda x, rank:x.set_rank(rank)):
#Inner function is the only way (I could think of) to pass the sortLambdaLst into a sort function
def multi_attribute_sort(x,y):
r = 0
for l in sortLambdaLst:
r = l(x,y)
if r!=0: return r #keep looping till you see a difference
return r
inLst.sort(lambda x,y:multi_attribute_sort(x,y))
#Now Rank your probes
rank = 0
last_group = group_lambda(inLst[0])
for i in range(len(inLst)):
rec = inLst[i]
group = group_lambda(rec)
if last_group == group:
rank+=1
else:
rank=1
last_group = group
SetRank_Lambda(inLst[i], rank) #This is pure evil!! The lambda purists are gnashing their teeth
Lst = [probe(4, 2.0), probe(4, 0.01), probe(4, 0.9), probe(4, 0.999), probe(4, 0.2), probe(1, 2.0), probe(1, 0.01), probe(1, 0.9), probe(1, 0.999), probe(1, 0.2) ]
RankLst(Lst, group_lambda= lambda x:x.group, sortLambdaLst = [lambda x,y:cmp(x.group, y.group), lambda x,y:cmp(x.score, y.score)], SetRank_Lambda = lambda x, rank:x.set_rank(rank))
print '\t'.join(['group', 'score', 'rank'])
for r in Lst: print r

There is a operator < between lists e.g.:
[12, 'tall', 'blue', 1] < [4, 'tall', 'blue', 13]
will give
False

Converting 2 list and one string to dictionary

P.S: Thank you everybody ,esp Matthias Fripp . Just reviewed the question You are right I made mistake : String is value not the key
num=[1,2,3,4,5,6]
pow=[1,4,9,16,25,36]
s= ":subtraction"
dic={1:1 ,0:s , 2:4,2:s, 3:9,6:s, 4:16,12:s.......}
There is easy way to convert two list to dictionary :
newdic=dict(zip(list1,list2))
but for this problem no clue even with comprehension:
print({num[i]:pow[i] for i in range(len(num))})

As others have said, dict cannot contain duplicate keys. You can make key duplicate with a little bit of tweaking. I used OrderedDict to keep order of inserted keys:
from pprint import pprint
from collections import OrderedDict
num=[1,2,3,4,5,6]
pow=[1,4,9,16,25,36]
pprint(OrderedDict(sum([[[a, b], ['substraction ({}-{}):'.format(a, b), a-b]] for a, b in zip(num, pow)], [])))
Prints:
OrderedDict([(1, 1),
('substraction (1-1):', 0),
(2, 4),
('substraction (2-4):', -2),
(3, 9),
('substraction (3-9):', -6),
(4, 16),
('substraction (4-16):', -12),
(5, 25),
('substraction (5-25):', -20),
(6, 36),
('substraction (6-36):', -30)])

In principle, this would do what you want:
nums = [(n, p) for (n, p) in zip(num, pow)]
diffs = [('subtraction', p-n) for (n, p) in zip(num, pow)]
items = nums + diffs
dic = dict(items)
However, a dictionary cannot have multiple items with the same key, so each of your "subtraction" items will be replaced by the next one added to the dictionary, and you'll only get the last one. So you might prefer to work with the items list directly.
If you need the items list sorted as you've shown, that will take a little more work. Maybe something like this:
items = []
for n, p in zip(num, pow):
items.append((n, p))
items.append(('subtraction', p-n))
# the next line will drop most 'subtraction' entries, but on
# Python 3.7+, it will at least preserve the order (not possible
# with earlier versions of Python)
dic = dict(items)

How can you slice with string keys instead of integers on a python OrderedDict?

Since an OrderedDict has the features of both a list (with ordered elements), and a dictionary (with keys instead of indexes), it would seem natural that you could slice using keys.
>>> from collections import OrderedDict
>>> cities = OrderedDict((('san francisco', 650), ('new york', 212), ('shanghai', 8621), ('barcelona', 42423)))
>>> test['shanghai':] # I want all the cities from shanghai to the end of the list
TypeError: unhashable type
What's interesting about this is that it's not the error you'd see due to OrderedDictionary.__getslice__ not being implemented. I tried adding my own __getslice__ method to OrderedDict, but I keep running into this TypeError problem. It seems like Python is doing some kind of type checking to enforce that slice keys are only integers, before they even get passed to the __getslice__ function, how unpythonic!
>>> class BetterOrderedDict(OrderedDict):
def __getslice__(self, start=None, end=None, step=1):
return 'potato'
>>> test = BetterOrderedDict((('one', 1), ('two', 2), ('three', 3), ('four', 4)))
>>> print test[1:4]
'potato' # ok this makes sense so far
>>> test['one':'four']
TypeError: unhashable type # WTF, strings are hashable!
So my question is, why can't I implement non-int slices, what kind of type-checking is preventing the slice keys from even reaching my __getslice__ function, and can I override it by implementing my BetterOrderedDict in C with bindings?

__getslice__ is deprecated way of implementing slicing. Instead you should handle slice objects with __getitem__:
from collections import OrderedDict
class SlicableDict(OrderedDict):
def __getitem__(self, key):
if isinstance(key, slice):
return 'potato({},{},{})'.format(key.start, key.stop, key.step)
return super(SlicableDict, self).__getitem__(key)
>>> s = SlicableDict(a=1, b=2, c=3)
>>> s
SlicableDict([('a', 1), ('c', 3), ('b', 2)])
>>> s['a']
1
>>> s['a':'c']
'potato(a,c,None)'
And if you need more than potato, than you can implement all three slicing operations following way:
def _key_slice_to_index_slice(items, key_slice):
try:
if key_slice.start is None:
start = None
else:
start = next(idx for idx, (key, value) in enumerate(items)
if key == key_slice.start)
if key_slice.stop is None:
stop = None
else:
stop = next(idx for idx, (key, value) in enumerate(items)
if key == key_slice.stop)
except StopIteration:
raise KeyError
return slice(start, stop, key_slice.step)
class SlicableDict(OrderedDict):
def __getitem__(self, key):
if isinstance(key, slice):
items = self.items()
index_slice = _key_slice_to_index_slice(items, key)
return SlicableDict(items[index_slice])
return super(SlicableDict, self).__getitem__(key)
def __setitem__(self, key, value):
if isinstance(key, slice):
items = self.items()
index_slice = _key_slice_to_index_slice(items, key)
items[index_slice] = value.items()
self.clear()
self.update(items)
return
return super(SlicableDict, self).__setitem__(key, value)
def __delitem__(self, key):
if isinstance(key, slice):
items = self.items()
index_slice = _key_slice_to_index_slice(items, key)
del items[index_slice]
self.clear()
self.update(items)
return
return super(SlicableDict, self).__delitem__(key)

This is the actual implementation of the slicing feature you are expecting.
OrderedDict internally maintains the order of the keys in the form of a doubly linked list. Quoting the actual comment from Python 2.7.9,
# The internal self.__map dict maps keys to links in a doubly linked list.
# The circular doubly linked list starts and ends with a sentinel element.
# The sentinel element never gets deleted (this simplifies the algorithm).
# Each link is stored as a list of length three: [PREV, NEXT, KEY].
Now, to slice the dictionary, we need to iterate the doubly linked list, __root, which is actually a private variable, protected by the name mangling mechanism.
Note: This involves hacky name unmangling to use the OrderedDict's internal data structures.
from collections import OrderedDict
class SlicableDict(OrderedDict):
def __getitem__(self, key):
if isinstance(key, slice):
# Unmangle `__root` to access the doubly linked list
root = getattr(self, "_OrderedDict__root")
# By default, make `start` as the first element, `end` as the last
start, end = root[1][2], root[0][2]
start = key.start or start
end = key.stop or end
step = key.step or 1
curr, result, begun, counter = root[1], [], False, 0
# Begin iterating
curr, result, begun = root[1], [], False
while curr is not root:
# If the end value is reached, `break` and `return`
if curr[2] == end:
break
# If starting value is matched, start appending to `result`
if curr[2] == start:
begun = True
if begun:
if counter % step == 0:
result.append((curr[2], self[curr[2]]))
counter += 1
# Make the `curr` point to the next element
curr = curr[1]
return result
return super(SlicableDict, self).__getitem__(key)
Few sample runs:
>>> s = SlicableDict(a=1, b=2, c=3, d=4)
>>> s
SlicableDict([('a', 1), ('c', 3), ('b', 2), ('e', 5), ('d', 4), ('f', 6)])
>>> s['a':'c']
[('a', 1)]
>>> s['a':]
[('a', 1), ('c', 3), ('b', 2), ('e', 5), ('d', 4)]
>>> s[:'a']
[]
>>> s['a':'f':2]
[('a', 1), ('b', 2), ('d', 4)]

Try this (very ugly) implementation
class SliceOrdered(OrderedDict):
def __getitem__(self, key):
if isinstance(key, slice):
tmp = OrderedDict()
i_self = iter(self)
for k in i_self:
if key.start <= k <= key.stop:
tmp[k] = self[k]
if key.step is not None and key.step > 1:
for _ in range(key.step-1):
try:
next(i_self)
except StopIteration:
break
return tmp
else:
return super(SliceOrdered, self).__getitem__(key)
DEMO (Python3.4)
>>> s = SliceOrdered([('a',2), ('b',2), ('c',3), ('d',4)])
>>> s['a':'c']
OrderedDict([('a', 2), ('b', 2), ('c', 3)])
>>> s['a':'d':2]
OrderedDict([('a', 2), ('c', 3)])
N.B. this probably only works because in this example, the OrderedDict was not only ordered, but also sorted. In an unsorted dictionary the slice 'a':'c' does not necessary contain 'b', so my if key.start <= k <= key.stop logic probably fails. The following code should respect that:
class SliceOrdered(OrderedDict):
def __getitem__(self, key):
if not isinstance(key, slice):
return super(SliceOrdered,self).__getitem__(key)
tmp = OrderedDict()
step = key.step or 1
accumulating = False
i_self = iter(self)
for k in i_self:
if k == key.start:
accumulating = True
if accumulating:
tmp[k] = self[k]
for _ in range(step-1):
next(i_self)
if k == key.stop:
accumulating = False
break
return tmp

Merging a list of time-range tuples that have overlapping time-ranges

I have a list of tuples where each tuple is a (start-time, end-time). I am trying to merge all overlapping time ranges and return a list of distinct time ranges.
For example
[(1, 5), (2, 4), (3, 6)] ---> [(1,6)]
[(1, 3), (2, 4), (5, 8)] ---> [(1, 4), (5,8)]
Here is how I implemented it.
# Algorithm
# initialranges: [(a,b), (c,d), (e,f), ...]
# First we sort each tuple then whole list.
# This will ensure that a<b, c<d, e<f ... and a < c < e ...
# BUT the order of b, d, f ... is still random
# Now we have only 3 possibilities
#================================================
# b<c<d: a-------b Ans: [(a,b),(c,d)]
# c---d
# c<=b<d: a-------b Ans: [(a,d)]
# c---d
# c<d<b: a-------b Ans: [(a,b)]
# c---d
#================================================
def mergeoverlapping(initialranges):
i = sorted(set([tuple(sorted(x)) for x in initialranges]))
# initialize final ranges to [(a,b)]
f = [i[0]]
for c, d in i[1:]:
a, b = f[-1]
if c<=b<d:
f[-1] = a, d
elif b<c<d:
f.append((c,d))
else:
# else case included for clarity. Since
# we already sorted the tuples and the list
# only remaining possibility is c<d<b
# in which case we can silently pass
pass
return f
I am trying to figure out if
Is the a an built-in function in some python module that can do this more efficiently? or
Is there a more pythonic way of accomplishing the same goal?
Your help is appreciated. Thanks!

A few ways to make it more efficient, Pythonic:
Eliminate the set() construction, since the algorithm should prune out duplicates during in the main loop.
If you just need to iterate over the results, use yield to generate the values.
Reduce construction of intermediate objects, for example: move the tuple() call to the point where the final values are produced, saving you from having to construct and throw away extra tuples, and reuse a list saved for storing the current time range for comparison.
Code:
def merge(times):
saved = list(times[0])
for st, en in sorted([sorted(t) for t in times]):
if st <= saved[1]:
saved[1] = max(saved[1], en)
else:
yield tuple(saved)
saved[0] = st
saved[1] = en
yield tuple(saved)
data = [
[(1, 5), (2, 4), (3, 6)],
[(1, 3), (2, 4), (5, 8)]
]
for times in data:
print list(merge(times))

Sort tuples then list, if t1.right>=t2.left => merge
and restart with the new list, ...
-->
def f(l, sort = True):
if sort:
sl = sorted(tuple(sorted(i)) for i in l)
else:
sl = l
if len(sl) > 1:
if sl[0][1] >= sl[1][0]:
sl[0] = (sl[0][0], sl[1][1])
del sl[1]
if len(sl) < len(l):
return f(sl, False)
return sl

The sort part: use standard sorting, it compares tuples the right way already.
sorted_tuples = sorted(initial_ranges)
The merge part. It eliminates duplicate ranges, too, so no need for a set. Suppose you have current_tuple and next_tuple.
c_start, c_end = current_tuple
n_start, n_end = next_tuple
if n_start <= c_end:
merged_tuple = min(c_start, n_start), max(c_end, n_end)
I hope the logic is clear enough.
To peek next tuple, you can use indexed access to sorted tuples; it's a wholly known sequence anyway.

Sort all boundaries then take all pairs where a boundary end is followed by a boundary start.
def mergeOverlapping(initialranges):
def allBoundaries():
for r in initialranges:
yield r[0], True
yield r[1], False
def getBoundaries(boundaries):
yield boundaries[0][0]
for i in range(1, len(boundaries) - 1):
if not boundaries[i][1] and boundaries[i + 1][1]:
yield boundaries[i][0]
yield boundaries[i + 1][0]
yield boundaries[-1][0]
return getBoundaries(sorted(allBoundaries()))
Hm, not that beautiful but was fun to write at least!
EDIT: Years later, after an upvote, I realised my code was wrong! This is the new version just for fun:
def mergeOverlapping(initialRanges):
def allBoundaries():
for r in initialRanges:
yield r[0], -1
yield r[1], 1
def getBoundaries(boundaries):
openrange = 0
for value, boundary in boundaries:
if not openrange:
yield value
openrange += boundary
if not openrange:
yield value
def outputAsRanges(b):
while b:
yield (b.next(), b.next())
return outputAsRanges(getBoundaries(sorted(allBoundaries())))
Basically I mark the boundaries with -1 or 1 and then sort them by value and only output them when the balance between open and closed braces is zero.

Late, but might help someone looking for this. I had a similar problem but with dictionaries. Given a list of time ranges, I wanted to find overlaps and merge them when possible. A little modification to #samplebias answer led me to this:
Merge function:
def merge_range(ranges: list, start_key: str, end_key: str):
ranges = sorted(ranges, key=lambda x: x[start_key])
saved = dict(ranges[0])
for range_set in sorted(ranges, key=lambda x: x[start_key]):
if range_set[start_key] <= saved[end_key]:
saved[end_key] = max(saved[end_key], range_set[end_key])
else:
yield dict(saved)
saved[start_key] = range_set[start_key]
saved[end_key] = range_set[end_key]
yield dict(saved)
Data:
data = [
{'start_time': '09:00:00', 'end_time': '11:30:00'},
{'start_time': '15:00:00', 'end_time': '15:30:00'},
{'start_time': '11:00:00', 'end_time': '14:30:00'},
{'start_time': '09:30:00', 'end_time': '14:00:00'}
]
Execution:
print(list(merge_range(ranges=data, start_key='start_time', end_key='end_time')))
Output:
[
{'start_time': '09:00:00', 'end_time': '14:30:00'},
{'start_time': '15:00:00', 'end_time': '15:30:00'}
]

When using Python 3.7, following the suggestion given by “RuntimeError: generator raised StopIteration” every time I try to run app, the method outputAsRanges from #UncleZeiv should be:
def outputAsRanges(b):
while b:
try:
yield (next(b), next(b))
except StopIteration:
return

Sort a list by multiple attributes?

I have a list of lists:
[[12, 'tall', 'blue', 1],
[2, 'short', 'red', 9],
[4, 'tall', 'blue', 13]]
If I wanted to sort by one element, say the tall/short element, I could do it via s = sorted(s, key = itemgetter(1)).
If I wanted to sort by both tall/short and colour, I could do the sort twice, once for each element, but is there a quicker way?

A key can be a function that returns a tuple:
s = sorted(s, key = lambda x: (x[1], x[2]))
Or you can achieve the same using itemgetter (which is faster and avoids a Python function call):
import operator
s = sorted(s, key = operator.itemgetter(1, 2))
And notice that here you can use sort instead of using sorted and then reassigning:
s.sort(key = operator.itemgetter(1, 2))

I'm not sure if this is the most pythonic method ...
I had a list of tuples that needed sorting 1st by descending integer values and 2nd alphabetically. This required reversing the integer sort but not the alphabetical sort. Here was my solution: (on the fly in an exam btw, I was not even aware you could 'nest' sorted functions)
a = [('Al', 2),('Bill', 1),('Carol', 2), ('Abel', 3), ('Zeke', 2), ('Chris', 1)]
b = sorted(sorted(a, key = lambda x : x[0]), key = lambda x : x[1], reverse = True)
print(b)
[('Abel', 3), ('Al', 2), ('Carol', 2), ('Zeke', 2), ('Bill', 1), ('Chris', 1)]

Several years late to the party but I want to both sort on 2 criteria and use reverse=True. In case someone else wants to know how, you can wrap your criteria (functions) in parenthesis:
s = sorted(my_list, key=lambda i: ( criteria_1(i), criteria_2(i) ), reverse=True)

It appears you could use a list instead of a tuple.
This becomes more important I think when you are grabbing attributes instead of 'magic indexes' of a list/tuple.
In my case I wanted to sort by multiple attributes of a class, where the incoming keys were strings. I needed different sorting in different places, and I wanted a common default sort for the parent class that clients were interacting with; only having to override the 'sorting keys' when I really 'needed to', but also in a way that I could store them as lists that the class could share
So first I defined a helper method
def attr_sort(self, attrs=['someAttributeString']:
'''helper to sort by the attributes named by strings of attrs in order'''
return lambda k: [ getattr(k, attr) for attr in attrs ]
then to use it
# would defined elsewhere but showing here for consiseness
self.SortListA = ['attrA', 'attrB']
self.SortListB = ['attrC', 'attrA']
records = .... #list of my objects to sort
records.sort(key=self.attr_sort(attrs=self.SortListA))
# perhaps later nearby or in another function
more_records = .... #another list
more_records.sort(key=self.attr_sort(attrs=self.SortListB))
This will use the generated lambda function sort the list by object.attrA and then object.attrB assuming object has a getter corresponding to the string names provided. And the second case would sort by object.attrC then object.attrA.
This also allows you to potentially expose outward sorting choices to be shared alike by a consumer, a unit test, or for them to perhaps tell you how they want sorting done for some operation in your api by only have to give you a list and not coupling them to your back end implementation.

convert the list of list into a list of tuples then sort the tuple by multiple fields.
data=[[12, 'tall', 'blue', 1],[2, 'short', 'red', 9],[4, 'tall', 'blue', 13]]
data=[tuple(x) for x in data]
result = sorted(data, key = lambda x: (x[1], x[2]))
print(result)
output:
[(2, 'short', 'red', 9), (12, 'tall', 'blue', 1), (4, 'tall', 'blue', 13)]

Here's one way: You basically re-write your sort function to take a list of sort functions, each sort function compares the attributes you want to test, on each sort test, you look and see if the cmp function returns a non-zero return if so break and send the return value.
You call it by calling a Lambda of a function of a list of Lambdas.
Its advantage is that it does single pass through the data not a sort of a previous sort as other methods do. Another thing is that it sorts in place, whereas sorted seems to make a copy.
I used it to write a rank function, that ranks a list of classes where each object is in a group and has a score function, but you can add any list of attributes.
Note the un-lambda-like, though hackish use of a lambda to call a setter.
The rank part won't work for an array of lists, but the sort will.
#First, here's a pure list version
my_sortLambdaLst = [lambda x,y:cmp(x[0], y[0]), lambda x,y:cmp(x[1], y[1])]
def multi_attribute_sort(x,y):
r = 0
for l in my_sortLambdaLst:
r = l(x,y)
if r!=0: return r #keep looping till you see a difference
return r
Lst = [(4, 2.0), (4, 0.01), (4, 0.9), (4, 0.999),(4, 0.2), (1, 2.0), (1, 0.01), (1, 0.9), (1, 0.999), (1, 0.2) ]
Lst.sort(lambda x,y:multi_attribute_sort(x,y)) #The Lambda of the Lambda
for rec in Lst: print str(rec)
Here's a way to rank a list of objects
class probe:
def __init__(self, group, score):
self.group = group
self.score = score
self.rank =-1
def set_rank(self, r):
self.rank = r
def __str__(self):
return '\t'.join([str(self.group), str(self.score), str(self.rank)])
def RankLst(inLst, group_lambda= lambda x:x.group, sortLambdaLst = [lambda x,y:cmp(x.group, y.group), lambda x,y:cmp(x.score, y.score)], SetRank_Lambda = lambda x, rank:x.set_rank(rank)):
#Inner function is the only way (I could think of) to pass the sortLambdaLst into a sort function
def multi_attribute_sort(x,y):
r = 0
for l in sortLambdaLst:
r = l(x,y)
if r!=0: return r #keep looping till you see a difference
return r
inLst.sort(lambda x,y:multi_attribute_sort(x,y))
#Now Rank your probes
rank = 0
last_group = group_lambda(inLst[0])
for i in range(len(inLst)):
rec = inLst[i]
group = group_lambda(rec)
if last_group == group:
rank+=1
else:
rank=1
last_group = group
SetRank_Lambda(inLst[i], rank) #This is pure evil!! The lambda purists are gnashing their teeth
Lst = [probe(4, 2.0), probe(4, 0.01), probe(4, 0.9), probe(4, 0.999), probe(4, 0.2), probe(1, 2.0), probe(1, 0.01), probe(1, 0.9), probe(1, 0.999), probe(1, 0.2) ]
RankLst(Lst, group_lambda= lambda x:x.group, sortLambdaLst = [lambda x,y:cmp(x.group, y.group), lambda x,y:cmp(x.score, y.score)], SetRank_Lambda = lambda x, rank:x.set_rank(rank))
print '\t'.join(['group', 'score', 'rank'])
for r in Lst: print r

There is a operator < between lists e.g.:
[12, 'tall', 'blue', 1] < [4, 'tall', 'blue', 13]
will give
False

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to use bisect.insort_left with a key? - python

If your goal is to mantain a list sorted by key, performing usual operations like bisect insert, delete and update, I think sortedcontainers should suit your needs as well, and you'll avoid O(n) inserts.

From python version 3.10, the key argument has been added. It will be something like: import bisect bisect.bisect_left(('brown', 7), data, key=lambda r: r[1]) Sources: GitHub feature request Documentation for version 3.10 See that documentation for version 3.9 does not have the key argument.

Related

Sort list of objects based on multiple attributes of the objects? [duplicate]

Converting 2 list and one string to dictionary

How can you slice with string keys instead of integers on a python OrderedDict?

Merging a list of time-range tuples that have overlapping time-ranges

Sort a list by multiple attributes?

Categories

Resources