Does OrderedDict.items() also keeps the order preserved? - python

Assuming that I'm using the following OrderedDict:
order_dict = OrderedDict([("a",1), ("b",2), ("c",3)])
At some point, I would like to get the (key,value) items and define an iterator, and start moving it once desired:
ordered_dict_items_iter = iter(ordered_dict.items())
...
key,val = next(ordered_dict_items_iter)
...
I'd like to know if order_dict.items() will also preserve the same order?
As I observed it seems that it does preserve the order, however I couldn't prove it.

It does. The idea of OrderedDict is that is behaves exactly as a dictionary, but internally it's a list of tuples, representing key-values pairs, so that order is preserved. All dictionary methods are replicated using this list of tuples.
Note: After python 3.7, standard dictionaries are also guaranteed to maintain insertion order.

Yes, it'll preserve the order you specify while initiliazing the dictionary.

Yes. OrderedDict.items() will return the items in the order they are inserted.
If you check the implementation of OrderedDict, you can see that items returns _OrderedDictItemsView.
class OrderedDict(dict):
...
...
def items(self):
"D.items() -> a set-like object providing a view on D's items"
return _OrderedDictItemsView(self)
And if you dig deep and find the implementation of _OrderedDictItemsView,
class _OrderedDictItemsView(_collections_abc.ItemsView):
def __reversed__(self):
for key in reversed(self._mapping):
yield (key, self._mapping[key])
And if you go deeper to check the _collections_abc.ItemsView, you will see that,
class ItemsView(MappingView, Set):
...
...
def __iter__(self):
for key in self._mapping:
yield (key, self._mapping[key])
And further down the MappingView, you will see,
class MappingView(Sized):
__slots__ = '_mapping',
def __init__(self, mapping):
self._mapping = mapping
Now our journey has reached it's destination, and we can see that the _mapping is the OrderedDict we created and it is always in order. The __iter__ method ItemsView, just iterates through each key value in the OrderedDict. Hence the proof :)

Related

How add() on set can work in such dictionary? [duplicate]

The addition of collections.defaultdict in Python 2.5 greatly reduced the need for dict's setdefault method. This question is for our collective education:
What is setdefault still useful for, today in Python 2.6/2.7?
What popular use cases of setdefault were superseded with collections.defaultdict?
You could say defaultdict is useful for settings defaults before filling the dict and setdefault is useful for setting defaults while or after filling the dict.
Probably the most common use case: Grouping items (in unsorted data, else use itertools.groupby)
# really verbose
new = {}
for (key, value) in data:
if key in new:
new[key].append( value )
else:
new[key] = [value]
# easy with setdefault
new = {}
for (key, value) in data:
group = new.setdefault(key, []) # key might exist already
group.append( value )
# even simpler with defaultdict
from collections import defaultdict
new = defaultdict(list)
for (key, value) in data:
new[key].append( value ) # all keys have a default already
Sometimes you want to make sure that specific keys exist after creating a dict. defaultdict doesn't work in this case, because it only creates keys on explicit access. Think you use something HTTP-ish with many headers -- some are optional, but you want defaults for them:
headers = parse_headers( msg ) # parse the message, get a dict
# now add all the optional headers
for headername, defaultvalue in optional_headers:
headers.setdefault( headername, defaultvalue )
I commonly use setdefault for keyword argument dicts, such as in this function:
def notify(self, level, *pargs, **kwargs):
kwargs.setdefault("persist", level >= DANGER)
self.__defcon.set(level, **kwargs)
try:
kwargs.setdefault("name", self.client.player_entity().name)
except pytibia.PlayerEntityNotFound:
pass
return _notify(level, *pargs, **kwargs)
It's great for tweaking arguments in wrappers around functions that take keyword arguments.
defaultdict is great when the default value is static, like a new list, but not so much if it's dynamic.
For example, I need a dictionary to map strings to unique ints. defaultdict(int) will always use 0 for the default value. Likewise, defaultdict(intGen()) always produces 1.
Instead, I used a regular dict:
nextID = intGen()
myDict = {}
for lots of complicated stuff:
#stuff that generates unpredictable, possibly already seen str
strID = myDict.setdefault(myStr, nextID())
Note that dict.get(key, nextID()) is insufficient because I need to be able to refer to these values later as well.
intGen is a tiny class I build that automatically increments an int and returns its value:
class intGen:
def __init__(self):
self.i = 0
def __call__(self):
self.i += 1
return self.i
If someone has a way to do this with defaultdict I'd love to see it.
As most answers state setdefault or defaultdict would let you set a default value when a key doesn't exist. However, I would like to point out a small caveat with regard to the use cases of setdefault. When the Python interpreter executes setdefaultit will always evaluate the second argument to the function even if the key exists in the dictionary. For example:
In: d = {1:5, 2:6}
In: d
Out: {1: 5, 2: 6}
In: d.setdefault(2, 0)
Out: 6
In: d.setdefault(2, print('test'))
test
Out: 6
As you can see, print was also executed even though 2 already existed in the dictionary. This becomes particularly important if you are planning to use setdefault for example for an optimization like memoization. If you add a recursive function call as the second argument to setdefault, you wouldn't get any performance out of it as Python would always be calling the function recursively.
Since memoization was mentioned, a better alternative is to use functools.lru_cache decorator if you consider enhancing a function with memoization. lru_cache handles the caching requirements for a recursive function better.
I use setdefault() when I want a default value in an OrderedDict. There isn't a standard Python collection that does both, but there are ways to implement such a collection.
As Muhammad said, there are situations in which you only sometimes wish to set a default value. A great example of this is a data structure which is first populated, then queried.
Consider a trie. When adding a word, if a subnode is needed but not present, it must be created to extend the trie. When querying for the presence of a word, a missing subnode indicates that the word is not present and it should not be created.
A defaultdict cannot do this. Instead, a regular dict with the get and setdefault methods must be used.
Theoretically speaking, setdefault would still be handy if you sometimes want to set a default and sometimes not. In real life, I haven't come across such a use case.
However, an interesting use case comes up from the standard library (Python 2.6, _threadinglocal.py):
>>> mydata = local()
>>> mydata.__dict__
{'number': 42}
>>> mydata.__dict__.setdefault('widgets', [])
[]
>>> mydata.widgets
[]
I would say that using __dict__.setdefault is a pretty useful case.
Edit: As it happens, this is the only example in the standard library and it is in a comment. So may be it is not enough of a case to justify the existence of setdefault. Still, here is an explanation:
Objects store their attributes in the __dict__ attribute. As it happens, the __dict__ attribute is writeable at any time after the object creation. It is also a dictionary not a defaultdict. It is not sensible for objects in the general case to have __dict__ as a defaultdict because that would make each object having all legal identifiers as attributes. So I can't foresee any change to Python objects getting rid of __dict__.setdefault, apart from deleting it altogether if it was deemed not useful.
I rewrote the accepted answer and facile it for the newbies.
#break it down and understand it intuitively.
new = {}
for (key, value) in data:
if key not in new:
new[key] = [] # this is core of setdefault equals to new.setdefault(key, [])
new[key].append(value)
else:
new[key].append(value)
# easy with setdefault
new = {}
for (key, value) in data:
group = new.setdefault(key, []) # it is new[key] = []
group.append(value)
# even simpler with defaultdict
new = defaultdict(list)
for (key, value) in data:
new[key].append(value) # all keys have a default value of empty list []
Additionally,I categorized the methods as reference:
dict_methods_11 = {
'views':['keys', 'values', 'items'],
'add':['update','setdefault'],
'remove':['pop', 'popitem','clear'],
'retrieve':['get',],
'copy':['copy','fromkeys'],}
One drawback of defaultdict over dict (dict.setdefault) is that a defaultdict object creates a new item EVERYTIME non existing key is given (eg with ==, print). Also the defaultdict class is generally way less common then the dict class, its more difficult to serialize it IME.
P.S. IMO functions|methods not meant to mutate an object, should not mutate an object.
Here are some examples of setdefault to show its usefulness:
"""
d = {}
# To add a key->value pair, do the following:
d.setdefault(key, []).append(value)
# To retrieve a list of the values for a key
list_of_values = d[key]
# To remove a key->value pair is still easy, if
# you don't mind leaving empty lists behind when
# the last value for a given key is removed:
d[key].remove(value)
# Despite the empty lists, it's still possible to
# test for the existance of values easily:
if d.has_key(key) and d[key]:
pass # d has some values for key
# Note: Each value can exist multiple times!
"""
e = {}
print e
e.setdefault('Cars', []).append('Toyota')
print e
e.setdefault('Motorcycles', []).append('Yamaha')
print e
e.setdefault('Airplanes', []).append('Boeing')
print e
e.setdefault('Cars', []).append('Honda')
print e
e.setdefault('Cars', []).append('BMW')
print e
e.setdefault('Cars', []).append('Toyota')
print e
# NOTE: now e['Cars'] == ['Toyota', 'Honda', 'BMW', 'Toyota']
e['Cars'].remove('Toyota')
print e
# NOTE: it's still true that ('Toyota' in e['Cars'])
I use setdefault frequently when, get this, setting a default (!!!) in a dictionary; somewhat commonly the os.environ dictionary:
# Set the venv dir if it isn't already overridden:
os.environ.setdefault('VENV_DIR', '/my/default/path')
Less succinctly, this looks like this:
# Set the venv dir if it isn't already overridden:
if 'VENV_DIR' not in os.environ:
os.environ['VENV_DIR'] = '/my/default/path')
It's worth noting that you can also use the resulting variable:
venv_dir = os.environ.setdefault('VENV_DIR', '/my/default/path')
But that's less necessary than it was before defaultdicts existed.
Another use case that I don't think was mentioned above.
Sometimes you keep a cache dict of objects by their id where primary instance is in the cache and you want to set cache when missing.
return self.objects_by_id.setdefault(obj.id, obj)
That's useful when you always want to keep a single instance per distinct id no matter how you obtain an obj each time. For example when object attributes get updated in memory and saving to storage is deferred.
One very important use-case I just stumbled across: dict.setdefault() is great for multi-threaded code when you only want a single canonical object (as opposed to multiple objects that happen to be equal).
For example, the (Int)Flag Enum in Python 3.6.0 has a bug: if multiple threads are competing for a composite (Int)Flag member, there may end up being more than one:
from enum import IntFlag, auto
import threading
class TestFlag(IntFlag):
one = auto()
two = auto()
three = auto()
four = auto()
five = auto()
six = auto()
seven = auto()
eight = auto()
def __eq__(self, other):
return self is other
def __hash__(self):
return hash(self.value)
seen = set()
class cycle_enum(threading.Thread):
def run(self):
for i in range(256):
seen.add(TestFlag(i))
threads = []
for i in range(8):
threads.append(cycle_enum())
for t in threads:
t.start()
for t in threads:
t.join()
len(seen)
# 272 (should be 256)
The solution is to use setdefault() as the last step of saving the computed composite member -- if another has already been saved then it is used instead of the new one, guaranteeing unique Enum members.
In addition to what have been suggested, setdefault might be useful in situations where you don't want to modify a value that has been already set. For example, when you have duplicate numbers and you want to treat them as one group. In this case, if you encounter a repeated duplicate key which has been already set, you won't update the value of that key. You will keep the first encountered value. As if you are iterating/updating the repeated keys once only.
Here's a code example of recording the index for the keys/elements of a sorted list:
nums = [2,2,2,2,2]
d = {}
for idx, num in enumerate(sorted(nums)):
# This will be updated with the value/index of the of the last repeated key
# d[num] = idx # Result (sorted_indices): [4, 4, 4, 4, 4]
# In the case of setdefault, all encountered repeated keys won't update the key.
# However, only the first encountered key's index will be set
d.setdefault(num,idx) # Result (sorted_indices): [0, 0, 0, 0, 0]
sorted_indices = [d[i] for i in nums]
[Edit] Very wrong! The setdefault would always trigger long_computation, Python being eager.
Expanding on Tuttle's answer. For me the best use case is cache mechanism. Instead of:
if x not in memo:
memo[x]=long_computation(x)
return memo[x]
which consumes 3 lines and 2 or 3 lookups, I would happily write :
return memo.setdefault(x, long_computation(x))
I like the answer given here:
http://stupidpythonideas.blogspot.com/2013/08/defaultdict-vs-setdefault.html
In short, the decision (in non-performance-critical apps) should be made on the basis of how you want to handle lookup of empty keys downstream (viz. KeyError versus default value).
The different use case for setdefault() is when you don't want to overwrite the value of an already set key. defaultdict overwrites, while setdefault() does not. For nested dictionaries it is more often the case that you want to set a default only if the key is not set yet, because you don't want to remove the present sub dictionary. This is when you use setdefault().
Example with defaultdict:
>>> from collection import defaultdict()
>>> foo = defaultdict()
>>> foo['a'] = 4
>>> foo['a'] = 2
>>> print(foo)
defaultdict(None, {'a': 2})
setdefault doesn't overwrite:
>>> bar = dict()
>>> bar.setdefault('a', 4)
>>> bar.setdefault('a', 2)
>>> print(bar)
{'a': 4}
Another usecase for setdefault in CPython is that it is atomic in all cases, whereas defaultdict will not be atomic if you use a default value created from a lambda.
cache = {}
def get_user_roles(user_id):
if user_id in cache:
return cache[user_id]['roles']
cache.setdefault(user_id, {'lock': threading.Lock()})
with cache[user_id]['lock']:
roles = query_roles_from_database(user_id)
cache[user_id]['roles'] = roles
If two threads execute cache.setdefault at the same time, only one of them will be able to create the default value.
If instead you used a defaultdict:
cache = defaultdict(lambda: {'lock': threading.Lock()}
This would result in a race condition. In my example above, the first thread could create a default lock, and the second thread could create another default lock, and then each thread could lock its own default lock, instead of the desired outcome of each thread attempting to lock a single lock.
Conceptually, setdefault basically behaves like this (defaultdict also behaves like this if you use an empty list, empty dict, int, or other default value that is not user python code like a lambda):
gil = threading.Lock()
def setdefault(dict, key, value_func):
with gil:
if key not in dict:
return
value = value_func()
dict[key] = value
Conceptually, defaultdict basically behaves like this (only when using python code like a lambda - this is not true if you use an empty list):
gil = threading.Lock()
def __setitem__(dict, key, value_func):
with gil:
if key not in dict:
return
value = value_func()
with gil:
dict[key] = value

How to pass in a dictionary with additional elements in python?

I have a dictionary:
big_dict = {1:"1",
2:"2",
...
1000:"1000"}
(Note: My dictionary isn't actually numbers to strings)
I am passing this dictionary into a function that calls for it. I use the dictionary often for different functions. However, on occasion I want to send in big_dict with an extra key:item pair such that the dictionary I want to send in would be equivalent to:
big_dict[1001]="1001"
But I don't want to actually add the value to the dictionary. I could make a copy of the dictionary and add it there, but I'd like to avoid the memory + CPU cycles this would consume.
The code I currently have is:
big_dict[1001]="1001"
function_that_uses_dict(big_dict)
del big_dict[1001]
While this works, it seems rather kludgy.
If this were a string I'd do:
function_that_uses_string(myString + 'what I want to add on')
Is there any equivalent way of doing this with a dictionary?
As pointed out by Veedrac in his answer, this problem has already been solved in Python 3.3+ in the form of the ChainMap class:
function_that_uses_dict(ChainMap({1001 : "1001"}, big_dict))
If you don't have Python 3.3 you should use a backport, and if for some reason you don't want to, then below you can see how to implement it by yourself :)
You can create a wrapper, similarly to this:
class DictAdditionalValueWrapper:
def __init__(self, baseDict, specialKey, specialValue):
self.baseDict = baseDict
self.specialKey = specialKey
self.specialValue = specialValue
def __getitem__(self, key):
if key == self.specialKey:
return self.specialValue
return self.baseDict[key]
# ...
You need to supply all other dict method of course, or use the UserDict as a base class, which should simplify this.
and then use it like this:
function_that_uses_dict(DictAdditionalValueWrapper(big_dict, 1001, "1001"))
This can be easily extended to a whole additional dictionary of "special" keys and values, not just single additional element.
You can also extend this approach to reach something similar as in your string example:
class AdditionalKeyValuePair:
def __init__(self, specialKey, specialValue):
self.specialKey = specialKey
self.specialValue = specialValue
def __add__(self, d):
if not isinstance(d, dict):
raise Exception("Not a dict in AdditionalKeyValuePair")
return DictAdditionalValueWrapper(d, self.specialKey, self.specialValue)
and use it like this:
function_that_uses_dict(AdditionalKeyValuePair(1001, "1001") + big_dict)
If you're on 3.3+, just use ChainMap. Otherwise use a backport.
new_dict = ChainMap({1001: "1001"}, old_dict)
You can add the extra key-value pair leaving original dictionary as such like this:
>>> def function_that_uses_bdict(big_dict):
... print big_dict[1001]
...
>>> dct = {1:'1', 2:'2'}
>>> function_that_uses_bdict(dict(dct.items()+[(1001,'1001')]))
1001
>>> dct
{1: '1', 2: '2'} # original unchanged
This is a bit annoying too, but you could just have the function take two parameters, one of them being big_dict, and another being a temporary dictionary, created just for the function (so something like fxn(big_dict, {1001,'1001'}) ). Then you could access both dictionaries without changing your first one, and without copying big_dict.

Shortest way to get first item of `OrderedDict` in Python 3

What's the shortest way to get first item of OrderedDict in Python 3?
My best:
list(ordered_dict.items())[0]
Quite long and ugly.
I can think of:
next(iter(ordered_dict.items())) # Fixed, thanks Ashwini
But it's not very self-describing.
Any better suggestions?
Programming Practices for Readabililty
In general, if you feel like code is not self-describing, the usual solution is to factor it out into a well-named function:
def first(s):
'''Return the first element from an ordered collection
or an arbitrary element from an unordered collection.
Raise StopIteration if the collection is empty.
'''
return next(iter(s))
With that helper function, the subsequent code becomes very readable:
>>> extension = {'xml', 'html', 'css', 'php', 'xhmtl'}
>>> one_extension = first(extension)
Patterns for Extracting a Single Value from Collection
The usual ways to get an element from a set, dict, OrderedDict, generator, or other non-indexable collection are:
for value in some_collection:
break
and:
value = next(iter(some_collection))
The latter is nice because the next() function lets you specify a default value if collection is empty or you can choose to let it raise an exception. The next() function is also explicit that it is asking for the next item.
Alternative Approach
If you actually need indexing and slicing and other sequence behaviors (such as indexing multiple elements), it is a simple matter to convert to a list with list(some_collection) or to use [itertools.islice()][2]:
s = list(some_collection)
print(s[0], s[1])
s = list(islice(n, some_collection))
print(s)
Use popitem(last=False), but keep in mind that it removes the entry from the dictionary, i.e. is destructive.
from collections import OrderedDict
o = OrderedDict()
o['first'] = 123
o['second'] = 234
o['third'] = 345
first_item = o.popitem(last=False)
>>> ('first', 123)
For more details, have a look at the manual on collections. It also works with Python 2.x.
Subclassing and adding a method to OrderedDict would be the answer to clarity issues:
>>> o = ExtOrderedDict(('a',1), ('b', 2))
>>> o.first_item()
('a', 1)
The implementation of ExtOrderedDict:
class ExtOrderedDict(OrderedDict):
def first_item(self):
return next(iter(self.items()))
Code that's readable, leaves the OrderedDict unchanged and doesn't needlessly generate a potentially large list just to get the first item:
for item in ordered_dict.items():
return item
If ordered_dict is empty, None would be returned implicitly.
An alternate version for use inside a stretch of code:
for first in ordered_dict.items():
break # Leave the name 'first' bound to the first item
else:
raise IndexError("Empty ordered dict")
The Python 2.x code corresponding to the first example above would need to use iteritems() instead:
for item in ordered_dict.iteritems():
return item
You might want to consider using SortedDict instead of OrderedDict.
It provides SortedDict.peekitem to peek an item.
Runtime complexity: O(log(n))
>>> sd = SortedDict({'a': 1, 'b': 2, 'c': 3})
>>> sd.peekitem(0)
('a', 1)
If you need a one-liner:
ordered_dict[[*ordered_dict.keys()][0]]
It creates a list of dict keys, picks the first and use it as key to access the dictionary value.
First record:
[key for key, value in ordered_dict][0]
Last record:
[key for key, value in ordered_dict][-1]

Last element in OrderedDict

I have od of type OrderedDict. I want to access its most recently added (key, value) pair. od.popitem(last = True) would do it, but would also remove the pair from od which I don't want.
What's a good way to do that? Can /should I do this:
class MyOrderedDict(OrderedDict):
def last(self):
return next(reversed(self))
Using next(reversed(od)) is a perfect way of accessing the most-recently added element. The class OrderedDict uses a doubly linked list for the dictionary items and implements __reversed__(), so this implementation gives you O(1) access to the desired element. Whether it is worthwhile to subclass OrderedDict() for this simple operation may be questioned, but there's nothing actually wrong with this approach.
God, I wish this was all built-in functionality...
Here's something to save you precious time. Tested in Python 3.7. od is your OrderedDict.
# Get first key
next(iter(od))
# Get last key
next(reversed(od))
# Get first value
od[next(iter(od))]
# Get last value
od[next(reversed(od))]
# Get first key-value tuple
next(iter(od.items()))
# Get last key-value tuple
next(reversed(od.items()))
A little magic from timeit can help here...
from collections import OrderedDict
class MyOrderedDict1(OrderedDict):
def last(self):
k=next(reversed(self))
return (k,self[k])
class MyOrderedDict2(OrderedDict):
def last(self):
out=self.popitem()
self[out[0]]=out[1]
return out
class MyOrderedDict3(OrderedDict):
def last(self):
k=(list(self.keys()))[-1]
return (k,self[k])
if __name__ == "__main__":
from timeit import Timer
N=100
d1=MyOrderedDict1()
for i in range(N): d1[i]=i
print ("d1",d1.last())
d2=MyOrderedDict2()
for i in range(N): d2[i]=i
print ("d2",d2.last())
d3=MyOrderedDict3()
for i in range(N): d3[i]=i
print("d3",d3.last())
t=Timer("d1.last()",'from __main__ import d1')
print ("OrderedDict1",t.timeit())
t=Timer("d2.last()",'from __main__ import d2')
print ("OrderedDict2",t.timeit())
t=Timer("d3.last()",'from __main__ import d3')
print ("OrderedDict3",t.timeit())
results in:
d1 (99, 99)
d2 (99, 99)
d3 (99, 99)
OrderedDict1 1.159217119216919
OrderedDict2 3.3667118549346924
OrderedDict3 24.030261993408203
(Tested on python3.2, Ubuntu Linux).
As pointed out by #SvenMarnach, the method you described is quite efficient compared to the other two ways I could cook up.
Your idea is fine, however the default iterator is only over the keys, so your example will only return the last key. What you actually want is:
class MyOrderedDict(OrderedDict):
def last(self):
return list(self.items())[-1]
This gives the (key, value) pairs, not just the keys, as you wanted.
Note that on pre-3.x versions of Python, OrderedDict.items() returns a list, so you don't need the list() call, but later versions return a dictionary view object, so you will.
Edit: As noted in the comments, the quicker operation is to do:
class MyOrderedDict(OrderedDict):
def last(self):
key = next(reversed(self))
return (key, self[key])
Although I must admit I do find this uglier in the code (I never liked getting the key then doing x[key] to get the value separately, I prefer getting the (key, value) tuple) - depending on the importance of speed and your preferences, you may wish to pick the former option.

Python: update a list of tuples... fastest method

This question is in relation to another question asked here:
Sorting 1M records
I have since figured out the problem I was having with sorting. I was sorting items from a dictionary into a list every time I updated the data. I have since realized that a lot of the power of Python's sort resides in the fact that it sorts data more quickly that is already partially sorted.
So, here is the question. Suppose I have the following as a sample set:
self.sorted_records = [(1, 1234567890), (20, 1245678903),
(40, 1256789034), (70, 1278903456)]
t[1] of each tuple in the list is a unique id. Now I want to update this list with the follwoing:
updated_records = {1245678903:45, 1278903456:76}
What is the fastest way for me to do so ending up with
self.sorted_records = [(1, 1234567890), (45, 1245678903),
(40, 1256789034), (76, 1278903456)]
Currently I am doing something like this:
updated_keys = updated_records.keys()
for i, record in enumerate(self.sorted_data):
if record[1] in updated_keys:
updated_keys.remove(record[1])
self.sorted_data[i] = (updated_records[record[1]], record[1])
But I am sure there is a faster, more elegant solution out there.
Any help?
* edit
It turns out I used bad exaples for the ids since they end up in sorted order when I do my update. I am actually interested in t[0] being in sorted order. After I do the update I was intending on resorting with the updated data, but it looks like bisect might be the ticket to insert in sorted order.
end edit *
You're scanning through all n records. You could instead do a binary search, which would be O(log(n)) instead of O(n). You can use the bisect module to do this.
Since apparently you don't care about the ending value of self.sorted_records actually being sorted (you have values in order 1, 45, 20, 76 -- that's NOT sorted!-), AND you only appear to care about IDs in updated_records that are also in self.sorted_data, a listcomp (with side effects if you want to change the updated_record on the fly) would serve you well, i.e.:
self.sorted_data = [(updated_records.pop(recid, value), recid)
for (value, recid) in self.sorted_data]
the .pop call removes from updated_records the keys (and corresponding values) that are ending up in the new self.sorted_data (and the "previous value for that recid", value, is supplied as the 2nd argument to pop to ensure no change where a recid is NOT in updated_record); this leaves in updated_record the "new" stuff so you can e.g append it to self.sorted_data before re-sorting, i.e I suspect you want to continue with something like
self.sorted_data.extend(value, recid
for recid, value in updated_records.iteritems())
self.sorted_data.sort()
though this part DOES go beyond the question you're actually asking (and I'm giving it only because I've seen your previous questions;-).
You'd probably be best served by some form of tree here (preserving sorted order while allowing O(log n) replacements.) There is no builtin balanaced tree type, but you can find many third party examples. Alternatively, you could either:
Use a binary search to find the node. The bisect module will do this, but it compares based on the normal python comparison order, whereas you seem to be sorted based on the second element of each tuple. You could reverse this, or just write your own binary search (or simply take the code from bisect_left and modify it)
Use both a dict and a list. The list contains the sorted keys only. You can wrap the dict class easily enough to ensure this is kept in sync. This allows you fast dict updating while maintaining sort order of the keys. This should prevent your problem of losing sort performance due to constant conversion between dict/list.
Here's a quick implementation of such a thing:
import bisect
class SortedDict(dict):
"""Dictionary which is iterable in sorted order.
O(n) sorted iteration
O(1) lookup
O(log n) replacement ( but O(n) insertion or new items)
"""
def __init__(self, *args, **kwargs):
dict.__init__(self, *args, **kwargs)
self._keys = sorted(dict.iterkeys(self))
def __setitem__(self, key, val):
if key not in self:
# New key - need to add to list of keys.
pos = bisect.bisect_left(self._keys, key)
self._keys.insert(pos, key)
dict.__setitem__(self, key, val)
def __delitem__(self, key):
if key in self:
pos = bisect.bisect_left(self._keys, key)
del self._keys[pos]
dict.__delitem__(self, key)
def __iter__(self):
for k in self._keys: yield k
iterkeys = __iter__
def iteritems(self):
for k in self._keys: yield (k, self[k])
def itervalues(self):
for k in self._keys: yield self[k]
def update(self, other):
dict.update(self, other)
self._keys = sorted(dict.iterkeys(self)) # Rebuild (faster if lots of changes made - may be slower if only minor changes to large dict)
def keys(self): return list(self.iterkeys())
def values(self): return list(self.itervalues())
def items(self): return list(self.iteritems())
def __repr__(self):
return "%s(%s)" % (self.__class__.__name__, ', '.join("%s=%r" % (k, self[k]) for k in self))
Since you want to replace by dictionary key, but have the array sorted by dictionary value, you definitely need a linear search for the key. In that sense, your algorithm is the best you can hope for.
If you would preserve the old dictionary value, then you could use a binary search for the value, and then locate the key in the proximity of where the binary search lead you.

Categories