Based on what I know, you can't assume that a data structure (such as dictionary) will save the values in it on the same order as what you have initialized it.
For example:
d = {1:10,2:20,3:30}
when you print it inside a for loop, the result could be:
{2:20,1:10,3:30}
Why it happens - why the dictionary (or the other data structure) won't keep the values in a specific order?
Is it true only for dictionaries?
Among the Python builtin types, it's true for dictionaries and sets. Lists and tuples preserve order. There is collections.OrderedDict for an ordered version of a dictionary. For other types (e.g., ones from libraries that aren't built into Python), you just have to read the documentation. There's no general rule for what a "data structure" does in Python. You have to look at the documentation of each type to understand what behavior it does or doesn't define.
Python does define the notion of "sequence", which is defined to have order (lists and tuples are sequences). A dictionary is a "mapping", which needn't have order. (See the Python glossary and the collections module for more info.)
As to why, it's just how dictionaries were implemented. Basically they can be faster if they don't have to keep track of order, and in many cases you don't care about the order, so they were implemented as unordered collections for efficiency.
Both dicts and sets in python lose order. This is because they are implemented as hash tables and therefore more concerned with faster lookup times than with order preservation.
If you are looking for a data structure that's geared towards order-preservation, then you should look at lists. In your case, you could use a list of tuples as follows:
In [255]: L = []
In [256]: L.append((1,10))
In [257]: L.append((2,20))
In [258]: L.append((3,30))
In [259]: L
Out[259]: [(1, 10), (2, 20), (3, 30)]
If however, you want to preserve order and want faster lookup times than what list has to offer, then you're likely better off with an OrderedDict:
In [265]: d = collections.OrderedDict()
In [266]: d[1]=10
In [267]: d
Out[267]: OrderedDict([(1, 10)])
In [268]: d[2]=20
In [269]: d
Out[269]: OrderedDict([(1, 10), (2, 20)])
In [270]: d[3]=30
In [271]: d
Out[271]: OrderedDict([(1, 10), (2, 20), (3, 30)])
Hope this helps
Yes, it is only dictionaries. underneath, a dictionary does not in fact store the values as-is, but as a hash of the key paired with the value. This allows very fast lookups. Lists and tuples maintain order.
Dictionaries will order their entries to make searching for keys efficient. If you want to keep your keys in the same order they were added, try an OrderedDict.
Related
I'm messing around with dictionaries for the first time and something's coming up that's confusing me. Using two lists to create a new dictionary, the order of the list terms for the key part seems to be wrong. Here's my code:
list1 = ["a", "b", "c", "d"]
list2 = [5,3,7,3]
newDict = {list1[c]: number for c, number in enumerate(list2)}
print(newDict)
This gives me the following:
{'a': 5, 'd': 3, 'c': 7, 'b': 3}
Why is this happening? Surely the 'c' value getting terms from the list is going from 0 and upwards, so why isn't it creating the dictionary with the letters in the same order?
Thanks.
For purposes of efficiency, traditional python dictionaries are unordered. If you need order, then you need OrderedDict:
>>> from collections import OrderedDict
>>> newDict = OrderedDict((list1[c], number) for c, number in enumerate(list2))
>>> print(newDict)
OrderedDict([('a', 5), ('b', 3), ('c', 7), ('d', 3)])
In Python 3.7, ordinary python dictionaries, implemented using a new algorithm, will be ordered. Until then, if you need order, use OrderedDict.
Python dictionaries don't preserve their order, but there's another data type that does: OrderedDict, from the collections module.
Dictionaries are unordered. In fact, if you run your program on a different computer, you might get a different key ordering. This is an intentional feature of the built-in dictionary in python.
To understand why, take a look at this stackoverflow question.
As of Python 3.7, dictionaries are insertion ordered.
See this stackoverflow discussion Are dictionaries ordered in Python 3.6+?
What's the correct way to initialize an ordered dictionary (OD) so that it retains the order of initial data?
from collections import OrderedDict
# Obviously wrong because regular dict loses order
d = OrderedDict({'b':2, 'a':1})
# An OD is represented by a list of tuples, so would this work?
d = OrderedDict([('b',2), ('a', 1)])
# What about using a list comprehension, will 'd' preserve the order of 'l'
l = ['b', 'a', 'c', 'aa']
d = OrderedDict([(i,i) for i in l])
Question:
Will an OrderedDict preserve the order of a list of tuples, or tuple of tuples or tuple of lists or list of lists etc. passed at the time of initialization (2nd & 3rd example above)?
How does one go about verifying if OrderedDict actually maintains an order? Since a dict has an unpredictable order, what if my test vectors luckily have the same initial order as the unpredictable order of a dict? For example, if instead of d = OrderedDict({'b':2, 'a':1}) I write d = OrderedDict({'a':1, 'b':2}), I can wrongly conclude that the order is preserved. In this case, I found out that a dict is ordered alphabetically, but that may not be always true. What's a reliable way to use a counterexample to verify whether a data structure preserves order or not, short of trying test vectors repeatedly until one breaks?
P.S. I'll just leave this here for reference: "The OrderedDict constructor and update() method both accept keyword arguments, but their order is lost because Python’s function call semantics pass-in keyword arguments using a regular unordered dictionary"
P.P.S : Hopefully, in future, OrderedDict will preserve the order of kwargs also (example 1): http://bugs.python.org/issue16991
The OrderedDict will preserve any order that it has access to. The only way to pass ordered data to it to initialize is to pass a list (or, more generally, an iterable) of key-value pairs, as in your last two examples. As the documentation you linked to says, the OrderedDict does not have access to any order when you pass in keyword arguments or a dict argument, since any order there is removed before the OrderedDict constructor sees it.
Note that using a list comprehension in your last example doesn't change anything. There's no difference between OrderedDict([(i,i) for i in l]) and OrderedDict([('b', 'b'), ('a', 'a'), ('c', 'c'), ('aa', 'aa')]). The list comprehension is evaluated and creates the list and it is passed in; OrderedDict knows nothing about how it was created.
# An OD is represented by a list of tuples, so would this work?
d = OrderedDict([('b', 2), ('a', 1)])
Yes, that will work. By definition, a list is always ordered the way it is represented. This goes for list-comprehension too, the list generated is in the same way the data was provided (i.e. source from a list it will be deterministic, sourced from a set or dict not so much).
How does one go about verifying if OrderedDict actually maintains an order. Since a dict has an unpredictable order, what if my test vectors luckily has the same initial order as the unpredictable order of a dict?. For example, if instead of d = OrderedDict({'b':2, 'a':1}) I write d = OrderedDict({'a':1, 'b':2}), I can wrongly conclude that the order is preserved. In this case, I found out that a dict is order alphabetically, but that may not be always true. i.e. what's a reliable way to use a counter example to verify if a data structure preserves order or not short of trying test vectors repeatedly until one breaks.
You keep your source list of 2-tuple around for reference, and use that as your test data for your test cases when you do unit tests. Iterate through them and ensure the order is maintained.
It is also possible (and a little more efficient) to use a generator expression:
d = OrderedDict((i, i) for i in l)
Obviously, the benefit is negligible in this trivial case for l, but if l corresponds to an iterator or was yielding results from a generator, e.g. used to parse and iterate through a large file, then the difference could be very substantial (e.g. avoiding to load the entire contents onto memory). For example:
def mygen(filepath):
with open(filepath, 'r') as f:
for line in f:
yield [int(field) for field line.split()]
d = OrderedDict((i, sum(numbers)) for i, numbers in enumerate(mygen(filepath)))
I checked on this link that set is mutable https://docs.python.org/3/library/stdtypes.html#frozenset while frozenset is immutable and hence hashable. So how is the set implemented in python and what is the element look up time? Actually I had a list of tuples [(1,2),(3,4),(2,1)] where each entry in the tuple is a id and I wanted to create a set/frozenset out of the this list. In this case the set should contain (1,2,3,4) as elements. Can I use frozenset to insert elements into it one by one from the list of tuples or I can only use a set?
You can instantiate a frozenset from a generator expression or other iterable. It's not immutable until it's finished being instantiated.
>>> L = [(1,2),(3,4),(2,1)]
>>> from itertools import chain
>>> frozenset(chain.from_iterable(L))
frozenset([1, 2, 3, 4])
Python3.3 also has an optimisation that turns set literals such as {1, 2, 3, 4} into precomputed frozensets when used as the right-hand side of an in operator.
Sets and frozensets are implemented the same way, as hashtables. (Why else would they require their elements to implement __hash__?) In fact, if you look at Objects/setobject.c, they share almost all their code. This means that as long as hash collisions don't get out of hand, lookup and deletion are O(1) and insertion is amortized O(1).
The usual way to create a frozenset is to initialize it with some other iterable. As gnibbler suggested, the best fit here would probably be itertools.chain.from_iterable:
>>> L = [(1,2),(3,4),(2,1)]
>>> from itertools import chain
>>> frozenset(chain.from_iterable(L))
frozenset([1, 2, 3, 4])
As for your first question, I haven't actually checked the source, but it seems safe to assume from the fact that sets need to contain objects of hashable types, that it is implemented using a hash table, and that its lookup time is, therefore, O(1).
As for your second question, you cannot insert the elements into a frozenset one by one (obviously, since it's immutable), but there's no reason to use a set instead; just construct it from a list (or other iterable) of the constituent values, e.g. like this:
data = [(1, 2), (3, 4), (2, 1)]
result = frozenset(reduce(list.__add__, [list(x) for x in data], []))
I have written a code which tries to sort a dictionary using the values rather than keys
""" This module sorts a dictionary based on the values of the keys"""
adict={1:1,2:2,5:1,10:2,44:3,67:2} #adict is an input dictionary
items=adict.items()## converts the dictionary into a list of tuples
##print items
list_value_key=[ [d[1],d[0]] for d in items] """Interchanges the position of the
key and the values"""
list_value_key.sort()
print list_value_key
key_list=[ list_value_key[i][1] for i in range(0,len(list_value_key))]
print key_list ## list of keys sorted on the basis of values
sorted_adict={}
*for key in key_list:
sorted_adict.update({key:adict[key]})
print key,adict[key]
print sorted_adict*
So when I print key_list i get the expected answer, but for the last part of the code where i try to update the dictionary, the order is not what it should be. Below are the results obtained. I am not sure why the "update" method is not working. Any help or pointers is appreciated
result:
sorted_adict={1: 1, 2: 2, 67: 2, 5: 1, 10: 2, 44: 3}
Python dictionaries, no matter how you insert into them, are unordered. This is the nature of hash tables, in general.
Instead, perhaps you should keep a list of keys in the order their values or sorted, something like: [ 5, 1, 44, ...]
This way, you can access your dictionary in sorted order at a later time.
Don't sort like that.
import operator
adict={1:1,2:2,5:1,10:2,44:3,67:2}
sorted_adict = sorted(adict.iteritems(), key=operator.itemgetter(1))
If you need a dictionary that retains its order, there's a class called OrderedDict in the collections module. You can use the recipes on that page to sort a dictionary and create a new OrderedDict that retains the sort order. The OrderedDict class is available in Python 2.7 or 3.1.
To sort your dictionnary, you could also also use :
adict={1:1,2:2,5:1,10:2,44:3,67:2}
k = adict.keys()
k.sort(cmp=lambda k1,k2: cmp(adict[k1],adict[k2]))
And by the way, it's useless to reuse a dictionnary after that because there are no order in dict (they are just mapping types - you can have keys of different types that are not "comparable").
One problem is that ordinary dictionaries can't be sorted because of the way they're implemented internally. Python 2.7 and 3.1 had a new class namedOrderedDictadded to theircollectionsmodule as #kindall mentioned in his answer. While they can't be sorted exactly either, they do retain or remember the order in which keys and associated values were added to them, regardless of how it was done (including via theupdate() method). This means that you can achieve what you want by adding everything from the input dictionary to anOrderedDictoutput dictionary in the desired order.
To do that, the code you had was on the right track in the sense of creating what you called thelist_value_keylist and sorting it. There's a slightly simpler and faster way to create the initial unsorted version of that list than what you were doing by using the built-inzip()function. Below is code illustrating how to do that:
from collections import OrderedDict
adict = {1:1, 2:2, 5:1, 10:2, 44:3, 67:2} # input dictionary
# zip together and sort pairs by first item (value)
value_keys_list = sorted(zip(adict.values(), adict.keys()))
sorted_adict = OrderedDict() # value sorted output dictionary
for pair in value_keys_list:
sorted_adict[pair[1]] = pair[0]
print sorted_adict
# OrderedDict([(1, 1), (5, 1), (2, 2), (10, 2), (67, 2), (44, 3)])
The above can be rewritten as a fairly elegant one-liner:
sorted_adict = OrderedDict((pair[1], pair[0])
for pair in sorted(zip(adict.values(), adict.keys())))
I have this dict in python;
d={}
d['b']='beta'
d['g']='gamma'
d['a']='alpha'
when i print the dict;
for k,v in d.items():
print k
i get this;
a
b
g
it seems like python sorts the dict automatically! how can i get the original unsorted list?
Gath
Dicts don't work like that:
CPython implementation detail: Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions.
You could use a list with 2-tuples instead:
d = [('b', 'beta'), ('g', 'gamma'), ('a', 'alpha')]
A similar but better solution is outlined in Wayne's answer.
As has been mentioned, dicts don't order or unorder the items you put in. It's "magic" as to how it's ordered when you retrieve it. If you want to keep an order -sorted or not- you need to also bind a list or tuple.
This will give you the same dict result with a list that retains order:
greek = ['beta', 'gamma', 'alpha']
d = {}
for x in greek:
d[x[0]] = x
Simply change [] to () if you have no need to change the original list/order.
Don't use a dictionary. Or use the Python 2.7/3.1 OrderedDict type.
There is no order in dictionaries to speak of, there is no original unsorted list.
No, python does not sort dict, it would be too expensive. The order of items() is arbitrary. From python docs:
CPython implementation detail: Keys
and values are listed in an arbitrary
order which is non-random, varies
across Python implementations, and
depends on the dictionary’s history of
insertions and deletions.