Python 3.7+: Access elements of a dictionary by order - python

So I understand that, since Python 3.7, dicts are ordered, but the documentation doesn't seem to list methods of using said ordering. For example, how do I access say the first element of the dict by order, independent of keys? What operations can I do with the ordered dict?
As an example, I am working on implementing a Least Frequently Used cache, where I need to track not only the number of uses of a key, but also use Least Recently Used information as a tie break. I could use a dict of queues to implement a priority queue, but then I lose O(1) lookup within the queue. If I use a dict of dicts I can retain the advantages of a hashed set AND implement it as a priority queue... I think. I just need to be able to pop the first element of the dictionary. Alas, there is no dict.popleft().
For now I am converting the keys to a list and just using the first element of the list, but while this works (the dict is keeping ordering), the conversion is costly.
LFU_queue = collections.defaultdict(collections.defaultdict)
LFU_queue[1].update({"key_1":None})
LFU_queue[1].update({"key_32":None})
LFU_queue[1].update({"key_6":None})
#Inspecting this, I DO get what I expect, which is a
#dict of dicts with the given ordering:
#{ 1: {"key_1":None}, {"key_32":None}, {"key_6":None}}
#here is where I'd love to be able to do something like
# LFU_queue[1].popleft() to return {"key_1":None}
list(LFU_queue[1])[0] works, but is less than ideal

As others commented, using OrderedDict is the right choice for that problem. But you can also use the bare dict using the items() like this.
d = {}
d['a'] = 1
d['b'] = 2
first_pair = next(iter(d.items()), None)
If you are interested, see PEP 372 and PEP 468 for the history of dictionary ordering.
With PEP 468, you can implement a syntax sugar like this.
from ctypes import Structure, c_int, c_ulong
def c_fields(**kwargs):
return list(kwargs.items())
class XClientMessageEvent(Structure):
_fields_ = c_fields(
type = c_int,
serial = c_ulong,
send_event = c_int,
...
)

Related

Use of dict instead of class

I don't understand when should I use dictionaries instead of classes on Python. If a class can do the same as dictionaries and more, why do we use dictionaries? For example, the next two snippets do the same:
Using class:
class PiggyBank:
def __init__(self, dollars, cents):
self.dollars = dollars
self.cents = cents
piggy1 = PiggyBank(2, 2)
print(piggy1.dollars) # 2
print(piggy1.__dict__) # {'dollars': 2, 'cents': 2}
Using dictionary:
def create_piggy(dollars, cents):
return {'dollars': dollars, 'cents': cents}
piggy2 = create_piggy(2, 2)
print(piggy2['dollars']) # 2
print(piggy2) # {'dollars': 2, 'cents': 2}
So at the end, I am creating two static objects with the same information. When should I use a class or a function for creating an instance?
You can use dictionary if it suffices your usecase and you are not forced to use a Class. But if you need some added benefits of class (like below) you may write is as Class
Some use cases when you might need Class
When you need to update code frequently
When you need encapsulation
When you need to reuse the same interface,code or logic
dicts are often called "associative arrays". They're like regular lists in Python, except that instead of using an integer as an index, you can use strings (or any hashable type).
You can certainly implement list-like and dict-like objects (as well as set-like objects and str-like objects) with Python classes, but why would you if there's already an object type that does exactly what you need?
So if all you're looking for is an associative array, a dict will most likely serve you just fine. But if you jump through hoops to implement a class that already does what a dict does, you'll be "re-inventing the wheel," as well as giving the future readers of your code extra work trying to figure out that all your class is doing is re-implementing a dict with no real extra functionality.
Basically, if all that you need is a dict, just use a dict. Otherwise, you'll be writing a lot of extra code (that may be prone to bugs) for no real gain.
You would typically want to use a dictionary when you're doing something dynamic -- something where you don't know all of the keys or information when you're writing the code. Note also that classes have restrictions like not being able to use numbers as properties.
As a classic example (which has better solutions using collections.Counter, but we'll still keep it for its educational value), if you have a list of values that you want to count you can use a dictionary to do so efficiently:
items_to_count = ['a', 'a', 'a', 'b', 5, 5]
count = {}
for item in items_to_count:
if item in count:
count[item] += 1
else:
count[item] = 1
# count == {'a': 3, 'b': 1, 5: 2}
why do we use dictionaries?
Dictionaries are built-in containers with quite a bit of functionality that can be used at no extra coding cost to us. Since they are built-in a lot of that functionality is probably optimized.

Python - Versioned list instead of immutable list?

Update:
As of CPython 3.6, dictionaries have a version (thank you pylang for showing this to me).
If they added the same version to list and made it public, all 3 asserts from my original post would pass! It would definitely meet my needs. Their implementation differs from what I envisioned, but I like it.
As it is, I don't feel I can use dictionary version:
It isn't public. Jake Vanderplas shows how to expose it in a post, but he cautions: definitely not code you should use for any purpose beyond simply having fun. I agree with his reasons.
In all of my use cases, the data is conceptually arrays of elements each of which has the same structure. A list of tuples is a natural fit. Using a dictionary would make the code less natural and probably more cumbersome.
Does anyone know if there are plans to add version to list?
Are there plans to make it public?
If there are plans to add version to list and make it public, I would feel awkward putting forward an incompatible VersionedList now. I would just implement the bare minimum I need and get by.
Original post below
Turns out that many of the times I wanted an immutable list, a VersionedList would have worked almost as well (sometimes even better).
Has anyone implemented a versioned list?
Is there a better, more Pythonic, concept that meets my needs? (See motivation below.)
What I mean by a versioned list is:
A class that behaves like a list
Any change to an instance or elements in the instance results in instance.version() being updated. So, if alist is a normal list:
a = VersionedList(alist)
a_version = a.version()
change(a)
assert a_version != a.version()
reverse_last_change(a)
If a list was hashable, hash() would achieve the above and meet all the needs identified in the motivation below. We need to define 'version()' in a way that doesn't have all of the same problems as 'hash()'.
If identical data in two lists is highly unlikely to ever happen except at initialization, we aren't going to have a reason to test for deep equality. From (https://docs.python.org/3.5/reference/datamodel.html#object.hash) The only required property is that objects which compare equal have the same hash value. If we don't impose this requirement on 'version()', it seems likely that 'version()' won't have all of the same problems that makes lists unhashable. So unlike hash, identical contents doesn't mean the same version
#contents of 'a' are now identical to original, but...
assert a_version != a.version()
b = VersionedList(alist)
c = VersionedList(alist)
assert b.version() != c.version()
For VersionList, it would be good if any attempt to modify the result of __get__ automatically resulted in a copy instead of modifying the underlying implementation data. I think that the only other option would be to have __get__ always return a copy of the elements, and this would be very inefficient for all of the use cases I can think of. I think we need to restrict the elements to immutable objects (deeply immutable, for example: exclude tuples with list elements). I can think of 3 ways to achieve this:
Only allow elements that can't contain mutable elements (int, str, etc are fine, but exclude tuples). (This is far too limiting for my cases)
Add code to __init__, __set__, etc to traverse inputs to deeply check for mutable sub-elements. (expensive, any way to avoid this?)
Also allow more complex elements, but require that they are deeply immutable. Perhaps require that they expose a deeply_immutable attribute. (This turns out to be easy for all the use cases I have)
Motivation:
If I am analyzing a dataset, I often have to perform multiple steps that return large datasets (note: since the dataset is ordered, it is best represented by a List not a set).
If at the end of several steps (ex: 5) it turns out that I need to perform different analysis (ex: back at step 4), I want to know that the dataset from step 3 hasn't accidentally been changed. That way I can start at step 4 instead of repeating steps 1-3.
I have functions (control-points, first-derivative, second-derivative, offset, outline, etc) that depend on and return array-valued objects (in the linear algebra sense). The base 'array' is knots.
control-points() depends on: knots, algorithm_enum
first-derivative() depends on: control-points(), knots
offset() depends on: first-derivative(), control-points(), knots, offset_distance
outline() depends on: offset(), end_type_enum
If offset_distance changes, I want to avoid having to recalculate first-derivative() and control-points(). To avoid recalculation, I need to know that nothing has accidentally changed the resultant 'arrays'.
If 'knots' changes, I need to recalculate everything and not depend on the previous resultant 'arrays'.
To achieve this, knots and all of the 'array-valued' objects could be VersionedList.
FYI: I had hoped to take advantage of an efficient class like numpy.ndarray. In most of my use cases, the elements logically have structure. Having to mentally keep track of multi-dimensions of indexes meant implementing and debugging the algorithms was many times more difficult with ndarray. An implementation based on lists of namedtuples of namedtuples turned out to be much more sustainable.
Private dicts in 3.6
In Python 3.6, dictionaries are now private (PEP 509) and compact (issue 27350), which track versions and preserve order respectively. These features are presently true when using the CPython 3.6 implementation. Despite the challenge, Jake VanderPlas demonstrates in his blog post a detailed demonstration of exposing this versioning feature from CPython within normal Python. We can use his approach to:
determine when a dictionary has been updated
preserve the order
Example
import numpy as np
d = {"a": np.array([1,2,3]),
"c": np.array([1,2,3]),
"b": np.array([8,9,10]),
}
for i in range(3):
print(d.get_version()) # monkey-patch
# 524938
# 524938
# 524938
Notice the version number does not change until the dictionary is updated, as shown below:
d.update({"c": np.array([10, 11, 12])})
d.get_version()
# 534448
In addition, the insertion order is preserved (the following was tested in restarted sessions of Python 3.5 and 3.6):
list(d.keys())
# ['a', 'c', 'b']
You may be able to take advantage of this new dictionary behavior, saving you from implementing a new datatype.
Details
For those interested, the latter get_version()is a monkey-patched method for any dictionary, implemented in Python 3.6 using the following modified code derived from Jake VanderPlas' blog post. This code was run prior to calling get_version().
import types
import ctypes
import sys
assert (3, 6) <= sys.version_info < (3, 7) # valid only in Python 3.6
py_ssize_t = ctypes.c_ssize_t
# Emulate the PyObjectStruct from CPython
class PyObjectStruct(ctypes.Structure):
_fields_ = [('ob_refcnt', py_ssize_t),
('ob_type', ctypes.c_void_p)]
# Create a DictStruct class to wrap existing dictionaries
class DictStruct(PyObjectStruct):
_fields_ = [("ma_used", py_ssize_t),
("ma_version_tag", ctypes.c_uint64),
("ma_keys", ctypes.c_void_p),
("ma_values", ctypes.c_void_p),
]
def __repr__(self):
return (f"DictStruct(size={self.ma_used}, "
f"refcount={self.ob_refcnt}, "
f"version={self.ma_version_tag})")
#classmethod
def wrap(cls, obj):
assert isinstance(obj, dict)
return cls.from_address(id(obj))
assert object.__basicsize__ == ctypes.sizeof(PyObjectStruct)
assert dict.__basicsize__ == ctypes.sizeof(DictStruct)
# Code for monkey-patching existing dictionaries
class MappingProxyStruct(PyObjectStruct):
_fields_ = [("mapping", ctypes.POINTER(DictStruct))]
#classmethod
def wrap(cls, D):
assert isinstance(D, types.MappingProxyType)
return cls.from_address(id(D))
assert types.MappingProxyType.__basicsize__ == ctypes.sizeof(MappingProxyStruct)
def mappingproxy_setitem(obj, key, val):
"""Set an item in a read-only mapping proxy"""
proxy = MappingProxyStruct.wrap(obj)
ctypes.pythonapi.PyDict_SetItem(proxy.mapping,
ctypes.py_object(key),
ctypes.py_object(val))
mappingproxy_setitem(dict.__dict__,
'get_version',
lambda self: DictStruct.wrap(self).ma_version_tag)

Why use dict.keys?

I recently wrote some code that looked something like this:
# dct is a dictionary
if "key" in dct.keys():
However, I later found that I could achieve the same results with:
if "key" in dct:
This discovery got me thinking and I began to run some tests to see if there could be a scenario when I must use the keys method of a dictionary. My conclusion however is no, there is not.
If I want the keys in a list, I can do:
keys_list = list(dct)
If I want to iterate over the keys, I can do:
for key in dct:
...
Lastly, if I want to test if a key is in dct, I can use in as I did above.
Summed up, my question is: am I missing something? Could there ever be a scenario where I must use the keys method?...or is it simply a leftover method from an earlier installation of Python that should be ignored?
On Python 3, use dct.keys() to get a dictionary view object, which lets you do set operations on just the keys:
>>> for sharedkey in dct1.keys() & dct2.keys(): # intersection of two dictionaries
... print(dct1[sharedkey], dct2[sharedkey])
In Python 2.7, you'd use dct.viewkeys() for that.
In Python 2, dct.keys() returns a list, a copy of the keys in the dictionary. This can be passed around an a separate object that can be manipulated in its own right, including removing elements without affecting the dictionary itself; however, you can create the same list with list(dct), which works in both Python 2 and 3.
You indeed don't want any of these for iteration or membership testing; always use for key in dct and key in dct for those, respectively.
Source: PEP 234, PEP 3106
Python 2's relatively useless dict.keys method exists for historical reasons. Originally, dicts weren't iterable. In fact, there was no such thing as an iterator; iterating over sequences worked by calling __getitem__, the element access method, with increasing integer indices until an IndexError was raised. To iterate over the keys of a dict, you had to call the keys method to get an explicit list of keys and iterate over that.
When iterators went in, dicts became iterable, because it was more convenient, faster, and all around better to say
for key in d:
than
for key in d.keys()
This had the side-effect of making d.keys() utterly superfluous; list(d) and iter(d) now did everything d.keys() did in a cleaner, more general way. They couldn't get rid of keys, though, since so much code already called it.
(At this time, dicts also got a __contains__ method, so you could say key in d instead of d.has_key(key). This was shorter and nicely symmetrical with for key in d; the symmetry is also why iterating over a dict gives the keys instead of (key, value) pairs.)
In Python 3, taking inspiration from the Java Collections Framework, the keys, values, and items methods of dicts were changed. Instead of returning lists, they would return views of the original dict. The key and item views would support set-like operations, and all views would be wrappers around the underlying dict, reflecting any changes to the dict. This made keys useful again.
Assuming you're not using Python 3, list(dct) is equivalent to dct.keys(). Which one you use is a matter of personal preference. I personally think dct.keys() is slightly clearer, but to each their own.
In any case, there isn't a scenario where you "need" to use dct.keys() per se.
In Python 3, dct.keys() returns a "dictionary view object", so if you need to get a hold of an unmaterialized view to the keys (which could be useful for huge dictionaries) outside of a for loop context, you'd need to use dct.keys().
key in dict
is much faster than checking
key in dict.keys()

What is a flexible, hybrid python collection object?

As a way to get used to python, I am trying to translate some of my code to python from Autohotkey_L.
I am immediately running into tons of choices for collection objects. Can you help me figure out a built in type or a 3rd party contributed type that has as much as possible, the functionality of the AutoHotkey_L object type and its methods.
AutoHotkey_L Objects have features of a python dict, list, and a class instance.
I understand that there are tradeoffs for space and speed, but I am just interested in functionality rather than optimization issues.
Don't write Python as <another-language>. Write Python as Python.
The data structure should be chosen just to have the minimal ability you need to use.
list — an ordered sequence of elements, with 1 flexible end.
collections.deque — an ordered sequence of elements, with 2 flexible ends (e.g. a queue).
set / frozenset — an unordered sequence of unique elements.
collections.Counter — an unordered sequence of non-unique elements.
dict — an unordered key-value relationship.
collections.OrderedDict — an ordered key-value relationship.
bytes / bytearray — a list of bytes.
array.array — a homogeneous list of primitive types.
Looking at the interface of Object,
dict would be the most suitable for finding a value by key
collections.OrderedDict would be the most suitable for the push/pop stuff.
when you need MinIndex / MaxIndex, where a sorted key-value relationship (e.g. red black tree) is required. There's no such type in the standard library, but there are 3rd party implementations.
It would be impossible to recommend a particular class without knowing how you intend on using it. If you are using this particular object as an ordered sequence where elements can be repeated, then you should use a list; if you are looking up values by their key, then use a dictionary. You will get very different algorithmic runtime complexity with the different data types. It really does not take that much time to determine when to use which type.... I suggest you give it some further consideration.
If you really can't decide, though, here is a possibility:
class AutoHotKeyObject(object):
def __init__(self):
self.list_value = []
self.dict_value = {}
def getDict(self):
return self.dict_value
def getList(self):
return self.list_value
With the above, you could use both the list and dictionary features, like so:
obj = AutoHotKeyObject()
obj.getList().append(1)
obj.getList().append(2)
obj.getList().append(3)
print obj.getList() # Prints [1, 2, 3]
obj.getDict()['a'] = 1
obj.getDict()['b'] = 2
print obj.getDict() # Prints {'a':1, 'b':2}

Is there a way to set multiple defaults on a Python dict using another dict?

Suppose I've got two dicts in Python:
mydict = { 'a': 0 }
defaults = {
'a': 5,
'b': 10,
'c': 15
}
I want to be able to expand mydict using the default values from defaults, such that 'a' remains the same but 'b' and 'c' are filled in. I know about dict.setdefault() and dict.update(), but each only do half of what I want - with dict.setdefault(), I have to loop over each variable in defaults; but with dict.update(), defaults will blow away any pre-existing values in mydict.
Is there some functionality I'm not finding built into Python that can do this? And if not, is there a more Pythonic way of writing a loop to repeatedly call dict.setdefaults() than this:
for key in defaults.keys():
mydict.setdefault(key, defaults[key])
Context: I'm writing up some data in Python that controls how to parse an XML tree. There's a dict for each node (i.e., how to process each node), and I'd rather the data I write up be sparse, but filled in with defaults. The example code is just an example... real code has many more key/value pairs in the default dict.
(I realize this whole question is but a minor quibble, but it's been bothering me, so I was wondering if there was a better way to do this that I am not aware of.)
Couldnt you make mydict be a copy of default, That way, mydict would have all the correct values to start with?
mydict = default.copy()
If you don't mind creating a new dictionary in the process, this will do the trick:
newdict = dict(defaults)
newdict.update(mydict)
Now newdict contains what you need.
Since Python 3.9, you can do:
mydict = defaults | mydict
I found this solution to be the most elegant in my usecase.
You can do this the same way Python's collections.DefaultDict works:
class MultiDefaultDict(dict):
def __init__(self, defaults, **kwargs):
self.defaults = defaults
self.update(kwargs)
def __missing__(self, key):
return self.defaults[key]
>>> mydict2 = MultiDefaultDict(defaults, a=0)
>>> mydict2['a']
0
>>> mydict2['b']
10
>>> mydict2
{'a': 0}
The other solutions posted so far duplicate all the default values; this one shares them, as requested. You may or may not want to override other dict methods like __contains__(), __iter__(), items(), keys(), values() -- this class as defined here iterates over the non-default items only.
defaults.update(mydict)
Personally I like to append the dictionary object. It works mostly like a dictionary except that you have to create the object first.
class d_dict(dict):
'Dictionary object with easy defaults.'
def __init__(self,defaults={}):
self.setdefault(defaults)
def setdefault(self,defaults):
for key, value in defaults.iteritems():
if not key in self:
dict.__setitem__(self,key,value)
This provides the exact same functionality as the dict type except that it overrides the setdefault() method and will take a dictionary containing one or more items. You can set the defaults at creation.
This is just a personal preference. As I understand all that dict.setdefault() does is set the items which haven't been set yet. So probably the simplest in place option is:
new_dict = default_dict.copy()
new_dict.update({'a':0})
However, if you do this more than once you might make a function out of this. At this point it may just be easier to use a custom dict object, rather than constantly adding defaults to your dictionaries.

Categories