Dictionary view objects vs sets

Dictionary view objects vs sets - python

I have been reading about these dictionary view objects that are returned by the likes of dict.keys(), including the posts on here about the subject. I understand they act as windows to the dictionary's contents without storing a copy of said contents explicitly and in so are more efficient than dynamically updating a list of keys. I also found they are containers (allow use of in operator) but are not sequences (not indexable), although they are iterable.
Overall this sounds to me like a set, since they have access to the dictionary's hash table they even offer the use of set-like operations like intersection/difference. One difference I can think of is that a set, while mutable like these view objects, can only store immutable (and therefore hashable) objects.
However, since a dictionary value doesn't have to be immutable, the values and items view objects are essentially sets with mutable contents, expectedly not supportive of set-like operations (subtraction/intersection). This makes me sceptical of considering these view objects as "a set with a reference to the dictionary".
My question is: are these view objects entirely different to sets but happen to have similar properties? Or are they implemented using sets? Any other major differences between the two? And most importantly - can it be damaging to consider them as "basically sets"?

The implicit point of your comparison is that dict.keys() and set elements can't have duplicates. However, the set-like Dictionary view obtained from the keys still retains order, while the set does not.
Duplicate dictionary keys:
If a key occurs more than once, the last value for that key becomes the corresponding value in the new dictionary.
Duplicate set elements:
A set object is an unordered collection of distinct hashable objects.
From the above, sets are unordered while in the current Python version dictionaries maintain insertion order:
Changed in version 3.7: Dictionary order is guaranteed to be insertion order.
Because dictionaries have an insertion order they can be reversed, while such operation in a set would be meaningless:
Dictionaries and dictionary views are reversible.
Finally, a set can be altered, deleted and inserted from. A Dictionary view object only allows looking at contents, not changing them.
My question is, are these view objects entirely different to sets but happen to have similar properties? Or are they implemented using sets?
The documentation makes no claim about implementation details.
Any other major differences between the two?
The documentations state the difference between "Keys views" and "items view" or "values views".
Keys views are set-like (...)
If all values are hashable, so that (key, value) pairs are unique and hashable, then the items view is also set-like.
(Values views are not treated as set-like (...))

Related

Dictionary versus Nested list/array?

In Python, dictionaries are used for key/value pairs. Nested lists, or arrays, however, can do the same thing with two-value lists inside a big list, for example [[1, 2], [3, 4]].
Arrays have more uses and are actually faster, but dictionaries are more straightforward. What are the pros and cons of using a dictionary versus an array?

If you use a list of key/value pairs, getting the value corresponding to a key requires a linear search, which is O(n). If the list is sorted by the keys you can improve that to O(log n), but then adding to the list becomes more complicated and expensive since you have to keep it sorted.
Dictionaries are implemented as hash tables, so getting the value corresponding to a key is amortized constant time.
Furthermore, Python provides convenient syntax for looking up keys in a dictionary. You can write dictname[key]. Since lists aren't intended to be used as lookup tables, there's no corresponding syntax for finding a value by key there. listname[index] gets an element by its numeric position, not looking up the key in a key/value pair.
Of course, if you want to use an association list, there's nothing stopping you from writing your own functions to do so. You could also embed them in a class and define appropriate methods so you can use [] syntax to get and set them.

Maintaining a sorted view of dict values?

I have a dict containing about 50,000 integer values, and a set that contains the keys of 100 of them. My inner loop increments or decrements the values of the dict items in an unpredictable way.
Periodically I need to replace one member of the set with the key of the largest element not already in the set. As an aside, if the dict items were sorted, the order of the sort would change slightly, not dramatically, between invocations of this routine.
Re-sorting the entire dict every time seems wasteful, although maybe less so given that it's already "almost" sorted. While I may be guilty of premature optimization, performance will matter as this will run a very large number of iterations, so I thought it worth asking my betters whether there's an obviously more efficient and pythonic approach.
I'm aware of the notion of dict "views" - Windows onto the contents which are updated as the contents change. Is there such a thing as a "sorted view"?

Instead of using a dict you could use a Counter object which has a neat most_common(n) method which
Return a list of the n most common elements and their counts from the most common to the least.

difference between python set and dict "internally"

Can anybody tell me how the internal implementation of set and dict is different in python? Do they use the same data structure in the background?
++ In theory, one can use dict to achieve set functionality.

In CPython, sets and dicts use the same basic data structure. Sets tune it slightly differently, but it is basically a hash table just like dictionaries.
You can take a look at the implementation details in the C code: setobject.c and dictobject.c; the implementations are very close; the setobject.c implementation was started as a copy of dictobject.c originally. dictobject.c has more implementation notes and tracing calls, but the actual implementations of the core functions differs only in details.
The most obvious difference is that the keys in the hash table are not used to reference values, like in dictionaries, so a setentry struct only has a cached hash and key, the dictentry struct adds a value pointer.
Before we had the built-in set, we had the sets module, a pure-Python implementation that used dict objects to track the set values as keys. And in Python versions before the sets module was available, we did just that: use dict objects with keys as the set values, to track unique, unordered values.

These two are use the same datastructure in the backend. e.g in sets you cannot store duplicate values but in dict you can store multople same values and you can turn the dict to sets by changing the behavior of dict

Set of non hashable objects in python

Is there an equivalent to python set for non-hashable objects? (For instance custom class that can be compared to one another but not hashed?)

If your values are not hashable, then there is no point in using set.
Just use a list instead. If all your objects can do is test for equality, then you'd have to scan each element every time to test for membership. obj in listvalue does just that, scan the list until an equality match is found:
if not someobj in somelist:
somelist.append(someobj)
would give you a list of 'unique' values.
Yes, this is going to be slower than a set, but sets can only achieve O(1) complexity through hashes.
If your objects are orderable, you could speed up operations by using the bisect module to bring down tests to O(log N) complexity, perhaps. Make sure you insert new values using the information gleaned from the bisection test to retain the order.

There is the sortedset class from the blist library, which offers a set-like api for comparable (and potentially non-hashable) objects, using a storage mechanism based on a sorted list.

Why can tuples contain mutable items?

If a tuple is immutable then why can it contain mutable items?
It is seemingly a contradiction that when a mutable item such as a list does get modified, the tuple it belongs to maintains being immutable.

That's an excellent question.
The key insight is that tuples have no way of knowing whether the objects inside them are mutable. The only thing that makes an object mutable is to have a method that alters its data. In general, there is no way to detect this.
Another insight is that Python's containers don't actually contain anything. Instead, they keep references to other objects. Likewise, Python's variables aren't like variables in compiled languages; instead the variable names are just keys in a namespace dictionary where they are associated with a corresponding object. Ned Batchhelder explains this nicely in his blog post. Either way, objects only know their reference count; they don't know what those references are (variables, containers, or the Python internals).
Together, these two insights explain your mystery (why an immutable tuple "containing" a list seems to change when the underlying list changes). In fact, the tuple did not change (it still has the same references to other objects that it did before). The tuple could not change (because it did not have mutating methods). When the list changed, the tuple didn't get notified of the change (the list doesn't know whether it is referred to by a variable, a tuple, or another list).
While we're on the topic, here are a few other thoughts to help complete your mental model of what tuples are, how they work, and their intended use:
Tuples are characterized less by their immutability and more by their intended purpose.
Tuples are Python's way of collecting heterogeneous pieces of information under one roof. For example,
s = ('www.python.org', 80)
brings together a string and a number so that the host/port pair can be passed around as a socket, a composite object. Viewed in that light, it is perfectly reasonable to have mutable components.
Immutability goes hand-in-hand with another property, hashability. But hashability isn't an absolute property. If one of the tuple's components isn't hashable, then the overall tuple isn't hashable either. For example, t = ('red', [10, 20, 30]) isn't hashable.
The last example shows a 2-tuple that contains a string and a list. The tuple itself isn't mutable (i.e. it doesn't have any methods that for changing its contents). Likewise, the string is immutable because strings don't have any mutating methods. The list object does have mutating methods, so it can be changed. This shows that mutability is a property of an object type -- some objects have mutating methods and some don't. This doesn't change just because the objects are nested.
Remember two things. First, immutability is not magic -- it is merely the absence of mutating methods. Second, objects don't know what variables or containers refer to them -- they only know the reference count.
Hope, this was useful to you :-)

That's because tuples don't contain lists, strings or numbers. They contain references to other objects.1 The inability to change the sequence of references a tuple contains doesn't mean that you can't mutate the objects associated with those references.2
1. Objects, values and types (see: second to last paragraph)
2. The standard type hierarchy (see: "Immutable sequences")

As I understand it, this question needs to be rephrased as a question about design decisions: Why did the designers of Python choose to create an immutable sequence type that can contain mutable objects?
To answer this question, we have to think about the purpose tuples serve: they serve as fast, general-purpose sequences. With that in mind, it becomes quite obvious why tuples are immutable but can contain mutable objects. To wit:
Tuples are fast and memory efficient: Tuples are faster to create than lists because they are immutable. Immutability means that tuples can be created as constants and loaded as such, using constant folding. It also means they're faster and more memory efficient to create because there's no need for overallocation, etc. They're a bit slower than lists for random item access, but faster again for unpacking (at least on my machine). If tuples were mutable, then they wouldn't be as fast for purposes such as these.
Tuples are general-purpose: Tuples need to be able to contain any kind of object. They're used to (quickly) do things like variable-length argument lists (via the * operator in function definitions). If tuples couldn't hold mutable objects, they would be useless for things like this. Python would have to use lists, which would probably slow things down, and would certainly be less memory efficient.
So you see, in order to fulfill their purpose, tuples must be immutable, but also must be able to contain mutable objects. If the designers of Python wanted to create an immutable object that guarantees that all the objects it "contains" are also immutable, they would have to create a third sequence type. The gain is not worth the extra complexity.

First of all, the word "immutable" can mean many different things to different people. I particularly like how Eric Lippert categorized immutability in his blog post [archive 2012-03-12]. There, he lists these kinds of immutability:
Realio-trulio immutability
Write-once immutability
Popsicle immutability
Shallow vs deep immutability
Immutable facades
Observational immutability
These can be combined in various ways to make even more kinds of immutability, and I'm sure more exist. The kind of immutability you seems interested in deep (also known as transitive) immutability, in which immutable objects can only contain other immutable objects.
The key point of this is that deep immutability is only one of many, many kinds of immutability. You can adopt whichever kind you prefer, as long as you are aware that your notion of "immutable" probably differs from someone else's notion of "immutable".

You cannot change the id of its items. So it will always contain the same items.
$ python
>>> t = (1, [2, 3])
>>> id(t[1])
12371368
>>> t[1].append(4)
>>> id(t[1])
12371368

I'll go out on a limb here and say that the relevant part here is that while you can change the contents of a list, or the state of an object, contained within a tuple, what you can't change is that the object or list is there. If you had something that depended on thing[3] being a list, even if empty, then I could see this being useful.

One reason is that there is no general way in Python to convert a mutable type into an immutable one (see the rejected PEP 351, and the linked discussion for why it was rejected). Thus, it would be impossible to put various types of objects in tuples if it had this restriction, including just about any user-created non-hashable object.
The only reason that dictionaries and sets have this restriction is that they require the objects to be hashable, since they are internally implemented as hash tables. But note that, ironically, dictionaries and sets themselves are not immutable (or hashable). Tuples do not use an object's hash, so its mutability does not matter.

A tuple is immutable in the sense that the tuple itself can not expand or shrink, not that all the items contained themselves are immutable. Otherwise tuples are dull.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.