Examining the working of cmp function for dictionaries - python

Consider two dictionaries as follows:
d1={"Name":"John","Age":47}
d2={"Name":"Margaret","Age":35}
On executing the following statement:
>>>cmp(d1,d2)
1
That implies that since the keys are identical therefore it compares the values and gives priority to the value associated with the "Age" key (perhaps because lexicographically it comes first). This is supported by the fact that when I alter the dictionaries:
d1={"Name":"John","Age":47}
d2={"Name":"Jack","Age":47}
The statement returns 1. Since the sum of the ASCII values is greater for d1.
But consider this pair of dictionaries:
d1={"Name":"John","Age":47}
d2={"Name":"Jzan","Age":47}
Now the statement returns -1.
Why is that? Is it that instead of comparing the sum of the ASCII values, it compares each character's value, one by one?
Also, if the keys themselves are different, on what basis does the function compare?

Most of the programming language implement the comparison of strings according to dictionary order (the way that words are ordered in a dictionary), i.e. compare the character's value one by one and return the first difference.
If the keys themselves are different, the return values are actually depends on the implementation. You can find more information from here: Is there a description of how __cmp__ works for dict objects in Python 2?. However it is not recommended to rely on this feature in your code.

Related

Time-complexity of checking if two frozensets are equal in Python

Couldn't find the details of this anywhere online, when comparing two frozensets does Python iterate through the elements in one of the sets or does it check the hash values of the frozensets since frozensets are hashable?
Since the reference docs don't say anything about this, it's implementation-dependent, so there is no answer short of looking at the source code for the version of Python you're using (in your CPython distribution's Objects/setobject.c). Looking at the source for Python 3.7.0, the answer is "maybe" ;-)
Equality first checks whether the frozensets have the same size (len()). If not, they can't be equal, so False is returned at once.
Else the hash codes are compared if they've already been computed. If they have already been computed, then False is returned at once if the hash codes aren't equal. Else element-by-element code is invoked to check whether one is a subset of the other.
A hash code for a frozenset isn't computed just for the heck of it - that would be an expense that may not pay off. So something has to force it. The primary use case for frozensets at the start was to allow sets of sets, and in that context hash codes will be computed as a normal part of adding a frozenset to a containing set. The C-level set implementation contains a slot to record the hash if and when it's computed, which is initialized to -1 (a reserved value that means "no hash code known" internally).
hash(x) == hash(y) does not imply that x == y:
>>> help(hash)
hash(...)
hash(object) -> integer
Return a hash value for the object. Two objects with the same value have
the same hash value. The reverse is not necessarily true, but likely.
so to compare two frozenset values for equality, you still need to check that both sets have the same size, then check if every element in one is also in the other.
I leave it as an exercise for the reader with lots of spare time to find two different frozensets with the same hash value.

Using tuples in a linked list in python

My teacher wants us to recreate the dict class in Python using tuples and linkedlists (for collisions). One of the methods is used to return a value given a key. I know how to do this in a tuple ( find the key at location[0] and return location[1]) but I have no idea how I would do this in the case of a collision. Any suggestions? If more info is needed please let me know
It sounds like you have some sort of hash to get a shortlist of possibilities, so, you hash your key to a small-ish number, e.g. 0-256 (as an example, it might hash to 63). You can then go directly to your data at index 63. Because you might have more than one item that hashes to 63, your entry for 63 will contain a list of (key,value) pairs, that you would have to search one by one - effectively, you've reduced your search area by 255/256th of the full list. Optionally, when the collisions for a particular key exceeds a threshold, you could repeat the process - so you get mydict[63][92], again reducing the problem size by the same factor. You could repeat this indefinitely.

Why does a set display in same order if sets are unordered?

I'm taking a first look at the python language from Python wikibook.
For sets the following is mentioned:
We can also have a loop move over each of the items in a set. However, since sets are unordered, it is undefined which order the iteration will follow.
and the code example given is :
s = set("blerg")
for letter in s:
print letter
Output:
r b e l g
When I run the program I get the results in the same order, no matter how many times I run. If sets are unordered and order of iteration is undefined, why is it returning the set in the same order? And what is the basis of the order?
They are not randomly ordered, they are arbitrarily ordered. It means you should not count on the order of insertions being maintained as the actual internal implementation details determine the order instead.
The order depends on the insertion and deletion history of the set.
In CPython, sets use a hash table, where inserted values are slotted into a sparse table based on the value returned from the hash() function, modulo the table size and a collision handling algorithm. Listing the set contents then returns the values as ordered in this table.
If you want to go into the nitty-gritty technical details then look at Why is the order in dictionaries and sets arbitrary?; sets are, at their core, dictionaries where the keys are the set values and there are no associated dictionary values. The actual implementation is a little more complicated, as always, but that answer will suffice to get you most of the way there. Then look at the C source code for set for the rest of those details.
Compare this to lists, which do have a fixed order that you can influence; you can move items around in the list and the new ordering would be maintained for you.

Python Efficiency of the in statement

Just a quick question, I know that when looking up entries in a dictionary there's a fast efficient way of doing it:
(Assuming the dictionary is ordered in some way using collections.OrderedDict())
You start at the middle of the dictionary, and find whether the desired key is off to one half or another, such as when testing the position of a name in an alphabetically ordered dictionary (or in rare cases dead on). You then check the next half, and continue this pattern until the item is found (meaning that with a dictionary of 1000000 keys you could effectively find any key within 20 iterations of this algorithm).
So I was wondering, if I were to use an in statement (i.e. if a in somedict:), would it use this same method of checking for the desired key? Does it use a faster/slower algorithm?
Nope. Python's dictionaries basically use a hash table (it actually uses an modified hash table to improve speed) (I won't bother to explain a hash table; the linked Wikipedia article describes it well) which is a neat structure which allows ~O(1) (very fast) access. in looks up the object (the same thing that dict[object] does) except it doesn't return the object, which is the most optimal way of doing it.
The code for in for dictionaries contains this line (dk_lookup() returns a hash table entry if it exists, otherwise NULL (the equivalent of None in C, often indicating an error)):
ep = (mp->ma_keys->dk_lookup)(mp, key, hash, &value_addr);

Dictionary into dictionary in python

Ok, this one should be simple. I have 3 dictionaries. They are all made, ordered, and filled to my satisfaction but I would like to put them all in an overarching dictionary so I can reference and manipulate them more easily and efficiently.
Layer0 = {}
Layer1 = {}
Layer2 = {}
here they are when created, and afterwards I feebly tried different things based on SO questions:
Layers = {Layer0, Layer1, Layer2}
which raised a syntax error
Layers = {'Layer0', 'Layer1', 'Layer2'}
which raised another syntax error
(Layers is the Dictionary I'm trying to create that will have all the previously made dictionaries within it)
All the other examples I found on SO have been related to creating dictionaries within dictionaries in order to fill them (or filling them simultaneously) and since I already coded a large number of lines to make these dictionaries, I'd rather put them into a dictionary after the fact instead of re-writing code.
It would be best if the order of the dictionaries are preserved when put into Layers
Does anyone know if this is possible and how I should do it?
Dictionary items have both a key and a value.
Layers = {'Layer0': Layer0, 'Layer1': Layer1, 'Layer2': Layer2}
Keep in mind that dictionaries don't have an order, since a dictionary is a hash table (i.e. a mapping from your key names to a unique hash value). Using .keys() or .values() generates a list, which does have an order, but the dictionary itself doesn't.
So when you say "It would be best if the order of the dictionaries are preserved when put into Layers" - this doesn't really mean anything. For example, if you rename your dictionaries from "Layer1, Layer2, Layer3" to "A, B, C," you'll see that Layers.keys() prints in the order "A, C, B." This is true regardless of the order you used when building the dictionary. All this shows is that the hash value of "C" is less than that of "B," and it doesn't tell you anything about the structure of your dictionary.
This is also why you can't directly iterate over a dictionary (you have to iterate over e.g. a list of the keys).
As a side note, this hash function is what allows a dictionary to do crazy fast lookups. A good hash function will give you constant time [O(1)] lookup, meaning you can check if a given item is in your dictionary in the same amount of time whether the dictionary contains ten items or ten million. Pretty cool.

Categories