Strange Python set and hash behaviour - how does this work?

Strange Python set and hash behaviour - how does this work? - python

I have a class called GraphEdge which I would like to be uniquely defined within a set (the built-in set type) by its tail and head members, which are set via __init__.
If I do not define __hash__, I see the following behaviour:
>>> E = GraphEdge('A', 'B')
>>> H = GraphEdge('A', 'B')
>>> hash(E)
139731804758160
>>> hash(H)
139731804760784
>>> S = set()
>>> S.add(E)
>>> S.add(H)
>>> S
set([('A', 'B'), ('A', 'B')])
The set has no way to know that E and H are the same by my definition, since they have differing hashes (which is what the set type uses to determine uniqueness, to my knowledge), so it adds both as distinct elements. So I define a rather naive hash function for GraphEdge like so:
def __hash__( self ):
return hash( self.tail ) ^ hash( self.head )
Now the above works as expected:
>>> E = GraphEdge('A', 'B')
>>> H = GraphEdge('A', 'B')
>>> hash(E)
409150083
>>> hash(H)
409150083
>>> S = set()
>>> S.add(E)
>>> S.add(H)
>>> S
set([('A', 'B')])
But clearly, ('A', 'B') and ('B', 'A') in this case will return the same hash, so I would expect that I could not add ('B', 'A') to a set already containing ('A', 'B'). But this is not what happens:
>>> E = GraphEdge('A', 'B')
>>> H = GraphEdge('B', 'A')
>>> hash(E)
409150083
>>> hash(H)
409150083
>>> S = set()
>>> S.add(E)
>>> S.add(H)
>>> S
set([('A', 'B'), ('B', 'A')])
So is the set type using the hash or not? If so, how is the last scenario possible? If not, why doesn't the first scenario (no __hash__ defined) work? Am I missing something?
Edit: For reference for future readers, I already had __eq__ defined (also based on tail and head).

You have a hash collision. On hash collision, the set uses the == operator to check on whether or not they are truly equal to each other.

It's important to understand how hash and == work together, because both are used by sets. For two values x and y, the important rule is that:
x == y ==> hash(x) == hash(y)
(x equals y implies that x and y's hashes are equal). But, the inverse is not true: two unequal values can have the same hash.
Sets (and dicts) will use the hash to get an approximation to equality, but will use the real equality operation to figure out if two values are the same or not.

You should always define both __eq__() and __hash__() if you need at least one of them. If the hashes of two objects are equal, an extra __eq__() check is done to verify uniqueness.

Related

Why can't you use a memory address of an unhashable type as a key for a dict?

I understand that since unhashable types like lists are mutating, they cannot be used as a key for hashing. However, I don't see why their memory address (which I don't believe changes) can be used?
For example:
my_list = [1,2,3]
my_dict = {my_list: 1} #error
my_dict = {id(my_list): 1} # no error

You actually can use the memory address of an object as a hash function if you extend list, set, etc.
The primary reason using a memory address for a hash is bad is because if two objects are equal (a == b evaluates to True) we also want their hashes to be equal (hash(a) == hash(b) to be True). Otherwise, we could get unintended behavior.
To see an example of this, let's create our own class that extends list and use the memory address of the object as a hash function.
>>> class HashableList(list):
def __hash__(self):
return id(self) # Returns the memory address of the object
Now we can create two hashable lists! Our HashableList uses the same constructor as python's built-in list.
>>> a = HashableList((1, 2, 3))
>>> b = HashableList((1, 2, 3))
Sure enough, as we would expect, we get
>>> a == b
True
And we can hash our lists!
>>> hash(a)
1728723187976
>>> hash(b)
1728723187816
>>> hash(a) == hash(b)
False
If you look at the last 3 digits, you'll see a and b are close to each other in memory, but aren't in the same location. Since we're using the memory address as our hash, that also means their hashes aren't equal.
What happens if compare the built in hash of two equal tuples (or any other hashable object)?
>>> y = ('foo', 'bar')
>>> z = ('foo', 'bar')
>>> y == z
True
>>> hash(y)
-1256824942587948134
>>> hash(z)
-1256824942587948134
>>> hash(y) == hash(z)
True
If you try this on your own, your hash of ('foo', 'bar') won't match mine, since the hashes of strings changes every time a new session of python starts. The important thing is that, in the same session hash(y) will always equal hash(z).
Let's see what happens if we make a set, and play around with the HashableList objects and the tuples we made.
>>> s = set()
>>> s.add(a)
>>> s.add(y)
>>> s
{[1, 2, 3], ('foo', 'bar')}
>>> a in s # Since hash(a) == hash(a), we can find a in our set
True
>>> y in s # Since hash(y) == hash(y), we can find y in our set
True
>>> b in s
False
>>> z in s
True
Even though a == b, we couldn't find a in the set because hash(b) doesn't equal hash(a), so we couldn't find our equivalent list in the set!

(python) How do I obtain specific entries from a dictionary (using keys) as I do with an array?

With an array x=['A','B','C'], I can obtain several elements from it by just stating the index: eg.print(x[0:2]) yields ['A','B'].
Now for a similar (ordered) dictionary x={1:'A', 2:'B', 3:'C'}, how would I obtain 'A' and 'B' in the same way, by referencing the keys 1 and 2? Trying a method similar to the array above gives me an error:
TypeError: unhashable type: 'slice'
Note that the key tied to the entries are important, so it won't help converting the dictionary into a list.
Also, I plan on doing this to a lot of entries (>100), so calling each individual one won't be useful. My real program will involve numbered keys starting from 100 and calling keys 200 to 300, for example.

The way to retrieve a value from a dictionary is dict_name[key]:
print x[1], x[2]
>> 'A', 'B'
Note that if the key doesn't exist this will raise a KeyError.
A way around it is to use get(key, default_value):
print x[9]
>> KeyError
print x.get(9, None)
>> None
You can use a for loop in order to check multiple keys:
for potential_key in range(10):
print x[potential_key]

You can use operator.itemgetter:
>>> from operator import itemgetter
>>> x = {1:'A', 2:'B', 3:'C'}
>>> itemgetter(1, 2)(x)
('A', 'B')
>>> get_1_2 = itemgetter(1, 2) # Alternative: Save the result function
>>> get_1_2(x) # call it later
('A', 'B')

You can map get() to the object that describes which keys you want, such as range() (which has syntax and results similar to that of slicing):
>>> x={1:'A', 2:'B', 3:'C'}
>>> print(*map(x.get, range(1,3)))
A B
Or a generator expression instead of map():
>>> x={1:'A', 2:'B', 3:'C'}
>>> print(*(x.get(item) for item in range(1,3)))
A B

>> list(x.values())[0:2]
This gives the output: ['A', 'B']
Since you mentioned 'ordered' dictionary, this could be a possible solution but without referencing the keys.

Replace one python object with another everywhere

How do I replace a python object everywhere with another object?
I have two classes, SimpleObject and FancyObject. I've created a SimpleObject, and have several references to it. Now I'd like to create a FancyObject, and make all those references point to the new object.
a = SimpleObject()
some_list.append(a)
b = FancyObject()
a = b is not what I want, it just changes what a points to. I read the following would work, but doesn't. I get an error "Attribute __dict__ is not writable":
a.__dict__ = b.__dict__
What I want is the equivalent of (pseudo-C):
*a = *b
I know this is hacky, but is there any way to accomplish this?

There's no way. It'd let you mutate immutable objects and cause all sorts of nastiness.
x = 1
y = (x,)
z = {x: 3}
magic_replace(x, [1])
# x is now a list!
# The contents of y have changed, and z now has an unhashable key.
x = 1 + 1
# Is x 2, or [1, 1], or something stranger?

You can put that object in global namespace of separate module and than monkey patch it when you need.
objstore.py:
replaceable = object()
sample.py:
import objstore
b = object()
def isB():
return objstore.replaceable is b
if __name__ == '__main__':
print isB()#False
objstore.replaceable = b
print isB()#True
P.S. Rely on monkey patching is a symptom of bad design

PyJack has a function replace_all_refs that replaces all references to an object in memory.
An example from the docs:
>>> item = (100, 'one hundred')
>>> data = {item: True, 'itemdata': item}
>>>
>>> class Foobar(object):
... the_item = item
...
>>> def outer(datum):
... def inner():
... return ("Here is the datum:", datum,)
...
... return inner
...
>>> inner = outer(item)
>>>
>>> print item
(100, 'one hundred')
>>> print data
{'itemdata': (100, 'one hundred'), (100, 'one hundred'): True}
>>> print Foobar.the_item
(100, 'one hundred')
>>> print inner()
('Here is the datum:', (100, 'one hundred'))
Calling replace_all_refs
>>> new = (101, 'one hundred and one')
>>> org_item = pyjack.replace_all_refs(item, new)
>>>
>>> print item
(101, 'one hundred and one')
>>> print data
{'itemdata': (101, 'one hundred and one'), (101, 'one hundred and one'): True}
>>> print Foobar.the_item
(101, 'one hundred and one')
>>> print inner()
('Here is the datum:', (101, 'one hundred and one'))

You have a number of options:
Design this in from the beginning, using the Facade pattern (i.e. every object in your main code is a proxy for something else), or a single mutable container (i.e. every variable holds a list; you can change the contents of the list through any such reference). Advantages are that it works with the execution machinery of the language, and is relatively easily discoverable from the affected code. Downside: more code.
Always refer to the same single variable. This is one implementation of the above. Clean, nothing fancy, clear in code. I would recommend this by far.
Use the debug, gc, and introspection features to hunt down every object meeting your criterion and alter the variables while running. The disadvantage is that the value of variables will change during code execution, without it being discoverable from the affected code. Even if the change is atomic (eliminating a class of errors), because this can change the type of a variable after the execution of code which determined the value was of a different type, introduces errors which cannot reasonably be anticipated in that code. For example
a = iter(b) # will blow up if not iterable
[x for x in b] # before change, was iterable, but between two lines, b was changed to an int.
More subtly, when discriminating between string and non-string sequences (because the defining feature of strings is that iterating them also yields strings, which are themselves iterable), when flattening a structure, code may be broken.
Another answer mentions pyjack which implements option 3. Although it may work, it has all of the problems mentioned. This is likely to be appropriate only in debugging and development.

Take advantage of mutable objects such as a list.
a = [SimpleObject()]
some_list.append(a)
b = FancyObject()
a[0] = b
Proof that this works:
class SimpleObject():
def Who(self):
print 'SimpleObject'
class FancyObject():
def Who(self):
print 'FancyObject'
>>> a = [SimpleObject()]
>>> a[0].Who()
SimpleObject
>>> some_list = []
>>> some_list.append(a)
>>> some_list[0][0].Who()
SimpleObject
>>> b = FancyObject()
>>> b.Who()
FancyObject
>>> a[0] = b
>>> some_list[0][0].Who()
FancyObject

Howto do reference to ints by name in Python

I want to have a a reference that reads as "whatever variable of name 'x' is pointing to" with ints so that it behaves as:
>>> a = 1
>>> b = 2
>>> c = (a, b)
>>> c
(1, 2)
>>> a = 3
>>> c
(3, 2)
I know I could do something similar with lists by doing:
>>> a = [1]
>>> b = [2]
>>> c = (a, b)
>>> c
([1], [2])
>>> a[0] = 3
>>> c
([3], [2])
but this can be easily lost if one assigns a or b to something instead of their elements.
Is there a simple way to do this?

No, there isn't a direct way to do this in Python. The reason is that both scalar values (numbers) and tuples are immutable. Once you have established a binding from a name to an immutable value (such as the name c with the tuple (1, 2)), nothing you do except reassigning c can change the value it's bound to.
Note that in your second example, although the tuple is itself immutable, it contains references to mutable values. So it appears as though the tuple changes, but the identity of the tuple remains constant and only the mutable parts are changing.

Whatever possible solution you come up with the second last line will always destroy it:
a = 3
This will assign a completely new content to the variable. Unless a stands for a property of an object or something (or a key in a list, as you did in your own example), you won't be able to have a relation between the first and last a.

If you just need the current values to be placed in a tuple on the fly you could use a lambda. You'll have to call c, not just return it or use it, but that may be acceptable in your situation. Something like this:
>>> a = 1
>>> b = 2
>>> c = lambda: (a, b)
>>> c()
(1, 2)
>>> a = 3
>>> c()
(3, 2)

There isn't a way in Python, not only because numbers are immutable, but also because you don't have pointers. Wrapping the value in a list simulates that you have pointers, so that's the best you can do.

class ByRefValue(object):
def __init__(self, value):
self.value = value
Pass it around wherever you like, remembering that you need to access the value member rather than the entire object.
Alternatively, globals().get('a', 0) will return the value of a if it is in the global namespace (or zero if it isn't).
Finally:
import threading
tls = threading.local()
tls.a = 1
If you import tls into every module where you need it, you will access the same value for a on each thread. Depending on how your program is set up, this may be acceptable, ideal or useless.

You can try creating your own pointer class and your own pointer storage object to emulate the system's internal stack.

Python: return the index of the first element of a list which makes a passed function true

The list.index(x) function returns the index in the list of the first item whose value is x.
Is there a function, list_func_index(), similar to the index() function that has a function, f(), as a parameter. The function, f() is run on every element, e, of the list until f(e) returns True. Then list_func_index() returns the index of e.
Codewise:
>>> def list_func_index(lst, func):
for i in range(len(lst)):
if func(lst[i]):
return i
raise ValueError('no element making func True')
>>> l = [8,10,4,5,7]
>>> def is_odd(x): return x % 2 != 0
>>> list_func_index(l,is_odd)
3
Is there a more elegant solution? (and a better name for the function)

You could do that in a one-liner using generators:
next(i for i,v in enumerate(l) if is_odd(v))
The nice thing about generators is that they only compute up to the requested amount. So requesting the first two indices is (almost) just as easy:
y = (i for i,v in enumerate(l) if is_odd(v))
x1 = next(y)
x2 = next(y)
Though, expect a StopIteration exception after the last index (that is how generators work). This is also convenient in your "take-first" approach, to know that no such value was found --- the list.index() function would throw ValueError here.

One possibility is the built-in enumerate function:
def index_of_first(lst, pred):
for i,v in enumerate(lst):
if pred(v):
return i
return None
It's typical to refer a function like the one you describe as a "predicate"; it returns true or false for some question. That's why I call it pred in my example.
I also think it would be better form to return None, since that's the real answer to the question. The caller can choose to explode on None, if required.

#Paul's accepted answer is best, but here's a little lateral-thinking variant, mostly for amusement and instruction purposes...:
>>> class X(object):
... def __init__(self, pred): self.pred = pred
... def __eq__(self, other): return self.pred(other)
...
>>> l = [8,10,4,5,7]
>>> def is_odd(x): return x % 2 != 0
...
>>> l.index(X(is_odd))
3
essentially, X's purpose is to change the meaning of "equality" from the normal one to "satisfies this predicate", thereby allowing the use of predicates in all kinds of situations that are defined as checking for equality -- for example, it would also let you code, instead of if any(is_odd(x) for x in l):, the shorter if X(is_odd) in l:, and so forth.
Worth using? Not when a more explicit approach like that taken by #Paul is just as handy (especially when changed to use the new, shiny built-in next function rather than the older, less appropriate .next method, as I suggest in a comment to that answer), but there are other situations where it (or other variants of the idea "tweak the meaning of equality", and maybe other comparators and/or hashing) may be appropriate. Mostly, worth knowing about the idea, to avoid having to invent it from scratch one day;-).

Not one single function, but you can do it pretty easily:
>>> test = lambda c: c == 'x'
>>> data = ['a', 'b', 'c', 'x', 'y', 'z', 'x']
>>> map(test, data).index(True)
3
>>>
If you don't want to evaluate the entire list at once you can use itertools, but it's not as pretty:
>>> from itertools import imap, ifilter
>>> from operator import itemgetter
>>> test = lambda c: c == 'x'
>>> data = ['a', 'b', 'c', 'x', 'y', 'z']
>>> ifilter(itemgetter(1), enumerate(imap(test, data))).next()[0]
3
>>>
Just using a generator expression is probably more readable than itertools though.
Note in Python3, map and filter return lazy iterators and you can just use:
from operator import itemgetter
test = lambda c: c == 'x'
data = ['a', 'b', 'c', 'x', 'y', 'z']
next(filter(itemgetter(1), enumerate(map(test, data))))[0] # 3

A variation on Alex's answer. This avoids having to type X every time you want to use is_odd or whichever predicate
>>> class X(object):
... def __init__(self, pred): self.pred = pred
... def __eq__(self, other): return self.pred(other)
...
>>> L = [8,10,4,5,7]
>>> is_odd = X(lambda x: x%2 != 0)
>>> L.index(is_odd)
3
>>> less_than_six = X(lambda x: x<6)
>>> L.index(less_than_six)
2

you could do this with a list-comprehension:
l = [8,10,4,5,7]
filterl = [a for a in l if a % 2 != 0]
Then filterl will return all members of the list fulfilling the expression a % 2 != 0. I would say a more elegant method...

Intuitive one-liner solution:
i = list(map(lambda value: value > 0, data)).index(True)
Explanation:
we use map function to create a list containing True or False based on if each element in our list pass the condition in the lambda or not.
then we convert the map output to list
then using the index function, we get the index of the first true which is the same index of the first value passing the condition.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.