Following this question, we know that two different dictionaries, dict_1 and dict_2 for example, use the exact same hash function.
Is there any way to alter the hash function used by the dictionary?Negative answers also accepted!
You can't change the hash-function - the dict will call hash on the keys it's supposed to insert, and that's that.
However, you can wrap the keys to provide different __hash__ and __eq__-Methods.
class MyHash(object):
def __init__(self, v):
self._v = v
def __hash__(self):
return hash(self._v) * -1
def __eq__(self, other):
return self._v == other._v
If this actually helps anything with your original problem/question I doubt though, it seems rather a custom array/list-based data-structure might be the answer. Or not.
Here is a "hash table" on top of a list of lists, where each hash table object is associated with a particular hashing function.
class HashTable(object):
def __init__(self, hash_function, size=256):
self.hash_function = hash_function
self.buckets = [list() for i in range(size)]
self.size = size
def __getitem__(self, key):
hash_value = self.hash_function(key) % self.size
bucket = self.buckets[hash_value]
for stored_key, stored_value in bucket:
if stored_key == key:
return stored_value
raise KeyError(key)
def __setitem__(self, key, value):
hash_value = self.hash_function(key) % self.size
bucket = self.buckets[hash_value]
i = 0
found = False
for stored_key, stored_value in bucket:
if stored_key == key:
found = True
break
i += 1
if found:
bucket[i] = (key, value)
else:
bucket.append((key, value))
The rest of your application can still see the underlying list of buckets. Your application might require additional metadata to be associated with each bucket, but that would be as simple as defining a new class for the elements of the bucket list instead of a plain list.
I think what you want is a way to create buckets. Based on this I recommend collections.defaultdict with a set initializer as the "bucket" (depends on what you're using it for though).
Here is a sample:
#!/usr/bin/env python
from collections import defaultdict
from itertools import combinations
d = defaultdict(set)
strs = ["str", "abc", "rts"]
for s in strs:
d[hash(s)].add(s)
d[hash(''.join(reversed(s)))].add(s)
for combination in combinations(d.values(), r=2):
matches = combination[0] & combination[1]
if len(matches) > 1:
print matches
# output: set(['str', 'rts'])
Two strings ending up in the same buckets here are very likely the same. I've created a hash collision by using the reverse function and using a string and it's reverse as values.
Note that the set will use full comparison but should do it very fast.
Don't hash too many values without draining the sets.
Related
I solved a simple Leetcode problem (that finds two numbers in a list that sum to a target number) in Python and found it curious why when I tried to change only one line to access the hashMap my code stopped functioning because of a KeyError. But I don't understand why the 'correct' way doesn't generate a keyError.
The way that works
def twoSum(self, nums, target):
"""
:type nums: List[int]
:type target: int
:rtype: List[int]
"""
hashMap = {}
for ind, val in enumerate(nums):
busq = target - val
if busq in hashMap: #This line is the only thing that changes
return [hashMap[busq], ind]
hashMap[val] = ind
This doesn't work
def twoSum(self, nums, target):
hashMap = {}
for ind, val in enumerate(nums):
busq = target - val
if(hashMap[busq]): #This doesn't work
return [hashMap[busq], ind]
hashMap[val] = ind
You can't access a non existing key in a python dictionary by using square brackets notation like hashmap[busq], instead you should check for key existence before accessing it using in operator like in the first method.
You can also use if hashmap.has_key(busq): to check for key existence.
Maybe use something like this instead:
try:
return [hashMap[busq], ind]
except KeyError:
hashMap[val] = ind
When the key is not in the dict it will not return “False” but throw an error. So “if” will not work.
I'm having a hard time with python and finding differences between two lists.
CMDB list:
ABC:NL1:SB6
ABC:NL2:SB6
ABC:NL3:SB6
ABC:NL4:SB6
NL9:SB9
NL5:SB4
NL6:SB7
DB list:
NL1:SB6
NL2:SB6
ABC:NL3:SB6
ABC:NL4:SB6
ABC:NL8:SB8
ABC:NL5:SB4
ABC:NL6:SB7
I would like to get output that finds differences:
NL9:SB9
ABC:NL8:SB8
I have tried
cmdb_fin = set(cmdb)
db_fin = set(db)
equal = db_fin.symmetric_difference(cmdb_fin)
but the output is like following because it compares exact strings to each other, not like "patterns"
ABC:NL5:SB4
NL6:SB7
ABC:NL2:SB6
NL2:SB6
ABC:NL8:SB8
NL5:SB4
ABC:NL6:SB7
NL9:SB9
ABC:NL1:SB6
NL1:SB6
Is there any way to get expected by me output?
criteria:
if any given string (block of chars) in CMDB list exists in DB list (it can be only part of a string), it should not be in output as it kinda exists in both lists. And of course in other way -> DB compared to CMD
for example NL5:SB4 from CMDB list matches ABC:NL5:SB4 from DB
In order to define a custom equality comparator when using python sets, you need to define a custom class with __eq__, __ne__, & __hash__ defined. Below is an example of how this could be achieved in your case, using the last two elements in each line to define whether two elements are equivalent.
Code:
class Line(object):
def __init__(self, s):
self.s = s
self.key = ':'.join(s.split(':')[-2:])
def __repr__(self):
return self.s
def __eq__(self, other):
if isinstance(other, Line):
return ((self.key == other.key))
else:
return False
def __ne__(self, other):
return (not self.__eq__(other))
def __hash__(self):
return hash(self.key)
cmdb = ['ABC:NL1:SB6', 'ABC:NL2:SB6', 'ABC:NL3:SB6', 'ABC:NL4:SB6', 'NL9:SB9',
'NL5:SB4', 'NL6:SB7']
db = ['NL1:SB6', 'NL2:SB6', 'ABC:NL3:SB6', 'ABC:NL4:SB6', 'ABC:NL8:SB8',
'ABC:NL5:SB4', 'ABC:NL6:SB7']
cmdb_fin = set(Line(l) for l in cmdb)
db_fin = set(Line(l) for l in db)
equal = db_fin.symmetric_difference(cmdb_fin)
Output:
>>> equal
{ABC:NL8:SB8, NL9:SB9}
Usage:
>>> Line('NL5:SB4') == Line('ABC:NL5:SB4')
True
Given a basic class Item:
class Item(object):
def __init__(self, val):
self.val = val
a list of objects of this class (the number of items can be much larger):
items = [ Item(0), Item(11), Item(25), Item(16), Item(31) ]
and a function compute that process and return a value.
How to find two items of this list for which the function compute return the same value when using the attribute val? If nothing is found, an exception should be raised. If there are more than two items that match, simple return any two of them.
For example, let's define compute:
def compute( x ):
return x % 10
The excepted pair would be: (Item(11), Item(31)).
You can check the length of the set of resulting values:
class Item(object):
def __init__(self, val):
self.val = val
def __repr__(self):
return f'Item({self.val})'
def compute(x):
return x%10
items = [ Item(0), Item(11), Item(25), Item(16), Item(31)]
c = list(map(lambda x:compute(x.val), items))
if len(set(c)) == len(c): #no two or more equal values exist in the list
raise Exception("All elements have unique computational results")
To find values with similar computational results, a dictionary can be used:
from collections import Counter
new_d = {i:compute(i.val) for i in items}
d = Counter(new_d.values())
multiple = [a for a, b in new_d.items() if d[b] > 1]
Output:
[Item(11), Item(31)]
A slightly more efficient way to find if multiple objects of the same computational value exist is to use any, requiring a single pass over the Counter object, whereas using a set with len requires several iterations:
if all(b == 1 for b in d.values()):
raise Exception("All elements have unique computational results")
Assuming the values returned by compute are hashable (e.g., float values), you can use a dict to store results.
And you don't need to do anything fancy, like a multidict storing all items that produce a result. As soon as you see a duplicate, you're done. Besides being simpler, this also means we short-circuit the search as soon as we find a match, without even calling compute on the rest of the elements.
def find_pair(items, compute):
results = {}
for item in items:
result = compute(item.val)
if result in results:
return results[result], item
results[result] = item
raise ValueError('No pair of items')
A dictionary val_to_it that contains Items keyed by computed val can be used:
val_to_it = {}
for it in items:
computed_val = compute(it.val)
# Check if an Item in val_to_it has the same computed val
dict_it = val_to_it.get(computed_val)
if dict_it is None:
# If not, add it to val_to_it so it can be referred to
val_to_it[computed_val] = it
else:
# We found the two elements!
res = [dict_it, it]
break
else:
raise Exception( "Can't find two items" )
The for block can be rewrite to handle n number of elements:
for it in items:
computed_val = compute(it.val)
dict_lit = val_to_it.get(computed_val)
if dict_lit is None:
val_to_it[computed_val] = [it]
else:
dict_lit.append(it)
# Check if we have the expected number of elements
if len(dict_lit) == n:
# Found n elements!
res = dict_lit
break
My hash codes returns only the whole title of the word.
I want to make it to show the results with only using keywords
for at least 2 word (onwards) then show the results (get function).
My hash code
class hashin:
def __init__(self):
self.size = 217 # size of hash table
self.map = [None] * self.size
def _get_hash(self, key):
hash = 0
for char in str(key):
hash += ord(char)
return hash % self.size
#returns the ASCII value of char in str(key)
def add(self, key, value): # add item to list
key_hash = self._get_hash(key)
key_value = [key, value]
if self.map[key_hash] is None:
self.map[key_hash] = list([key_value])
return True
else:
for pair in self.map[key_hash]:
if pair[0] == key:
pair[1] = value
return True
self.map[key_hash].append(key_value)
return True
def get(self, key): # search for item
key_hash = self._get_hash(key)
if self.map[key_hash] is not None:
for pair in self.map[key_hash]: # find pair of words
if pair[0] == key: # if pair is equals to the whole title of the word
return pair[0] + " - " + pair[1]
return "Error no results for %s \nEnter the correct word." % (key)
sample outputs:
when whole title was typed
When keyword was typed (i need to show the results even when keyword was typed)
What i need is :
Output:
Cheater - Kygos
and the other words with chea in their name
A hash table isn't the right data structure for this task. The purpose of a hash value is to narrow the search to a small subset of the possibilities. Since the hash value is dependent on the entire string, using just a portion of the string will give the wrong subset.
A better data structure for this task is a trie (sometimes called a "prefix tree"). While it is not difficult to write this data structure on your own, there are already many tested, ready-to-use modules already available on PyPI.
See:
https://pypi.python.org/pypi?%3Aaction=search&term=trie&submit=search
I'm doing an exercise from a programming book.
I've writen a code but some steps i can't understand.
This code i've created is a module called hashmap:
def new(num_buckets=256):
"""Initializes a Map with the given number of buckets."""
aMap = []
for i in range(0, num_buckets):
aMap.append([])
return aMap
def hash_key(aMap, key):
"""Given a key this will create a number and then convert it to
an index for the aMap's buckets."""
return hash(key) % len(aMap)
def get_bucket(aMap, key):
"""Given a key, find the bucket where it would go."""
bucket_id = hash_key(aMap, key)
return aMap[bucket_id]
def get_slot(aMap, key, default=None):
"""
Returns the index, key, and value of a slot found in a bucket.
Returns -1, key, and default (None if not set) when not found.
"""
bucket = get_bucket(aMap, key)
for i, kv in enumerate(bucket):
k, v = kv
if key == k:
return i, k, v
return -1, key, default
def get(aMap, key, default=None):
"""Gets the value in a bucket for the given key, or the default."""
i, k, v = get_slot(aMap, key, default=default)
return v
def set(aMap, key, value):
"""Sets the key to the value, replacing any existing value."""
bucket = get_bucket(aMap, key)
i, k, v = get_slot(aMap, key)
if i >= 0:
# the key exists, replace it
bucket[i] = (key, value)
else:
# the key does not, append to create it
bucket.append((key, value))
def delete(aMap, key):
"""Deletes the given key from the Map."""
bucket = get_bucket(aMap, key)
for i in xrange(len(bucket)):
k, v = bucket[i]
if key == k:
del bucket[i]
break
def list(aMap):
"""Prints out what's in the Map."""
for bucket in aMap:
if bucket:
for k, v in bucket:
print k, v
1) Why there is a keyword as a parameter in the function new(num_buckets=256)?
What if i set num_buckets as a variable and 256 as a value in the middle of the function? Does it matter where to set it?
def new():
"""Initializes a Map with the given number of buckets."""
aMap = []
num_buckets = 256 # <--- this line
for i in range(0, num_buckets):
aMap.append([])
return aMap
2) Why the size of the aMap is 256? Is it on purpose or just an accidental number?
3) What's the sense of hash_key(aMap, key) function?
This way doesn't guarantee that the key will be in the bucket with a "remainder-index".
For example.
aMap = [[(9, 'nine')], [(10, 'ten')], [11, 'eleven']]
After running the function hash_key, the "remainder-index" will be 1. But key 10 isn't in the first bucket.
I'm new in Python. I hope for your help.
It's just a default parameter. It's bound once when the code executes and allows the function's user to not set it explicitly. If you set it in your code the user has to set it when calling the function.
256 is just a number. It may fit in memory nicely so that's why it was picked. I remember Java uses 2^n sizes for HashMap's backing buckets as well but don't take my word for it.
I don't really understand your example. You use the hash key when inserting into and when retrieving from the map - just to get the right bucket. You then compare as you would to a list (since the buckets are actually lists).
(1) & (2)
This is an optional input parameter, a guess at an appropriate number of buckets for general use. If "new" is called with no argument, then num_buckets will be 256. If an argument is supplied, then num_buckets will take on that value.
(3) I think you may have a little confusion about hashing. The purpose of a hash function is to provide an integer that encodes the key. The hash values should spread the set of keys throughout the given integer range. For example, "nine" might map to 12; "ten" might map to 301. The latter would be converted to 45 (301 % 256).
According to the data you presented, "ten" will map to key 10, not 1. Can you explain how you got 1 for the remainder-index?