How can i solve some questions in a hashmap module? - python

I'm doing an exercise from a programming book.
I've writen a code but some steps i can't understand.
This code i've created is a module called hashmap:
def new(num_buckets=256):
"""Initializes a Map with the given number of buckets."""
aMap = []
for i in range(0, num_buckets):
aMap.append([])
return aMap
def hash_key(aMap, key):
"""Given a key this will create a number and then convert it to
an index for the aMap's buckets."""
return hash(key) % len(aMap)
def get_bucket(aMap, key):
"""Given a key, find the bucket where it would go."""
bucket_id = hash_key(aMap, key)
return aMap[bucket_id]
def get_slot(aMap, key, default=None):
"""
Returns the index, key, and value of a slot found in a bucket.
Returns -1, key, and default (None if not set) when not found.
"""
bucket = get_bucket(aMap, key)
for i, kv in enumerate(bucket):
k, v = kv
if key == k:
return i, k, v
return -1, key, default
def get(aMap, key, default=None):
"""Gets the value in a bucket for the given key, or the default."""
i, k, v = get_slot(aMap, key, default=default)
return v
def set(aMap, key, value):
"""Sets the key to the value, replacing any existing value."""
bucket = get_bucket(aMap, key)
i, k, v = get_slot(aMap, key)
if i >= 0:
# the key exists, replace it
bucket[i] = (key, value)
else:
# the key does not, append to create it
bucket.append((key, value))
def delete(aMap, key):
"""Deletes the given key from the Map."""
bucket = get_bucket(aMap, key)
for i in xrange(len(bucket)):
k, v = bucket[i]
if key == k:
del bucket[i]
break
def list(aMap):
"""Prints out what's in the Map."""
for bucket in aMap:
if bucket:
for k, v in bucket:
print k, v
1) Why there is a keyword as a parameter in the function new(num_buckets=256)?
What if i set num_buckets as a variable and 256 as a value in the middle of the function? Does it matter where to set it?
def new():
"""Initializes a Map with the given number of buckets."""
aMap = []
num_buckets = 256 # <--- this line
for i in range(0, num_buckets):
aMap.append([])
return aMap
2) Why the size of the aMap is 256? Is it on purpose or just an accidental number?
3) What's the sense of hash_key(aMap, key) function?
This way doesn't guarantee that the key will be in the bucket with a "remainder-index".
For example.
aMap = [[(9, 'nine')], [(10, 'ten')], [11, 'eleven']]
After running the function hash_key, the "remainder-index" will be 1. But key 10 isn't in the first bucket.
I'm new in Python. I hope for your help.

It's just a default parameter. It's bound once when the code executes and allows the function's user to not set it explicitly. If you set it in your code the user has to set it when calling the function.
256 is just a number. It may fit in memory nicely so that's why it was picked. I remember Java uses 2^n sizes for HashMap's backing buckets as well but don't take my word for it.
I don't really understand your example. You use the hash key when inserting into and when retrieving from the map - just to get the right bucket. You then compare as you would to a list (since the buckets are actually lists).

(1) & (2)
This is an optional input parameter, a guess at an appropriate number of buckets for general use. If "new" is called with no argument, then num_buckets will be 256. If an argument is supplied, then num_buckets will take on that value.
(3) I think you may have a little confusion about hashing. The purpose of a hash function is to provide an integer that encodes the key. The hash values should spread the set of keys throughout the given integer range. For example, "nine" might map to 12; "ten" might map to 301. The latter would be converted to 45 (301 % 256).
According to the data you presented, "ten" will map to key 10, not 1. Can you explain how you got 1 for the remainder-index?

Related

Append to a list in Python [duplicate]

I have some code that prints data from a global dictionary named cal:
def show_todo():
for key, value in cal.items():
print(value[0], key)
However, I want to use this code as part of a Discord bot. In order for the bot to work properly, I need to return the data to another function that will actually send the message to the Discord chat. Using print like above means that the message is displayed in my local console window, and the chat just sees None.
I tried to fix it by using return instead:
def show_todo():
for key, value in cal.items():
return(value[0], key)
but this way, the for loop does not work properly. I only get at most one key-value pair from the dictionary.
How can I fix this so that all of the data is returned?
Using a return inside of a loop will break it and exit the function even if the iteration is still not finished.
For example:
def num():
# Here there will be only one iteration
# For number == 1 => 1 % 2 = 1
# So, break the loop and return the number
for number in range(1, 10):
if number % 2:
return number
>>> num()
1
In some cases we need to break the loop if some conditions are met. However, in your current code, breaking the loop before finishing it is unintentional.
Instead of that, you can use a different approach:
Yielding your data
def show_todo():
# Create a generator
for key, value in cal.items():
yield value[0], key
You can call it like:
a = list(show_todo()) # or tuple(show_todo())
or you can iterate through it:
for v, k in show_todo(): ...
Putting your data into a list or other container
Append your data to a list, then return it after the end of your loop:
def show_todo():
my_list = []
for key, value in cal.items():
my_list.append((value[0], key))
return my_list
Or use a list comprehension:
def show_todo():
return [(value[0], key) for key, value in cal.items()]
Use a generator syntax (excellent explanation on SO here):
def show_todo():
for key, value in cal.items():
yield value[0], key
for value, key in show_todo():
print(value, key)

function that looks up keys in a dictionary until there is no more associated values

I need help creating a function that goes through a given dictionary. The value associated with that key may be another key to the dictionary. i need the function to keep looking up the keys until it reaches a key that has no associated value.
def follow_me(d, s):
while d:
if s in d:
return d[s]
I can return the value in the dictionary that s equals to but I've no idea how to iterate through it until I get a value that has no associated value. So I can get the value that badger is doe, but how do I iterate through the dictionary until I get fox and then fox to hen etc.
d = {'badger':'doe', 'doe':'fox', 'fox':'hen','hen':'flea',
'sparrow':'spider', 'zebra':'lion', 'lion':'zebra'}
print(follow_me(d, 'badger'))
print(follow_me(d, 'fox'))
print(follow_me(d, 'sparrow'))
print(follow_me(d, 'zebra'))
print(follow_me(d, 'aardvark'))
and this is what I currently have of the function that makes sense to me because everything else I've tried is just wrong.
def follow_me(d, s):
while d:
if s in d:
return d[s]
and the output needs to be:
flea
flea
spider
aardvark
but my code right now is producing:
doe
hen
spider
lion
To extend on the other answers, which are still valid. If you have a very large dictionary then using key not in dic.keys() or k in d iterates through all keys every loop.
To go around this, one can use a try catch:
def follow_me(dic, key):
while True:
if key not in dic.keys():
return key
key = dic[key]
def follow_me2(dic, key):
try:
while True:
key = dic[key]
except Exception as e:
return key
import time
d = { i: (i+1) for i in range(10000000) }
start = time.time()
follow_me(d, 0)
print("Using 'in' takes", time.time() - start,"s")
start = time.time()
follow_me2(d, 0)
print("Using 'try' takes", time.time() - start,"s")
gives the output:
Using 'in' takes 2.476428747177124 s
Using 'try' takes 0.9100546836853027 s
I think this is what you are looking for, though your problem description is very unclear:
def follow_me(d, k):
while k in d:
k = d[k]
return k
Note that the loop in this function will run forever if there is a cycle between keys and values in your dictionary. Your example has one between 'lion' and 'zebra', and it's not entirely clear how you intend such a cycle to be broken. If you want to expand each key only once, you could handle it by keeping track of the values you've seen so far in a set:
def follow_me(d, k):
seen = set()
while k in d and k not in seen:
seen.add(k)
k = d[k]
return k
This will return whichever key in the cycle you reach first (so follow_me(d, 'zebra') with your example dictionary will return 'zebra' after going zebra => lion => zebra). If you want some other outcome, you'd need different logic and it might be tricky to do.
If you request a key that's not in the dictionary (like 'aardvark' in your example), the requested key will be returned immediately. You could add special handling for the first key you look up, but it would again make things more complicated.
Considering the existence of infinite loops this has to be handled. Your description isn't clear about what should happen in this case.
def follow_me(d, key):
visited_keys = []
while key not in visited_keys and d[key]:
visited_keys.append(key)
key = d[key]
if not d[key]:
return key
return "this hunt has no end"

Negatively updating a Python dict [NOT "key"]

I am looking for a way to update/access a Python dictionary by addressing all keys that do NOT match the key given.
That is, instead of the usual dict[key], I want to do something like dict[!key]. I found a workaround, but figured there must be a better way which I cannot figure out at the moment.
# I have a dictionary of counts
dicti = {"male": 1, "female": 200, "other": 0}
# Problem: I encounter a record (cannot reproduce here) that
# requires me to add 1 to every key in dicti that is NOT "male",
# i.e. dicti["female"], and dicti["other"],
# and other keys I might add later
# Here is what I am doing and I don't like it
dicti.update({k: v + 1 for k,v in dicti.items() if k != "male"})
dicti.update({k: v + 1 for k,v in dicti.items() if k != "male"})
that creates a sub-dictionary (hashing, memory overhead) then passes it to the old dictionary: more hashing/ref copy.
Why not a good old loop on the keys (since the values aren't mutable):
for k in dicti:
if k != "male":
dicti[k] += 1
Maybe faster if there are a lot of keys and only one key to avoid: add to all the keys, and cancel the operation on the one key you want to avoid (saves a lot of string comparing):
for k in dicti:
dicti[k] += 1
dicti["male"] -= 1
if the values were mutable (ex: lists) we would avoid one hashing and mutate the value instead:
for k,v in dicti.items():
if k != "male":
v.append("something")
One-liners are cool, but sometimes it's better to avoid them (performance & readability in that case)
If you have to perform this "add to others" operation more often, and if all the values are numeric, you could also subtract from the given key and add the same value to some global variable counting towards all the values (including that same key). For example, as a wrapper class:
import collections
class Wrapper:
def __init__(self, **values):
self.d = collections.Counter(values)
self.n = 0
def add(self, key, value):
self.d[key] += value
def add_others(self, key, value):
self.d[key] -= value
self.n += value
def get(self, key):
return self.d[key] + self.n
def to_dict(self):
if self.n != 0: # recompute dict and reset global offset
self.d = {k: v + self.n for k, v in self.d.items()}
self.n = 0
return self.d
Example:
>>> dicti = Wrapper(**{"male": 1, "female": 200, "other": 0})
>>> dicti.add("male", 2)
>>> dicti.add_others("male", 5)
>>> dicti.get("male")
3
>>> dicti.to_dict()
{'other': 5, 'female': 205, 'male': 3}
The advantage is that both the add and the add_others operation are O(1) and only when you actually need them, you update the values with the global offset. Of course, the to_dict operation still is O(n), but the updated dict can be saved and only recomputed when add_other has been called again in between.

My lempel zip implementation makes encoding longer

I can't work out why my implementation is creating a longer string than the input.
It is implemented according to the description in this document and only this description.
It is simply designed to act on binary strings only. If anyone can shed some light on why this creates a longer string than it started with I'd be very greatful!
Main Encoding
def LZ_encode(uncompressed):
m=uncompressed
dictionary=dict_gen(m)
list=[int(bin(i)[2:]) for i in range(1,len(dictionary))]
pointer_bit=[]
for k in list:
pointer_bit=pointer_bit+[(str(chopped_lookup(k,dictionary)),dictionary[k][-1])]
new_pointer_bit=pointer_length_correct(pointer_bit)
list_output=[i for sub in new_pointer_bit for i in sub]
if list_output[-1]=='$':
output=''.join(list_output[:-1])
else:
output=''.join(list_output)
return output
Component Functions
def dict_gen(m): # Generates Dictionary
dictionary={0:""}
j=1
w=""
iterator=0
l=len(m)
for c in m:
iterator+=1
wc= str(str(w) + str(c))
if wc in dictionary.values():
w=wc
if iterator==l:
dictionary.update({int(bin(j)[2:]): wc+'$'})
else:
dictionary.update({int(bin(j)[2:]): wc})
w=""
j+=1
return dictionary
def chopped_lookup(k,dictionary): # Returns entry number of shortened source string
cut_source_string=dictionary[k][:-1]
for key, value in dictionary.iteritems():
if value == cut_source_string:
return key
def pointer_length_correct(lst): # Takes the (pointer,bit) list and corrects the lenth of the pointer
new_pointer_bit=[]
for pair in lst:
n=lst.index(pair)
if len(str(pair[0]))>ceil(log(n+1,2)):
while len(str(pair[0]))!=ceil(log(n+1,2)):
pair = (str(pair[0])[1:],pair[1])
if len(str(pair[0]))<ceil(log(n+1,2)):
while len(str(pair[0]))!=ceil(log(n+1,2)):
pair = (str('0'+str(pair[0])),pair[1])
new_pointer_bit=new_pointer_bit+[pair]
return new_pointer_bit

Alter the hash function of a dictionary

Following this question, we know that two different dictionaries, dict_1 and dict_2 for example, use the exact same hash function.
Is there any way to alter the hash function used by the dictionary?Negative answers also accepted!
You can't change the hash-function - the dict will call hash on the keys it's supposed to insert, and that's that.
However, you can wrap the keys to provide different __hash__ and __eq__-Methods.
class MyHash(object):
def __init__(self, v):
self._v = v
def __hash__(self):
return hash(self._v) * -1
def __eq__(self, other):
return self._v == other._v
If this actually helps anything with your original problem/question I doubt though, it seems rather a custom array/list-based data-structure might be the answer. Or not.
Here is a "hash table" on top of a list of lists, where each hash table object is associated with a particular hashing function.
class HashTable(object):
def __init__(self, hash_function, size=256):
self.hash_function = hash_function
self.buckets = [list() for i in range(size)]
self.size = size
def __getitem__(self, key):
hash_value = self.hash_function(key) % self.size
bucket = self.buckets[hash_value]
for stored_key, stored_value in bucket:
if stored_key == key:
return stored_value
raise KeyError(key)
def __setitem__(self, key, value):
hash_value = self.hash_function(key) % self.size
bucket = self.buckets[hash_value]
i = 0
found = False
for stored_key, stored_value in bucket:
if stored_key == key:
found = True
break
i += 1
if found:
bucket[i] = (key, value)
else:
bucket.append((key, value))
The rest of your application can still see the underlying list of buckets. Your application might require additional metadata to be associated with each bucket, but that would be as simple as defining a new class for the elements of the bucket list instead of a plain list.
I think what you want is a way to create buckets. Based on this I recommend collections.defaultdict with a set initializer as the "bucket" (depends on what you're using it for though).
Here is a sample:
#!/usr/bin/env python
from collections import defaultdict
from itertools import combinations
d = defaultdict(set)
strs = ["str", "abc", "rts"]
for s in strs:
d[hash(s)].add(s)
d[hash(''.join(reversed(s)))].add(s)
for combination in combinations(d.values(), r=2):
matches = combination[0] & combination[1]
if len(matches) > 1:
print matches
# output: set(['str', 'rts'])
Two strings ending up in the same buckets here are very likely the same. I've created a hash collision by using the reverse function and using a string and it's reverse as values.
Note that the set will use full comparison but should do it very fast.
Don't hash too many values without draining the sets.

Categories