My lempel zip implementation makes encoding longer - python

I can't work out why my implementation is creating a longer string than the input.
It is implemented according to the description in this document and only this description.
It is simply designed to act on binary strings only. If anyone can shed some light on why this creates a longer string than it started with I'd be very greatful!
Main Encoding
def LZ_encode(uncompressed):
m=uncompressed
dictionary=dict_gen(m)
list=[int(bin(i)[2:]) for i in range(1,len(dictionary))]
pointer_bit=[]
for k in list:
pointer_bit=pointer_bit+[(str(chopped_lookup(k,dictionary)),dictionary[k][-1])]
new_pointer_bit=pointer_length_correct(pointer_bit)
list_output=[i for sub in new_pointer_bit for i in sub]
if list_output[-1]=='$':
output=''.join(list_output[:-1])
else:
output=''.join(list_output)
return output
Component Functions
def dict_gen(m): # Generates Dictionary
dictionary={0:""}
j=1
w=""
iterator=0
l=len(m)
for c in m:
iterator+=1
wc= str(str(w) + str(c))
if wc in dictionary.values():
w=wc
if iterator==l:
dictionary.update({int(bin(j)[2:]): wc+'$'})
else:
dictionary.update({int(bin(j)[2:]): wc})
w=""
j+=1
return dictionary
def chopped_lookup(k,dictionary): # Returns entry number of shortened source string
cut_source_string=dictionary[k][:-1]
for key, value in dictionary.iteritems():
if value == cut_source_string:
return key
def pointer_length_correct(lst): # Takes the (pointer,bit) list and corrects the lenth of the pointer
new_pointer_bit=[]
for pair in lst:
n=lst.index(pair)
if len(str(pair[0]))>ceil(log(n+1,2)):
while len(str(pair[0]))!=ceil(log(n+1,2)):
pair = (str(pair[0])[1:],pair[1])
if len(str(pair[0]))<ceil(log(n+1,2)):
while len(str(pair[0]))!=ceil(log(n+1,2)):
pair = (str('0'+str(pair[0])),pair[1])
new_pointer_bit=new_pointer_bit+[pair]
return new_pointer_bit

Related

function that looks up keys in a dictionary until there is no more associated values

I need help creating a function that goes through a given dictionary. The value associated with that key may be another key to the dictionary. i need the function to keep looking up the keys until it reaches a key that has no associated value.
def follow_me(d, s):
while d:
if s in d:
return d[s]
I can return the value in the dictionary that s equals to but I've no idea how to iterate through it until I get a value that has no associated value. So I can get the value that badger is doe, but how do I iterate through the dictionary until I get fox and then fox to hen etc.
d = {'badger':'doe', 'doe':'fox', 'fox':'hen','hen':'flea',
'sparrow':'spider', 'zebra':'lion', 'lion':'zebra'}
print(follow_me(d, 'badger'))
print(follow_me(d, 'fox'))
print(follow_me(d, 'sparrow'))
print(follow_me(d, 'zebra'))
print(follow_me(d, 'aardvark'))
and this is what I currently have of the function that makes sense to me because everything else I've tried is just wrong.
def follow_me(d, s):
while d:
if s in d:
return d[s]
and the output needs to be:
flea
flea
spider
aardvark
but my code right now is producing:
doe
hen
spider
lion
To extend on the other answers, which are still valid. If you have a very large dictionary then using key not in dic.keys() or k in d iterates through all keys every loop.
To go around this, one can use a try catch:
def follow_me(dic, key):
while True:
if key not in dic.keys():
return key
key = dic[key]
def follow_me2(dic, key):
try:
while True:
key = dic[key]
except Exception as e:
return key
import time
d = { i: (i+1) for i in range(10000000) }
start = time.time()
follow_me(d, 0)
print("Using 'in' takes", time.time() - start,"s")
start = time.time()
follow_me2(d, 0)
print("Using 'try' takes", time.time() - start,"s")
gives the output:
Using 'in' takes 2.476428747177124 s
Using 'try' takes 0.9100546836853027 s
I think this is what you are looking for, though your problem description is very unclear:
def follow_me(d, k):
while k in d:
k = d[k]
return k
Note that the loop in this function will run forever if there is a cycle between keys and values in your dictionary. Your example has one between 'lion' and 'zebra', and it's not entirely clear how you intend such a cycle to be broken. If you want to expand each key only once, you could handle it by keeping track of the values you've seen so far in a set:
def follow_me(d, k):
seen = set()
while k in d and k not in seen:
seen.add(k)
k = d[k]
return k
This will return whichever key in the cycle you reach first (so follow_me(d, 'zebra') with your example dictionary will return 'zebra' after going zebra => lion => zebra). If you want some other outcome, you'd need different logic and it might be tricky to do.
If you request a key that's not in the dictionary (like 'aardvark' in your example), the requested key will be returned immediately. You could add special handling for the first key you look up, but it would again make things more complicated.
Considering the existence of infinite loops this has to be handled. Your description isn't clear about what should happen in this case.
def follow_me(d, key):
visited_keys = []
while key not in visited_keys and d[key]:
visited_keys.append(key)
key = d[key]
if not d[key]:
return key
return "this hunt has no end"

Trying to understand the function of reduce in python

I recently received an answer from the stackoverflow fellow on my previous question and I tried to inquire more in order to understand the function but somehow no response so I wish to ask it here.
I wanted to know what is the k and v that used in the lambda represent? I thought it was representing like this......
k = dictionary ?
v = string ? # Did I understand it correctly?
dictionary = {"test":"1", "card":"2"}
string = "There istest at the cardboards"
from functools import reduce
res = reduce(lambda k, v: k.replace(v, dictionary[v]), dictionary, string)
since we use lambda then it loop each of the element within both of these variables. But why k.replace? Isnt that a dictionary? Should It be v.replace? Somehow this method works. I wish someone could explain to me how this work and please more details if possible. Thank you!
reduce is equivalent to repeatedly calling a function.
The function in this case is a lambda, but a lambda is just an anonymous function:
def f(k, v):
return k.replace(v, dictionary[v])
The definition of reduce itself is (almost—the None default here is not quite right, nor the len test):
def reduce(func, seq, initial=None):
if initial is not None:
ret = initial
for i in seq:
ret = func(ret, i)
return ret
# initial not supplied, so sequence must be non-empty
if len(seq) == 0:
raise TypeError("reduce() of empty sequence with no initial value")
first = True
for i in seq:
if first:
ret = i
first = False
else:
ret = func(ret, i)
return ret
So, ask yourself what this would do when called on your lambda function. The:
for i in dictionary
loop will iterate over each key in the dictionary. It will pass that key, along with the stored ret (or the initial argument for the first call), to your function. So you'll get each key, plus the string value that's initially "There istest at the cardboards", as your v (key from dictionary, called i in the expansion of reduce) and k (long string, called ret in the expansion of reduce) arguments.
Note that k is the full text string, not the string used as the key in the dictionary, while v is the word that is the key in the dictionary. I've used the variable names k and v here only because you did too. As noted in a comment, text and word might be better variable names in either the expanded def f(...) or the original lambda function.
Trace your code execution
Try the same code, except that instead of just:
def f(k, v):
return k.replace(v, dictionary[v])
you write it as:
def f(text, word):
print("f(text={!r}, word={!r})".format(text, word))
replacement = dictionary[word]
print(" I will now replace {!r} with {!r}".format(word, replacement))
result = text.replace(word, replacement)
print(" I got: {!r}".format(result))
return result
Run the functools.reduce function over function f with dictionary and string as the other two arguments and observe the output.

Search for keyword instead of whole word - py

My hash codes returns only the whole title of the word.
I want to make it to show the results with only using keywords
for at least 2 word (onwards) then show the results (get function).
My hash code
class hashin:
def __init__(self):
self.size = 217 # size of hash table
self.map = [None] * self.size
def _get_hash(self, key):
hash = 0
for char in str(key):
hash += ord(char)
return hash % self.size
#returns the ASCII value of char in str(key)
def add(self, key, value): # add item to list
key_hash = self._get_hash(key)
key_value = [key, value]
if self.map[key_hash] is None:
self.map[key_hash] = list([key_value])
return True
else:
for pair in self.map[key_hash]:
if pair[0] == key:
pair[1] = value
return True
self.map[key_hash].append(key_value)
return True
def get(self, key): # search for item
key_hash = self._get_hash(key)
if self.map[key_hash] is not None:
for pair in self.map[key_hash]: # find pair of words
if pair[0] == key: # if pair is equals to the whole title of the word
return pair[0] + " - " + pair[1]
return "Error no results for %s \nEnter the correct word." % (key)
sample outputs:
when whole title was typed
When keyword was typed (i need to show the results even when keyword was typed)
What i need is :
Output:
Cheater - Kygos
and the other words with chea in their name
A hash table isn't the right data structure for this task. The purpose of a hash value is to narrow the search to a small subset of the possibilities. Since the hash value is dependent on the entire string, using just a portion of the string will give the wrong subset.
A better data structure for this task is a trie (sometimes called a "prefix tree"). While it is not difficult to write this data structure on your own, there are already many tested, ready-to-use modules already available on PyPI.
See:
https://pypi.python.org/pypi?%3Aaction=search&term=trie&submit=search

Recursively Generating a List of n choose k combinations in Python - BUT return a list

I'm attempting to generate all n choose k combinations of a list (not checking for uniqueness) recursively by following the strategy of either include or not include an element for each recursive call. I can definitely print out the combinations but I for the life of me cannot figure out how to return the correct list in Python. Here are some attempts below:
class getCombinationsClass:
def __init__(self,array,k):
#initialize empty array
self.new_array = []
for i in xrange(k):
self.new_array.append(0)
self.final = []
self.combinationUtil(array,0,self.new_array,0,k)
def combinationUtil(self,array,array_index,current_combo, current_combo_index,k):
if current_combo_index == k:
self.final.append(current_combo)
return
if array_index >= len(array):
return
current_combo[current_combo_index] = array[array_index]
#if current item included
self.combinationUtil(array,array_index+1,current_combo,current_combo_index+1,k)
#if current item not included
self.combinationUtil(array,array_index+1,current_combo,current_combo_index,k)
In the above example I tried to append the result to an external list which didn't seem to work. I also tried implementing this by recursively constructing a list which is finally returned:
def getCombinations(array,k):
#initialize empty array
new_array = []
for i in xrange(k):
new_array.append(0)
return getCombinationsUtil(array,0,new_array,0,k)
def getCombinationsUtil(array,array_index,current_combo, current_combo_index,k):
if current_combo_index == k:
return [current_combo]
if array_index >= len(array):
return []
current_combo[current_combo_index] = array[array_index]
#if current item included & not included
return getCombinationsUtil(array,array_index+1,current_combo,current_combo_index+1,k) + getCombinationsUtil(array,array_index+1,current_combo,current_combo_index,k)
When I tested this out for the list [1,2,3] and k = 2, for both implementations, I kept getting back the result [[3,3],[3,3],[3,3]]. However, if I actually print out the 'current_combo' variable within the inner (current_combo_index == k) if statement, the correct combinations print out. What gives? I am misunderstanding something to do with variable scope or Python lists?
The second method goes wrong because the line
return [current_combo]
returns a reference to current_combo. At the end of the program, all the combinations returned are references to the same current_combo.
You can fix this by making a copy of the current_combo by changing the line to:
return [current_combo[:]]
The first method fails for the same reason, you need to change:
self.final.append(current_combo)
to
self.final.append(current_combo[:])
Check this out: itertools.combinations. You can take a look at the implementation as well.

Finding keys for a value in a dictionary of integers

The problem is to write a Python function that returns a list of keys in aDict with the value target. All keys and values in the dictionary are integers and the keys in the list we return must be in increasing order.
This is the work I have so far:
def keysWithValue(aDict, target):
'''
aDict: a dictionary
target: an integer
'''
ans = []
if target not in aDict.values():
return ans
else:
for key in aDict.keys():
if target in aDict[key]:
ans+=[key]
return ans.sort()
I keep on getting:
"TypeError: argument of type 'int' is not iterable"
but I don't really understand what that means, and how to fix it. If anyone could help, I'd be really grateful!
The issue is here
if target in aDict[key]:
You are trying to iterate over an integer value, which wont work.
You should instead use
if target == aDict[key]:
You can refactor your code like this. I have made an assumption about what your input data looks like. If I'm wrong I can adjust my answer.
d = {1:1, 2:2, 3:3, 4:3, 5:3}
def keysWithValue(aDict, target):
ans = []
#for k, v in aDict.iteritems():
for k, v in aDict.items():
if v == target:
ans.append(k)
return sorted(ans)
print(keysWithValue(d, 3))
The commented line is what should be used for python 2.x instead of the line below it.
Your aDict[key] is an int and therefor is not iterable, ie you cannot use if target in aDict[key]: you will have to change the values to strings.
if str(target) in str(aDict[key])
or
if target == aDict[key]
Although the second option will have to be an exact match. It is unclear on what you a specifically after
When you are doing
if target in aDict[key]
You are basically checking if target is in some sort of array, but aDict[key] is not an array, it is an integer! You want to check if target is equal to aDict[key], not if it is contained by aDict[key].
Hence, do
if target == aDict[key]
Iteration is when you access every element of a collection one by one. However, an integer is not a collection, it is a number!
def keysWithValue(aDict, target):
'''
aDict: a dictionary
target: an integer
'''
ans = []
if target not in aDict.values():
return ans
else:
for key in aDict.keys():
if target == aDict[key]:
ans.append(key)
ans.sort()
return ans
//This will do your task. :)

Categories