Cannot write a function to retrieve all words in a trie - python

I have a following Trie implementation:
class TrieNode:
def __init__(self):
self.nodes = defaultdict(TrieNode)
self.is_fullpath = False
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word):
curr = self.root
for char in word:
curr = curr.nodes[char]
curr.is_fullpath = True
I'm trying to write a method to retrieve a list of all words in my trie.
t = Trie()
t.insert('a')
t.insert('ab')
print(t.paths()) # ---> ['a', 'ab']
My current implementation looks like this:
def paths(self, node=None):
if node is None:
node = self.root
result = []
for k, v in node.nodes.items():
if not node.is_fullpath:
for el in self.paths(v):
result.append(str(k) + el)
else:
result.append('')
return result
But it does not seem to return full list of words.

Here are the issues in your code:
It doesn't look further when is_fullpath is True. But you should also look deeper (for longer words) in that case.
It should not check node.is_fullpath but v.is_fullpath.
result.append('') is not correct. It should be result.append(str(k))
So your for loop body could look like this:
if v.is_fullpath:
result.append(str(k))
for el in self.paths(v):
result.append(str(k) + el)
I would however do it like this:
Define this recursive generator method on your TrieNode class:
def paths(self, prefix=""):
if self.is_fullpath:
yield prefix
for chr, node in self.nodes.items():
yield from node.paths(prefix + chr)
Note how this passes the collected characters on the path to the recursive call. If at any time the is_fullpath boolean is True, we yield that path. Always we continue the search recursively via child nodes.
The method on the Trie class is then quite simple:
def paths(self):
return list(self.root.paths())

Related

Passing a list of strings to be put into trie

I have the code that can build a trie data structure when it is given one string. When I am trying to pass a list of strings, it combines the words into one
class TrieNode:
def __init__(self):
self.end = False
self.children = {}
def all_words(self, prefix):
if self.end:
yield prefix
for letter, child in self.children.items():
yield from child.all_words(prefix + letter)
class Trie:
def __init__(self):
self.root = TrieNode()
def __init__(self):
self.root = TrieNode()
def insert(self, words):
curr = self.root
#the line I added to read the words from a list is below
for word in words:
for letter in word:
node = curr.children.get(letter)
if not node:
node = TrieNode()
curr.children[letter] = node
curr = node
curr.end = True
def all_words_beginning_with_prefix(self, prefix):
cur = self.root
for c in prefix:
cur = cur.children.get(c)
if cur is None:
return # No words with given prefix
yield from cur.all_words(prefix)
This is the code I use to insert everything into the tree:
lst = ['foo', 'foob', 'foobar', 'foof']
trie = Trie()
trie.insert(lst)
The output I get is
['foo', 'foofoob', 'foofoobfoobar', 'foofoobfoobarfoof']
The output I would like to get is
['foo', 'foob', 'foobar', 'foof']
This is the line I used to get the output (for reproducibility, in case you will need to run the code) - it returns all the words that start with a particular prefix:
print(list(trie.all_words_beginning_with_prefix('foo')))
How do I fix it?
You aren't resetting curr back to the root after each insert, so you're inserting the next word where the last one left off. You'd want something like:
def insert(self, words):
curr = self.root
for word in words:
for letter in word:
node = curr.children.get(letter)
if not node:
node = TrieNode()
curr.children[letter] = node
curr = node
curr.end = True
curr = self.root # Reset back to the root
I'd break this up though. I think your insert function is doing too much, and shouldn't be dealing with multiple strings. I'd change it to something like:
def insert(self, word):
curr = self.root
for letter in word:
node = curr.children.get(letter)
if not node:
node = TrieNode()
curr.children[letter] = node
curr = node
curr.end = True
def insert_many(self, words):
for word in words:
self.insert(word) # Just loop over self.insert
Now that's a non-problem since each insert is an independent call, and you can't forget to reset curr.

Trie Implementation in Python -- Print Keys

I Implemented a Trie data structure using python, now the problem is it doesn't display the keys that Trie is stored in its data structure.
class Node:
def __init__(self):
self.children = [None] * 26
self.endOfTheWord = False
class Trie:
def __init__(self):
self.root = self.getNode()
def getNode(self):
return Node()
def charToIndex(self ,ch):
return ord(ch) - ord('a')
def insert(self ,word):
current = self.root
for i in range(len(word)):
index = self.charToIndex(word[i])
if current.children[index] is None:
current.children[index] = self.getNode()
current = current.children[index]
current.endOfTheWord = True
def printKeys(self):
str = []
self.printKeysUtil(self.root ,str)
def printKeysUtil(self ,root ,str):
if root.endOfTheWord == True:
print(''.join(str))
return
for i in range(26):
if root.children[i] is not None:
ch = chr(97) + chr(i)
str.append(ch)
self.printKeysUtil(root.children[i] ,str)
str.pop()
You could perform a pre-order traversal of the nodes, and wherever you find an end-of-word marker, you zoom up to the root, capturing the letters as you go, in order to get the full word... except that to accomplish this, you would need to store the parent node in each node.
def printKeysUtil(self ,root ,str):
if root.endOfTheWord == True:
print(''.join(str))
return
for i in range(26):
if root.children[i] is not None:
ch = chr(97+i)
str.append(ch)
self.printKeysUtil(root.children[i] ,str)
str.pop()

Traversing over dictionary using variable number of keys [duplicate]

I'm interested in tries and DAWGs (direct acyclic word graph) and I've been reading a lot about them but I don't understand what should the output trie or DAWG file look like.
Should a trie be an object of nested dictionaries? Where each letter is divided in to letters and so on?
Would a lookup performed on such a dictionary be fast if there are 100k or 500k entries?
How to implement word-blocks consisting of more than one word separated with - or space?
How to link prefix or suffix of a word to another part in the structure? (for DAWG)
I want to understand the best output structure in order to figure out how to create and use one.
I would also appreciate what should be the output of a DAWG along with trie.
I do not want to see graphical representations with bubbles linked to each other, I want to know the output object once a set of words are turned into tries or DAWGs.
Unwind is essentially correct that there are many different ways to implement a trie; and for a large, scalable trie, nested dictionaries might become cumbersome -- or at least space inefficient. But since you're just getting started, I think that's the easiest approach; you could code up a simple trie in just a few lines. First, a function to construct the trie:
>>> _end = '_end_'
>>>
>>> def make_trie(*words):
... root = dict()
... for word in words:
... current_dict = root
... for letter in word:
... current_dict = current_dict.setdefault(letter, {})
... current_dict[_end] = _end
... return root
...
>>> make_trie('foo', 'bar', 'baz', 'barz')
{'b': {'a': {'r': {'_end_': '_end_', 'z': {'_end_': '_end_'}},
'z': {'_end_': '_end_'}}},
'f': {'o': {'o': {'_end_': '_end_'}}}}
If you're not familiar with setdefault, it simply looks up a key in the dictionary (here, letter or _end). If the key is present, it returns the associated value; if not, it assigns a default value to that key and returns the value ({} or _end). (It's like a version of get that also updates the dictionary.)
Next, a function to test whether the word is in the trie:
>>> def in_trie(trie, word):
... current_dict = trie
... for letter in word:
... if letter not in current_dict:
... return False
... current_dict = current_dict[letter]
... return _end in current_dict
...
>>> in_trie(make_trie('foo', 'bar', 'baz', 'barz'), 'baz')
True
>>> in_trie(make_trie('foo', 'bar', 'baz', 'barz'), 'barz')
True
>>> in_trie(make_trie('foo', 'bar', 'baz', 'barz'), 'barzz')
False
>>> in_trie(make_trie('foo', 'bar', 'baz', 'barz'), 'bart')
False
>>> in_trie(make_trie('foo', 'bar', 'baz', 'barz'), 'ba')
False
I'll leave insertion and removal to you as an exercise.
Of course, Unwind's suggestion wouldn't be much harder. There might be a slight speed disadvantage in that finding the correct sub-node would require a linear search. But the search would be limited to the number of possible characters -- 27 if we include _end. Also, there's nothing to be gained by creating a massive list of nodes and accessing them by index as he suggests; you might as well just nest the lists.
Finally, I'll add that creating a directed acyclic word graph (DAWG) would be a bit more complex, because you have to detect situations in which your current word shares a suffix with another word in the structure. In fact, this can get rather complex, depending on how you want to structure the DAWG! You may have to learn some stuff about Levenshtein distance to get it right.
Here is a list of python packages that implement Trie:
marisa-trie - a C++ based implementation.
python-trie - a simple pure python implementation.
PyTrie - a more advanced pure python implementation.
pygtrie - a pure python implementation by Google.
datrie - a double array trie implementation based on libdatrie.
Have a look at this:
https://github.com/kmike/marisa-trie
Static memory-efficient Trie structures for Python (2.x and 3.x).
String data in a MARISA-trie may take up to 50x-100x less memory than
in a standard Python dict; the raw lookup speed is comparable; trie
also provides fast advanced methods like prefix search.
Based on marisa-trie C++ library.
Here's a blog post from a company using marisa trie successfully:
https://www.repustate.com/blog/sharing-large-data-structure-across-processes-python/
At Repustate, much of our data models we use in our text analysis can be represented as simple key-value pairs, or dictionaries in Python lingo. In our particular case, our dictionaries are massive, a few hundred MB each, and they need to be accessed constantly. In fact for a given HTTP request, 4 or 5 models might be accessed, each doing 20-30 lookups. So the problem we face is how do we keep things fast for the client as well as light as possible for the server.
...
I found this package, marisa tries, which is a Python wrapper around a C++ implementation of a marisa trie. “Marisa” is an acronym for Matching Algorithm with Recursively Implemented StorAge. What’s great about marisa tries is the storage mechanism really shrinks how much memory you need. The author of the Python plugin claimed 50-100X reduction in size – our experience is similar.
What’s great about the marisa trie package is that the underlying trie structure can be written to disk and then read in via a memory mapped object. With a memory mapped marisa trie, all of our requirements are now met. Our server’s memory usage went down dramatically, by about 40%, and our performance was unchanged from when we used Python’s dictionary implementation.
There are also a couple of pure-python implementations, though unless you're on a restricted platform you'd want to use the C++ backed implementation above for best performance:
https://github.com/bdimmick/python-trie
https://pypi.python.org/pypi/PyTrie
Modified from senderle's method (above). I found that Python's defaultdict is ideal for creating a trie or a prefix tree.
from collections import defaultdict
class Trie:
"""
Implement a trie with insert, search, and startsWith methods.
"""
def __init__(self):
self.root = defaultdict()
# #param {string} word
# #return {void}
# Inserts a word into the trie.
def insert(self, word):
current = self.root
for letter in word:
current = current.setdefault(letter, {})
current.setdefault("_end")
# #param {string} word
# #return {boolean}
# Returns if the word is in the trie.
def search(self, word):
current = self.root
for letter in word:
if letter not in current:
return False
current = current[letter]
if "_end" in current:
return True
return False
# #param {string} prefix
# #return {boolean}
# Returns if there is any word in the trie
# that starts with the given prefix.
def startsWith(self, prefix):
current = self.root
for letter in prefix:
if letter not in current:
return False
current = current[letter]
return True
# Now test the class
test = Trie()
test.insert('helloworld')
test.insert('ilikeapple')
test.insert('helloz')
print test.search('hello')
print test.startsWith('hello')
print test.search('ilikeapple')
There's no "should"; it's up to you. Various implementations will have different performance characteristics, take various amounts of time to implement, understand, and get right. This is typical for software development as a whole, in my opinion.
I would probably first try having a global list of all trie nodes so far created, and representing the child-pointers in each node as a list of indices into the global list. Having a dictionary just to represent the child linking feels too heavy-weight, to me.
Using defaultdict and reduce function.
Create Trie
from functools import reduce
from collections import defaultdict
T = lambda : defaultdict(T)
trie = T()
reduce(dict.__getitem__,'how',trie)['isEnd'] = True
Trie :
defaultdict(<function __main__.<lambda>()>,
{'h': defaultdict(<function __main__.<lambda>()>,
{'o': defaultdict(<function __main__.<lambda>()>,
{'w': defaultdict(<function __main__.<lambda>()>,
{'isEnd': True})})})})
Search In Trie :
curr = trie
for w in 'how':
if w in curr:
curr = curr[w]
else:
print("Not Found")
break
if curr['isEnd']:
print('Found')
from collections import defaultdict
Define Trie:
_trie = lambda: defaultdict(_trie)
Create Trie:
trie = _trie()
for s in ["cat", "bat", "rat", "cam"]:
curr = trie
for c in s:
curr = curr[c]
curr.setdefault("_end")
Lookup:
def word_exist(trie, word):
curr = trie
for w in word:
if w not in curr:
return False
curr = curr[w]
return '_end' in curr
Test:
print(word_exist(trie, 'cam'))
Here is full code using a TrieNode class. Also implemented auto_complete method to return the matching words with a prefix.
Since we are using dictionary to store children, there is no need to convert char to integer and vice versa and don't need to allocate array memory in advance.
class TrieNode:
def __init__(self):
#Dict: Key = letter, Item = TrieNode
self.children = {}
self.end = False
class Trie:
def __init__(self):
self.root = TrieNode()
def build_trie(self,words):
for word in words:
self.insert(word)
def insert(self,word):
node = self.root
for char in word:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.end = True
def search(self, word):
node = self.root
for char in word:
if char in node.children:
node = node.children[char]
else:
return False
return node.end
def _walk_trie(self, node, word, word_list):
if node.children:
for char in node.children:
word_new = word + char
if node.children[char].end:
# if node.end:
word_list.append( word_new)
# word_list.append( word)
self._walk_trie(node.children[char], word_new , word_list)
def auto_complete(self, partial_word):
node = self.root
word_list = [ ]
#find the node for last char of word
for char in partial_word:
if char in node.children:
node = node.children[char]
else:
# partial_word not found return
return word_list
if node.end:
word_list.append(partial_word)
# word_list will be created in this method for suggestions that start with partial_word
self._walk_trie(node, partial_word, word_list)
return word_list
create a Trie
t = Trie()
words = ['hi', 'hieght', 'rat', 'ram', 'rattle', 'hill']
t.build_trie(words)
Search for word
words = ['hi', 'hello']
for word in words:
print(word, t.search(word))
hi True
hel False
search for words using prefix
partial_word = 'ra'
t.auto_complete(partial_word)
['rat', 'rattle', 'ram']
If you want a TRIE implemented as a Python class, here is something I wrote after reading about them:
class Trie:
def __init__(self):
self.__final = False
self.__nodes = {}
def __repr__(self):
return 'Trie<len={}, final={}>'.format(len(self), self.__final)
def __getstate__(self):
return self.__final, self.__nodes
def __setstate__(self, state):
self.__final, self.__nodes = state
def __len__(self):
return len(self.__nodes)
def __bool__(self):
return self.__final
def __contains__(self, array):
try:
return self[array]
except KeyError:
return False
def __iter__(self):
yield self
for node in self.__nodes.values():
yield from node
def __getitem__(self, array):
return self.__get(array, False)
def create(self, array):
self.__get(array, True).__final = True
def read(self):
yield from self.__read([])
def update(self, array):
self[array].__final = True
def delete(self, array):
self[array].__final = False
def prune(self):
for key, value in tuple(self.__nodes.items()):
if not value.prune():
del self.__nodes[key]
if not len(self):
self.delete([])
return self
def __get(self, array, create):
if array:
head, *tail = array
if create and head not in self.__nodes:
self.__nodes[head] = Trie()
return self.__nodes[head].__get(tail, create)
return self
def __read(self, name):
if self.__final:
yield name
for key, value in self.__nodes.items():
yield from value.__read(name + [key])
This version is using recursion
import pprint
from collections import deque
pp = pprint.PrettyPrinter(indent=4)
inp = raw_input("Enter a sentence to show as trie\n")
words = inp.split(" ")
trie = {}
def trie_recursion(trie_ds, word):
try:
letter = word.popleft()
out = trie_recursion(trie_ds.get(letter, {}), word)
except IndexError:
# End of the word
return {}
# Dont update if letter already present
if not trie_ds.has_key(letter):
trie_ds[letter] = out
return trie_ds
for word in words:
# Go through each word
trie = trie_recursion(trie, deque(word))
pprint.pprint(trie)
Output:
Coool👾 <algos>🚸 python trie.py
Enter a sentence to show as trie
foo bar baz fun
{
'b': {
'a': {
'r': {},
'z': {}
}
},
'f': {
'o': {
'o': {}
},
'u': {
'n': {}
}
}
}
This is much like a previous answer but simpler to read:
def make_trie(words):
trie = {}
for word in words:
head = trie
for char in word:
if char not in head:
head[char] = {}
head = head[char]
head["_end_"] = "_end_"
return trie
class TrieNode:
def __init__(self):
self.keys = {}
self.end = False
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word: str, node=None) -> None:
if node == None:
node = self.root
# insertion is a recursive operation
# this is base case to exit the recursion
if len(word) == 0:
node.end = True
return
# if this key does not exist create a new node
elif word[0] not in node.keys:
node.keys[word[0]] = TrieNode()
self.insert(word[1:], node.keys[word[0]])
# that means key exists
else:
self.insert(word[1:], node.keys[word[0]])
def search(self, word: str, node=None) -> bool:
if node == None:
node = self.root
# this is positive base case to exit the recursion
if len(word) == 0 and node.end == True:
return True
elif len(word) == 0:
return False
elif word[0] not in node.keys:
return False
else:
return self.search(word[1:], node.keys[word[0]])
def startsWith(self, prefix: str, node=None) -> bool:
if node == None:
node = self.root
if len(prefix) == 0:
return True
elif prefix[0] not in node.keys:
return False
else:
return self.startsWith(prefix[1:], node.keys[prefix[0]])
class Trie:
head = {}
def add(self,word):
cur = self.head
for ch in word:
if ch not in cur:
cur[ch] = {}
cur = cur[ch]
cur['*'] = True
def search(self,word):
cur = self.head
for ch in word:
if ch not in cur:
return False
cur = cur[ch]
if '*' in cur:
return True
else:
return False
def printf(self):
print (self.head)
dictionary = Trie()
dictionary.add("hi")
#dictionary.add("hello")
#dictionary.add("eye")
#dictionary.add("hey")
print(dictionary.search("hi"))
print(dictionary.search("hello"))
print(dictionary.search("hel"))
print(dictionary.search("he"))
dictionary.printf()
Out
True
False
False
False
{'h': {'i': {'*': True}}}
Python Class for Trie
Trie Data Structure can be used to store data in O(L) where L is the length of the string so for inserting N strings time complexity would be O(NL) the string can be searched in O(L) only same goes for deletion.
Can be clone from https://github.com/Parikshit22/pytrie.git
class Node:
def __init__(self):
self.children = [None]*26
self.isend = False
class trie:
def __init__(self,):
self.__root = Node()
def __len__(self,):
return len(self.search_byprefix(''))
def __str__(self):
ll = self.search_byprefix('')
string = ''
for i in ll:
string+=i
string+='\n'
return string
def chartoint(self,character):
return ord(character)-ord('a')
def remove(self,string):
ptr = self.__root
length = len(string)
for idx in range(length):
i = self.chartoint(string[idx])
if ptr.children[i] is not None:
ptr = ptr.children[i]
else:
raise ValueError("Keyword doesn't exist in trie")
if ptr.isend is not True:
raise ValueError("Keyword doesn't exist in trie")
ptr.isend = False
return
def insert(self,string):
ptr = self.__root
length = len(string)
for idx in range(length):
i = self.chartoint(string[idx])
if ptr.children[i] is not None:
ptr = ptr.children[i]
else:
ptr.children[i] = Node()
ptr = ptr.children[i]
ptr.isend = True
def search(self,string):
ptr = self.__root
length = len(string)
for idx in range(length):
i = self.chartoint(string[idx])
if ptr.children[i] is not None:
ptr = ptr.children[i]
else:
return False
if ptr.isend is not True:
return False
return True
def __getall(self,ptr,key,key_list):
if ptr is None:
key_list.append(key)
return
if ptr.isend==True:
key_list.append(key)
for i in range(26):
if ptr.children[i] is not None:
self.__getall(ptr.children[i],key+chr(ord('a')+i),key_list)
def search_byprefix(self,key):
ptr = self.__root
key_list = []
length = len(key)
for idx in range(length):
i = self.chartoint(key[idx])
if ptr.children[i] is not None:
ptr = ptr.children[i]
else:
return None
self.__getall(ptr,key,key_list)
return key_list
t = trie()
t.insert("shubham")
t.insert("shubhi")
t.insert("minhaj")
t.insert("parikshit")
t.insert("pari")
t.insert("shubh")
t.insert("minakshi")
print(t.search("minhaj"))
print(t.search("shubhk"))
print(t.search_byprefix('m'))
print(len(t))
print(t.remove("minhaj"))
print(t)
Code Oputpt
True
False
['minakshi', 'minhaj']
7
minakshi
minhajsir
pari
parikshit
shubh
shubham
shubhi
With prefix search
Here is #senderle's answer, slightly modified to accept prefix search (and not only whole-word matching):
_end = '_end_'
def make_trie(words):
root = dict()
for word in words:
current_dict = root
for letter in word:
current_dict = current_dict.setdefault(letter, {})
current_dict[_end] = _end
return root
def in_trie(trie, word):
current_dict = trie
for letter in word:
if _end in current_dict:
return True
if letter not in current_dict:
return False
current_dict = current_dict[letter]
t = make_trie(['hello', 'hi', 'foo', 'bar'])
print(in_trie(t, 'hello world'))
# True
In response to #basj
The following code will capture \b (end of word) letters.
_end = '_end_'
def make_trie(words):
root = dict()
for word in words:
current_dict = root
for letter in word:
current_dict = current_dict.setdefault(letter, {})
current_dict[_end] = _end
return root
def in_trie(trie, word):
current_dict = trie
for letter in word:
if letter not in current_dict: # Adjusted the
return False # order of letter
if _end in current_dict[letter]: # checks to capture
return True # the last letter.
current_dict = current_dict[letter]
t = make_trie(['hello', 'hi', 'foo', 'bar'])
>>> print(in_trie(t, 'hi'))
True
>>> print(in_trie(t, 'hola'))
False
>>> print(in_trie(t, 'hello friend'))
True
>>> print(in_trie(t, 'hel'))
None

Implementation of a Trie in Python

I programmed a Trie as a class in python. The search and insert function are clear, but now i tried to programm the python function __str__, that i can print it on the screen. But my function doesn't work!
class Trie(object):
def __init__(self):
self.children = {}
self.val = None
def __str__(self):
s = ''
if self.children == {}: return ' | '
for i in self.children:
s = s + i + self.children[i].__str__()
return s
def insert(self, key, val):
if not key:
self.val = val
return
elif key[0] not in self.children:
self.children[key[0]] = Trie()
self.children[key[0]].insert(key[1:], val)
Now if I create a Object of Trie:
tr = Trie()
tr.insert('hallo', 54)
tr.insert('hello', 69)
tr.insert('hellas', 99)
And when i now print the Trie, occures the problem that the entries hello and hellas aren't completely.
print tr
hallo | ellas | o
How can i solve that problem?.
Why not have str actually dump out the data in the format that it is stored:
def __str__(self):
if self.children == {}:
s = str(self.val)
else:
s = '{'
comma = False
for i in self.children:
if comma:
s = s + ','
else:
comma = True
s = s + "'" + i + "':" + self.children[i].__str__()
s = s + '}'
return s
Which results in:
{'h':{'a':{'l':{'l':{'o':54}}},'e':{'l':{'l':{'a':{'s':99},'o':69}}}}}
There are several issues you're running into. The first is that if you have several children at the same level, you'll only be prefixing one of them with the initial part of the string, and just showing the suffix of the others. Another issue is that you're only showing leaf nodes, even though you can have terminal values that are not at a leaf (consider what happens when you use both "foo" and "foobar" as keys into a Trie). Finally, you're not outputting the values at all.
To solve the first issue, I suggest using a recursive generator that does the traversal of the Trie. Separating the traversal from __str__ makes things easier since the generator can simply yield each value we come across, rather than needing to build up a string as we go. The __str__ method can assemble the final result easily using str.join.
For the second issue, you should yield the current node's key and value whenever self.val is not None, rather than only at leaf nodes. As long as you don't have any way to remove values, all leaf nodes will have a value, but we don't actually need any special casing to detect that.
And for the final issue, I suggest using string formatting to make a key:value pair. (I suppose you can skip this if you really don't need the values.)
Here's some code:
def traverse(self, prefix=""):
if self.val is not None:
yield "{}:{}".format(prefix, self.val)
for letter, child in self.children.items():
yield from child.traverse(prefix + letter)
def __str__(self):
return " | ".join(self.traverse())
If you're using a version of Python before 3.3, you'll need to replace the yield from statement with an explicit loop to yield the items from the recursive calls:
for item in child.traverse(prefix + letter)
yield item
Example output:
>>> t = Trie()
>>> t.insert("foo", 5)
>>> t.insert("bar", 10)
>>> t.insert("foobar", 100)
>>> str(t)
'bar:10 | foo:5 | foobar:100'
You could go with a simpler representation that just provides a summary of what the structure contains:
class Trie:
def __init__(self):
self.__final = False
self.__nodes = {}
def __repr__(self):
return 'Trie<len={}, final={}>'.format(len(self), self.__final)
def __getstate__(self):
return self.__final, self.__nodes
def __setstate__(self, state):
self.__final, self.__nodes = state
def __len__(self):
return len(self.__nodes)
def __bool__(self):
return self.__final
def __contains__(self, array):
try:
return self[array]
except KeyError:
return False
def __iter__(self):
yield self
for node in self.__nodes.values():
yield from node
def __getitem__(self, array):
return self.__get(array, False)
def create(self, array):
self.__get(array, True).__final = True
def read(self):
yield from self.__read([])
def update(self, array):
self[array].__final = True
def delete(self, array):
self[array].__final = False
def prune(self):
for key, value in tuple(self.__nodes.items()):
if not value.prune():
del self.__nodes[key]
if not len(self):
self.delete([])
return self
def __get(self, array, create):
if array:
head, *tail = array
if create and head not in self.__nodes:
self.__nodes[head] = Trie()
return self.__nodes[head].__get(tail, create)
return self
def __read(self, name):
if self.__final:
yield name
for key, value in self.__nodes.items():
yield from value.__read(name + [key])
Instead of your current strategy for printing, I suggest the following strategy instead:
Keep a list of all characters in order that you have traversed so far. When descending to one of your children, push its character on the end of its list. When returning, pop the end character off of the list. When you are at a leaf node, print the contents of the list as a string.
So say you have a trie built out of hello and hellas. This means that as you descend to hello, you build a list h, e, l, l, o, and at the leaf node you print hello, return once to get (hell), push a, s and at the next leaf you print hellas. This way you re-print letters earlier in the tree rather than having no memory of what they were and missing them.
(Another possiblity is to just descend the tree, and whenever you reach a leaf node go to your parent, your parent's parent, your parent's parent's parent... etc, keeping track of what letters you encounter, reversing the list you make and printing that out. But it may be less efficient.)

Trees in Python

Please help me to understand trees in Python. This is an example of tree implementation I found in the Internet.
from collections import deque
class EmptyTree(object):
"""Represents an empty tree."""
# Supported methods
def isEmpty(self):
return True
def __str__(self):
return ""
def __iter__(self):
"""Iterator for the tree."""
return iter([])
def preorder(self, lyst):
return
def inorder(self, lyst):
return
def postorder(self, lyst):
return
class BinaryTree(object):
"""Represents a nonempty binary tree."""
# Singleton for all empty tree objects
THE_EMPTY_TREE = EmptyTree()
def __init__(self, item):
"""Creates a tree with
the given item at the root."""
self._root = item
self._left = BinaryTree.THE_EMPTY_TREE
self._right = BinaryTree.THE_EMPTY_TREE
def isEmpty(self):
return False
def getRoot(self):
return self._root
def getLeft(self):
return self._left
def getRight(self):
return self._right
def setRoot(self, item):
self._root = item
def setLeft(self, tree):
self._left = tree
def setRight(self, tree):
self._right = tree
def removeLeft(self):
left = self._left
self._left = BinaryTree.THE_EMPTY_TREE
return left
def removeRight(self):
right = self._right
self._right = BinaryTree.THE_EMPTY_TREE
return right
def __str__(self):
"""Returns a string representation of the tree
rotated 90 degrees to the left."""
def strHelper(tree, level):
result = ""
if not tree.isEmpty():
result += strHelper(tree.getRight(), level + 1)
result += " " * level
result += str(tree.getRoot()) + "\n"
result += strHelper(tree.getLeft(), level + 1)
return result
return strHelper(self, 0)
def __iter__(self):
"""Iterator for the tree."""
lyst = []
self.inorder(lyst)
return iter(lyst)
def preorder(self, lyst):
"""Adds items to lyst during
a preorder traversal."""
lyst.append(self.getRoot())
self.getLeft().preorder(lyst)
self.getRight().preorder(lyst)
def inorder(self, lyst):
"""Adds items to lyst during
an inorder traversal."""
self.getLeft().inorder(lyst)
lyst.append(self.getRoot())
self.getRight().inorder(lyst)
def postorder(self, lyst):
"""Adds items to lystduring
a postorder traversal."""
self.getLeft().postorder(lyst)
self.getRight().postorder(lyst)
lyst.append(self.getRoot())
def levelorder(self, lyst):
"""Adds items to lyst during
a levelorder traversal."""
# levelsQueue = LinkedQueue()
levelsQueue = deque ([])
levelsQueue.append(self)
while levelsQueue != deque():
node = levelsQueue.popleft()
lyst.append(node.getRoot())
left = node.getLeft()
right = node.getRight()
if not left.isEmpty():
levelsQueue.append(left)
if not right.isEmpty():
levelsQueue.append(right)
This is programm that makes the small tree.
"""
File: testbinarytree.py
Builds a full binary tree with 7 nodes.
"""
from binarytree import BinaryTree
lst = ["5", "+", "2"]
for i in range(len(lst)):
b = BinaryTree(lst[0])
d = BinaryTree(lst[1])
f = BinaryTree(lst[2])
# Build the tree from the bottom up, where
# d is the root node of the entire tree
d.setLeft(b)
d.setRight(f)
def size(tree):
if tree.isEmpty():
return 0
else:
return 1 + size(tree.getLeft()) + size(tree.getRight())
def frontier(tree):
"""Returns a list containing the leaf nodes
of tree."""
if tree.isEmpty():
return []
elif tree.getLeft().isEmpty() and tree.getRight().isEmpty():
return [tree.getRoot()]
else:
return frontier(tree.getLeft()) + frontier(tree.getRight())
print ("Size:", size(d))
print ("String:")
print (d)
How can I make a class that will count the value of the expression, such that the answer = 7 (5+2). I really want to understand the concept with a small example.
It sounds like your problem isn't trees, which are a much more general (and simple) concept, but in how to properly populate and/or evaluate an expression tree.
If you have your operators specified in post-fix order, it becomes a lot easier.
See this wikipedia article on how to deal with infix notation when parsing input to a desktop calculator. It is called the shunting-yard algorithm.
You should do function that walks a tree in depth first order, calculating value of each node, either just taking value of it (if it is "5" for example), or making calculation (if it is "+" for example) - by walking the tree in depth first order you are sure that all subnodes of given node will be calculated when you are calculating that node (for example "5" and "2" will be calculated when you are calculating "+").
Then, at the root of the tree you'll get the result of the whole tree.
First of all, I'm not going to give much detail in case this is homework, which it sounds a bit like.
You need a method on your tree class that evaluates the tree. I suppose it'll assume that the "root" value of each tree node is a number (when the node is a leaf, i.e. when it has no children) or the name of an operator (When the node has children).
Your method will be recursive: the value of a tree-node with children is determined by (1) the value of its left subtree, (2) the value of its right subtree, and (3) the operator in its "root".
You'll probably want a table -- maybe stored in a dict -- mapping operator names like "+" to actual functions like operator.add (or, if you prefer, lambda x,y: x+y).

Categories