As part of a beginners' university Python project, I am currently creating a database of words, be it Nouns, Verbs, Determiners, Adjectives.
Now the problem I am having is that the words being read into the program via the lexicon.readfromfile method are being put into the dictionary via an instance of a class ( be it noun, verb or adjective ). This created the problem that I have absolutely no idea how to call these objects from the dictionary since they do not have variables as keys, but rather memory locations (see the following):
{<__main__.Verb object at 0x02F4F110>, <__main__.Noun object at 0x02F4F130>, <__main__.Adjective object at 0x02F4F1D0>, <__main__.Noun object at 0x02F4F170>}
Does anyone have any idea how I can call these keys in such a way that I can make them usable in my code?
Here is the part I'm stuck on:
Add a method getPast() to the Verb class, which returns the past tense of the Verb. Your getPast() method can simple work by retrieving the value of ‘past’ from the attributes.
Here is a the majority of the code, leaving out the Noun and Adjective classes:
class Lexicon(object):
'A container clas for word objects'
def __init__(self):
self.words = {}
def addword(self, word):
self.words[word.stringrep] = word
def removeword(self, word):
if word in self.words:
del(word)
print('Word has been deleted from the Lexicon' )
else:
print('That word is not in the Lexicon')
def getword(self,wordstring):
if wordstring in self.words:
return self.words[wordstring]
else:
return None
def containsword(self,string):
if string in self.words:
return True
else:
return False
def getallwords(self):
allwordslist = []
for w in self.words:
allwordslist.append(self.words[w])
return set(allwordslist)
def readfromfile(self, x):
filehandle = open(x, 'r')
while True:
line = filehandle.readline()
if line == '':
break
line = line.strip()
info = line.split(',')
if info[1] == 'CN' or info[1] == 'PN':
noun=Noun(info[0],info[1])
noun.setattribute('regular',bool(info[2]))
self.addword(noun)
elif info[1] == 'A':
adjective=Adjective(info[0],info[1])
adjective.setattribute('comparative', bool(info[2]))
self.addword(adjective)
elif info[1] == 'V':
verb=Verb(info[0],info[1])
verb.setattribute('transitive', bool(info[2]))
verb.setattribute('past', info[3])
self.addword(verb)
def writetofile(self, x):
filehandle = open(x, 'w')
for t in self.words.values():
filehandle.write(t.getFormattedString() + '\n')
filehandle.close()
#---------------------------------------------------------------------------#
class Word(object):
'A word of any category'
def __init__(self,stringrep,category):
self.wordattribute = {}
self.stringrep = stringrep
self.category = category
def setattribute(self, attributename, attributevalue):
self.wordattribute[attributename] = attributevalue
def getvalue(self,name):
if name in self.wordattribute:
return self.wordattribute[name]
else:
return none
def __str__(self):
return self.stringrep + ':' + self.category
def __lt__(self,otherword):
return self.stringrep < otherword.stringrep
class Verb(Word):
'"Represents a Verb."'
def __init__(self, stringrep, category):
super().__init__(stringrep,category)
def istransitive(self):
return self.transitive
def getFormattedString(self):
n = '{stringrep},{category}'
n = n.format(stringrep=self.stringrep, category=self.category)
for i in range(1,2):
for v,b in self.wordattribute.items():
n = n+','+str(b)
return n
You have a set there, not a dictionary. A set will let you check to see whether a given instance is in the set quickly and easily, but, as you have found, you can't easily get a specific value back out unless you already know what it is. That's OK because that's not what the set is for.
With a dictionary, you associate a key with a value when you add it to the dictionary. Then you use the key to get the value back out. So make a dictionary rather than a set, and use meaningful keys so you can easily get the value back.
Or, since I see you are already making a list before converting it to a set, just return that; you can easily access the items in the list by index. In other words, don't create the problem in the first place, and you won't have it.
Related
Although I've seen similar questions about this on here none have really explained in a way I think applies to me. I'm working on an RPG game in python and I store my character's inventory in a text file. However when I try to return these inventory items as an Item() class object I'm having issues. Each item is stored as: 'level 10 armor of water' or something along these lines. They are stored as the item's name which contains all the information needed for the object. --> Item(item_type, item_level, item_element, name). Is there anyway to extract this data needed from the object's name in string form?
#inventory.txt:
['', '', '', 'level 10 armor of water', '', '', '', '', '', '']
#Item() constuctor
class Item(object):
def __init__(self, item_type, item_level, item_element, name):
self.item_type = item_type
self.item_level = item_level
self.item_element = item_element
self.name = name
#Inventory Constructor
class Inventory(object):
item_slot1 = ""
item_slot2 = ""
item_slot3 = ""
item_slot4 = ""
item_slot5 = ""
item_slot6 = ""
item_slot7 = ""
item_slot8 = ""
item_slot9 = ""
item_slot10 = ""
slots = [item_slot1, item_slot2, item_slot3, item_slot4, item_slot5, item_slot6, item_slot7, item_slot8, item_slot9, item_slot10]
I realize this isn't the most efficient way of doing things but all help is appreciated.
You can do this:
with open("inventory.txt", "r") as f:
arr = eval(f.read())
for item_string in arr:
item = Item.from_string(item_string)
Writing Item.from_string is going to be a bit cumbersome though, since the name doesn't appear to lend itself well to parsing (e.g. "level 10 armor of water" instead of "level 10|armor|of water" or something easier like that). I'd redesign your storage format, but if that isn't an option, you could use regular expressions, like so:
class Item:
#staticmethod
def from_string(item_string):
level_match = re.match("level (\d+)", item_string)
item_level = level_match.group(1)
type_match = re.match("(armor|sword|backpack|etc)", item_string)
item_type = type_match.group(1)
return Item(item_type, item_level)
Also, you will be executing whatever code is contained in inventory.txt. But since the game is in Python, somebody could just edit the source code for the game itself. Realistically it isn't a problem, imho, but keep it in mind.
Parsing text in this way, instead of using a structured format like json, will lead to problems.
But, in the meantime, you can load the attributes from the str as long as it has a structured/predictable format.
For instance, if we assume that the first two words are the string level followed by the level number then you can use that pattern so long as it's true 100% of the time.
class Item:
def __init__(self, item_type, item_level, item_element, name):
self.item_type = item_type
self.item_level = item_level
self.item_element = item_element
self.name = name
# A #classmethod is good for defining another type of constructor.
# In this example, the #classmethod is what builds the class out of
# the name str.
#classmethod
def load_from_name(cls, name_text):
name_text = name_text.strip() # remove all surrounding whitespace
if not name_text:
return None # the text is empty
words = name_text.split() # split the text into word tokens
if words[0] != "level":
raise ValueError("Must start with 'level'")
try:
level = int(words[1])
except ValueError:
raise ValueError("Second word must be valid int")
# Now we want all of the words before "of" to be the item_type
# and all of the words after "of" to be the element.
if "of" not in words:
raise ValueError("Missing 'of'")
item_type, element = " of ".split(" ".join(words[2:]))
# Finally we assemble the instance and return it
return cls(item_type, level, element, name_text)
Notice how many conditions we have to check for. There's definitely many checks and errors missing. Here's what a structured format looks like:
class Item:
def __init__(self, item_type, item_level, item_element, name):
self.item_type = item_type
self.item_level = item_level
self.item_element = item_element
self.name = name
#classmethod
def load_from_save_state(cls, state):
return cls(state["type"], state["level"], state["element"], state["name"])
Now, the data can be loaded from a json/yaml/whatever structured format super easily.
import json
item_config_json = """
{
"item_type": "water",
"level": 10,
"element": "armor",
"name": "level 10 armor of water"
}
"""
# In a real scenario, this would probably get a path name,
# and the json would contain a list of many dict objects.
def load_item_from_json(json_text):
state = json.loads(json_text)
return Item.load_from_save_state(state)
I am trying to do a dictionary database, like actual dictionary. User input key word and meaning and program saves it in database. Like input word: rain , input meaning of the word: water droplets falling from the clouds then program makes it a dictionary. So far I can manage do this but it doesn't work the way I want.
class Mydictionary:
def __init__(self):
self.key=input("Please input word: ")
self.value=input("Please input meaning of the word: ")
def mydictionary(self):
self.dic={self.key:self.value}
Mydic=Mydictionary()
Mydic.mydictionary()
It works for only one time. I want to save keywords and values as much as I want. I want to create a dictionary database.
As far as I could see, it is working perfectly as you explained...
If you were thinking that you want to insert many values in a single object, this won't work as you are getting the only one input while calling the constructor.
You have to implement it like,
import json
class Mydictionary:
def __inint__(self):
self.dic = {}
def mydictionary(self):
self.key=input("Please input word: ")
self.value=input("Please input meaning of the word: ")
self.dic[self.key] = self.value
def save(self, json_file):
with open(json_file, "w") as f:
json.dump(self.dic, f)
Mydic=Mydictionary()
Mydic.mydictionary()
Mydic.mydictionary()
# to save it in a JSON file
Mydic.save("mydict.json")
Now you can call the method n times to add n entries...
You can look at the answer by #arsho below which I would consider as a good practice. Naming the function appropriately wrt the actual function they are doing is important.
To insert new key - value pair to your dictionary, you need to create a method to get data from the user.
In __init__ you can declare an empty dictionary and then in insert method you can get a new entry from the user.
Moreover, to display the current elements of the dictionary you can create a separate method with name display.
json built-in can directly write and read dictionary type data from an to a json file. You can read about json from official documentation on json.
import json
import os
class Mydictionary:
def __init__(self, file_name):
self.json_file = file_name
if os.path.exists(file_name):
with open(self.json_file, "r") as json_output:
self.data = json.load(json_output)
else:
self.data = {}
def insert(self):
user_key = input("Please input word: ")
user_value = input("Please input meaning of the word: ")
self.data[user_key] = user_value
with open(self.json_file, "w") as json_output:
json.dump(self.data, json_output)
def display(self):
if os.path.exists(self.json_file):
with open(self.json_file, "r") as json_output:
print(json.load(json_output))
else:
print("{} is not created yet".format(self.json_file))
Mydic=Mydictionary("data.json")
Mydic.display()
Mydic.insert()
Mydic.insert()
Mydic.display()
Output:
data.json is not created yet
Please input word: rain
Please input meaning of the word: water droplets falling from the clouds
Please input word: fire
Please input meaning of the word: Fire is a chemical reaction that releases light and heat
{'rain': 'water droplets falling from the clouds', 'fire': 'Fire is a chemical reaction that releases light and heat'}
Disclaimer: This is just a concept of class and method declaration and usage. You can improvise this approach.
Try:
import json
class MyDictionary:
__slots__ = "dic",
def __init__(self):
self.dic = {}
def addvalue(self):
"""Adds a value into the dictionary."""
key=input("Please input word: ")
value=input("Please input meaning of the word: ")
self.dic[key] = value
def save(self, json_file):
"""Saves the dictionary into a json file."""
with open(json_file, "w") as f:
json.dump(self.dic, f)
# Testing
MyDic = MyDictionary()
MyDic.addvalue()
MyDic.addvalue()
print(MyDic.dic) # Two elements
MyDic.save("json_file.json") # Save the file
class dictionary():
def __init__(self):
self.dictionary={}
def insert_word(self,word):
self.dictionary.update(word)
def get_word(self):
word=input("enter a word or enter nothing to exit: ")
if word=="":
return None
meaning=input("enter the meaning: ")
return {word:meaning}
def get_dict(self):
return self.dictionary
if __name__ == "__main__":
mydict=dictionary()
word=mydict.get_word()
while word:
mydict.insert_word(word)
word=mydict.get_word()
print(mydict.get_dict())
this will keep taking inputs until you give it a null value and then print out the dictionary when u stop.
I am looking for some tips about how to decrease memory usage for python. I am using this piece of code as the main structure to hold my data:
http://stevehanov.ca/blog/index.php?id=114
I need it to serve for proximity word matching using a flask server. I need to put much more than 20 millions of different strings (and it will increase). Now I get MemoryError when trying to put around 14 millions in the Trie.
I just add a dictionary to hold some value with quick access (I need it, but it can be considered as a kind of ID of appearance, it is not directly related to the word)
class TrieNode:
values = {}
def __init__(self):
self.word = None
self.children = {}
global NodeCount
NodeCount += 1
def insert( self, word, value):
node = self
for letter in word:
if letter not in node.children:
node.children[letter] = TrieNode()
node = node.children[letter]
TrieNode.values[word] = value
node.word = word
I am not familiar with Python optimization, is there any way to make the "letter" object less big to save some memory?
Please note that my difficulty come from the fact that this letter is not only [a-z] but need to handle all the "unicode range" (like accentuated chars but not only). BTW it is a single character, so it should be quite light from the memory fingerprint. How can I use the codepoint instead of the string object (will it be more memory efficient)?
EDIT: adding some other informations following reply from #juanpa-arrivillaga
so, first I see no difference using the slot construct, on my computer, with or without __slot__ I see the same memory usage.
with __slot__ :
>>> class TrieNode:
NodeCount = 0
__slots__ = "word", "children"
def __init__(self):
self.word = None
self.children = {}
#global NodeCount # my goal is to encapsulated the NodeCount in the class itself
TrieNode.NodeCount += 1
>>> tn = TrieNode()
>>> sys.getsizeof(tn) + sys.getsizeof(tn.__dict__)
176
without __slot__:
>>> class TrieNode:
NodeCount = 0
def __init__(self):
self.word = None
self.children = {}
#global NodeCount
TrieNode.NodeCount += 1
>>> tn = TrieNode()
>>> sys.getsizeof(tn) + sys.getsizeof(tn.__dict__)
176
so I do not understand, why. Where am i wrong ?
here is something else what I tried too, using "intern" keyword, because this value is a string handling an "id" (and so is not related to unicode, not like letter) :
btw my goal was to have with values and NodeCount, the equivalent concept for class/static variables so that each of them is shared by all the instance of the small created objets, I thought it would preserve memory and avoid duplicate, but I may be wrong from my understanding about "static-like" concept in Python)
class TrieNode:
values = {} # shared amon all instances so only one structure?
NodeCount = 0
__slots__ = "word", "children"
def __init__(self):
self.word = None
self.children = {}
#global NodeCount
TrieNode.NodeCount += 1
def insert( self, word, value = None):
# value is a string id like "XYZ999999999"
node = self
for letter in word:
codepoint = ord(letter)
if codepoint not in node.children:
node.children[codepoint] = TrieNode()
node = node.children[codepoint]
node.word = word
if value is not None:
lost = TrieNode.values.setdefault(word, [])
TrieNode.values[word].append(intern(str(value)))
ADDED:
Last, I should have precised that i am using Python 2.7.x family.
I was wondering if there were any fixed len data types from library like numpy could help me to save some memory, again as new, i do not know where to look. Btw "word" are not real "natural language word" but "arbitrary length sequence of characters" and they can also be very long.
from your reply, I agree that avoiding to store the word in each node would be efficient, but you need to have a look to the linked article/piece of code. The main goal is not to reconstruct this word but to be able to do efficient/very fast approximate string matching using this word and then getting the "value" related to each of the closest matches, i am not sure i understood what was the goal of the path down to tree. (not reaching the complete tree?), and when matched we just need to get the orginal word matched, (but my understanding can be wrong at this point).
so I need to have this huge dict somewhere and I wanted to encapsulate in the class to be convenient. But so may be it is too much costly from the memory "weight" point of view ?
also I noticed that I get already less memory usage than your sample (I do not know why for now), but so here is an example value of "letter" contained in the structure.
>>> s = u"\u266f"
>>> ord(s)
9839
>>> sys.getsizeof(s)
28
>>> sys.getsizeof(ord(s))
12
>>> print s
♯
>>> repr(s)
"u'\\u266f'"
Low hanging fruit: use __slots__ in your node class, otherwise, each TrieNode object is carrying around a dict.
class TrieNode:
__slots__ = "word", "children"
def __init__(self):
self.word = None
self.children = {}
Now, each TrieNode object will not carry around an attribute dict. Compare the sizes:
>>> class TrieNode:
... def __init__(self):
... self.word = None
... self.children = {}
...
>>> tn = TrieNode()
>>> sys.getsizeof(tn) + sys.getsizeof(tn.__dict__)
168
Vs:
>>> class TrieNode:
... __slots__ = "word", "children"
... def __init__(self):
... self.is_word = False
... self.children = {}
...
>>> sys.getsizeof(tn)
56
>>> tn.__dict__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'TrieNode' object has no attribute '__dict__'
Another optimization, use int objects. Small int objects are cached, it is probable most of your characters will be in that range anyway, but even if they aren't, an int, while still beefy in Python, is smaller than even a single character string:
>>> 'ñ'
'ñ'
>>> ord('ñ')
241
>>> sys.getsizeof('ñ')
74
>>> sys.getsizeof(ord('ñ'))
28
So you can do something like:
def insert( self, word, value):
node = self
for letter in word:
code_point = ord(letter)
if code_point not in node.children:
node.children[code_point] = TrieNode()
node = node.children[code_point]
node.is_word = True #Don't save the word, simply a reference to a singleton
Also, you are keeping around a class variable values dict that is growing enormously, but that information is redundant. You say:
I just add a dictionary to hold some value with quick access (I need
it)
You can reconstruct the words from the path. It should be relatively fast, I would seriously consider against having this dict. Check out how much memory it requires simply to hold a million one-character strings:
>>> d = {str(i):i for i in range(1000000)}
>>> (sum(sizeof(k)+sizeof(v) for k,v in d.items()) + sizeof(d)) * 1e-9
0.12483203000000001
You could do something like:
class TrieNode:
__slots__ = "value", "children"
def __init__(self):
self.value = None
self.children = {}
def insert( self, word, value):
node = self
for letter in word:
code_point = ord(letter)
if code_point not in node.children:
node.children[code_point] = TrieNode()
node = node.children[code_point]
node.value = value #this serves as a signal that it is a word
def get(word, default=None):
val = self._get_value(word)
if val is None:
return default
else:
return val
def _get_value(self, word):
node = self
for letter in word:
code_point = ord(letter)
try:
node = node.children[code_point]
except KeyError:
return None
return node.value
class MySong:
_songTitle = "Song Title"
_artistName = "Artist Name"
_likeIndicator = -1
def setTitleAndArtist(self, songTitle, artistName):
self._songTitle = songTitle
self._artistName = artistName
def setLike(self, likeIndicator):
self._likeIndicator = likeIndicator
def undoSetLike(self, songTitle):
Null
def getTitle(self):
return self._songTitle
def getArtist(self):
return self._artistName
def getLikeIndicator(self):
return self._likeIndicator
class MyPlaylist:
_mySongs = []
def add(self, song):
self._mySongs.append(song)
def showTitles(self):
index = 0
titlesList = []
while index != len(self._mySongs):
titlesList.append(self._mySongs[index].getTitle())
index = index + 1
return titlesList
def remove(self):
remindex = 0
while remindex != len(self._mySongs):
if (self._mySongs[index].getTitle()) == remChoice :
return("Song FOUND debug!")
self._mySongs.remove(index)
else:
remindex = remindex + 1
return("Song NOT FOUND debug!")
def getMySong(self):
Null
There is a list of song objects inside of _mySongs = []. I'm trying to remove one, based on the title variable of that object.
In a separate (unshown) part of the program, the user is asked to enter the title of the song they want removed as a string. This is saved as remChoice.
I'm not entirely sure how to remove the song based on the title.
I've tried for a while to get it going, obviously we find the index of the song in the list by matching it to the title (by calling the getTitle method), then removing that index when it's found.
This isn't working. Where am I going wrong?
If you want to delete an item from a list knowing it's index use:
del xs[i]
Where i is the index. (e.g: Your song's index based on your search).
list.remove() is used for removing a matching element form the list not the "ith" item.
You might also find that a list is not a suitable data structure here? Perhaps you could try storing key/value pairs in a dict. e.g:
my_songs = {}
my_aongs["My Song Title"] = MySong(title, description, length)
You can later delete songs via their keys:
del my_songs["My Song Title"]
where titles are your keys. This saves you from doing O(n) searching.
Update:
Your .remove() method should look more like the following:
def remove(self, title):
for i, song in enumerate(self._mySongs):
if song.getTitle() == title:
del self._mySongs[i]
return
print("Song not found!")
Here we're using list's iteration protocol by using a for x in xs: rather than using a while loop and doing manual bookkeeping. The builtin function enumerate() is also used to give us an index into the list we're iterating over (i.e: it's position in the sequence).
try
self._mySongs.remove(title)
That should work.
(Or from another object: replace self by whatever your object name is)
I am trying to write a function which cleans up URLs (strips them of anything like "www.", "http://" etc.) to create a list that I can sort alphabetically.
I have tried to do this by creating a class including a method to detect the term I would like to remove from the URL-string, and remove it. The bit where I am struggling is that I want to add the modified URLs to a new list called new_strings, and then use that new list when I call the method for a second time on a different term, so that step by step I can remove all unwanted elements from the URL-string.
For some reason my current code returns an empty list, and I am also struggling to understand whether new_strings should be passed to __init__ or not? I guess I am a bit confused with global vs. local variables, and some help and explanation would be greatly appreciated. :)
Thanks! Code below.
class URL_Cleaner(object):
def __init__(self, old_strings, new_strings, term):
self.old_strings = old_strings
self.new_strings = new_strings
self.term = term
new_strings = []
def delete_term(self, new_strings):
for self.string in self.old_strings:
if self.term in string:
new_string = string.replace(term, "")
self.new_strings.append(new_string)
else:
self.new_strings.append(string)
return self.new_strings
print "\n" .join(new_strings) #for checking; will be removed later
strings = ["www.google.com", "http://www.google.com", "https://www.google.com"]
new_strings = []
www = URL_Cleaner(strings, new_strings, "www.")
Why are we making a class to do this?
for string in strings:
string.replace("www.","")
Isn't that what you're trying to accomplish?
Regardless the problem is in your class definition. Pay attention to scopes:
class URL_Cleaner(object):
def __init__(self, old_strings, new_strings, term):
"""These are all instance objects"""
self.old_strings = old_strings
self.new_strings = new_strings
self.term = term
new_strings = [] # this is a class object
def delete_term(self, new_strings):
"""You never actually call this function! It never does anything!"""
for self.string in self.old_strings:
if self.term in string:
new_string = string.replace(term, "")
self.new_strings.append(new_string)
else:
self.new_strings.append(string)
return self.new_strings
print "\n" .join(new_strings) #for checking; will be removed later
# this is referring the class object, and will be evaluated when
# the class is defined, NOT when the object is created!
I've commented your code the necessary reasons.... To fix:
class URL_Cleaner(object):
def __init__(self, old_strings):
"""Cleans URL of 'http://www.'"""
self.old_strings = old_strings
cleaned_strings = self.clean_strings()
def clean_strings(self):
"""Clean the strings"""
accumulator = []
for string in self.old_strings:
string = string.replace("http://", "").replace("www.", "")
# this might be better as string = re.sub("http://(?:www.)?", "", string)
# but I'm not going to introduce re yet.
accumulator.append(string)
return accumulator
# this whole function is just:
## return [re.sub("http://(?:www.)?", "", string, flags=re.I) for string in self.old_strings]
# but that's not as readable imo.
You just need to define new_strings as
self.new_strings = []
and remove new_strings argument from the constructor.
The 'new_strings' and 'self.new_strings' are two different lists.