Three way language dictionary

Three way language dictionary - python

se_eng_fr_dict = {'School': ['Skola', 'Ecole'], 'Ball': ['Boll', 'Ballon']}
choose_language = raw_input("Type 'English', for English. Skriv 'svenska' fo:r svenska. Pour francais, ecrit 'francais'. ")
if choose_language == 'English':
word = raw_input("Type in a word:")
swe_word = se_eng_fr_dict[word][0]
fra_word = se_eng_fr_dict[word][1]
print word, ":", swe_word, "pa. svenska," , fra_word, "en francais."
elif choose_language == 'Svenska':
word = raw_input("Vilket ord:")
for key, value in se_eng_fr_dict.iteritems():
if value == word:
print key
I want to create a dictionary (to be stored locally as a txt file) and the user can choose between entering a word in English, Swedish or French to get the translation of the word in the two other languages. The user should also be able to add data to the dictionary.
The code works when I look up the Swedish and French word with the English word. But how can I get the Key, and Value2 if I only have value1?
Is there a way or should I try to approach this problem in a different way?

A good option would be to store None for the value if it hasn't been set. While it would increase the amount of memory required, you could go a step further and add the language itself.
Example:
se_eng_fr_dict = {'pencil': {'se': None, 'fr': 'crayon'}}
def translate(word, lang):
# If dict.get() finds no value with `word` it will return
# None by default. We override it with an empty dictionary `{}`
# so we can always call `.get` on the result.
translated = se_eng_fr_dict.get(word, {}).get(lang)
if translated is None:
print("No {lang} translation found for {word}.format(**locals()))
else:
print("{} is {} in {}".format(word, translated, lang))
translate('pencil', 'fr')
translate('pencil', 'se')

i hope there could be a better solution, but here is mine:
class Word:
def __init__(self, en, fr, se):
self.en = en
self.fr = fr
self.se = se
def __str__(self):
return '<%s,%s,%s>' % (self.en, self.fr, self.se)
then you dump all these Words into a mapping data structure. you can use dictionary, but here if you have a huge data set, it's better for you to use BST, have a look at https://pypi.python.org/pypi/bintrees/2.0.1
lets say you have all these Words loaded in a list named words, then:
en_words = {w.en: w for w in words}
fr_words = {w.fr: w for w in words}
se_words = {w.se: w for w in words}
again, BST is more recommended here.

Maybe a set of nested lists would be better for this:
>>> my_list = [
[
"School", "Skola", "Ecole"
],
[
"Ball", "Boll", "Ballon"
]
]
Then you can access the set of translations by doing:
>>> position = [index for index, item in enumerate(my_list) for subitem in item if value == subitem][0]
This returns the index of the list, which you can grab:
>>> sub_list = my_list[position]
And the sublist will have all the translations in order.
For example:
>>> position = [index for index, item in enumerate(my_list) for subitem in item if "Ball" == subitem][0]
>>> print position
1
>>> my_list[position]
['Ball', 'Boll', 'Ballon']

In order to speedup word lookups and achieve a good flexibility, I'd choose a dictionary of subdictionaries: each subdictionary translates the words of a language into all the available languages and the top-level dictionary maps each language into the corresponding subdictionary.
For example, if multidict is the top-level dictionary, then multidict['english']['ball'] returns the (sub)dictionary:
{'english':'ball', 'francais':'ballon', 'svenska':'ball'}
Below is a class Multidictionary implementing such an idea.
For convenience it assumes that all the translations are stored into a text file in CSV format, which is read at initialization time, e.g.:
english,svenska,francais,italiano
school,skola,ecole,scuola
ball,boll,ballon,palla
Any number of languages can be easily added to the CSV file.
class Multidictionary(object):
def __init__(self, fname=None):
'''Init a multidicionary from a CSV file.
The file describes a word per line, separating all the available
translations with a comma.
First file line must list the corresponding languages.
For example:
english,svenska,francais,italiano
school,skola,ecole,scuola
ball,boll,ballon,palla
'''
self.fname = fname
self.multidictionary = {}
if fname is not None:
import csv
with open(fname) as csvfile:
reader = csv.DictReader(csvfile)
for translations in reader:
for lang, word in translations.iteritems():
self.multidictionary.setdefault(lang, {})[word] = translations
def get_available_languages(self):
'''Return the list of available languages.'''
return sorted(self.multidictionary)
def translate(self, word, language):
'''Return a dictionary containing the translations of a word (in a
specified language) into all the available languages.
'''
if language in self.get_available_languages():
translations = self.multidictionary[language].get(word)
else:
print 'Invalid language %r selected' % language
translations = None
return translations
def get_translations(self, word, language):
'''Generate the string containing the translations of a word in a
language into all the other available languages.
'''
translations = self.translate(word, language)
if translations:
other_langs = (lang for lang in translations if lang != language)
lang_trans = ('%s in %s' % (translations[lang], lang) for lang in other_langs)
s = '%s: %s' % (word, ', '.join(lang_trans))
else:
print '%s word %r not found' % (language, word)
s = None
return s
if __name__ == '__main__':
multidict = Multidictionary('multidictionary.csv')
print 'Available languages:', ', '.join(multidict.get_available_languages())
language = raw_input('Choose the input language: ')
word = raw_input('Type a word: ')
translations = multidict.get_translations(word, language)
if translations:
print translations

Related

NameError: name 'lemma_from_key' is not defined

I am trying to run wordnet from nltk. But in wordnet.py it says "NameError: name 'lemma_from_key' is not defined" at line 1680, though the function lemma_from_key() is defined in the same class
_WordNetObject
. The portion of codes are as follow:
class _WordNetObject:
def lemma(self, name, lang="eng"):
"""Return lemma object that matches the name"""
# cannot simply split on first '.',
# e.g.: '.45_caliber.a.01..45_caliber'
separator = SENSENUM_RE.search(name).end()
synset_name, lemma_name = name[: separator - 1], name[separator:]
synset = self.synset(synset_name)
for lemma in synset.lemmas(lang):
if lemma._name == lemma_name:
return lemma
raise WordNetError(f"no lemma {lemma_name!r} in {synset_name!r}")
def lemma_from_key(self, key):
# Keys are case sensitive and always lower-case
key = key.lower()
lemma_name, lex_sense = key.split("%")
pos_number, lexname_index, lex_id, _, _ = lex_sense.split(":")
pos = self._pos_names[int(pos_number)]
# open the key -> synset file if necessary
if self._key_synset_file is None:
self._key_synset_file = self.open("index.sense")
# Find the synset for the lemma.
synset_line = _binary_search_file(self._key_synset_file, key)
if not synset_line:
raise WordNetError("No synset found for key %r" % key)
offset = int(synset_line.split()[1])
synset = self.synset_from_pos_and_offset(pos, offset)
# return the corresponding lemma
for lemma in synset._lemmas:
if lemma._key == key:
return lemma
raise WordNetError("No lemma found for for key %r" % key)
#############################################################
# Loading Synsets
#############################################################
def synset(self, name):
# split name into lemma, part of speech and synset number
lemma, pos, synset_index_str = name.lower().rsplit(".", 2)
synset_index = int(synset_index_str) - 1
# get the offset for this synset
try:
offset = self._lemma_pos_offset_map[lemma][pos][synset_index]
except KeyError as e:
message = "no lemma %r with part of speech %r"
raise WordNetError(message % (lemma, pos)) from e
except IndexError as e:
n_senses = len(self._lemma_pos_offset_map[lemma][pos])
message = "lemma %r with part of speech %r has only %i %s"
if n_senses == 1:
tup = lemma, pos, n_senses, "sense"
else:
tup = lemma, pos, n_senses, "senses"
raise WordNetError(message % tup) from e
# load synset information from the appropriate file
synset = self.synset_from_pos_and_offset(pos, offset)
# some basic sanity checks on loaded attributes
if pos == "s" and synset._pos == "a":
message = (
"adjective satellite requested but only plain "
"adjective found for lemma %r"
)
raise WordNetError(message % lemma)
assert synset._pos == pos or (pos == "a" and synset._pos == "s")
# Return the synset object.
return synset
def _data_file(self, pos):
"""
Return an open file pointer for the data file for the given
part of speech.
"""
if pos == ADJ_SAT:
pos = ADJ
if self._data_file_map.get(pos) is None:
fileid = "data.%s" % self._FILEMAP[pos]
self._data_file_map[pos] = self.open(fileid)
return self._data_file_map[pos]
def synset_from_pos_and_offset(self, pos, offset):
"""
- pos: The synset's part of speech, matching one of the module level
attributes ADJ, ADJ_SAT, ADV, NOUN or VERB ('a', 's', 'r', 'n', or 'v').
- offset: The byte offset of this synset in the WordNet dict file
for this pos.
>>> from nltk.corpus import wordnet as wn
>>> print(wn.synset_from_pos_and_offset('n', 1740))
Synset('entity.n.01')
"""
# Check to see if the synset is in the cache
if offset in self._synset_offset_cache[pos]:
return self._synset_offset_cache[pos][offset]
data_file = self._data_file(pos)
data_file.seek(offset)
data_file_line = data_file.readline()
# If valid, the offset equals the 8-digit 0-padded integer found at the start of the line:
line_offset = data_file_line[:8]
if line_offset.isalnum() and offset == int(line_offset):
synset = self._synset_from_pos_and_line(pos, data_file_line)
assert synset._offset == offset
self._synset_offset_cache[pos][offset] = synset
else:
synset = None
raise WordNetError(
f"No WordNet synset found for pos={pos} at offset={offset}."
)
data_file.seek(0)
return synset
#deprecated("Use public method synset_from_pos_and_offset() instead")
def _synset_from_pos_and_offset(self, *args, **kwargs):
"""
Hack to help people like the readers of
https://stackoverflow.com/a/27145655/1709587
who were using this function before it was officially a public method
"""
return self.synset_from_pos_and_offset(*args, **kwargs)
def _synset_from_pos_and_line(self, pos, data_file_line):
# Construct a new (empty) synset.
synset = Synset(self)
# parse the entry for this synset
try:
# parse out the definitions and examples from the gloss
columns_str, gloss = data_file_line.strip().split("|")
definition = re.sub(r"[\"].*?[\"]", "", gloss).strip()
examples = re.findall(r'"([^"]*)"', gloss)
for example in examples:
synset._examples.append(example)
synset._definition = definition.strip("; ")
# split the other info into fields
_iter = iter(columns_str.split())
def _next_token():
return next(_iter)
# get the offset
synset._offset = int(_next_token())
# determine the lexicographer file name
lexname_index = int(_next_token())
synset._lexname = self._lexnames[lexname_index]
# get the part of speech
synset._pos = _next_token()
# create Lemma objects for each lemma
n_lemmas = int(_next_token(), 16)
for _ in range(n_lemmas):
# get the lemma name
lemma_name = _next_token()
# get the lex_id (used for sense_keys)
lex_id = int(_next_token(), 16)
# If the lemma has a syntactic marker, extract it.
m = re.match(r"(.*?)(\(.*\))?$", lemma_name)
lemma_name, syn_mark = m.groups()
# create the lemma object
lemma = Lemma(self, synset, lemma_name, lexname_index, lex_id, syn_mark)
synset._lemmas.append(lemma)
synset._lemma_names.append(lemma._name)
# collect the pointer tuples
n_pointers = int(_next_token())
for _ in range(n_pointers):
symbol = _next_token()
offset = int(_next_token())
pos = _next_token()
lemma_ids_str = _next_token()
if lemma_ids_str == "0000":
synset._pointers[symbol].add((pos, offset))
else:
source_index = int(lemma_ids_str[:2], 16) - 1
target_index = int(lemma_ids_str[2:], 16) - 1
source_lemma_name = synset._lemmas[source_index]._name
lemma_pointers = synset._lemma_pointers
tups = lemma_pointers[source_lemma_name, symbol]
tups.append((pos, offset, target_index))
# read the verb frames
try:
frame_count = int(_next_token())
except StopIteration:
pass
else:
for _ in range(frame_count):
# read the plus sign
plus = _next_token()
assert plus == "+"
# read the frame and lemma number
frame_number = int(_next_token())
frame_string_fmt = VERB_FRAME_STRINGS[frame_number]
lemma_number = int(_next_token(), 16)
# lemma number of 00 means all words in the synset
if lemma_number == 0:
synset._frame_ids.append(frame_number)
for lemma in synset._lemmas:
lemma._frame_ids.append(frame_number)
lemma._frame_strings.append(frame_string_fmt % lemma._name)
# only a specific word in the synset
else:
lemma = synset._lemmas[lemma_number - 1]
lemma._frame_ids.append(frame_number)
lemma._frame_strings.append(frame_string_fmt % lemma._name)
# raise a more informative error with line text
except ValueError as e:
raise WordNetError(f"line {data_file_line!r}: {e}") from e
# set sense keys for Lemma objects - note that this has to be
# done afterwards so that the relations are available
for lemma in synset._lemmas:
if synset._pos == ADJ_SAT:
head_lemma = synset.similar_tos()[0]._lemmas[0]
head_name = head_lemma._name
head_id = "%02d" % head_lemma._lex_id
else:
head_name = head_id = ""
tup = (
lemma._name,
WordNetCorpusReader._pos_numbers[synset._pos],
lemma._lexname_index,
lemma._lex_id,
head_name,
head_id,
)
lemma._key = ("%s%%%d:%02d:%02d:%s:%s" % tup).lower()
# the canonical name is based on the first lemma
lemma_name = synset._lemmas[0]._name.lower()
offsets = self._lemma_pos_offset_map[lemma_name][synset._pos]
sense_index = offsets.index(synset._offset)
tup = lemma_name, synset._pos, sense_index + 1
synset._name = "%s.%s.%02i" % tup
return synset
def synset_from_sense_key(self, sense_key):
"""
Retrieves synset based on a given sense_key. Sense keys can be
obtained from lemma.key()
From https://wordnet.princeton.edu/documentation/senseidx5wn:
A sense_key is represented as::
lemma % lex_sense (e.g. 'dog%1:18:01::')
where lex_sense is encoded as::
ss_type:lex_filenum:lex_id:head_word:head_id
:lemma: ASCII text of word/collocation, in lower case
:ss_type: synset type for the sense (1 digit int)
The synset type is encoded as follows::
1 NOUN
2 VERB
3 ADJECTIVE
4 ADVERB
5 ADJECTIVE SATELLITE
:lex_filenum: name of lexicographer file containing the synset for the sense (2 digit int)
:lex_id: when paired with lemma, uniquely identifies a sense in the lexicographer file (2 digit int)
:head_word: lemma of the first word in satellite's head synset
Only used if sense is in an adjective satellite synset
:head_id: uniquely identifies sense in a lexicographer file when paired with head_word
Only used if head_word is present (2 digit int)
>>> import nltk
>>> from nltk.corpus import wordnet as wn
>>> print(wn.synset_from_sense_key("drive%1:04:03::"))
Synset('drive.n.06')
>>> print(wn.synset_from_sense_key("driving%1:04:03::"))
Synset('drive.n.06')
"""
return self.lemma_from_key(sense_key).synset()#line 1680
The full code can be found at nltk documentation.
I was trying to run wordnet code to implement bert. I downloaded nltk using pip install nltk from anaconda command prompt. But the code gives me error: NameError: name 'lemma_from_key' is not defined.

Since you installed using pip install nltk, it must have likely installed the latest published version of the code. Seems like there is a bug in the code there, as can be seen in the latest version (3.7) source code here.
The issue in version 3.7 is that on line 1680, the function lemma_from_key is being called, but it does not exist. To call the class method lemma_from_key, one needs to use self.lemma_from_key.
You can try using an older version, 3.6.5, which does not have this issue. Install it by:
pip install nltk==3.6.5
I can also see that the develop branch of nltk has fixed this issue. I assume that this will be resolved in a future release, which you can later upgrade to.

Building Abreviations Dictionary from Text file

I would like to build a dictionary of abreviations.
I have a text file with a lot of abreviations. The text file looks like this(after import)
with open('abreviations.txt') as ab:
ab_words = ab.read().splitlines()
An extract:
'ACE',
'Access Control Entry',
'ACK',
'Acknowledgement',
'ACORN',
'A Completely Obsessive Really Nutty person',
Now I want to build the dictionnary, where I have every uneven line as a dictionary key and every even line as the dictionary value.
Hence I should be able to write at the end:
ab_dict['ACE']
and get the result:
'Access Control Entry'
Also, How can I make it case-insensitive ?
ab_dict['ace']
should yield the same result
'Access Control Entry'
In fact, it would be perfect, if the output would also be lower case:
'access control entry'
Here is a link to the text file: https://www.dropbox.com/s/91afgnupk686p9y/abreviations.txt?dl=0

Complete solution with custom ABDict class and Python's generator functionality:
class ABDict(dict):
''' Class representing a dictionary of abbreviations'''
def __getitem__(self, key):
v = dict.__getitem__(self, key.upper())
return v.lower() if key.islower() else v
with open('abbreviations.txt') as ab:
ab_dict = ABDict()
while True:
try:
k = next(ab).strip() # `key` line
v = next(ab).strip() # `value` line
ab_dict[k] = v
except StopIteration:
break
Now, testing (with case-relative access):
print(ab_dict['ACE'])
print(ab_dict['ace'])
print('*' * 10)
print(ab_dict['WYTB'])
print(ab_dict['wytb'])
The output(consecutively):
Access Control Entry
access control entry
**********
Wish You The Best
wish you the best

Here's another solution based on the pairwise function from this solution:
from requests.structures import CaseInsensitiveDict
def pairwise(iterable):
"s -> (s0, s1), (s2, s3), (s4, s5), ..."
a = iter(iterable)
return zip(a, a)
with open('abreviations.txt') as reader:
abr_dict = CaseInsensitiveDict()
for abr, full in pairwise(reader):
abr_dict[abr.strip()] = full.strip()

Here is an answer that also allows sentences to be replaced with words from the dictionary:
import re
from requests.structures import CaseInsensitiveDict
def read_file_dict(filename):
"""
Reads file data into CaseInsensitiveDict
"""
# lists for keys and values
keys = []
values = []
# case sensitive dict
data = CaseInsensitiveDict()
# count used for deciding which line we're on
count = 1
with open(filename) as file:
temp = file.read().splitlines()
for line in temp:
# if the line count is even, a value is being read
if count % 2 == 0:
values.append(line)
# otherwise, a key is being read
else:
keys.append(line)
count += 1
# Add to dictionary
# perhaps some error checking here would be good
for key, value in zip(keys, values):
data[key] = value
return data
def replace_word(ab_dict, sentence):
"""
Replaces sentence with words found in dictionary
"""
# not necessarily words, but you get the idea
words = re.findall(r"[\w']+|[.,!?; ]", sentence)
new_words = []
for word in words:
# if word is in dictionary, replace it and add it to resulting list
if word in ab_dict:
new_words.append(ab_dict[word])
# otherwise add it as normally
else:
new_words.append(word)
# return sentence with replaced words
return "".join(x for x in new_words)
def main():
ab_dict = read_file_dict("abreviations.txt")
print(ab_dict)
print(ab_dict['ACE'])
print(ab_dict['Ace'])
print(ab_dict['ace'])
print(replace_word(ab_dict, "The ACE is not easy to understand"))
if __name__ == '__main__':
main()
Which outputs:
{'ACE': 'Access Control Entry', 'ACK': 'Acknowledgement', 'ACORN': 'A Completely Obsessive Really Nutty person'}
Access Control Entry
Access Control Entry
Access Control Entry
The Access Control Entry is not easy to understand

Wikipedia Infobox parser with Multi-Language Support

I am trying to develop an Infobox parser in Python which supports all the languages of Wikipedia. The parser will get the infobox data and will return the data in a Dictionary.
The keys of the Dictionary will be the property which is described (e.g. Population, City name, etc...).
The problem is that Wikipedia has slightly different page contents for each language. But the most important thing is that the API response structure for each language can also be different.
For example, the API response for 'Paris' in English contains this Infobox:
{{Infobox French commune |name = Paris |commune status = [[Communes of France|Commune]] and [[Departments of France|department]] |image = <imagemap> File:Paris montage.jpg|275px|alt=Paris montage
and in Greek, the corresponding part for 'Παρίσι' is:
[...] {{Πόλη (Γαλλία) | Πόλη = Παρίσι | Έμβλημα =Blason paris 75.svg | Σημαία =Mairie De Paris (SVG).svg | Πλάτος Σημαίας =120px | Εικόνα =Paris - Eiffelturm und Marsfeld2.jpg [...]
In the second example, there isn't any 'Infobox' occurrence after the {{. Also, in the API response the name = Paris is not the exact translation for Πόλη = Παρίσι. (Πόλη means city, not name)
Because of such differences between the responses, my code fails.
Here is the code:
class WikipediaInfobox():
# Class to get and parse the Wikipedia Infobox Data
infoboxArrayUnprocessed = [] # Maintains the order which the data is displayed.
infoboxDictUnprocessed = {} # Still Contains Brackets and Wikitext coding. Will be processed more later...
language="en"
def getInfoboxDict(self, infoboxRaw): # Get the Infobox in Dict and Array form (Unprocessed)
if infoboxRaw.strip() == "":
return {}
boxLines = [line.strip().replace(" "," ") for line in infoboxRaw.splitlines()]
wikiObjectType = boxLines[0]
infoboxData = [line[1:] for line in boxLines[1:]]
toReturn = {"wiki_type":wikiObjectType}
for i in infoboxData:
key = i.split("=")[0].strip()
value = ""
if i.strip() != key + "=":
value=i.split("=")[1].strip()
self.infoboxArrayUnprocessed.append({key:value})
toReturn[key]=value
self.infoboxDictUnprocessed = toReturn
return toReturn
def getInfoboxRaw(self, pageTitle, followRedirect = False, resetOld=True): # Get Infobox in Raw Text
if resetOld:
infoboxDict = {}
infoboxDictUnprocessed = {}
infoboxArray = []
infoboxArrayUnprocessed = []
params = { "format":"xml", "action":"query", "prop":"revisions", "rvprop":"timestamp|user|comment|content" }
params["titles"] = "%s" % urllib.quote(pageTitle.encode("utf8"))
qs = "&".join("%s=%s" % (k, v) for k, v in params.items())
url = "http://" + self.language + ".wikipedia.org/w/api.php?%s" % qs
tree = etree.parse(urllib.urlopen(url))
revs = tree.xpath('//rev')
if len(revs) == 0:
return ""
if "#REDIRECT" in revs[-1].text and followRedirect == True:
redirectPage = revs[-1].text[revs[-1].text.find("[[")+2:revs[-1].text.find("]]")]
return self.getInfoboxRaw(redirectPage,followRedirect,resetOld)
elif "#REDIRECT" in revs[-1].text and followRedirect == False:
return ""
infoboxRaw = ""
if "{{Infobox" in revs[-1].text: # -> No Multi-language support:
infoboxRaw = revs[-1].text.split("{{Infobox")[1].split("}}")[0]
return infoboxRaw
def __init__(self, pageTitle = "", followRedirect = False): # Constructor
if pageTitle != "":
self.language = guess_language.guessLanguage(pageTitle)
if self.language == "UNKNOWN":
self.language = "en"
infoboxRaw = self.getInfoboxRaw(pageTitle, followRedirect)
self.getInfoboxDict(infoboxRaw) # Now the parsed data is in self.infoboxDictUnprocessed
Some parts of this code was found on this blog...
I don't want to reinvent the wheel, so maybe someone has a nice solution for multi-language support and neat parsing of the Infobox section of Wikipedia.
I have seen many alternatives, like DBPedia or some other parsers that MediaWiki recommends, but I haven't found anything that suits my needs, yet. I also want to avoid scraping the page with BeautifulSoup, because it can fail on some cases, but if it is necessary it will do.
If something isn't clear enough, please ask. I want to help as much as I can.

Wikidata is definitely the first choice these days if you want to get structured data, anyway if in the future you need to parse data from wikipedia articles, especially as you are using Python, I can recommand mwparserfromhell which is a python library aimed at parsing wikitext and that has an option to extract templates and their attributes. That won't directly fix your issue as the multiple templates in multiple languages will definitely be different but that might be useful if you continue trying to parse wikitext.

How do I instantiate a group of objects from a text file?

I have some log files that look like many lines of the following:
<tickPrice tickerId=0, field=2, price=201.81, canAutoExecute=1>
<tickSize tickerId=0, field=3, size=25>
<tickSize tickerId=0, field=8, size=534349>
<tickPrice tickerId=0, field=2, price=201.82, canAutoExecute=1>
I need to define a class of type tickPrice or tickSize. I will need to decide which to use before doing the definition.
What would be the Pythonic way to grab these values? In other words, I need an effective way to reverse str() on a class.
The classes are already defined and just contain the presented variables, e.g., tickPrice.tickerId. I'm trying to find a way to extract these values from the text and set the instance attributes to match.
Edit: Answer
This is what I ended up doing-
with open(commandLineOptions.simulationFilename, "r") as simulationFileHandle:
for simulationFileLine in simulationFileHandle:
(date, time, msgString) = simulationFileLine.split("\t")
if ("tickPrice" in msgString):
msgStringCleaned = msgString.translate(None, ''.join("<>,"))
msgList = msgStringCleaned.split(" ")
msg = message.tickPrice()
msg.tickerId = int(msgList[1][9:])
msg.field = int(msgList[2][6:])
msg.price = float(msgList[3][6:])
msg.canAutoExecute = int(msgList[4][15:])
elif ("tickSize" in msgString):
msgStringCleaned = msgString.translate(None, ''.join("<>,"))
msgList = msgStringCleaned.split(" ")
msg = message.tickSize()
msg.tickerId = int(msgList[1][9:])
msg.field = int(msgList[2][6:])
msg.size = int(msgList[3][5:])
else:
print "Unsupported tick message type"

I'm not sure how you want to dynamically create objects in your namespace, but the following will at least dynamically create objects based on your loglines:
Take your line:
line = '<tickPrice tickerId=0, field=2, price=201.81, canAutoExecute=1>'
Remove chars that aren't interesting to us, then split the line into a list:
line = line.translate(None, ''.join('<>,'))
line = line.split(' ')
Name the potential class attributes for convenience:
line_attrs = line[1:]
Then create your object (name, base tuple, dictionary of attrs):
tickPriceObject = type(line[0], (object,), { key:value for key,value in [at.split('=') for at in line_attrs]})()
Prove it works as we'd expect:
print(tickPriceObject.field)
# 2

Approaching the problem with regex, but with the same result as tristan's excellent answer (and stealing his use of the type constructor that I will never be able to remember)
import re
class_instance_re = re.compile(r"""
<(?P<classname>\w[a-zA-Z0-9]*)[ ]
(?P<arguments>
(?:\w[a-zA-Z0-9]*=[0-9.]+[, ]*)+
)>""", re.X)
objects = []
for line in whatever_file:
result = class_instance_re.match(line)
classname = line.group('classname')
arguments = line.group('arguments')
new_obj = type(classname, (object,),
dict([s.split('=') for s in arguments.split(', ')]))
objects.append(new_obj)

Python dictionary editing entries

def replace_acronym(): # function not yet implemented
#FIND
for abbr, text in acronyms.items():
if abbr == acronym_edit.get():
textadd.insert(0,text)
#DELETE
name = acronym_edit.get().upper()
name.upper()
r =dict(acronyms)
del r[name]
with open('acronym_dict.py','w')as outfile:
outfile.write(str(r))
outfile.close() # uneccessary explicit closure since used with...
message ='{0} {1} {2} \n '.format('Removed', name,'with its text from the database.')
display.insert('0.0',message)
#ADD
abbr_in = acronym_edit.get()
text_in = add_expansion.get()
acronyms[abbr_in] = text_in
# write amended dictionary
with open('acronym_dict.py','w')as outfile:
outfile.write(str(acronyms))
outfile.close()
message ='{0} {1}:{2}{3}\n '.format('Modified entry', abbr_in,text_in, 'added')
display.insert('0.0',message)
I am trying to add the functionality of editing my dictionary entries in my tkinter widget. The dictionary is in the format {ACRONYM: text, ACRONYM2: text2...}
What I thought the function would achieve is to find the entry in the dictionary, delete both the acronym and its associated text and then add whatever the acronym and text have been changed to. What happens is for example if I have an entry TEST: test and I want to modify it to TEXT: abc what is returned by the function is TEXT: testabc - appending the changed text although I have (I thought) overwritten the file.
What am I doing wrong?

That's a pretty messy lookin' function. The acronym replacement itself can be done pretty simple:
acronyms = {'SONAR': 'SOund Navigation And Ranging',
'HTML': 'HyperText Markup Language',
'CSS': 'Cascading Style Sheets',
'TEST': 'test',
'SCUBA': 'Self Contained Underwater Breathing Apparatus',
'RADAR': 'RAdio Detection And Ranging',
}
def replace_acronym(a_dict,check_for,replacement_key,replacement_text):
c = a_dict.get(check_for)
if c is not None:
del a_dict[check_for]
a_dict[replacement_key] = replacement_text
return a_dict
new_acronyms = replace_acronym(acronyms,'TEST','TEXT','abc')
That works perfect for me (in Python 3). You could just call this in another function that writes the new_acronyms dict into the file or do whatever else you want with it 'cause it's no longer tied to just being written to the file.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Three way language dictionary - python

Related

NameError: name 'lemma_from_key' is not defined

Building Abreviations Dictionary from Text file

Wikipedia Infobox parser with Multi-Language Support

How do I instantiate a group of objects from a text file?

Python dictionary editing entries

Categories

Resources