Building Abreviations Dictionary from Text file

Building Abreviations Dictionary from Text file - python

I would like to build a dictionary of abreviations.
I have a text file with a lot of abreviations. The text file looks like this(after import)
with open('abreviations.txt') as ab:
ab_words = ab.read().splitlines()
An extract:
'ACE',
'Access Control Entry',
'ACK',
'Acknowledgement',
'ACORN',
'A Completely Obsessive Really Nutty person',
Now I want to build the dictionnary, where I have every uneven line as a dictionary key and every even line as the dictionary value.
Hence I should be able to write at the end:
ab_dict['ACE']
and get the result:
'Access Control Entry'
Also, How can I make it case-insensitive ?
ab_dict['ace']
should yield the same result
'Access Control Entry'
In fact, it would be perfect, if the output would also be lower case:
'access control entry'
Here is a link to the text file: https://www.dropbox.com/s/91afgnupk686p9y/abreviations.txt?dl=0

Complete solution with custom ABDict class and Python's generator functionality:
class ABDict(dict):
''' Class representing a dictionary of abbreviations'''
def __getitem__(self, key):
v = dict.__getitem__(self, key.upper())
return v.lower() if key.islower() else v
with open('abbreviations.txt') as ab:
ab_dict = ABDict()
while True:
try:
k = next(ab).strip() # `key` line
v = next(ab).strip() # `value` line
ab_dict[k] = v
except StopIteration:
break
Now, testing (with case-relative access):
print(ab_dict['ACE'])
print(ab_dict['ace'])
print('*' * 10)
print(ab_dict['WYTB'])
print(ab_dict['wytb'])
The output(consecutively):
Access Control Entry
access control entry
**********
Wish You The Best
wish you the best

Here's another solution based on the pairwise function from this solution:
from requests.structures import CaseInsensitiveDict
def pairwise(iterable):
"s -> (s0, s1), (s2, s3), (s4, s5), ..."
a = iter(iterable)
return zip(a, a)
with open('abreviations.txt') as reader:
abr_dict = CaseInsensitiveDict()
for abr, full in pairwise(reader):
abr_dict[abr.strip()] = full.strip()

Here is an answer that also allows sentences to be replaced with words from the dictionary:
import re
from requests.structures import CaseInsensitiveDict
def read_file_dict(filename):
"""
Reads file data into CaseInsensitiveDict
"""
# lists for keys and values
keys = []
values = []
# case sensitive dict
data = CaseInsensitiveDict()
# count used for deciding which line we're on
count = 1
with open(filename) as file:
temp = file.read().splitlines()
for line in temp:
# if the line count is even, a value is being read
if count % 2 == 0:
values.append(line)
# otherwise, a key is being read
else:
keys.append(line)
count += 1
# Add to dictionary
# perhaps some error checking here would be good
for key, value in zip(keys, values):
data[key] = value
return data
def replace_word(ab_dict, sentence):
"""
Replaces sentence with words found in dictionary
"""
# not necessarily words, but you get the idea
words = re.findall(r"[\w']+|[.,!?; ]", sentence)
new_words = []
for word in words:
# if word is in dictionary, replace it and add it to resulting list
if word in ab_dict:
new_words.append(ab_dict[word])
# otherwise add it as normally
else:
new_words.append(word)
# return sentence with replaced words
return "".join(x for x in new_words)
def main():
ab_dict = read_file_dict("abreviations.txt")
print(ab_dict)
print(ab_dict['ACE'])
print(ab_dict['Ace'])
print(ab_dict['ace'])
print(replace_word(ab_dict, "The ACE is not easy to understand"))
if __name__ == '__main__':
main()
Which outputs:
{'ACE': 'Access Control Entry', 'ACK': 'Acknowledgement', 'ACORN': 'A Completely Obsessive Really Nutty person'}
Access Control Entry
Access Control Entry
Access Control Entry
The Access Control Entry is not easy to understand

Related

How do I instantiate a group of objects from a text file?

I have some log files that look like many lines of the following:
<tickPrice tickerId=0, field=2, price=201.81, canAutoExecute=1>
<tickSize tickerId=0, field=3, size=25>
<tickSize tickerId=0, field=8, size=534349>
<tickPrice tickerId=0, field=2, price=201.82, canAutoExecute=1>
I need to define a class of type tickPrice or tickSize. I will need to decide which to use before doing the definition.
What would be the Pythonic way to grab these values? In other words, I need an effective way to reverse str() on a class.
The classes are already defined and just contain the presented variables, e.g., tickPrice.tickerId. I'm trying to find a way to extract these values from the text and set the instance attributes to match.
Edit: Answer
This is what I ended up doing-
with open(commandLineOptions.simulationFilename, "r") as simulationFileHandle:
for simulationFileLine in simulationFileHandle:
(date, time, msgString) = simulationFileLine.split("\t")
if ("tickPrice" in msgString):
msgStringCleaned = msgString.translate(None, ''.join("<>,"))
msgList = msgStringCleaned.split(" ")
msg = message.tickPrice()
msg.tickerId = int(msgList[1][9:])
msg.field = int(msgList[2][6:])
msg.price = float(msgList[3][6:])
msg.canAutoExecute = int(msgList[4][15:])
elif ("tickSize" in msgString):
msgStringCleaned = msgString.translate(None, ''.join("<>,"))
msgList = msgStringCleaned.split(" ")
msg = message.tickSize()
msg.tickerId = int(msgList[1][9:])
msg.field = int(msgList[2][6:])
msg.size = int(msgList[3][5:])
else:
print "Unsupported tick message type"

I'm not sure how you want to dynamically create objects in your namespace, but the following will at least dynamically create objects based on your loglines:
Take your line:
line = '<tickPrice tickerId=0, field=2, price=201.81, canAutoExecute=1>'
Remove chars that aren't interesting to us, then split the line into a list:
line = line.translate(None, ''.join('<>,'))
line = line.split(' ')
Name the potential class attributes for convenience:
line_attrs = line[1:]
Then create your object (name, base tuple, dictionary of attrs):
tickPriceObject = type(line[0], (object,), { key:value for key,value in [at.split('=') for at in line_attrs]})()
Prove it works as we'd expect:
print(tickPriceObject.field)
# 2

Approaching the problem with regex, but with the same result as tristan's excellent answer (and stealing his use of the type constructor that I will never be able to remember)
import re
class_instance_re = re.compile(r"""
<(?P<classname>\w[a-zA-Z0-9]*)[ ]
(?P<arguments>
(?:\w[a-zA-Z0-9]*=[0-9.]+[, ]*)+
)>""", re.X)
objects = []
for line in whatever_file:
result = class_instance_re.match(line)
classname = line.group('classname')
arguments = line.group('arguments')
new_obj = type(classname, (object,),
dict([s.split('=') for s in arguments.split(', ')]))
objects.append(new_obj)

python - Looking up a dictionary key in another file with two criteria

After the end of my code, I have a dictionary like so:
{'"WS1"': 1475.9778073075058, '"BRO"': 1554.1437268304624, '"CHA"': 1552.228925324831}
What I want to do is to find each of the keys in a separate file, teams.txt, which is formatted like this:
1901,'BRO','LAD'
1901,'CHA','CHW'
1901,'WS1','MIN'
Using the year, which is 1901, and the team, which is the key of each item in the dictionary, I want to create a new dictionary where the key is the third column in teams.txt if the year and team both match, and the value is the value of the team in the first dictionary.
I figured this would be easiest if I created a function to "lookup" the year and the team, and return "franch", and then apply that function to each key in the dictionary. This is what I have so far, but it gives me a KeyError
def franch(year, team_str):
team_str = str(team_str)
with open('teams.txt') as imp_file:
teams = imp_file.readlines()
for team in teams:
(yearID, teamID, franchID) = team.split(',')
yearID = int(yearID)
if yearID == year:
if teamID == team_str:
break
franchID = franchID[1:4]
return franchID
And in the other function with the dictionary that I want to apply this function to:
franch_teams={}
for team in teams:
team = team.replace('"', "'")
franch_teams[franch(year, team)] = teams[team]
The ideal output of what I am trying to accomplish would look like:
{'"MIN"': 1475.9778073075058, '"LAD"': 1554.1437268304624, '"CHW"': 1552.228925324831}
Thanks!

Does this code suite your needs?
I am doing an extra check for equality, because there were different string signs in different parts of your code.
def almost_equals(one, two):
one = one.replace('"', '').replace("'", "")
two = two.replace('"', '').replace("'", "")
return one == two
def create_data(year, data, text_content):
""" This function returns new dictionary. """
content = [line.split(',') for line in text_content.split('\n')]
res = {}
for key in data.keys():
for one_list in content:
if year == one_list[0] and almost_equals(key, one_list[1]):
res[one_list[2]] = data[key]
return res
teams_txt = """1901,'BRO','LAD'
1901,'CHA','CHW'
1901,'WS1','MIN'"""
year = '1901'
data = { '"WS1"': 1475.9778073075058, '"BRO"': 1554.1437268304624, '"CHA"': 1552.228925324831 }
result = create_data(year, data, teams_txt)
And the output:
{"'CHW'": 1552.228925324831, "'LAD'": 1554.1437268304624, "'MIN'": 1475.9778073075058}
Update:
To read from text file use this function:
def read_text_file(filename):
with open(filename) as file_object:
result = file_object.read()
return result
teams_txt = read_text_file('teams.txt')

You may try something like:
#!/usr/bin/env python
def clean(_str):
return _str.strip('"').strip("'")
first = {'"WS1"': 1475.9778073075058, '"BRO"': 1554.1437268304624, '"CHA"': 1552.228925324831}
clean_first = dict()
second = dict()
for k,v in first.items():
clean_first[clean(k)] = v
with open("teams.txt", "r") as _file:
lines = _file.readlines()
for line in lines:
_,old,new = line.split(",")
second[new.strip()] = clean_first[clean(old)]
print second
Which gives the expected:
{"'CHW'": 1552.228925324831, "'LAD'": 1554.1437268304624, "'MIN'": 1475.9778073075058}

Three way language dictionary

se_eng_fr_dict = {'School': ['Skola', 'Ecole'], 'Ball': ['Boll', 'Ballon']}
choose_language = raw_input("Type 'English', for English. Skriv 'svenska' fo:r svenska. Pour francais, ecrit 'francais'. ")
if choose_language == 'English':
word = raw_input("Type in a word:")
swe_word = se_eng_fr_dict[word][0]
fra_word = se_eng_fr_dict[word][1]
print word, ":", swe_word, "pa. svenska," , fra_word, "en francais."
elif choose_language == 'Svenska':
word = raw_input("Vilket ord:")
for key, value in se_eng_fr_dict.iteritems():
if value == word:
print key
I want to create a dictionary (to be stored locally as a txt file) and the user can choose between entering a word in English, Swedish or French to get the translation of the word in the two other languages. The user should also be able to add data to the dictionary.
The code works when I look up the Swedish and French word with the English word. But how can I get the Key, and Value2 if I only have value1?
Is there a way or should I try to approach this problem in a different way?

A good option would be to store None for the value if it hasn't been set. While it would increase the amount of memory required, you could go a step further and add the language itself.
Example:
se_eng_fr_dict = {'pencil': {'se': None, 'fr': 'crayon'}}
def translate(word, lang):
# If dict.get() finds no value with `word` it will return
# None by default. We override it with an empty dictionary `{}`
# so we can always call `.get` on the result.
translated = se_eng_fr_dict.get(word, {}).get(lang)
if translated is None:
print("No {lang} translation found for {word}.format(**locals()))
else:
print("{} is {} in {}".format(word, translated, lang))
translate('pencil', 'fr')
translate('pencil', 'se')

i hope there could be a better solution, but here is mine:
class Word:
def __init__(self, en, fr, se):
self.en = en
self.fr = fr
self.se = se
def __str__(self):
return '<%s,%s,%s>' % (self.en, self.fr, self.se)
then you dump all these Words into a mapping data structure. you can use dictionary, but here if you have a huge data set, it's better for you to use BST, have a look at https://pypi.python.org/pypi/bintrees/2.0.1
lets say you have all these Words loaded in a list named words, then:
en_words = {w.en: w for w in words}
fr_words = {w.fr: w for w in words}
se_words = {w.se: w for w in words}
again, BST is more recommended here.

Maybe a set of nested lists would be better for this:
>>> my_list = [
[
"School", "Skola", "Ecole"
],
[
"Ball", "Boll", "Ballon"
]
]
Then you can access the set of translations by doing:
>>> position = [index for index, item in enumerate(my_list) for subitem in item if value == subitem][0]
This returns the index of the list, which you can grab:
>>> sub_list = my_list[position]
And the sublist will have all the translations in order.
For example:
>>> position = [index for index, item in enumerate(my_list) for subitem in item if "Ball" == subitem][0]
>>> print position
1
>>> my_list[position]
['Ball', 'Boll', 'Ballon']

In order to speedup word lookups and achieve a good flexibility, I'd choose a dictionary of subdictionaries: each subdictionary translates the words of a language into all the available languages and the top-level dictionary maps each language into the corresponding subdictionary.
For example, if multidict is the top-level dictionary, then multidict['english']['ball'] returns the (sub)dictionary:
{'english':'ball', 'francais':'ballon', 'svenska':'ball'}
Below is a class Multidictionary implementing such an idea.
For convenience it assumes that all the translations are stored into a text file in CSV format, which is read at initialization time, e.g.:
english,svenska,francais,italiano
school,skola,ecole,scuola
ball,boll,ballon,palla
Any number of languages can be easily added to the CSV file.
class Multidictionary(object):
def __init__(self, fname=None):
'''Init a multidicionary from a CSV file.
The file describes a word per line, separating all the available
translations with a comma.
First file line must list the corresponding languages.
For example:
english,svenska,francais,italiano
school,skola,ecole,scuola
ball,boll,ballon,palla
'''
self.fname = fname
self.multidictionary = {}
if fname is not None:
import csv
with open(fname) as csvfile:
reader = csv.DictReader(csvfile)
for translations in reader:
for lang, word in translations.iteritems():
self.multidictionary.setdefault(lang, {})[word] = translations
def get_available_languages(self):
'''Return the list of available languages.'''
return sorted(self.multidictionary)
def translate(self, word, language):
'''Return a dictionary containing the translations of a word (in a
specified language) into all the available languages.
'''
if language in self.get_available_languages():
translations = self.multidictionary[language].get(word)
else:
print 'Invalid language %r selected' % language
translations = None
return translations
def get_translations(self, word, language):
'''Generate the string containing the translations of a word in a
language into all the other available languages.
'''
translations = self.translate(word, language)
if translations:
other_langs = (lang for lang in translations if lang != language)
lang_trans = ('%s in %s' % (translations[lang], lang) for lang in other_langs)
s = '%s: %s' % (word, ', '.join(lang_trans))
else:
print '%s word %r not found' % (language, word)
s = None
return s
if __name__ == '__main__':
multidict = Multidictionary('multidictionary.csv')
print 'Available languages:', ', '.join(multidict.get_available_languages())
language = raw_input('Choose the input language: ')
word = raw_input('Type a word: ')
translations = multidict.get_translations(word, language)
if translations:
print translations

Printing values from dictionary in specific form

I have a dictionary with keys relating to various reactions and their data ie. exponentn, comment etc. I want to search and print a list of reactions concerning the atom 'BR'. My code currently prints all reactions for 'BR' and the data in random order. I am not sure which data corresponds to which reaction.
I've had a go at trying to use the repr function to output the data as follows but I'm not having much luck: reactionName : exponentn comment I found another question which I tried to replicate but was not able to do so; printing values and keys from a dictionary in a specific format (python).
class SourceNotDefinedException(Exception):
def __init__(self, message):
super(SourceNotDefinedException, self).__init__(message)
class tvorechoObject(object):
"""The class stores a pair of objects, "tv" objects, and "echo" objects. They are accessed simply by doing .tv, or .echo. If it does not exist, it will fall back to the other variable. If neither are present, it returns None."""
def __init__(self, echo=None, tv=None):
self.tv = tv
self.echo = echo
def __repr__(self):
return str({"echo": self.echo, "tv": self.tv}) # Returns the respective strings
def __getattribute__(self, item):
"""Altered __getattribute__() function to return the alternative of .echo / .tv if the requested attribute is None."""
if item in ["echo", "tv"]:
if object.__getattribute__(self,"echo") is None: # Echo data not present
return object.__getattribute__(self,"tv") # Select TV data
elif object.__getattribute__(self,"tv") is None: # TV data not present
return object.__getattribute__(self,"echo") # Select Echo data
else:
return object.__getattribute__(self,item) # Return all data
else:
return object.__getattribute__(self,item) # Return all data
class Reaction(object):
def __init__(self, inputLine, sourceType=None):
#self.reactionName = QVTorQPObject()
self.exponentn = QVTorQPObject()
self.comment = QVTorQPObject()
self.readIn(inputLine, sourceType=sourceType)
products, reactants = self.reactionName.split(">")
self.products = [product.strip() for product in products.split("+")]
self.reactants = [reactant.strip() for reactant in reactants.split("+")]
def readIn(self, inputLine, sourceType=None):
if sourceType == "echo": # Parsed reaction line for combined format
echoPart = inputLine.split("|")[0]
reactionName = inputLine.split(":")[0].strip()
exponentn = echoPart.split("[")[1].split("]")[0].strip() # inputLine.split("[")[1].split("]")[0].strip()
comment = "%".join(echoPart.split("%")[1:]).strip() # "%".join(inputLine.split("%")[1:]).strip()
# Store the objects
self.reactionName = reactionName
self.exponentn.echo = exponentn
self.comment.echo = comment
elif sourceType == "tv": # Parsed reaction line for combined format
tvPart = inputLine.split("|")[1]
reactionName = inputLine.split(":")[0].strip()
comment = "%".join(tvPart.split("!")[1:]).strip() # "%".join(inputLine.split("!")[1:]).strip()
# Store the objects
self.reactionName = reactionName
self.comment.tv = comment
elif sourceType.lower() == "unified":
reaction = inputLine.split(":")[0]
echoInput, tvInput = ":".join(inputLine.split(":")[1:]).split("|")
echoInput = reaction + ":" + echoInput
tvInput = reaction + ":" + tvInput
if "Not present in TV" not in tvInput:
self.readIn(inputLine, sourceType="tv")
if "Not present in Echo" not in echoInput:
self.readIn(inputLine, sourceType="echo")
else:
raise SourceNotDefinedException("'%s' is not a valid 'sourceType'" % sourceType) # Otherwise print
def __repr__(self):
return str({"reactionName": self.reactionName, "exponentn": self.exponentn, "comment": self.comment, })
return str(self.reactionName) # Returns all relevant reactions
keykeyDict = {}
for key in reactionDict.keys():
keykeyDict[key] = key
formatString = "{reactionName:<40s} {comment:<10s}" # TV format
formatString = "{reactionName:<40s} {exponentn:<10s} {comment:<10s}" # Echo format
return formatString.format(**keykeyDict)
return formatString.format(**reactionDict)
def toDict(self, priority="tv"):
"""Returns a dictionary of all the variables, in the form {"comment":<>, "exponentn":<>, ...}. Design used is to be passed into the echo and tv style line format statements."""
if priority in ["echo", "tv" # Creating the dictionary by a large, horrible, list comprehension, to avoid even more repeated text
return dict([("reactionName", self.reactionName)] + [(attributeName, self.__getattribute__(attributeName).__getattribute__(priority))
for attributeName in ["exponentn", "comment"]])
else:
raise SourceNotDefinedException("{0} source type not recognised.".format(priority)) # Otherwise print
def find_allReactions(allReactions, reactant_set):
"""
reactant_set is the set of reactants that you want to grab all reactions which are relevant allReactions is just the set of reactions you're considering. Need to repeatedly loop through all reactions. If the current reaction only contains reactants in the reactant_set, then add all its products to the reactant set. Repeat this until reactant_set does not get larger.
"""
reactant_set = set(reactant_set) # this means that we can pass a list, but it will always be treated as a set.
#Initialise the list of reactions that we'll eventually return
relevant_reactions = []
previous_reactant_count = None
while len(reactant_set) != previous_reactant_count:
previous_reactant_count = len(reactant_set)
for reaction in allReactions:
if set(reaction.reactants).issubset(reactant_set):
relevant_reactions.append(reaction)
reactant_set = reactant_set.union(set(reaction.products))
return relevant_reactions
print find_allReactions(allReactions, ["BR"])
Current output:
'{'exponentn': {'tv': '0', 'echo': '0'}, 'comment': {'tv': 'BR-NOT USED', 'echo': 'BR-NOT USED'},'reactionName': 'E + BR > BR* + E', {'exponentn': {'qvt': '0', 'qp': '0'}, 'comment': {'qvt': 'BR+ -RECOMBINATION', 'qp': 'BR+ -RECOMBINATION'},'reactionName': 'E + BR* > BR* + E'
Desired output: reactionName exponentn comment
E + BR > BR* + E 0 BR+ -RECOMBINATION
E + BR* > BR* + E 0 BR-NOT USED

If your data is added into the dict in a certain order, and you want to preserve that order, collections.OrderedDict is what you're looking for.

Adding external information to ParseResults before return

I want to add external information to ParseResults before return. I return the results of parsing as asXML(). The external data represented as dictionary so as to parsed as XML in the final parsing.
This the code before adding external data
from pyparsing import *
# a hypothetical outer parser, with an unparsed SkipTo element
color = oneOf("red orange yellow green blue purple")
expression = SkipTo("XXX") + Literal("XXX").setResultsName('ex') + color.setResultsName('color')
data = "JUNK 100 200 10 XXX green"
print expression.parseString(data).dump()
# main grammar
def minorgrammar(toks):
# a simple inner grammar
integer = Word(nums)
grammar2 = integer("A").setResultsName('A') + integer("B").setResultsName('B') + integer("C").setResultsName('C')
# use scanString to find the inner grammar
# (since we just want the first occurrence, we can use next
# instead of a for loop with a break)
t,s,e = next(grammar2.scanString(toks[0],maxMatches=1))
# remove 0'th element from toks
del toks[0]
# return a new ParseResults, the sum of t and everything
# in toks after toks[0] was removed
return t + toks
grammar1 = expression.setParseAction(minorgrammar)
x = grammar1.parseString(data).asXML("main")
print x
the output is
<main>
<A>100</A>
<B>200</B>
<C>10</C>
<ex>XXX</ex>
<color>green</color>
</main>
the code after adding external data
...
external_data = {'name':'omar', 'age':'40'}
return t + toks + ParseResults(external_data)
grammar1 = expression.setParseAction(minorgrammar)
x = grammar1.parseString(data).asXML("main")
print x
the output
<main>
<A>100</A>
<B>200</B>
<C>10</C>
<ex>XXX</ex>
<color>green</color>
<ITEM>{&apos;age&apos;: &apos;40&apos;, &apos;name&apos;: &apos;omar&apos;}</ITEM>
</main>
I want the output in the form
<main>
<A>100</A>
<B>200</B>
<C>10</C>
<ex>XXX</ex>
<color>green</color>
<name>omar</name>
<age>40</age>
</main>
What is the error in that code ? Thans

One problem is this fragment:
external_data = {'name':'omar', 'age':'40'}
return t + toks + ParseResults(external_data)
ParseResults will take a dict as a constructor argument, but I don't think it will do what you want - it just assigns the dict as it's 0'th element, and does not assign any results names.
You can assign named values into a ParseResults by using its dict-style assignment:
pr = ParseResults(['omar','40'])
for k,v in external_data.items():
pr[k] = v
See if this gets you closer to your desired format.
EDIT: Hmm, it seems asXML is more fussy about how named results get added to the ParseResults, than just setting the name. This will work better:
def addNamedResult(pr, value, name):
addpr = ParseResults([value])
addpr[name] = value
pr += addpr
And then in your parse action, add the values with their names using:
addNamedResult(toks, 'omar', 'name')
addNamedResult(toks, '40', 'age')

Thanks very much Paul. I modified your function to add a dictionary of data
...
external_data = {'name':'omar', 'age':'40'}
return t + toks + addDicResult(external_data)
...
def addDicResult(dict):
pr = ParseResults([])
for k, v in dict.items():
addpr = ParseResults([v])
addpr[k] = v
pr += addpr
return pr
The output
<main>
<A>100</A>
<B>200</B>
<C>10</C>
<ex>XXX</ex>
<color>green</color>
<age>40</age>
<name>omar</name>
</main>

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Building Abreviations Dictionary from Text file - python

Related

How do I instantiate a group of objects from a text file?

python - Looking up a dictionary key in another file with two criteria

Three way language dictionary

Printing values from dictionary in specific form

Adding external information to ParseResults before return

Categories

Resources