Unexpected String Translation from Dictionary - python

I'd like to write a program that reads in a file and translates a short string of text 4 characters long to a new string of 4 characters. Currently, I read in a tab-delimited text file containing two columns: an "old tag" and a "new tag". I'm able to successfully build a dictionary that maps the "old tag" as the key and the "new tag" as the value.
My problem comes in when I attempt to use maketrans() and str.translate(). Somehow my "old_tag" is getting converted to a "new_tag" that I don't even have in my dictionary! I've attached screenshots of what I mean.
"P020" should get converted to "AGAC" as outline in my dictionary.
The error is that variable "old_tag" should get converted to "AGAC" as outlined in my dictionary, but it's instead getting converted to "ACAC" (look at variable "new_tag"). I don't even have ACAC in my translation table!
Here's my function that does the string translate:
def translate_tag(f_in, old_tag, trn_dict):
"""Function to convert any old tags to their new format based on the translation dictionary (variable "trn_dict")."""
try:
# tag_lookup = trn_dict[old_tag]
# trans = maketrans(old_tag, tag_lookup)
trans = maketrans(old_tag, trn_dict[old_tag]) # Just did the above two lines on one line
except KeyError:
print("Error in file {}! The tag {} wasn't found in the translation table. "
"Make sure the translation table is up to date. "
"The program will continue with the rest of the file, but this tag will be skipped!".format(f_in,
old_tag))
return None
new_tag = old_tag.translate(trans)
return new_tag
Here's my translation table. It's a tab-delimited text file, and the old tag is column 1, and the new tag is column 2. I translate from old tag to new tag.
The strange this is that it converts just fine for some tags. For example, "P010" gets translated correctly. What could be causing the problem?

You should not use maketrans, as it works on individual characters (per the official documentation). Make it a dictionary, with your original text (1st column) as the key and the new text (2nd column) as its value.
Then you can look up any tag x with trn_dict[x], wrapped by a try or with a test beforehand if x in trn_dict.
database = """P001 AAAA
P002 AAAT
P003 AAAG
P004 AAAC
P005 AATA
P006 AATT
P007 AATG
P008 AATC
P009 ATAA
P010 ATAT
P011 ATAG
P012 ATAC
P013 ATTA
P014 ATTT
P015 ATTG
P016 ATTC
P017 AGAA
P018 AGAT
P019 AGAG
P020 AGAC
P021 AGTA
P022 AGTT
P023 AGTG
P024 AGTC
""".splitlines()
trn_dict = {str.split()[0]:str.split()[1] for str in database}
def translate_tag(old_tag, trn_dict):
"""Function to convert any old tags to their new format based on the translation dictionary (variable "trn_dict")."""
try:
return trn_dict[old_tag]
except KeyError:
print("Error in file {}! The tag {} wasn't found in the translation table. "
"Make sure the translation table is up to date. "
"The program will continue with the rest of the file, but this tag will be skipped!")
return None
print (translate_tag('P020', trn_dict))
shows the expected value AGAC.
(That string-to-list-to-dict code is a quick hack to get the data in the program and is not really part of this how-to.)

Related

Check if multiple dictionary keys are located in a string

Imagine having a txt file with like
5843092xxx289421xxx832175xxx...
You have a dictionary with keys correspoding to letters
A am trying to search for each key within the string to output a message.
decoder = {5843092:'a', 289421:'b'}
with open( "code.txt","r") as fileTxt:
fileTxt = fileTxt.readlines()
b = []
for key in decoder.keys():
if key in fileTxt:
b.append(decoder[key])
print(b)
this is what I have I feel like im on the right track but I am missing how to do each iteration maybe?
the goal output in this i.e. would be either a list or string of ab...
There are two problems here:
You have a list of strings, and you're treating it as if it's one string.
You're building your output based on the order the keys appear in the decoder dictionary, rather than the order they appear in the input text. That means the message will be all scrambled.
If the input text actually separates each key with a fixed string like xxx, the straightforward solution is to split on that string:
for line in fileTxt:
print(' '.join(decoder.get(int(key), '?') for key in line.split('xxx')))

Removing '\n' from a string without using .translate, .replace or strip()

I'm making a simple text-based game as a learning project. I'm trying to add a feature where the user can input 'save' and their stats will be written onto a txt file named 'save.txt' so that after the program has been stopped, the player can then upload their previous stats and play from where they left off.
Here is the code for the saving:
user inputs 'save' and class attributes are saved onto the text file as text, one line at a time
elif first_step == 'save':
f = open("save.txt", "w")
f.write(f'''{player1.name}
{player1.char_type} #value is 'Wizard'
{player1.life}
{player1.energy}
{player1.strength}
{player1.money}
{player1.weapon_lvl}
{player1.wakefulness}
{player1.days_left}
{player1.battle_count}''')
f.close()
But, I also need the user to be able to load their saved stats next time they run the game. So they would enter 'load' and their stats will be updated.
I'm trying to read the text file one line at a time and then the value of that line would become the value of the relevant class attribute in order, one at a time. If I do this without converting it first to a string I get issues, such as some lines being skipped as python is reading 2 lines as one and putting them altogether as a list.
So, I tried the following:
In the below example, I'm only showing the data from the class attributes 'player1.name' and 'player1.char_type' as seen above as to not make this question as short as possible.
elif first_step == 'load':
f = open("save.txt", 'r')
player1.name_saved = f.readline() #reads the first line of the text file and assigns it's value to player1.name_saved
player1.name_saved2 = str(player1.name_saved) # converts the value of player1.name_saved to a string and saves that string in player1.name_saved2
player1.name = player1.name_saved2 #assigns the value of player1.name_saved to the class attribute player1.name
player1.char_type_saved = f.readlines(1) #reads the second line of the txt file and saves it in player1.char_type_saved
player1.char_type_saved2 = str(player1.char_type_saved) #converts the value of player1.char_type_saved into a string and assigns that value to player1.char_type_saved2
At this point, I would assign the value of player1.char_type_saved2 to the class attribute player1.char_type so that the value of player1.char_type enables the player to load the previous character type from the last time they played the game. This should make the value of player1.char_type = 'Wizard' but I'm getting '['Wizard\n']'
I tried the following to remove the brackets and \n:
final_player1.char_type = player1.char_type_saved2.translate({ord(c): None for c in "[']\n" }) #this is intended to remove everything from the string except for Wizard
For some reason, the above only removes the square brackets and punctuation marks but not \n from the end.
I then tried the following to remove \n:
final_player1.char_type = final_player1.char_type.replace("\n", "")
final_player1.char_type is still 'Wizard\n'
I've also tried using strip() but I've been unsuccessful.
If anyone could help me with this I would greatly appreciate it. Sorry if I have overcomplicated this question but it's hard to articulate it without lots of info. Let me know if this is too much or if more info is needed to answer.
If '\n' is always at the end it may be best to use:
s = 'wizard\n'
s = s[:-1]
print(s, s)
Output:
wizard wizard
But I still think strip() is best:
s = 'wizard\n'
s = s.strip()
print(s, s)
Output:
wizard wizard
Normaly it should work with just
char_type = "Wizard\n"
char_type.replace("\n", "")
print(char_type)
The output will be "Wizard"

extract a certain quote after a keyword has been detected in Python 3

I'm trying to make a multi-term definer to quicken the process of searching for the definitions individually.
After python loads a webpage, it saves the page as a temporary text file.
Sample of saved page: ..."A","Answer":"","Abstract":"Harriet Tubman was an American abolitionist.","ImageIs...
In this sample, I'm after the string that contains the definition, in this case Harriet Tubman. The string "Abstract": is the portion always before the definition of the term.
What I need is a way to scan the text file for "Abstract":. Once that has been detected, look for an opening ". Then, copy and save all text to another text file until reaching the end ".
If you just wanted to find the string following "Abstract:" you could take a substring.
page = '..."A","Answer":"","Abstract":"Harriet Tubman was an American abolitionist.","ImageIs...'
i = page.index("Abstract") + 11
defn = page[i: page.index("\"", i)]
If you wanted to extract multiple parts of the page you should try the following.
dict_str = '"Answer":"","Abstract":"Harriet Tubman was an American abolitionist."'
definitions = {}
for kv in dict_str.split(","):
parts = kv.replace("\"", "").split(":")
if len(parts) != 2:
continue
definitions[parts[0]] = parts[1]
definitions['Abstract'] # 'Harriet Tubman was an American abolitionist.'
definitions["Answer"] # ''

Searching in a .txt file (JSON format) for a particular string and then printing a specific key

I have to take input from the user in the form of strings and then have to search for it in a .txt file which is in JSON format. If the text matches, X has to be done otherwise Y. For example if the user enters 'mac' my code should display the complete name(s) of the terms which contains the search term 'mac'.
My JSON file has currently Big Mac as an item and when I search for 'mac' it shows nothing, whereas, it has to display me (0 Big Mac). 0 is the index number which is also required.
if option == 's':
if 'name' in open('data.txt').read():
sea = input ("Type a menu item name to search for: ")
with open ('data.txt', 'r') as data_file:
data = json.load(data_file)
for line in data:
if sea in line:
print (data[index]['name'])
else:
print ('No such item exist')
else:
print ("The list is empty")
main()
I have applied a number of solutions but none works.
See How to search if dictionary value contains certain string with Python.
Since you know you are looking for the string within the value stored against the 'name' key, you can just change:
if sea in line
to:
if sea in line['name']
(or if sea in line.get('name') if there is a risk that one of your dictionaries might not have a 'name' key).
However, you're attempting to use index without having set that up anywhere. If you need to keep track of where you are in the list, you'd be better off using enumerate:
for index, line in enumerate(data):
if sea.lower() in line['name'].lower():
print ((index, line['name']))
If you want 'm' to match 'Big Mac' then you will need to do case-insensitive matching ('m' is not the same as 'M'). See edit above, which converts all strings to lower case before comparing.

Separating comma delimited text within a field using Python

I'm currently trying to convert a table into RDF using Python and attach the values from each cell to the end of a URL (eg E00 becomes statistics.data.gov.uk/id/statistical-geography/E00).
I can do this for cells containing a single value using the script.
FirstCode = row[11]
if row[11] != '':
RDF = RDF + '<http://statistics.data.gov.uk/id/statistical-geography/' + FirstCode + '>.\n'
One field within the database contains multiple values that are comma delimited.
The code above therefore returns all the codes appended to the URL
e.g. http://statistics.data.gov.uk/id/statistical-geography/E00,W00,S00
Whereas I'd like it to return three values
statistics.data.gov.uk/id/statistical-geography/E00
statistics.data.gov.uk/id/statistical-geography/W00
statistics.data.gov.uk/id/statistical-geography/S00
Is there some code that will allow me to separate these out?
Yes, there is the split method.
FirstCode.split(",")
will return a list like (E00, W00, S00)
You can than iterate over the items in the list:
for i in FirstCode.split(","):
print i
Will print out:
E00
W00
S00
This page has some other useful string functions
for i in FirstCode.split(','):
RDF = RDF + '<http://statistics.data.gov.uk/id/statistical-geography/' + i + '>.\n'

Categories