Check if multiple dictionary keys are located in a string - python

Imagine having a txt file with like
5843092xxx289421xxx832175xxx...
You have a dictionary with keys correspoding to letters
A am trying to search for each key within the string to output a message.
decoder = {5843092:'a', 289421:'b'}
with open( "code.txt","r") as fileTxt:
fileTxt = fileTxt.readlines()
b = []
for key in decoder.keys():
if key in fileTxt:
b.append(decoder[key])
print(b)
this is what I have I feel like im on the right track but I am missing how to do each iteration maybe?
the goal output in this i.e. would be either a list or string of ab...

There are two problems here:
You have a list of strings, and you're treating it as if it's one string.
You're building your output based on the order the keys appear in the decoder dictionary, rather than the order they appear in the input text. That means the message will be all scrambled.
If the input text actually separates each key with a fixed string like xxx, the straightforward solution is to split on that string:
for line in fileTxt:
print(' '.join(decoder.get(int(key), '?') for key in line.split('xxx')))

Related

Converting a file in list format into a dictionary with multiple conditions. (python)

Disclaimer, sorry if I have not explicitly expressed my issue. Terminology is still new to me. Thank you in advance for reading.
alright, I have a function named
def pluralize(word)
The aim is to pluralize all nouns within a file. The output I desire is: {'plural': word_in_plural, 'status' : x}
word_in_plural is the pluralized version of the input argument (word) and x is a string which can have one of the following values; 'empty_string', 'proper_noun', 'already_in_plural', 'success'.
My code so far looks like..
filepath = '/proper_noun.txt'
def pluralize(word):
proper_nouns = [line.strip() for line in open (filepath)] ### reads in file as list when function is called
dictionary = {'plural' : word_in_plural, 'status', : x} ### defined dictionary
if word == '': ### if word is an empty string, return values; 'word_in_plural = '' and x = 'empty_string'
dictionary['plural'] = ''
dictionary['status'] = 'empty_string'
return dictionary
what you can see above is my attempt at writing a condition that returns a value specified if the word is an empty string.
The next goal is to create a condition that if word is already in plural (assuming it ends with 's' 'es' 'ies' .. etc), then the function returns a dictionary with the values: **word_in_plural = word and x = 'already_in_plural'. So the input word remains untouched. eg. (input: apartments, output: apartments)
if word ### is already in plural (ending with plural), function returns a dictionary with values; word_in_plural = word and x = 'already_in_plural'
any ideas on how to read the last characters of the string to implement the rules ? I also very much doubt the logic.
Thank you for your input SOF community.
You can index the word by -1 to get its last character. You can slice a string to get the the last two [-2:] or last three [-3:] characters
last_char = word[-1]
last_three_char = word[-3:]

How to Ignore returning a key value in Python Dictionary

i need to write a function create_dictionary(filename) that reads the named file and returns a dictionary mapping from object names to occurrence counts (the number of times the particular object was guessed). For example, given a file mydata.txt containing the following:
abacus
calculator
modern computer
abacus
modern computer
large white thing
modern computer
Here's my program, and it works fine for non-empty text files.
from collections import Counter
def create_dictionary(filename):
"""Cool Program"""
keys = Counter()
s = open(filename,'r').read().strip()
keys = (Counter(s.split('\n')))
dictionary = create_dictionary('mydata.txt')
for key in dictionary:
print(key + ': ' + str(dictionary[key]))
return keys
Out will be as:
{'abacus': 2, 'calculator': 1, 'modern computer': 3, 'large white thing': 1}
When I have an empty file (eg. blank.txt), the function must ignore any and all blank lines. So, the print statement must return a blank. But I am getting ': 1' no matter what I tried. And oh, here are some simple constraints:
Here are some constraints:
You may assume the given file exists, but it may be empty (i.e. containing no lines).
Keys must be inserted into the dictionary in the order in which they appear in the input file.
Leading and trailing whitespace should be stripped from object names
Empty object names (e.g. blank lines or lines with only whitespace) should be ignored.
Any advise?
lines = open(filename,'r').readlines()
keys = Counter([line.strip() for line in lines if line.strip()])

python: Dictionary Operation, KeyError: '0'

I want to extract information from a text file and store the data in a directionary. The file records semicolon seperated values. That is, it records one property in one line, and the key and the value are seperated by a semicolon followed by a white space.
def info_file_parser(info_file):
f=open(info_file,'rb')
info={}
for line in f:
line=line.rstrip()
mlist=line.split(": ")
if len(mlist)==2:
info[mlist[0]]=info[mlist[1]]
if __name__='__main__':
fname="/Users/me/info.txt"
info_file_parser(fname)
KeyError: '0'. What's wrong with that? Why can't I create keys by assignment?
You are trying to set a key in your dictionary with a key that doesn't exist.
In the if statement of your function, shouldn't it be:
if len(mlist)==2:
info[mlist[0]]=mlist[1]
this line:
info[mlist[0]]=info[mlist[1]]
tries to store value from info dictonary with key mlist[1] and try this:
info[mlist[0]]=mlist[1]

Using a dictionary as regex in Python

I had a Python question I was hoping for some help on.
Let's start with the important part, here is my current code:
import re #for regex
import numpy as np #for matrix
f1 = open('file-to-analyze.txt','r') #file to analyze
#convert files of words into arrays.
#These words are used to be matched against in the "file-to-analyze"
math = open('sample_math.txt','r')
matharray = list(math.read().split())
math.close()
logic = open('sample_logic.txt','r')
logicarray = list(logic.read().split())
logic.close()
priv = open ('sample_priv.txt','r')
privarray = list(priv.read().split())
priv.close()
... Read in 5 more files and make associated arrays
#convert arrays into dictionaries
math_dict = dict()
math_dict.update(dict.fromkeys(matharray,0))
logic_dict = dict()
logic_dict.update(dict.fromkeys(logicarray,1))
...Make more dictionaries from the arrays (8 total dictionaries - the same number as there are arrays)
#create big dictionary of all keys
word_set = dict(math_dict.items() + logic_dict.items() + priv_dict.items() ... )
statelist = list()
for line in f1:
for word in word_set:
for m in re.finditer(word, line):
print word.value()
The goal of the program is to take a large text file and perform analysis on it. Essentially, I want the program to loop through the text file and match words found in Python dictionaries and associate them with a category and keep track of it in a list.
So for example, let's say I was parsing through the file and I ran across the word "ADD". ADD is listed under the "math" or '0' category of words. The program should then add it to a list that it ran across a 0 category and then continue to parse the file. Essentially generating a large list that looks like [0,4,6,7,4,3,4,1,2,7,1,2,2,2,4...] with each of the numbers corresponding to a particular state or category of words as illustrated above. For the sake of understanding, we'll call this large list 'statelist'
As you can tell from my code, so far I can take as input the file to analyze, take and store the text files that contain the list of words into arrays and from there into dictionaries with their correct corresponding list value (a numerical value from 1 - 7). However, I'm having trouble with the analysis portion.
As you can tell from my code, I'm trying to go line by line through the text file and regex any of the found words with the dictionaries. This is done through a loop and regexing with an additional, 9th dictionary that is more or less a "super" dictionary to help simplify the parsing.
However, I'm having trouble matching all the words in the file and when I find the word, matching it to the dictionary value, not the key. That is when it runs across and "ADD" to add 0 to the list because it is a part of the 0 or "math" category.
Would someone be able to help me figure out how to write this script? I really appreciate it! Sorry for the long post, but the code requires a lot of explanation so you know what's going on. Thank you so much in advance for your help!
The simplest change to your existing code would just be to just keep track of both the word and the category in the loop:
for line in f1:
for word, category in word_set.iteritems():
for m in re.finditer(word, line):
print word, category
statelist.append(category)

Scan through txt, append certain data to an empty list in Python

I have a text file that I am reading in python . I'm trying to extract certain elements from the text file that follow keywords to append them into empty lists . The file looks like this:
so I want to make two empty lists
1st list will append the sequence names
2nd list will be a list of lists which will include be in the format [Bacteria,Phylum,Class,Order, Family, Genus, Species]
most of the organisms will be Uncultured bacterium . I am trying to add the Uncultured bacterium with the following IDs that are separated by ;
Is there anyway to scan for a certain word and when the word is found, take the word that is after it [separated by a '\t'] ?
I need it to create a dictionary of the Sequence Name to be translated to the taxonomic data .
I know i will need an empty list to append the names to:
seq_names=[ ]
a second list to put the taxonomy lists into
taxonomy=[ ]
and a 3rd list that will be reset after every iteration
temp = [ ]
I'm sure it can be done in Biopython but i'm working on my python skills
Yes there is a way.
You can split the string which you get from reading the file into an array using the inbuilt function split. From this you can find the index of the word you are looking for and then using this index plus one to get the word after it. For example using a text file called test.text that looks like so (the formatting is a bit weird because SO doesn't seem to like hard tabs).
one two three four five six seven eight nine
The following code
f = open('test.txt','r')
string = f.read()
words = string.split('\t')
ind = words.index('seven')
desired = words[ind+1]
will return desired as 'eight'
Edit: To return every following word in the list
f = open('test.txt','r')
string = f.read()
words = string.split('\t')
desired = [words[ind+1] for ind, word in enumerate(words) if word == "seven"]
This is using list comprehensions. It enumerates the list of words and if the word is what you are looking for includes the word at the next index in the list.
Edit2: To split it on both new lines and tabs you can use regular expressions
import re
f = open('testtest.txt','r')
string = f.read()
words = re.split('\t|\n',string)
desired = [words[ind+1] for ind, word in enumerate(words) if word == "seven"]
It sounds like you might want a dictionary indexed by sequence name. For instance,
my_data = {
'some_sequence': [Bacteria,Phylum,Class,Order, Family, Genus, Species],
'some_other_sequence': [Bacteria,Phylum,Class,Order, Family, Genus, Species]
}
Then, you'd just access my_data['some_sequence'] to pull up the data about that sequence.
To populate your data structure, I would just loop over the lines of the files, .split('\t') to break them into "columns" and then do something like my_data[the_row[0]] = [the_row[10], the_row[11], the_row[13]...] to load the row into the dictionary.
So,
for row in inp_file.readlines():
row = row.split('\t')
my_data[row[0]] = [row[10], row[11], row[13], ...]

Categories