I have this following string in a text file
InfoType 0 :
string1
string2
string3
InfoType 1 :
string1
string2
string3
InfoType 3 :
string1
string2
string3
Is there a way to create a dictionary that would look like this:
{'InfoType 0':'string1,string2,string3', 'InfoType 1':'string1,string2,string3', 'InfoType 3':'string1,string2,string3'}
Something like this should work:
def my_parser(fh, key_pattern):
d = {}
for line in fh:
if line.startswith(key_pattern):
name = line.strip()
break
# This list will hold the lines
lines = []
# Now iterate to find the lines
for line in fh:
line = line.strip()
if not line:
continue
if line.startswith(key_pattern):
# When in this block we have reached
# the next record
# Add to the dict
d[name] = ",".join(lines)
# Reset the lines and save the
# name of the next record
lines = []
name = line
# skip to next line
continue
lines.append(line)
d[name] = ",".join(lines)
return d
Use like so:
with open("myfile.txt", "r") as fh:
d = my_parser(fh, "InfoType")
# {'InfoType 0 :': 'string1,string2,string3',
# 'InfoType 1 :': 'string1,string2,string3',
# 'InfoType 3 :': 'string1,string2,string3'}
There are limitations, such as:
Duplicate keys
The key needs processing
You could get around these by making the function a generator and yielding name, str pairs and processing them as you read the file.
This will do:
dictionary = {}
# Replace ``file.txt`` with the path of your text file.
with open('file.txt', 'r') as file:
for line in file:
if not line.strip():
continue
if line.startswith('InfoType'):
key = line.rstrip('\n :')
dictionary[key] = ''
else:
value = line.strip('\n') + ','
dictionary[key] += value
Related
I want to know, if it's possible to save the output of this code into a dictionary (maybe it's also the wrong data-type). I'm not expirienced in coding yet, so I can't think of a way it could work.
I want to create a dicitionary that has the lines of the txt.-file in it alongside the value of the corresponding line. In the end, I want to create a code, where the user has the option to search for a word in the line through an input - the output should return the corresponding line. Has anyone a suggestion? Thanks in advance! Cheers!
filepath = 'myfile.txt'
with open(filepath) as fp:
line = fp.readline()
cnt = 1
while line:
print("Line {}: {}".format(cnt, line.strip()))
line = fp.readline()
cnt += 1
This should do it (using the code you provided as a framework, it only takes one extra line to store it in a dictionary):
my_dict={}
filepath = 'myfile.txt'
with open(filepath) as fp:
line = fp.readline()
cnt = 1
while line:
# print("Line {}: {}".format(cnt, line.strip()))
my_dict[str(line.strip())] = cnt
line = fp.readline()
cnt += 1
Then, you can prompt for user input like this:
usr_in = input('enter text to search: ')
print('That text is found at line(s) {}'.format(
[v for k,v in my_dict.items() if usr_in in k]))
For storing the line string value as key in dictionary and line number as value, you can try something like:
filepath = 'myfile.txt'
result_dict = {}
with open(filepath) as fp:
for line_num, line in enumerate(fp.readlines()):
result_dict[line.strip()] = line_num+1
Or, using dictionary comprehension, above code can be:
filepath = 'myfile.txt'
with open(filepath) as fp:
result_dict = {line.strip(): line_num+1
for line_num, line in enumerate(fp.readlines())}
Now to search and return all the lines with words:
search_result = [{key: value} for key, value in result_dict.items()
if search_word in key]
I have many lines like the following:
>ENSG00000003137|ENST00000001146|CYP26B1|72374964|72375167|4732
CGTCGTTAACCGCCGCCATGGCTCCCGCAGAGGCCGAGT
>ENSG00000001630|ENST00000003100|CYP51A1|91763679|91763844|3210
TCCCGGGAGCGCGCTTCTGCGGGATGCTGGGGCGCGAGCGGGACTGTTGACTAAGCTTCG
>ENSG00000003137|ENST00000412253|CYP26B1|72370133;72362405|72370213;72362548|4025
AGCCTTTTTCTTCGACGATTTCCG
In this example ENSG00000003137 is name and 4732 which is the last one is length. as you see some names are repeated but they have different length.
I want to make a new file in which I only have those with the longest length. meaning the results would be like this:
>ENSG00000003137|ENST00000001146|CYP26B1|72374964|72375167|4732
CGTCGTTAACCGCCGCCATGGCTCCCGCAGAGGCCGAGT
>ENSG00000001630|ENST00000003100|CYP51A1|91763679|91763844|3210
TCCCGGGAGCGCGCTTCTGCGGGATGCTGGGGCGCGAGCGGGACTGTTGACTAAGCTTCG
I have made this code to split but don't know how to make the file I want:
file = open(“file.txt”, “r”)
for line in file:
if line.startswith(“>”):
line = line.split(“|”)
You'll need to read the file twice; the first time round, track the largest size per entry:
largest = {}
with open(inputfile) as f:
for line in f:
if line.startswith('>'):
parts = line.split('|')
name, length = parts[0][1:], int(parts[-1])
largest[name] = max(length, largest.get(name, -1))
then write out the copy in a second pass, but only those sections whose name and length match the extracted largest length from the first pass:
with open(inputfile) as f, open(outpufile, 'w') as out:
copying = False
for line in f:
if line.startswith('>'):
parts = line.split('|')
name, length = parts[0][1:], int(parts[-1])
copying = largest[name] == length
if copying:
out.write(line)
you have to do two types of handling in the loop, one that compares your 'length', and one that stores the CGTA when its needed. I wrote an example for you that reads those into dicts:
file = open("file.txt", "r")
myDict = {}
myValueDict = {}
action = 'remember'
geneDict = {}
for line in file:
if line.startswith(">"):
line = line.rstrip().split("|")
line_name = line[0]
line_number = int(line[-1])
if line_name in myValueDict:
if myValueDict[line_name] < line_number:
action = 'remember'
myValueDict[line_name] = line_number
myDict[line_name] = line
else:
action = 'forget'
else:
myDict[line_name] = line
myValueDict[line_name] = line_number
else:
if action == 'remember':
geneDict[line_name] = line.rstrip()
for key in myDict:
print(myDict[key])
for key in geneDict:
print(geneDict[key])
this ignores the lower length items. you can now store those dicts any way you want.
I would like to format the values of a dictionary in python. Here is the script that i have used to generate the output
entries = {}
entries1 = {}
with open('no_dup.txt', 'r') as fh_in:
for line in fh_in:
if line.startswith('E'):
line = line.strip()
line = line.split()
entry = line[0]
if entry in entries:
entries[entry].append(line)
else:
entries[entry] = [line]
with open('no_dup_out.txt', 'w') as fh_out:
for kee, val in entries.iteritems():
if len(val) == 1:
fh_out.write("{} \n".format(val))
with open('no_dup_out.txt', 'r') as fh_in2:
for line in fh_in2:
line = line.strip()
line = line.split()
entry = line[1]
if entry in entries1:
entries1[entry].append(line)
else:
entries1[entry] = [line]
with open('no_dup_out_final.txt', 'w') as fh_out2:
for kee, val in entries1.iteritems():
if len(val) == 1:
fh_out2.write("{} \n".format(val))
For example by running the above script i generated the following output
[["[['ENSGMOG00000003747',", "'ENSORLG00000006947']]"]]
[["[['ENSGMOG00000003752',", "'ENSORLG00000005385']]"]]
[["[['ENSGMOG00000003760',", "'ENSORLG00000005379']]"]]
[["[['ENSGMOG00000003748',", "'ENSORLG00000004636']]"]]
[["[['ENSGMOG00000003761',", "'ENSORLG00000005382']]"]]
And i would like to format it such as way that i remove all the parentheses and commas (ENSGMOG00000003747 ENSORLG00000006947) and output the rest as it is using tab delimited format. How can i do that?
If your list of lists is full_list, then you could have the following code give your desired output:
desired_list = ['\t'.join([element.split('\'')[1] for element in list_item[0]]) for list_item in full_list]
I want to return a dictionary that a file contains. What I have is this code:
def read_report(filename):
new_report = {}
input_filename = open(filename)
for line in input_filename:
lines = line[:-1]
new_report.append(lines)
input_filename.close()
return new_report
It says I can't append to a dictionary. So how would I go with adding lines from the file into the dictionary? Let's say my filename is this:
shorts: a, b, c, d
longs: a, b, c, d
mosts: a
count: 11
avglen: 1.0
a 5
b 3
c 2
d 1
I'm assuming the last lines of your files (the ones that don't contain :) are to be ignored.
from collections import defaultdict
d = defaultdict(list)
with open('somefile.txt') as f:
for line in f:
if ':' in line:
key, val = line.split(':')
d[key.strip()] += val.rstrip().split(',')
def read_line(filename):
list = []
new_report = {}
file_name = open(filename)
for i in file_name:
list.append(i[:-1])
for i in range(len(list)):
new_report[i] = list[i]
file_name.close()
return new_report
if you rewrite your input file to have uniform lines like the first and the second, you could try this:
EDIT: modified code to support also lines with space separator instead of colon (:)
def read_report(filename):
new_report = {}
f = open(filename)
for line in f:
if line.count(':') == 1:
key, value = line.split(':')
else:
key, value = line.split(' ')
new_report[key] = value.split(',')
f.close()
return new_report
I have a plain text file with some data in it, that I'm trying to open and read using a Python (ver 3.2) program, and trying to load that data into a data structure within the program.
Here's what my text file looks like (file is called "data.txt")
NAME: Joe Smith
CLASS: Fighter
STR: 14
DEX: 7
Here's what my program looks like:
player_name = None
player_class = None
player_STR = None
player_DEX = None
f = open("data.txt")
data = f.readlines()
for d in data:
# parse input, assign values to variables
print(d)
f.close()
My question is, how do I assign the values to the variables (something like setting player_STR = 14 within the program)?
player = {}
f = open("data.txt")
data = f.readlines()
for line in data:
# parse input, assign values to variables
key, value = line.split(":")
player[key.strip()] = value.strip()
f.close()
now the name of your player will be player['name'], and the same goes for all other properties in your file.
import re
pattern = re.compile(r'([\w]+): ([\w\s]+)')
f = open("data.txt")
v = dict(pattern.findall(f.read()))
player_name = v.get("name")
plater_class = v.get('class')
# ...
f.close()
The most direct way to do it is to assign the variables one at a time:
f = open("data.txt")
for line in f: # loop over the file directly
line = line.rstrip() # remove the trailing newline
if line.startswith('NAME: '):
player_name = line[6:]
elif line.startswith('CLASS: '):
player_class = line[7:]
elif line.startswith('STR: '):
player_strength = int(line[5:])
elif line.startswith('DEX: '):
player_dexterity = int(line[5:])
else:
raise ValueError('Unknown attribute: %r' % line)
f.close()
That said, most Python programmers would stored the values in a dictionary rather than in variables. The fields can be stripped (removing the line endings) and split with: characteristic, value = data.rstrip().split(':'). If the value should be a number instead of a string, convert it with float() or int().