Parsing Input File in Python - python

I have a plain text file with some data in it, that I'm trying to open and read using a Python (ver 3.2) program, and trying to load that data into a data structure within the program.
Here's what my text file looks like (file is called "data.txt")
NAME: Joe Smith
CLASS: Fighter
STR: 14
DEX: 7
Here's what my program looks like:
player_name = None
player_class = None
player_STR = None
player_DEX = None
f = open("data.txt")
data = f.readlines()
for d in data:
# parse input, assign values to variables
print(d)
f.close()
My question is, how do I assign the values to the variables (something like setting player_STR = 14 within the program)?

player = {}
f = open("data.txt")
data = f.readlines()
for line in data:
# parse input, assign values to variables
key, value = line.split(":")
player[key.strip()] = value.strip()
f.close()
now the name of your player will be player['name'], and the same goes for all other properties in your file.

import re
pattern = re.compile(r'([\w]+): ([\w\s]+)')
f = open("data.txt")
v = dict(pattern.findall(f.read()))
player_name = v.get("name")
plater_class = v.get('class')
# ...
f.close()

The most direct way to do it is to assign the variables one at a time:
f = open("data.txt")
for line in f: # loop over the file directly
line = line.rstrip() # remove the trailing newline
if line.startswith('NAME: '):
player_name = line[6:]
elif line.startswith('CLASS: '):
player_class = line[7:]
elif line.startswith('STR: '):
player_strength = int(line[5:])
elif line.startswith('DEX: '):
player_dexterity = int(line[5:])
else:
raise ValueError('Unknown attribute: %r' % line)
f.close()
That said, most Python programmers would stored the values in a dictionary rather than in variables. The fields can be stripped (removing the line endings) and split with: characteristic, value = data.rstrip().split(':'). If the value should be a number instead of a string, convert it with float() or int().

Related

Return a dictionary of a function

I want to define a function, that reads a table of a textfile as a dictionary and than use it for returning specific values. The keys are chemical symbols (like "He" for Helium,...). The values return their specific atom masses.
I don't understand, what I have to do...
The first five lines of the textfile read:
H,1.008
He,4.0026
Li,6.94
Be,9.0122
B,10.81
Here are my attempts: (I don't know where to place the parameter key so that I can define it)
def read_masses():
atom_masses = {}
with open["average_mass.csv") as f:
for line in f:
(key, value) = line.split(",")
atom_masses[key] = value
return(value)
m = read_masses("average_mass.csv)
print(m["N"]) #for the mass of nitrogen ```
once return has called, the code below it doesn't execute. What you need to return is the atom_masses not value and you have to place it outside the for loop
def read_masses(file):
atom_masses = {}
with open(file) as f:
for line in f:
(key, value) = line.split(",")
atom_masses[key] = value
return (atom_masses)
m = read_masses("average_mass.csv")
print(m["H"])
>>> 1.008
Try:
def read_masses(name):
data = {}
with open(name, "r") as f_in:
for line in map(str.strip, f_in):
if line == "":
continue
a, b = map(str.strip, line.split(",", maxsplit=1))
data[a] = float(b)
return data
m = read_masses("your_file.txt")
print(m.get("He"))
Prints:
4.0026

How to create a dictionary based on a string from a file

I have this following string in a text file
InfoType 0 :
string1
string2
string3
InfoType 1 :
string1
string2
string3
InfoType 3 :
string1
string2
string3
Is there a way to create a dictionary that would look like this:
{'InfoType 0':'string1,string2,string3', 'InfoType 1':'string1,string2,string3', 'InfoType 3':'string1,string2,string3'}
Something like this should work:
def my_parser(fh, key_pattern):
d = {}
for line in fh:
if line.startswith(key_pattern):
name = line.strip()
break
# This list will hold the lines
lines = []
# Now iterate to find the lines
for line in fh:
line = line.strip()
if not line:
continue
if line.startswith(key_pattern):
# When in this block we have reached
# the next record
# Add to the dict
d[name] = ",".join(lines)
# Reset the lines and save the
# name of the next record
lines = []
name = line
# skip to next line
continue
lines.append(line)
d[name] = ",".join(lines)
return d
Use like so:
with open("myfile.txt", "r") as fh:
d = my_parser(fh, "InfoType")
# {'InfoType 0 :': 'string1,string2,string3',
# 'InfoType 1 :': 'string1,string2,string3',
# 'InfoType 3 :': 'string1,string2,string3'}
There are limitations, such as:
Duplicate keys
The key needs processing
You could get around these by making the function a generator and yielding name, str pairs and processing them as you read the file.
This will do:
dictionary = {}
# Replace ``file.txt`` with the path of your text file.
with open('file.txt', 'r') as file:
for line in file:
if not line.strip():
continue
if line.startswith('InfoType'):
key = line.rstrip('\n :')
dictionary[key] = ''
else:
value = line.strip('\n') + ','
dictionary[key] += value

TypeError: 'type' object is not subscriptable. How can I get this to remove an array from a 2d array?

I have had a look at answers to similar questions but I just can't make this work. I am quite new to python.
def read():
set = []
f = open("error set 1.txt", "r")
replace = f.read()
f.close()
f = open("Test1_Votes.txt", "w")
replaced = replace.replace(",", "")
f.write(replaced)
f.close()
f = open("Test1_Votes.txt", "r")
for line in f:
ballot = []
for ch in line:
vote = ch
ballot.append(vote)
print (ballot)
set.append(ballot)
"""print(set)"""
remove()
def remove():
for i in range (70):
x = i - 1
check = set[x]
if 1 not in check:
set.remove[x]
print(set)
The error is line 37, check = set[x]
I'm unsure of what is actually causing the error
In the remove function, you have not defined set. So, python thinks it's the built-in object set, which is actually not subscriptable.
Pass your object to the remove function, and, preferably, give it another name.
Your remove function cant "see" your set variable (which is list, avoid using reserved words as variable name), because its not public, its defined only inside read function.
Define this variable before read function or send it as input to remove function, and it should be working.
def read():
set = []
f = open("error set 1.txt", "r")
replace = f.read()
f.close()
f = open("Test1_Votes.txt", "w")
replaced = replace.replace(",", "")
f.write(replaced)
f.close()
f = open("Test1_Votes.txt", "r")
for line in f:
ballot = []
for ch in line:
vote = ch
ballot.append(vote)
print (ballot)
set.append(ballot)
"""print(set)"""
remove(set)
def remove(set):
for i in range (70):
x = i - 1
check = set[x]
if 1 not in check:
set.remove(x)
print(set)

Python- How do I update an index of a for loop that iterates over lines in a file?

Using a for loop, I'm iterating over the lines in a file. Given this line:
line= [ ‘641', '"Tornadus', ' (Incarnate Form)"', '"Flying"', '""', '5', '"TRUE"']
I need to reformat index [6] from '"TRUE"' to the boolean True.
Full expected output: d={'Tornadus, (Incarnate Form)': (641, 'Flying', None, 5, True}
I used:
if "T" in line[6]: # format legendary if TRUE
line[6] = True
But I get this error:
Traceback (most recent call last):
File "tester5p.py", line 305, in test_read_info_file_05
self.assertEqual(read_info_file(DATAFILE),info_db5())File "/Users/kgreenwo/Desktop/student.py", line 52, in read_info_file
line[5] = False
TypeError: 'str' object does not support item assignment
How can I assign it WITHIN the for loop?
To see my full code:
def read_info_file(filename):
f = open(filename, 'r') # open file in read mode
d = {} # intitialze as empty
count = 0 # helps to skip first line
key = ""
for line in f: # get each line from file
if count != 0: # skip first line
# 1___________________________________________open file,read, skip 1st line
id_num = int(line[0]) # make id an integer
# 2________________________________________________
if ',' in line[1]: # two parts to fullname, changes indexes
part1 = line[1].strip('"') # get format first part of name
part2 = line[2].strip() # get format second part of name
# 3______________
fullname = part1 + part2
key = fullname
# 4______________
type1 = line[3].strip('"')
# 5--------------
if line[4] == "": # check if there is not a second type
type2 = None # correct format
else: # is a second type
type2 = line[4].strip('"') # format second type
# 6______________
generation = line[5] # format generation
# 7_____________
if "T" in line[6]: # format legendary if TRUE
line[6] = True
legendary = line[6]
else: # format legendary if FALSE
line[6] = False
legendary = line[6]
# 8______________________________________________one part to name
else: # one part to name
fullname = line[1].strip('"')
# 9______________
type1 = line[2].strip('"')
# 10_____________
if line[3] == "": # if no second type
type2 = None
else:
type2 = line[3].strip('"') # there is a second type
# 11_____________
generation = line[4] # format generation
# 12_____________
if "T" in line[5]: # format legendary if TRUE
line[5] = True
legendary = line[5]
else: # formmat Legendary if False
line[5] = False
legendary = line[5]
value = (id_num, type1, type2, generation, legendary)
d.update([(key, value)])
count += 1
return d
Reproducible example:
input: (don't forget to skip first line!)
info_file1 = '''"ID","Name","Type 1","Type 2","Generation","Legendary"
1,"Bulbasaur","Grass","Poison",1,"FALSE"
Output:
d={'Bulbasaur':(1,'Grass','Poison',1,False)}
It is quite unclear from your example, but my thoughts go to:
for line in f:
line = line.split(',')
Now you can mess with indexes and see whether you have more errors.
And if you use:
if "T" in line[6]: # format legendary if TRUE
line[6] = True
It will work.
Your input file looks like a comma-separated values file. If it is, what you want is pretty easy.
Let's suppose your input file is literally this:
Input_file-43644346.txt
info_file1 = '''"ID","Name","Type 1","Type 2","Generation","Legendary"
1,"Bulbasaur","Grass","Poison",1,"FALSE"
641,"Tornadus', ' (Incarnate Form)","Flying",,5,"TRUE"
You could do something like that:
#!/usr/bin/env python3
import csv
input_file_name = "Input_file-43644346.txt"
with open(input_file_name, newline='') as input_file:
next(input_file) # skip first line
record_extractor = csv.reader(input_file)
d = {}
for row in record_extractor:
key = row[1].strip()
row_truth = row[5] == "TRUE" # simplifying the boolean retrieving
# Using conditional expressions
row_second_type = row[3].strip() if row[3] else None
output_row = (row[0], row[2], row_second_type, row[4], row_truth)
d[key] = output_row
print("d=", d)
Here are some key points of this solution:
This example is in Python 3's syntax
Using with makes sure that the input file is closed timely
Since a file object is also an iterator, you can skip the first line by using next().
csv.reader() will give you a tuple containing the information from a row. It will process quoted string like you would expect.
The expression row[5] == "TRUE" will yield a boolean expression. You don't need to use an if statement.
An empty string is equivalent to False. Any other string is True.
Conditional expressions can be used to change an empty string to None like you wanted.
dict.update() is useful if you already have a dictionary or a list of tuples you want to use its values to update an dictionary but you are better off using d[key] = value
But my guess is that your file is more like that:
Input_file-43644346b.txt
"ID","Name","Type 1","Type 2","Generation","Legendary"
1,"Bulbasaur","Grass","Poison",1,"FALSE"
641,"Tornadus', ' (Incarnate Form)","Flying",,5,"TRUE"
You can then use csv.DictReader to read your data:
#!/usr/bin/env python3
import csv
input_file_name = "Input_file-43644346b.txt"
with open(input_file_name, newline='') as input_file:
record_extractor = csv.DictReader(input_file)
d = {}
for row in record_extractor:
key = row["Name"].strip()
row_truth = row["Legendary"] == "TRUE"
row_second_type = row["Type 2"].strip() if row["Type 2"] else None
output_row = (row["ID"], row["Type 1"],
row_second_type, row["Generation"], row_truth)
d[key] = output_row
print("d=", d)
That enables you to use "column" names to identify different parts of each row
You can simplify even more your code by using a dictionary comprehension:
#!/usr/bin/env python3
import csv
input_file_name = "Input_file-43644346.txt"
with open(input_file_name, newline='') as input_file:
next(input_file) # skip first line
record_extractor = csv.reader(input_file)
d = { row[1]: (row[0],
row[2],
row[3].strip() if row[3] else None,
row[4],
row[5] == "TRUE")
for row in record_extractor }
print("d=", d)
Instead of reassigning it, I just did this and it worked:
if "T" in line[6]: # format legendary if TRUE
legendary = True
else: # format legendary if FALSE
legendary = False

filteration using separators in python

I have many lines like the following:
>ENSG00000003137|ENST00000001146|CYP26B1|72374964|72375167|4732
CGTCGTTAACCGCCGCCATGGCTCCCGCAGAGGCCGAGT
>ENSG00000001630|ENST00000003100|CYP51A1|91763679|91763844|3210
TCCCGGGAGCGCGCTTCTGCGGGATGCTGGGGCGCGAGCGGGACTGTTGACTAAGCTTCG
>ENSG00000003137|ENST00000412253|CYP26B1|72370133;72362405|72370213;72362548|4025
AGCCTTTTTCTTCGACGATTTCCG
In this example ENSG00000003137 is name and 4732 which is the last one is length. as you see some names are repeated but they have different length.
I want to make a new file in which I only have those with the longest length. meaning the results would be like this:
>ENSG00000003137|ENST00000001146|CYP26B1|72374964|72375167|4732
CGTCGTTAACCGCCGCCATGGCTCCCGCAGAGGCCGAGT
>ENSG00000001630|ENST00000003100|CYP51A1|91763679|91763844|3210
TCCCGGGAGCGCGCTTCTGCGGGATGCTGGGGCGCGAGCGGGACTGTTGACTAAGCTTCG
I have made this code to split but don't know how to make the file I want:
file = open(“file.txt”, “r”)
for line in file:
if line.startswith(“>”):
line = line.split(“|”)
You'll need to read the file twice; the first time round, track the largest size per entry:
largest = {}
with open(inputfile) as f:
for line in f:
if line.startswith('>'):
parts = line.split('|')
name, length = parts[0][1:], int(parts[-1])
largest[name] = max(length, largest.get(name, -1))
then write out the copy in a second pass, but only those sections whose name and length match the extracted largest length from the first pass:
with open(inputfile) as f, open(outpufile, 'w') as out:
copying = False
for line in f:
if line.startswith('>'):
parts = line.split('|')
name, length = parts[0][1:], int(parts[-1])
copying = largest[name] == length
if copying:
out.write(line)
you have to do two types of handling in the loop, one that compares your 'length', and one that stores the CGTA when its needed. I wrote an example for you that reads those into dicts:
file = open("file.txt", "r")
myDict = {}
myValueDict = {}
action = 'remember'
geneDict = {}
for line in file:
if line.startswith(">"):
line = line.rstrip().split("|")
line_name = line[0]
line_number = int(line[-1])
if line_name in myValueDict:
if myValueDict[line_name] < line_number:
action = 'remember'
myValueDict[line_name] = line_number
myDict[line_name] = line
else:
action = 'forget'
else:
myDict[line_name] = line
myValueDict[line_name] = line_number
else:
if action == 'remember':
geneDict[line_name] = line.rstrip()
for key in myDict:
print(myDict[key])
for key in geneDict:
print(geneDict[key])
this ignores the lower length items. you can now store those dicts any way you want.

Categories