How can I parse a formatted file into variables using Python?

How can I parse a formatted file into variables using Python? - python

I have a pre-formatted text file with some variables in it, like this:
header one
name = "this is my name"
last_name = "this is my last name"
addr = "somewhere"
addr_no = 35
header
header two
first_var = 1.002E-3
second_var = -2.002E-8
header
As you can see, each score starts with the string header followed by the name of the scope (one, two, etc.).
I can't figure out how to programmatically parse those options using Python so that they would be accesible to my script in this manner:
one.name = "this is my name"
one.last_name = "this is my last name"
two.first_var = 1.002E-3
Can anyone point me to a tutorial or a library or to a specific part of the docs that would help me achieve my goal?

I'd parse that with a generator, yielding sections as you parse the file. ast.literal_eval() takes care of interpreting the value as a Python literal:
import ast
def load_sections(filename):
with open(filename, 'r') as infile:
for line in infile:
if not line.startswith('header'):
continue # skip to the next line until we find a header
sectionname = line.split(None, 1)[-1].strip()
section = {}
for line in infile:
if line.startswith('header'):
break # end of section
line = line.strip()
key, value = line.split(' = ', 1)
section[key] = ast.literal_eval(value)
yield sectionname, section
Loop over the above function to receive (name, section_dict) tuples:
for name, section in load_sections(somefilename):
print name, section
For your sample input data, that results in:
>>> for name, section in load_sections('/tmp/example'):
... print name, section
...
one {'last_name': 'this is my last name', 'name': 'this is my name', 'addr_no': 35, 'addr': 'somewhere'}
two {'first_var': 0.001002, 'second_var': -2.002e-08}

Martijn Pieters is correct in his answer given your preformatted file, but if you can format the file in a different way in the first place, you will avoid a lot of potential bugs. If I were you, I would look into getting the file formatted as JSON (or XML), because then you would be able to use python's json (or XML) libraries to do the work for you. http://docs.python.org/2/library/json.html . Unless you're working with really bad legacy code or a system that you don't have access to, you should be able to go into the code that spits out the file in the first place and make it give you a better file.

def get_section(f):
section=[]
for line in f:
section += [ line.strip("\n ") ]
if section[-1] == 'header': break
return section
sections = dict()
with open('input') as f:
while True:
section = get_section(f)
if not section: break
section_dict = dict()
section_dict['sname'] = section[0].split()[1]
for param in section[1:-2]:
k,v = [ x.strip() for x in param.split('=')]
section_dict[k] = v
sections[section_dict['sname']] = section_dict
print sections['one']['name']
You can also access these sections as attributes:
class Section:
def __init__(self, d):
self.__dict__ = d
one = Section(sections['one'])
print one.name

Related

Read file and format it into dictionary

How to capture the string in function.py and track the def step1() and its following function create() and login() into dictionary format? (The format i want to achieve is below)
function.py
#!C:\Python\Python39\python.exe
# print ('Content-type: text/html\n\n')
def step1():
create()
login()
def step2():
authenticate()
def step3():
send()
Expected output
thisdict = {
'def step1()': ['create(),login()'],
'def step2():':['authenticate()'],
'def step3():': ['send()']
}

You can read the file function.py, split it in order to separate the different functions, and then for each function, split it once more to get the signature as key and the commands as values:
with open('function.py', 'r') as inFile:
funcs = inFile.read().split('\n\n')[1:]
result = {}
for elem in funcs:
sign, commands = elem.split(':')
commands = list(map(str.strip, commands.split('\n')))[1:]
result.update({sign : commands})
print(result)
This will return:
{'def step1()': ['create()', 'login()'], 'def step2()': ['authenticate()'], 'def step3()': ['send()']}

You could use a regex that would find each method and content (def \w+\(.*\):)((?:\n[ \t]+.+)+)
(def \w+\(.*\):) for the method definition
\n[ \t]+.+ for each method row (with the previous \n)
import json
import re
with open("function.py") as fic:
content = fic.read()
groups = re.findall(r"(def \w+\(.*\):)((?:\n[ \t]+.+)+)", content)
result = {key: [",".join(map(str.strip, val.strip().splitlines()))]
for key, val in groups}
print(json.dumps(result, indent=4))

you can do something like that:
with open('function.py', 'r') as f:
file = f.readlines()
thisdict = {'start':[]}
temp = []
a = '_start_' #just to get the first lines if there is some things before the first function
for line in file:
if line.startsWith('def'): #You might want to add something for the spacing
thisdict[a] = temp
a = line[3:]
temp=[]
else:
temp.append(line)
thisdict[a] = temp
print(thisdict)
this clearly isn't the best code but it's easy to understand and easy to implement :)

Try statement not running as I expect

I have three functions, the readHeader thet reads the header of the a txt file, readExpertsFile that reads the contents of the file and the exceptionNH function that compares the file name and header and raises an exception if the two are not compatible (e.g. if the date in the name is not the same as the header).
Here are the three functions and a txt example:
def readHeader(fileName):
fileIn = open(fileName, "r")
fileIn.readline()
day = fileIn.readline().replace("\n", "")
fileIn.readline()
time = fileIn.readline().replace("\n", "")
fileIn.readline()
company = fileIn.readline().replace("\n", "")
scope = fileIn.readline().replace(":", "").replace("\n", "")
fileIn.close()
return (day, time, company, scope)
def readFile(fileName):
expertsList = []
expertsList.append(readHeader(fileName))
fileIn = open(fileName, "r")
for line_counter in range(LNHEADER):
fileIn.readline()
fileInE.close()
return expertsList
def exceptionNH(fileName):
try:
assert fileName[10:17] == readFile(fileName)[3][0].lower().replace(":", "")
except AssertionError:
print("Error in input file: inconsistent name and header in file", fileName,".")
exit()
fileName = "file.txt"
exceptionNH("2018y03m28experts10h30.txt")
2018y03m28experts10h30.txt:
Day:
2018-03-28
Time:
10:30
Company:
XXX
Experts:
...
...
My problem here is that on the try statement I expect the assert "sees" the comparation as True and skip the except clause but this is not happening.
I suspect that the .lower() is not working but I can't understand why.
If you see other things that could be better feel free to share, as I'm a new at python and want to improve myself.

I've found the error. I was thinking that when I want to get the first element from the first tuple inside a list, I would need to write list[position of item][position of tuple], instead of it's inverse.
Following the mkrieger1's advice, I printed fileName[10:17] and readFile(fileName)[3][0].lower().replace(":", ""), the first was good but the second was not showing the third item of the first tuple (that's from readHeader) but the first item of the third tuple.
I've changed from readFile(fileName)[3][0].lower().replace(":", "") to readFile(fileName)[0][3].lower().replace(":", "") and it's working now, thank you for the help.

I need a shortcut

So im just trying to make a simple script that can filter emails with different domains its working great but i need a shortcut, cause i dont wana write if and elif statements many time , Can anyone tell my how to write my script with function so that will become shorter and easier.. thanks in advance ,Script is below:
f_location = 'C:/Users/Jack The Reaper/Desktop/mix.txt'
text = open(f_location)
good = open('C:/Users/Jack The Reaper/Desktop/good.txt','w')
for line in text:
if '#yahoo' in line:
yahoo = None
elif '#gmail' in line:
gmail = None
elif '#yahoo' in line:
yahoo = None
elif '#live' in line:
live = None
elif '#outlook' in line:
outlook = None
elif '#hotmail' in line:
hotmail = None
elif '#aol' in line:
aol = None
else:
if ' ' in line:
good.write(line.strip(' '))
elif '' in line:
good.write(line.strip(''))
else:
good.write(line)
text.close()
good.close()

I would suggest you to use dict for this instead of having separate variables for all the cases.
my_dict = {}
...
if '#yahoo' in line:
my_dict['yahoo'] = None
But if you want to do the way you described in the question, you can do as done below,
email_domains = ['#yahoo', '#gmail', '#live', '#outlook', '#hotmail', '#aol']
for e in email_domains:
if e in line:
locals()[e[1:]] = None
#if you use dict, use the below line
#my_dict[e[1:]] = None
locals() returns a dictionary of the current namespace. The keys in this dict are the variable names and value is the value of the variable.
So locals()['gmail'] = None creates a local variable named gmail(if it doesn't exist) and assigns it None.

As you stated the problem and provided the sample file :
So i have two solution : One line solution and other is detailed solution.
First let's define regex pattern and import re module
import re
pattern=r'.+#(?!gmail|yahoo|aol|hotmail|live|outlook).+'
Now detailed version code:
emails=[]
with open('emails.txt','r') as f:
for line in f:
match=re.finditer(pattern,line)
for find in match:
emails.append(find.group())
with open('result.txt','w') as f:
f.write('\n'.join(emails))
output in result.txt file :
nic-os9#gmx.de
angelique.charuel#sfr.fr
nannik#interia.pl
l.andrioli#freenet.de
kamil_sieminski8#o2.pl
hugo.lebrun.basket#orange.fr
One line solution if you want too short:
with open('results.txt','w') as file:
file.write('\n'.join([find.group() for line in open('emails.txt','r') for find in re.finditer(pattern,line)]))
output:
nic-os9#gmx.de
angelique.charuel#sfr.fr
nannik#interia.pl
l.andrioli#freenet.de
kamil_sieminski8#o2.pl
hugo.lebrun.basket#orange.fr
P.S : with one line solution file will not close automatically but python clear that stuff its not a big issue (but not always) but still if you want you can use.

Copying string from a specific index from one file to pasting that string on a specific place in another file

My intention was to copy a piece of string after either a colon or equal sign from File 1 , and pasting that string in File 2 in a similar location after either a colon or equal sign.
For instance, if File 1 has:
username: Stack
File 2 is originally empty:
username=
I want Stack to be copied over to File 2 after username. Currently, I'm stuck and not sure what to do. The program piece I made below doesn't copy the username. I would greatly appreciate any input!
with open("C:/Users/SO//Downloads//f1.txt", "r") as f1:
with open("C:/Users/SO//Downloads//f2.txt", "r+") as f2:
searchlines = f1.readlines()
searchlines_f2=f2.readlines()
for i, line in enumerate(searchlines):
if 'username' in line:
for l in searchlines[i:i+1]:
ind = max(l.find(':'), l.find('='), 0) #finding index of specific characters
copy_string=l[ind+1:].strip() #copying string for file 2
for l in searchlines_f2[i:i+1]:
if 'username' in line:
f2.write(copy_string)

I think something like this will get you what you need in a more maintainable and Pythonic way.
Note the use of regex as well as some string methods (e.g., startswith)
import re
SOURCE_PATH = "C:/Users/SO//Downloads//f1.txt"
TARGET_PATH = "C:/Users/SO//Downloads//f2.txt"
def _get_lines(filepath):
""" read `filepath` and return a list of strings """
with open(filepath, "r+") as fh:
return fh.readlines()
def _get_value(fieldname, text):
""" parse `text` to get the value of `fieldname` """
try:
pattern = '%s[:=]{1}\s?(.*)' % fieldname
return re.match(pattern, text).group(1)
except IndexError:
# you may want to handle this differently!
return None
def _write_target(filepath, trgt_lines):
""" write `trgt_lines` to `filepath` """
with open(filepath, "w+") as fh:
fh.writelines(trgt_lines)
src_lines = _get_lines(SOURCE_PATH)
trgt_lines = _get_lines(TARGET_PATH)
# extract field values from source file
fields = ['username', 'id', 'location']
for field in fields:
value = None
for cur_src in src_lines:
if cur_src.startswith(field):
value = _get_value(field, cur_src)
break
# update target_file w/ value (if we were able to find it)
if value is not None:
for i, cur_trgt in enumerate(trgt_lines):
if cur_trgt.startswith('{0}='.format(field)):
trgt_lines[i] = '{0}={1}'.format(field, value)
break
_write_target(TARGET_PATH, trgt_lines)

How do I remove a particular line from a file but keep other lines intact?

I want to learn Python so I started writing my first program which is a phone book directory.
It has the options to add a name and phone number, remove numbers, and search for them.
Ive been stuck on the remove part for about 2 days now and just can't get it working correctly. I've been in the Python IRC and everything, but haven't been able to figure it out.
Basically, my program stores the numbers to a list in a file. I cannot figure out how to remove a particular line in the file but keep the rest of the file intact. Can someone please help me with this?
Some people have advised that it will be easier to do if I create a temp file, remove the line, then copy the remaining lines from the original file over to the temp file. Then write over the original file over with the temp file. So I have been trying this...
if ui == 'remove':
coname = raw_input('What company do you want to remove? ') # company name
f = open('codilist.txt', 'r') # original phone number listing
f1 = open('codilist.tmp', 'a') # open a tmp file
for line in f:
if line.strip() != coname.strip():
for line in f:
f1.write(line)
break # WILL LATER OVERWRITE THE codilist.txt WITH THE TMP FILE
else:
f1.write(line)
else:
print 'Error: That company is not listed.'
f1.close()
f.close()
continue

I assume your file contains something like <name><whitespace><number> on each line? If that's the case, you could use something like this for your if statement (error handling not included!):
name, num = line.strip().split()
if name != coname.strip():
# write to file
Suggestion:
Unless there is some specific reason for you to use a custom format, the file format json is quite good for this kind of task. Also note the use of the 'with' statement in these examples, which saves you having to explicitly close the file.
To write the information:
import json
# Somehow build a dict of {coname: num,...}
info = {'companyA': '0123456789', 'companyB': '0987654321'}
with open('codilist.txt', 'w') as f:
json.dump(info, f, indent=4) # Using indent for prettier files
To read/amend the file:
import json
with open('codilist.txt', 'r+') as f:
info = json.load(f)
# Remove coname
if coname in info:
info.pop(coname)
else:
print 'No record exists for ' + coname
# Add 'companyC'
info['companyC'] = '0112233445'
# Write back to file
json.dump(info, f, indent=4)
You'll need python2.6 or later for these examples. If you're on 2.5, you'll need these imports:
import simplejson as json
from __future__ import with_statement
Hope that helps!

Here is a pretty extensively rewritten version:
all the phone data is wrapped into a Phonebook class; data is kept in memory (instead of being saved and reloaded for every call)
it uses the csv module to load and save data
individual actions are turned into short functions or methods (instead of One Big Block of Code)
commands are abstracted into a function-dispatch dictionary (instead of a cascade of if/then tests)
This should be much easier to understand and maintain.
import csv
def show_help():
print('\n'.join([
"Commands:",
" help shows this screen",
" load [file] loads the phonebook (file name is optional)",
" save [file] saves the phonebook (file name is optional)",
" add {name} {number} adds an entry to the phonebook",
" remove {name} removes an entry from the phonebook",
" search {name} displays matching entries",
" list show all entries",
" quit exits the program"
]))
def getparam(val, prompt):
if val is None:
return raw_input(prompt).strip()
else:
return val
class Phonebook(object):
def __init__(self, fname):
self.fname = fname
self.data = []
self.load()
def load(self, fname=None):
if fname is None:
fname = self.fname
try:
with open(fname, 'rb') as inf:
self.data = list(csv.reader(inf))
print("Phonebook loaded")
except IOError:
print("Couldn't open '{}'".format(fname))
def save(self, fname=None):
if fname is None:
fname = self.fname
with open(fname, 'wb') as outf:
csv.writer(outf).writerows(self.data)
print("Phonebook saved")
def add(self, name=None, number=None):
name = getparam(name, 'Company name? ')
number = getparam(number, 'Company number? ')
self.data.append([name,number])
print("Company added")
def remove(self, name=None):
name = getparam(name, 'Company name? ')
before = len(self.data)
self.data = [d for d in self.data if d[0] != name]
after = len(self.data)
print("Deleted {} entries".format(before-after))
def search(self, name=None):
name = getparam(name, 'Company name? ')
found = 0
for c,n in self.data:
if c.startswith(name):
found += 1
print("{:<20} {:<15}".format(c,n))
print("Found {} entries".format(found))
def list(self):
for c,n in self.data:
print("{:<20} {:<15}".format(c,n))
print("Listed {} entries".format(len(self.data)))
def main():
pb = Phonebook('phonebook.csv')
commands = {
'help': show_help,
'load': pb.load,
'save': pb.save,
'add': pb.add,
'remove': pb.remove,
'search': pb.search,
'list': pb.list
}
goodbyes = set(['quit','bye','exit'])
while True:
# get user input
inp = raw_input("#> ").split()
# if something was typed in
if inp:
# first word entered is the command; anything after that is a parameter
cmd,args = inp[0],inp[1:]
if cmd in goodbyes:
# exit the program (can't be delegated to a function)
print 'Goodbye.'
break
elif cmd in commands:
# "I know how to do this..."
try:
# call the appropriate function, and pass any parameters
commands[cmd](*args)
except TypeError:
print("Wrong number of arguments (type 'help' for commands)")
else:
print("I didn't understand that (type 'help' for commands)")
if __name__=="__main__":
main()

Something simple like this will read all of f, and write out all the lines that don't match:
for line in f:
if line.strip() != coname.strip():
f1.write(line)

Ned's answer looks like it should work. If you haven't tried this already, you can set python's interactive debugger above the line in question. Then you can print out the values of line.strip() and coname.strip() to verify you are comparing apples to apples.
for line in f:
import pdb
pdb.set_trace()
if line.strip() != coname.strip():
f1.write(line)
Here's a list of pdb commands.

You probably don't want to open the temp file in append ('a') mode:
f1 = open('codilist.tmp', 'a') # open a tmp file
also, be aware that
for line in f:
...
f1.write(line)
will write everything to the file without newlines.
The basic structure you want is:
for line in myfile:
if not <line-matches-company>:
tmpfile.write(line + '\n') # or print >>tmpfile, line
you'll have to implement <line-matches-company> (there isn't enough information in the question to know what it should be -- perhaps if you showed a couple of lines from your data file..?)

I got this working...
if ui == 'remove':
coname = raw_input('What company do you want to remove? ') # company name
f = open('codilist.txt')
tmpfile = open('codilist.tmp', 'w')
for line in f:
if coname in line:
print coname + ' has been removed.'
else:
tmpfile.write(line)
f.close()
tmpfile.close()
os.rename('codilist.tmp', 'codilist.txt')
continue

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I parse a formatted file into variables using Python? - python

Related

Read file and format it into dictionary

Try statement not running as I expect

I need a shortcut

Copying string from a specific index from one file to pasting that string on a specific place in another file

How do I remove a particular line from a file but keep other lines intact?

Categories

Resources