how to skip blocks of text when writing file in python - python

Is it possible to use python to skip blocks of text when writing a file from another file?
For example lets say the input file is:
This is the file I would like to write this line
I would like to skip this line
and this one...
and this one...
and this one...
but I want to write this one
and this one...
How can I write a script that allows me to skip certain lines that differ in content and size which resumes writing the lines to another file once it recognizes a certain line?
My code reads through the lines, doesn't write duplicate lines and performs some operation on the line by using dictionaries and regex.

def is_wanted(line):
#
# You have to define this!
#
# return True to keep the line, or False to discard it
def copy_some_lines(infname, outfname, wanted_fn=is_wanted):
with open(infname) as inf, open(outfname, "w") as outf:
outf.writelines(line for line in inf if wanted_fn(line))
copy_some_lines("file_a.txt", "some_of_a.txt")
In order to extend this to multi-line blocks, you can implement a finite state machine like
which would turn into something like
class BlockState:
GOOD_BLOCK = True
BAD_BLOCK = False
def __init__(self):
self.state = self.GOOD_BLOCK
def is_bad(self, line):
# *** Implement this! ***
# return True if line is bad
def is_good(self, line):
# *** Implement this! ***
# return True if line is good
def __call__(self, line):
if self.state == self.GOOD_BLOCK:
if self.is_bad(line):
self.state = self.BAD_BLOCK
else:
if self.is_good(line):
self.state = self.GOOD_BLOCK
return self.state
then
copy_some_lines("file_a.txt", "some_of_a.txt", BlockState())

Pseudo-code:
# Open input and output files, and declare the unwanted function
for line in file1:
if unwanted(line):
continue
file2.write(line)
# Close files etc...

You can read the file line by line, and have control on each line you read:
with open(<your_file>, 'r') as lines:
for line in lines:
# skip this line
# but not this one
Note that if you want to read all lines despite the content and only then manipulate it, you can:
with open(<your_file>) as fil:
lines = fil.readlines()

This should work:
SIZE_TO_SKIP = ?
CONTENT_TO_SKIP = "skip it"
with open("my/input/file") as input_file:
with open("my/output/file",'w') as output_file:
for line in input_file:
if len(line)!=SIZE_TO_SKIP and line!=CONTENT_TO_SKIP:
output_file.write(line)

Related

Unable to move to second function after the result of the first

Program Goal: Search a defined yamlfile (scan_dcn.yaml) and return all lines matching the search criteria as defined in the function_search_search_key() and function_search_event_type() functions.
Input File - scan_dcn.yaml:
search_dict:
[
{search_key: ["Failed to Process the file"],
event_type: "evttyp_repl_dcn_error",
event_description: "Failure to process DCN file",
priority: 50,
scan_interval: 1,
remove_dups: True,
category: "dcn",
context_begin: 0,
context_end: 1,
reportable: False,
offset: 0
},
Problem:
My program will return function_search_search_key() but will not proceed to function_search_event_type().
I would think that my problem is that I have no logic to proceed to the second function after the completion of the first.
Do I need to return a value in each function to proceed?
Python Source Code
yamlfile = open('scan_dcn.yaml', 'r')
def function_search_search_key():
search_search_key = ['{search_key:']
for line in yamlfile.readlines():
for word in search_search_key:
if word in line:
print(line)
def function_search_event_type():
search_event_type = ['event_type:']
for line in yamlfile.readlines():
for word in search_event_type:
if word in line:
print(line)
def main():
function_search_search_key()
function_search_event_type()
main()
In your first function you read the whole file with readlines. When you use readlines again in your second function you're already at the end of the file and there is no more data to read, so the for loop is not even entered.
But there's no need to read the file again for every function. Read the file outside of the functions and put it in a list. Then add a parameter to each of those functions that takes this list. In the function you can loop over the list.
def function_search_search_key(lines):
search_search_key = ['{search_key:']
for line in lines:
for word in search_search_key:
if word in line:
print(line)
def function_search_event_type(lines):
search_event_type = ['event_type:']
for line in lines:
for word in search_event_type:
if word in line:
print(line)
def main():
with open('scan_dcn.yaml', 'r') as yamlfile:
lines = yamlfile.readlines()
function_search_search_key(lines)
function_search_event_type(lines)
if __name__ = '__main__':
main()
If you ever need to change the name of the file you can do it in one place. If you open and read the file in every single function you would have to change all occurrances of the file name.
Your second function is being entered. It must if the call above it has finished.
You aren't seeing anything printed because you're attempting to loop though the same file more than once. Once you've read the file, it's exhausted. You can just re-read the file as a simple fix:
def function_search_search_key():
with open('scan_dcn.yaml', 'r') as yamlfile:
search_search_key = ['{search_key:']
for line in yamlfile.readlines():
for word in search_search_key:
if word in line:
print(line)
def function_search_event_type():
with open('scan_dcn.yaml', 'r') as yamlfile: # Read the file again
search_event_type = ['event_type:']
for line in yamlfile.readlines():
for word in search_event_type:
if word in line:
print(line)
You can read a file descriptor only once (if you don't seek to start), so you may open your file in each function
def function_search_search_key():
search_search_key = ['{search_key:']
with open('scan_dcn.yaml') as fic:
for line in fic:
for word in search_search_key:
if word in line:
print(line)
def function_search_event_type():
search_event_type = ['event_type:']
with open('scan_dcn.yaml') as fic:
for line in fic:
for word in search_event_type:
if word in line:
print(line)

Writing text in between to strings on a single line

def getlink():
with open('findlink.txt') as infile, open('extractlink.txt', 'w') as outfile:
copy = False
for line in infile:
if "](" in line.strip():
copy = True
if copy:
outfile.write(line)
if ")" in line.strip():
copy = False
print("extractlink written.")
def part3():
with open ('findlink.txt', 'w') as findlink:
findlink.write("[Testing](Test)")
print("findlink written and closed.")
getlink()
def run_bot():
getlink() #Already have findlink.txt written
When part3() is activated, the text is written to findlink.txt as expected, but when getlink() is activated, the extractlink.txt is never written to.
I've gathered my current code from a post back in 2013/2016, does anyone have any ideas why this may not be working?
The current goal is to have "Test" from findlink copied to extractlink.txt, not the entire line.
Someone edited your post to fix this, but you should understand that it is unnecessary to explicitly close a file when using a "with" statement AKA a context manager, because they handle this automatically for you.
A simple parser :
def getlink():
with open('findlink.txt') as infile, open('extractlink.txt', 'w') as outfile:
for line in infile:
begin = line.find("](")+2
end = line.find(")")
if 0 <= begin < end:
outfile.write(line[begin:end]+'\n')
According to this [tutorial][1] https://www.tutorialspoint.com/python/string_strip.htm
line.strip("text") removes text from the string.
So when you call line.strip() == "](" you are comparing line.strip("") with "](" which is always false unless the input line was just "](". So this condition is not really doing anything.
To get the Test from the file, I used line.find() which gives the index of the character in the string.
print line.find("](")
print line.find(")")
print line[line.find("](")+2: line.find(")")]
output:
8
14
Test
So then you could just do this for getlink().
def getlink():
with open('findlink.txt') as infile, open('extractlink.txt', 'w') as outfile:
for line in infile:
outfile.write(line[line.find("](")+2: line.find(")")])
print("extractlink written.")
infile.close()
outfile.close()
extractlink.txt:
Test
This is just one simple solution. You could implement it differently, but you may want to use line.find() instead of line.strip().

Nested loops iterating on a single file

I want to delete some specific lines in a file.
The part I want to delete is enclosed between two lines (that will be deleted too), named STARTING_LINE and CLOSING_LINE. If there is no closing line before the end of the file, then the operation should stop.
Example:
...blabla...
[Start] <-- # STARTING_LINE
This is the body that I want to delete
[End] <-- # CLOSING_LINE
...blabla...
I came out with three different ways to achieve the same thing (plus one provided by tdelaney's answer below), but I am wondering which one is the best. Please note that I am not looking for a subjective opinion: I would like to know if there are some real reasons why I should choose one method over another.
1. A lot of if conditions (just one for loop):
def delete_lines(filename):
with open(filename, 'r+') as my_file:
text = ''
found_start = False
found_end = False
for line in my_file:
if not found_start and line.strip() == STARTING_LINE.strip():
found_start = True
elif found_start and not found_end:
if line.strip() == CLOSING_LINE.strip():
found_end = True
continue
else:
print(line)
text += line
# Go to the top and write the new text
my_file.seek(0)
my_file.truncate()
my_file.write(text)
2. Nested for loops on the open file:
def delete_lines(filename):
with open(filename, 'r+') as my_file:
text = ''
for line in my_file:
if line.strip() == STARTING_LINE.strip():
# Skip lines until we reach the end of the function
# Note: the next `for` loop iterates on the following lines, not
# on the entire my_file (i.e. it is not starting from the first
# line). This will allow us to avoid manually handling the
# StopIteration exception.
found_end = False
for function_line in my_file:
if function_line.strip() == CLOSING_LINE.strip():
print("stop")
found_end = True
break
if not found_end:
print("There is no closing line. Stopping")
return False
else:
text += line
# Go to the top and write the new text
my_file.seek(0)
my_file.truncate()
my_file.write(text)
3. while True and next() (with StopIteration exception)
def delete_lines(filename):
with open(filename, 'r+') as my_file:
text = ''
for line in my_file:
if line.strip() == STARTING_LINE.strip():
# Skip lines until we reach the end of the function
while True:
try:
line = next(my_file)
if line.strip() == CLOSING_LINE.strip():
print("stop")
break
except StopIteration as ex:
print("There is no closing line.")
else:
text += line
# Go to the top and write the new text
my_file.seek(0)
my_file.truncate()
my_file.write(text)
4. itertools (from tdelaney's answer):
def delete_lines_iter(filename):
with open(filename, 'r+') as wrfile:
with open(filename, 'r') as rdfile:
# write everything before startline
wrfile.writelines(itertools.takewhile(lambda l: l.strip() != STARTING_LINE.strip(), rdfile))
# drop everything before stopline.. and the stopline itself
try:
next(itertools.dropwhile(lambda l: l.strip() != CLOSING_LINE.strip(), rdfile))
except StopIteration:
pass
# include everything after
wrfile.writelines(rdfile)
wrfile.truncate()
It seems that these four implementations achieve the same result. So...
Question: which one should I use? Which one is the most Pythonic? Which one is the most efficient?
Is there a better solution instead?
Edit: I tried to evaluate the methods on a big file using timeit. In order to have the same file on each iteration, I removed the writing parts of each code; this means that the evaluation mostly regards the reading (and file opening) task.
t_if = timeit.Timer("delete_lines_if('test.txt')", "from __main__ import delete_lines_if")
t_for = timeit.Timer("delete_lines_for('test.txt')", "from __main__ import delete_lines_for")
t_while = timeit.Timer("delete_lines_while('test.txt')", "from __main__ import delete_lines_while")
t_iter = timeit.Timer("delete_lines_iter('test.txt')", "from __main__ import delete_lines_iter")
print(t_if.repeat(3, 4000))
print(t_for.repeat(3, 4000))
print(t_while.repeat(3, 4000))
print(t_iter.repeat(3, 4000))
Result:
# Using IF statements:
[13.85873354100022, 13.858520206999856, 13.851908310999988]
# Using nested FOR:
[13.22578497800032, 13.178281234999758, 13.155530822999935]
# Using while:
[13.254994718000034, 13.193942980999964, 13.20395484699975]
# Using itertools:
[10.547019549000197, 10.506679693000024, 10.512742852999963]
You can make it fancy with itertools. I'd be interested in how timing compares.
import itertools
def delete_lines(filename):
with open(filename, 'r+') as wrfile:
with open(filename, 'r') as rdfile:
# write everything before startline
wrfile.writelines(itertools.takewhile(lambda l: l.strip() != STARTING_LINE.strip(), rdfile))
# drop everything before stopline.. and the stopline itself
next(itertools.dropwhile(lambda l: l.strip() != CLOSING_LINE.strip(), rdfile))
# include everything after
wrfile.writelines(rdfile)
wrfile.truncate()

How to Open a file through python

I am very new to programming and the python language.
I know how to open a file in python, but the question is how can I open the file as a parameter of a function?
example:
function(parameter)
Here is how I have written out the code:
def function(file):
with open('file.txt', 'r') as f:
contents = f.readlines()
lines = []
for line in f:
lines.append(line)
print(contents)
You can easily pass the file object.
with open('file.txt', 'r') as f: #open the file
contents = function(f) #put the lines to a variable.
and in your function, return the list of lines
def function(file):
lines = []
for line in f:
lines.append(line)
return lines
Another trick, python file objects actually have a method to read the lines of the file. Like this:
with open('file.txt', 'r') as f: #open the file
contents = f.readlines() #put the lines to a variable (list).
With the second method, readlines is like your function. You don't have to call it again.
Update
Here is how you should write your code:
First method:
def function(file):
lines = []
for line in f:
lines.append(line)
return lines
with open('file.txt', 'r') as f: #open the file
contents = function(f) #put the lines to a variable (list).
print(contents)
Second one:
with open('file.txt', 'r') as f: #open the file
contents = f.readlines() #put the lines to a variable (list).
print(contents)
Hope this helps!
Python allows to put multiple open() statements in a single with. You comma-separate them. Your code would then be:
def filter(txt, oldfile, newfile):
'''\
Read a list of names from a file line by line into an output file.
If a line begins with a particular name, insert a string of text
after the name before appending the line to the output file.
'''
with open(newfile, 'w') as outfile, open(oldfile, 'r', encoding='utf-8') as infile:
for line in infile:
if line.startswith(txt):
line = line[0:len(txt)] + ' - Truly a great person!\n'
outfile.write(line)
# input the name you want to check against
text = input('Please enter the name of a great person: ')
letsgo = filter(text,'Spanish', 'Spanish2')
And no, you don't gain anything by putting an explicit return at the end of your function. You can use return to exit early, but you had it at the end, and the function will exit without it. (Of course with functions that return a value, you use the return to specify the value to return.)
def fun(file):
contents = None
with open(file, 'r') as fp:
contents = fp.readlines()
## if you want to eliminate all blank lines uncomment the next line
#contents = [line for line in ''.join(contents).splitlines() if line]
return contents
print fun('test_file.txt')
or you can even modify this, such a way it takes file object as a function arguement as well
Here's a much simpler way of opening a file without defining your own function in Python 3.4:
var=open("A_blank_text_document_you_created","type_of_file")
var.write("what you want to write")
print (var.read()) #this outputs the file contents
var.close() #closing the file
Here are the types of files:
"r": just to read a file
"w": just to write a file
"r+": a special type which allows both reading and writing of the file
For more information see this cheatsheet.
def main():
file=open("chirag.txt","r")
for n in file:
print (n.strip("t"))
file.close()
if __name__== "__main__":
main()
the other method is
with open("chirag.txt","r") as f:
for n in f:
print(n)

how access to a file concurrently to add/edit/delete the data?

I want to create a text file and add data to it, line by line. If a data line already exists in the file, it should be ignored. Otherwise, it should be appended to the file.
You are almost certainly better to read the file and write a new changed version. In most circumstances it will be quicker, easier, less error-prone and more extensible.
If your file isn't that big, you could just do something like this:
added = set()
def add_line(line):
if line not in added:
f = open('myfile.txt', 'a')
f.write(line + '\n')
added.add(line)
f.close()
But this isn't a great idea if you have to worry about concurrency, large amounts of data being stored in the file, or basically anything other than something quick and one-off.
I did it like this,
def retrieveFileData():
"""Retrieve Location/Upstream data from files"""
lines = set()
for line in open(LOCATION_FILE):
lines.add(line.strip())
return lines
def add_line(line):
"""Add new entry to file"""
f = open(LOCATION_FILE, 'a')
lines = retrieveFileData()
print lines
if line not in lines:
f.write(line + '\n')
lines.add(line)
f.close()
else:
print "entry already exists"
if __name__ == "__main__":
while True:
line = raw_input("Enter line manually: ")
add_line(line)
if line == 'quit':
break

Categories