I have a file in which each line contains a sentence. Some sentences are however empty, i.e. in this case there is just "\n" newline character on the line.
What I want to do is: if I find an empty sentence, I want to replace it with some symbol like .
If I replace "\n", it will be replaced at all places in the file.
However, I am not sure how to do this:
import sys
f = open(sys.argv[1], "wr")
for line in f:
if len(line.strip())==0:
line.replace("\n", "empty")
# Then write the line back on the file
f.write(line + "\n") # Will this replace the line in the file?
Is the above code correct? Can I simultaneously read the line and edit it too?
This is a quick way of solving the problem, but not the ideal way of doing it should you have memory constrictions.
f = open(sys.argv[1], "r")
lines = f.readlines()
f.close()
lines = ['empty' if i == '\n' else i for i in lines]
f = open(sys.argv[1], "w")
f.writelines(lines)
f.close()
Should you have memory restrictions, creating a function utilising yield would be the best way to go about this
Edit: I should also say unless there has been an update, I don't believe it is possible to overwrite a specific line in a file using python without re-writing the entire file.
Related
I am trying to delete all the lines in a text file after a line that contains a specific string. What I am trying to do is find the number of the line in said file and rewrite the whole text up until that line.
The code that I'm trying is the following:
import itertools as it
with open('sampletext.txt', "r") as rf:
for num, line in enumerate(rf, 1): #Finds the number of the line in which a specific string is contained
if 'string' in line:
print(num)
with open('sampletext_copy.txt', "w") as wf:
for line in it.islice(rf, 0, num):
wf.write(line)
Also would appreciate any tips on how to do this. Thank you!
You could do it like this:
with open('sampletext.txt', "r") as rf, open('sampletext_copy.txt', "w") as wf:
for line in rf:
if 'string' in line:
break
wf.write(line)
Basically, you open both files at the same time, then read the input file line-by-line. If string is in the line, then you're done - otherwise, write it to the output file.
In case if you want to apply changes to original file, it's possible to do using .truncate() method of file object:
with open(r"sampletext.txt", "r+") as f:
while line := f.readline():
if line.rstrip() == "string": # line.startswith("string")
f.truncate(f.tell()) # removes all content after current position
break
Here we iterating over file until reach this specific line and resize stream to size of bytes we've already read (to get it we use .tell()).
Just to complement Donut's answer, if you want to modify the file in place, there's a much more efficient solution:
with open('sampletext.txt', "r+") as f:
for line in iter(f.readline, ''): # Can't use for line in f: because it disables
# tell for txt
# Or for walrus lovers:
# while line := f.readline():
if 'string' in line:
f.seek(0, 1) # Needed to ensure underlying handle matches logical read
# position; f.seek(f.tell()) is logically equivalent
f.truncate()
break
If issue #26158 is ever fixed (so calling truncate on a file actually truncates at the logical position, not the arbitrary position of the underlying raw handle that's likely advanced a great deal due to buffering), this simpler code would work:
with open('sampletext.txt', "r+") as f:
for line in f:
if 'string' in line:
f.truncate()
break
I am writing in python 3.6 and am having trouble making my code match strings in a short text document. this is a simple example of the exact logic that is breaking my bigger program:
PATH = "C:\\Users\\JoshLaptop\\PycharmProjects\\practice\\commented.txt"
file = open(PATH, 'r')
words = ['bah', 'dah', 'gah', "fah", 'mah']
print(file.read().splitlines())
if 'bah' not in file.read().splitlines():
print("fail")
with the text document formatted like so:
bah
gah
fah
dah
mah
and it is indeed printing out fail each time I run this. Am I using the incorrect method of reading the data from the text document?
the issue is that you're printing print(file.read().splitlines())
so it exhausts the file, and the next call to file.read().splitlines() returns an empty list...
A better way to "grep" your pattern would be to iterate on the file lines instead of reading it fully. So if you find the string early in the file, you save time:
with open(PATH, 'r') as f:
for line in f:
if line.rstrip()=="bah":
break
else:
# else is reached when no break is called from the for loop: fail
print("fail")
The small catch here is not to forget to call line.rstrip() because file generator issues the line with the line terminator. Also, if there's a trailing space in your file, this code will still match the word (make it strip() if you want to match even with leading blanks)
If you want to match a lot of words, consider creating a set of lines:
lines = {line.rstrip() for line in f}
so your in lines call will be a lot faster.
Try it:
PATH = "C:\\Users\\JoshLaptop\\PycharmProjects\\practice\\commented.txt"
file = open(PATH, 'r')
words = file.read().splitlines()
print(words)
if 'bah' not in words:
print("fail")
You can't read the file two times.
When you do print(file.read().splitlines()), the file is read and the next call to this function will return nothing because you are already at the end of file.
PATH = "your_file"
file = open(PATH, 'r')
words = ['bah', 'dah', 'gah', "fah", 'mah']
if 'bah' not in (file.read().splitlines()) :
print("fail")
as you can see output is not 'fail' you must use one 'file.read().splitlines()' in code or save it in another variable otherwise you have an 'fail' message
I have three short JSON text files. I want to combine them with Python, and as far as it works and creates an output file with everything on the right place, on the last line I have a comma, and I would like to replace it with } . I have came up with such a code:
def join_json_file (file_name_list,output_file_name):
with open(output_file_name,"w") as file_out:
file_out.write('{')
for filename in file_name_list:
with open(filename) as infile:
file_out.write(infile.read()[1:-1] + ",")
with open(output_file_name,"r") as file_out:
lines = file_out.readlines()
print lines[-1]
lines[-1] = lines[-1].replace(",","")
but it doesn't replace the last line. Could somebody help me? I am new to Python and I can't find the solution by myself.
You are writing all of the files, and then loading it back in to change the last line. The change though will only be in memory, not in the file itself. The better approach would be to avoid writing the extra , in the first place. For example:
def join_json_file (file_name_list, output_file_name):
with open(output_file_name, "w") as file_out:
file_out.write('{')
for filename in file_name_list[:-1]:
with open(filename) as infile:
file_out.write(infile.read()[1:-1] + ",")
with open(file_name_list[-1]) as infile:
file_out.write(infile.read()[1:-1])
This first writes all but the last file with the extra comma, and then writes the last file seperately. You might also want to check for the case of a single file.
As a practice, I am learning to reading a file.
As is obvious from code, hopefully, I have a file in working/root whatever directory. I need to read it and print it.
my_file=open("new.txt","r")
lengt=sum(1 for line in my_file)
for i in range(0,lengt-1):
myline=my_file.readlines(1)[0]
print(myline)
my_file.close()
This returns error and says out of range.
The text file simply contains statements like
line one
line two
line three
.
.
.
Everything same, I tried myline=my_file.readline(). I get empty 7 lines.
My guess is that while using for line in my_file, I read up the lines. So reached end of document. To get same result as I desire, I do I overcome this?
P.S. if it mattersm it's python 3.3
No need to count along. Python does it for you:
my_file = open("new.txt","r")
for myline in my_file:
print(myline)
Details:
my_file is an iterator. This a special object that allows to iterate over it.
You can also access a single line:
line 1 = next(my_file)
gives you the first line assuming you just opened the file. Doing it again:
line 2 = next(my_file)
you get the second line. If you now iterate over it:
for myline in my_file:
# do something
it will start at line 3.
Stange extra lines?
print(myline)
will likely print an extra empty line. This is due to a newline read from the file and a newline added by print(). Solution:
Python 3:
print(myline, end='')
Python 2:
print myline, # note the trailing comma.
Playing it save
Using the with statement like this:
with open("new.txt", "r") as my_file:
for myline in my_file:
print(myline)
# my_file is open here
# my_file is closed here
you don't need to close the file as it done as soon you leave the context, i.e. as soon as you continue with your code an the same level as the with statement.
You can actually take care of all of this at once by iterating over the file contents:
my_file = open("new.txt", "r")
length = 0
for line in my_file:
length += 1
print(line)
my_file.close()
At the end, you will have printed all of the lines, and length will contain the number of lines in the file. (If you don't specifically need to know length, there's really no need for it!)
Another way to do it, which will close the file for you (and, in fact, will even close the file if an exception is raised):
length = 0
with open("new.txt", "r") as my_file:
for line in my_file:
length += 1
print(line)
I have a file named a.txt which looks like this:
I'm the first line
I'm the second line.
There may be more lines here.
I'm below an empty line.
I'm a line.
More lines here.
Now, I want to remove the contents above the empty line(including the empty line itself).
How could I do this in a Pythonic way?
Basically you can't delete stuff from the beginning of a file, so you will have to write to a new file.
I think the pythonic way looks like this:
# get a iterator over the lines in the file:
with open("input.txt", 'rt') as lines:
# while the line is not empty drop it
for line in lines:
if not line.strip():
break
# now lines is at the point after the first paragraph
# so write out everything from here
with open("output.txt", 'wt') as out:
out.writelines(lines)
Here are some simpler versions of this, without with for older Python versions:
lines = open("input.txt", 'rt')
for line in lines:
if not line.strip():
break
open("output.txt", 'wt').writelines(lines)
and a very straight forward version that simply splits the file at the empty line:
# first, read everything from the old file
text = open("input.txt", 'rt').read()
# split it at the first empty line ("\n\n")
first, rest = text.split('\n\n',1)
# make a new file and write the rest
open("output.txt", 'wt').write(rest)
Note that this can be pretty fragile, for example windows often uses \r\n as a single linebreak, so a empty line would be \r\n\r\n instead. But often you know the format of the file uses one kind of linebreaks only, so this could be fine.
Naive approach by iterating over the lines in the file one by one top to bottom:
#!/usr/bin/env python
with open("4692065.txt", 'r') as src, open("4692065.cut.txt", "w") as dest:
keep = False
for line in src:
if keep: dest.write(line)
if line.strip() == '': keep = True
The fileinput module (from the standard library) is convenient for this kind of thing. It sets things up so you can act as though your are editing the file "in-place":
import fileinput
import sys
fileobj=iter(fileinput.input(['a.txt'], inplace=True))
# iterate through the file until you find an empty line.
for line in fileobj:
if not line.strip():
break
# Iterators (like `fileobj`) pick up where they left off.
# Starting a new for-loop saves you one `if` statement and boolean variable.
for line in fileobj:
sys.stdout.write(line)
Any idea how big the file is going to be?
You could read the file into memory:
f = open('your_file', 'r')
lines = f.readlines()
which will read the file line by line and store those lines in a list (lines).
Then, close the file and reopen with 'w':
f.close()
f = open('your_file', 'w')
for line in lines:
if your_if_here:
f.write(line)
This will overwrite the current file. Then you can pick and choose which lines from the list you want to write back in. Probably not a very good idea if the file gets to large though, since the entire file has to reside in memory. But, it doesn't require that you create a second file to dump your output.
from itertools import dropwhile, islice
def content_after_emptyline(file_object):
return islice(dropwhile(lambda line: line.strip(), file_object), 1, None)
with open("filename") as f:
for line in content_after_emptyline(f):
print line,
You could do a little something like this:
with open('a.txt', 'r') as file:
lines = file.readlines()
blank_line = lines.index('\n')
lines = lines[blank_line+1:] #\n is the index of the blank line
with open('a.txt', 'w') as file:
file.write('\n'.join(lines))
and that makes the job much simpler.