Reading a string from an opened text file [duplicate]

Reading a string from an opened text file [duplicate] - python

I have seen these two ways to process a file:
file = open("file.txt")
for line in file:
#do something
file = open("file.txt")
contents = file.read()
for line in contents:
# do something
I know that in the first case, the file will act like a list, so the for loop iterates over the file as if it were a list. What exactly happens in the second case, where we read the file and then iterate over the contents? What are the consequences of taking each approach, and how should I choose between them?

In the first one you are iterating over the file, line by line. In this scenario, the entire file data is not read into the memory at once; instead, only the current line is read into memory. This is useful for handling very large files, and good for robustness if you don't know if the file is going to be large or not.
In the second one, file.read() returns the complete file data as a string. When you are iterating over it, you are actually iterating over the file's data character by character. This reads the complete file data into memory.
Here's an example to show this behavior.
a.txt file contains
Hello
Bye
Code:
>>> f = open('a.txt','r')
>>> for l in f:
... print(l)
...
Hello
Bye
>>> f = open('a.txt','r')
>>> r = f.read()
>>> print(repr(r))
'Hello\nBye'
>>> for c in r:
... print(c)
...
H
e
l
l
o
B
y
e

The second case reads in the contents of the file into one big string. If you iterate over a string, you get each character in turn. If you want to get each line in turn, you can do this:
for line in contents.split('\n'):
# do something
Or you can read in the contents as a list of lines using readlines() instead of read().
with open('file.txt','r') as fin:
lines = fin.readlines()
for line in lines:
# do something

Related

Python: How to delete line from text file [duplicate]

Let's say I have a text file full of nicknames. How can I delete a specific nickname from this file, using Python?

First, open the file and get all your lines from the file. Then reopen the file in write mode and write your lines back, except for the line you want to delete:
with open("yourfile.txt", "r") as f:
lines = f.readlines()
with open("yourfile.txt", "w") as f:
for line in lines:
if line.strip("\n") != "nickname_to_delete":
f.write(line)
You need to strip("\n") the newline character in the comparison because if your file doesn't end with a newline character the very last line won't either.

Solution to this problem with only a single open:
with open("target.txt", "r+") as f:
d = f.readlines()
f.seek(0)
for i in d:
if i != "line you want to remove...":
f.write(i)
f.truncate()
This solution opens the file in r/w mode ("r+") and makes use of seek to reset the f-pointer then truncate to remove everything after the last write.

The best and fastest option, rather than storing everything in a list and re-opening the file to write it, is in my opinion to re-write the file elsewhere.
with open("yourfile.txt", "r") as file_input:
with open("newfile.txt", "w") as output:
for line in file_input:
if line.strip("\n") != "nickname_to_delete":
output.write(line)
That's it! In one loop and one only you can do the same thing. It will be much faster.

This is a "fork" from #Lother's answer (which I believe that should be considered the right answer).
For a file like this:
$ cat file.txt
1: october rust
2: november rain
3: december snow
This fork from Lother's solution works fine:
#!/usr/bin/python3.4
with open("file.txt","r+") as f:
new_f = f.readlines()
f.seek(0)
for line in new_f:
if "snow" not in line:
f.write(line)
f.truncate()
Improvements:
with open, which discard the usage of f.close()
more clearer if/else for evaluating if string is not present in the current line

The issue with reading lines in first pass and making changes (deleting specific lines) in the second pass is that if you file sizes are huge, you will run out of RAM. Instead, a better approach is to read lines, one by one, and write them into a separate file, eliminating the ones you don't need. I have run this approach with files as big as 12-50 GB, and the RAM usage remains almost constant. Only CPU cycles show processing in progress.

I liked the fileinput approach as explained in this answer:
Deleting a line from a text file (python)
Say for example I have a file which has empty lines in it and I want to remove empty lines, here's how I solved it:
import fileinput
import sys
for line_number, line in enumerate(fileinput.input('file1.txt', inplace=1)):
if len(line) > 1:
sys.stdout.write(line)
Note: The empty lines in my case had length 1

If you use Linux, you can try the following approach.
Suppose you have a text file named animal.txt:
$ cat animal.txt
dog
pig
cat
monkey
elephant
Delete the first line:
>>> import subprocess
>>> subprocess.call(['sed','-i','/.*dog.*/d','animal.txt'])
then
$ cat animal.txt
pig
cat
monkey
elephant

Probably, you already got a correct answer, but here is mine.
Instead of using a list to collect unfiltered data (what readlines() method does), I use two files. One is for hold a main data, and the second is for filtering the data when you delete a specific string. Here is a code:
main_file = open('data_base.txt').read() # your main dataBase file
filter_file = open('filter_base.txt', 'w')
filter_file.write(main_file)
filter_file.close()
main_file = open('data_base.txt', 'w')
for line in open('filter_base'):
if 'your data to delete' not in line: # remove a specific string
main_file.write(line) # put all strings back to your db except deleted
else: pass
main_file.close()
Hope you will find this useful! :)

I think if you read the file into a list, then do the you can iterate over the list to look for the nickname you want to get rid of. You can do it much efficiently without creating additional files, but you'll have to write the result back to the source file.
Here's how I might do this:
import, os, csv # and other imports you need
nicknames_to_delete = ['Nick', 'Stephen', 'Mark']
I'm assuming nicknames.csv contains data like:
Nick
Maria
James
Chris
Mario
Stephen
Isabella
Ahmed
Julia
Mark
...
Then load the file into the list:
nicknames = None
with open("nicknames.csv") as sourceFile:
nicknames = sourceFile.read().splitlines()
Next, iterate over to list to match your inputs to delete:
for nick in nicknames_to_delete:
try:
if nick in nicknames:
nicknames.pop(nicknames.index(nick))
else:
print(nick + " is not found in the file")
except ValueError:
pass
Lastly, write the result back to file:
with open("nicknames.csv", "a") as nicknamesFile:
nicknamesFile.seek(0)
nicknamesFile.truncate()
nicknamesWriter = csv.writer(nicknamesFile)
for name in nicknames:
nicknamesWriter.writeRow([str(name)])
nicknamesFile.close()

In general, you can't; you have to write the whole file again (at least from the point of change to the end).
In some specific cases you can do better than this -
if all your data elements are the same length and in no specific order, and you know the offset of the one you want to get rid of, you could copy the last item over the one to be deleted and truncate the file before the last item;
or you could just overwrite the data chunk with a 'this is bad data, skip it' value or keep a 'this item has been deleted' flag in your saved data elements such that you can mark it deleted without otherwise modifying the file.
This is probably overkill for short documents (anything under 100 KB?).

I like this method using fileinput and the 'inplace' method:
import fileinput
for line in fileinput.input(fname, inplace =1):
line = line.strip()
if not 'UnwantedWord' in line:
print(line)
It's a little less wordy than the other answers and is fast enough for

Save the file lines in a list, then remove of the list the line you want to delete and write the remain lines to a new file
with open("file_name.txt", "r") as f:
lines = f.readlines()
lines.remove("Line you want to delete\n")
with open("new_file.txt", "w") as new_f:
for line in lines:
new_f.write(line)

here's some other method to remove a/some line(s) from a file:
src_file = zzzz.txt
f = open(src_file, "r")
contents = f.readlines()
f.close()
contents.pop(idx) # remove the line item from list, by line number, starts from 0
f = open(src_file, "w")
contents = "".join(contents)
f.write(contents)
f.close()

You can use the re library
Assuming that you are able to load your full txt-file. You then define a list of unwanted nicknames and then substitute them with an empty string "".
# Delete unwanted characters
import re
# Read, then decode for py2 compat.
path_to_file = 'data/nicknames.txt'
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# Define unwanted nicknames and substitute them
unwanted_nickname_list = ['SourDough']
text = re.sub("|".join(unwanted_nickname_list), "", text)

Do you want to remove a specific line from file so use this snippet short and simple code you can easily remove any line with sentence or prefix(Symbol).
with open("file_name.txt", "r") as f:
lines = f.readlines()
with open("new_file.txt", "w") as new_f:
for line in lines:
if not line.startswith("write any sentence or symbol to remove line"):
new_f.write(line)

To delete a specific line of a file by its line number:
Replace variables filename and line_to_delete with the name of your file and the line number you want to delete.
filename = 'foo.txt'
line_to_delete = 3
initial_line = 1
file_lines = {}
with open(filename) as f:
content = f.readlines()
for line in content:
file_lines[initial_line] = line.strip()
initial_line += 1
f = open(filename, "w")
for line_number, line_content in file_lines.items():
if line_number != line_to_delete:
f.write('{}\n'.format(line_content))
f.close()
print('Deleted line: {}'.format(line_to_delete))
Example output:
Deleted line: 3

Take the contents of the file, split it by newline into a tuple. Then, access your tuple's line number, join your result tuple, and overwrite to the file.

How to use read next() starting from any line in python?

I'm trying to start reading some file from line 3, but I can't.
I've tried to use readlines() + the index number of the line, as seen bellow:
x = 2
f = open('urls.txt', "r+").readlines( )[x]
line = next(f)
print(line)
but I get this result:
Traceback (most recent call last):
File "test.py", line 441, in <module>
line = next(f)
TypeError: 'str' object is not an iterator
I would like to be able to set any line, as a variable, and from there, all the time that I use next() it goes to the next line.
IMPORTANT: as this is a new feature and all my code already uses next(f), the solution needs to be able to work with it.

Try this (uses itertools.islice):
from itertools import islice
f = open('urls.txt', 'r+')
start_at = 3
file_iterator = islice(f, start_at - 1, None)
# to demonstrate
while True:
try:
print(next(file_iterator), end='')
except StopIteration:
print('End of file!')
break
f.close()
urls.txt:
1
2
3
4
5
Output:
3
4
5
End of file!
This solution is better than readlines because it doesn't load the entire file into memory and only loads parts of it when needed. It also doesn't waste time iterating previous lines when islice can do that, making it much faster than #MadPhysicist's answer.
Also, consider using the with syntax to guarantee the file gets closed:
with open('urls.txt', 'r+') as f:
# do whatever

The readlines method returns a list of strings for the lines. So when you take readlines()[2] you're getting the third line, as a string. Calling next on that string then makes no sense, so you get an error.
The easiest way to do this is to slice the list: readlines()[x:] gives a list of everything from line x onwards. Then you can use that list however you like.
If you have your heart set on an iterator, you can turn a list (or pretty much anything) into an iterator with the iter builtin function. Then you can next it to your heart's content.

The following code will allow you to use an iterator to print the first line:
In [1]: path = '<path to text file>'
In [2]: f = open(path, "r+")
In [3]: line = next(f)
In [4]: print(line)
This code will allow you to print the lines starting from the xth line:
In [1]: path = '<path to text file>'
In [2]: x = 2
In [3]: f = iter(open(path, "r+").readlines()[x:])
In [4]: f = iter(f)
In [5]: line = next(f)
In [6]: print(line)
Edit: Edited the solution based on #Tomothy32's observation.

The line you printed returns a string:
open('urls.txt', "r+").readlines()[x]
open returns a file object. Its readlines method returns a list of strings. Indexing with [x] returns the third line in the file as a single string.
The first problem is that you open the file without closing it. The second is that your index doesn't specify a range of lines until the end. Here's an incremental improvement:
with open('urls.txt', 'r+') as f:
lines = f.readlines()[x:]
Now lines is a list of all the lines you want. But you first read the whole file into memory, then discarded the first two lines. Also, a list is an iterable, not an iterator, so to use next on it effectively, you'd need to take an extra step:
lines = iter(lines)
If you want to harness the fact that the file is already a rather efficient iterator, apply next to it as many times as you need to discard unwanted lines:
with open('urls.txt', 'r+') as f:
for _ in range(x):
next(f)
# now use the file
print(next(f))
After the for loop, any read operation you do on the file will start from the third line, whether it be next(f), f.readline(), etc.
There are a few other ways to strip the first lines. In all cases, including the example above, next(f) can be replaced with f.readline():
for n, _ in enumerate(f):
if n == x:
break
or
for _ in zip(f, range(x)): pass
After you run either of these loops, next(f) will return the xth line.

Just call next(f) as many times as you need to. (There's no need to overcomplicate this with itertools, nor to slurp the entire file with readlines.)
lines_to_skip = 3
with open('urls.txt') as f:
for _ in range(lines_to_skip):
next(f)
for line in f:
print(line.strip())
Output:
% cat urls.txt
url1
url2
url3
url4
url5
% python3 test.py
url4
url5

Deleting specific line from a text file in Python [duplicate]

Let's say I have a text file full of nicknames. How can I delete a specific nickname from this file, using Python?

First, open the file and get all your lines from the file. Then reopen the file in write mode and write your lines back, except for the line you want to delete:
with open("yourfile.txt", "r") as f:
lines = f.readlines()
with open("yourfile.txt", "w") as f:
for line in lines:
if line.strip("\n") != "nickname_to_delete":
f.write(line)
You need to strip("\n") the newline character in the comparison because if your file doesn't end with a newline character the very last line won't either.

Solution to this problem with only a single open:
with open("target.txt", "r+") as f:
d = f.readlines()
f.seek(0)
for i in d:
if i != "line you want to remove...":
f.write(i)
f.truncate()
This solution opens the file in r/w mode ("r+") and makes use of seek to reset the f-pointer then truncate to remove everything after the last write.

The best and fastest option, rather than storing everything in a list and re-opening the file to write it, is in my opinion to re-write the file elsewhere.
with open("yourfile.txt", "r") as file_input:
with open("newfile.txt", "w") as output:
for line in file_input:
if line.strip("\n") != "nickname_to_delete":
output.write(line)
That's it! In one loop and one only you can do the same thing. It will be much faster.

This is a "fork" from #Lother's answer (which I believe that should be considered the right answer).
For a file like this:
$ cat file.txt
1: october rust
2: november rain
3: december snow
This fork from Lother's solution works fine:
#!/usr/bin/python3.4
with open("file.txt","r+") as f:
new_f = f.readlines()
f.seek(0)
for line in new_f:
if "snow" not in line:
f.write(line)
f.truncate()
Improvements:
with open, which discard the usage of f.close()
more clearer if/else for evaluating if string is not present in the current line

The issue with reading lines in first pass and making changes (deleting specific lines) in the second pass is that if you file sizes are huge, you will run out of RAM. Instead, a better approach is to read lines, one by one, and write them into a separate file, eliminating the ones you don't need. I have run this approach with files as big as 12-50 GB, and the RAM usage remains almost constant. Only CPU cycles show processing in progress.

I liked the fileinput approach as explained in this answer:
Deleting a line from a text file (python)
Say for example I have a file which has empty lines in it and I want to remove empty lines, here's how I solved it:
import fileinput
import sys
for line_number, line in enumerate(fileinput.input('file1.txt', inplace=1)):
if len(line) > 1:
sys.stdout.write(line)
Note: The empty lines in my case had length 1

If you use Linux, you can try the following approach.
Suppose you have a text file named animal.txt:
$ cat animal.txt
dog
pig
cat
monkey
elephant
Delete the first line:
>>> import subprocess
>>> subprocess.call(['sed','-i','/.*dog.*/d','animal.txt'])
then
$ cat animal.txt
pig
cat
monkey
elephant

Probably, you already got a correct answer, but here is mine.
Instead of using a list to collect unfiltered data (what readlines() method does), I use two files. One is for hold a main data, and the second is for filtering the data when you delete a specific string. Here is a code:
main_file = open('data_base.txt').read() # your main dataBase file
filter_file = open('filter_base.txt', 'w')
filter_file.write(main_file)
filter_file.close()
main_file = open('data_base.txt', 'w')
for line in open('filter_base'):
if 'your data to delete' not in line: # remove a specific string
main_file.write(line) # put all strings back to your db except deleted
else: pass
main_file.close()
Hope you will find this useful! :)

I think if you read the file into a list, then do the you can iterate over the list to look for the nickname you want to get rid of. You can do it much efficiently without creating additional files, but you'll have to write the result back to the source file.
Here's how I might do this:
import, os, csv # and other imports you need
nicknames_to_delete = ['Nick', 'Stephen', 'Mark']
I'm assuming nicknames.csv contains data like:
Nick
Maria
James
Chris
Mario
Stephen
Isabella
Ahmed
Julia
Mark
...
Then load the file into the list:
nicknames = None
with open("nicknames.csv") as sourceFile:
nicknames = sourceFile.read().splitlines()
Next, iterate over to list to match your inputs to delete:
for nick in nicknames_to_delete:
try:
if nick in nicknames:
nicknames.pop(nicknames.index(nick))
else:
print(nick + " is not found in the file")
except ValueError:
pass
Lastly, write the result back to file:
with open("nicknames.csv", "a") as nicknamesFile:
nicknamesFile.seek(0)
nicknamesFile.truncate()
nicknamesWriter = csv.writer(nicknamesFile)
for name in nicknames:
nicknamesWriter.writeRow([str(name)])
nicknamesFile.close()

In general, you can't; you have to write the whole file again (at least from the point of change to the end).
In some specific cases you can do better than this -
if all your data elements are the same length and in no specific order, and you know the offset of the one you want to get rid of, you could copy the last item over the one to be deleted and truncate the file before the last item;
or you could just overwrite the data chunk with a 'this is bad data, skip it' value or keep a 'this item has been deleted' flag in your saved data elements such that you can mark it deleted without otherwise modifying the file.
This is probably overkill for short documents (anything under 100 KB?).

I like this method using fileinput and the 'inplace' method:
import fileinput
for line in fileinput.input(fname, inplace =1):
line = line.strip()
if not 'UnwantedWord' in line:
print(line)
It's a little less wordy than the other answers and is fast enough for

Save the file lines in a list, then remove of the list the line you want to delete and write the remain lines to a new file
with open("file_name.txt", "r") as f:
lines = f.readlines()
lines.remove("Line you want to delete\n")
with open("new_file.txt", "w") as new_f:
for line in lines:
new_f.write(line)

here's some other method to remove a/some line(s) from a file:
src_file = zzzz.txt
f = open(src_file, "r")
contents = f.readlines()
f.close()
contents.pop(idx) # remove the line item from list, by line number, starts from 0
f = open(src_file, "w")
contents = "".join(contents)
f.write(contents)
f.close()

You can use the re library
Assuming that you are able to load your full txt-file. You then define a list of unwanted nicknames and then substitute them with an empty string "".
# Delete unwanted characters
import re
# Read, then decode for py2 compat.
path_to_file = 'data/nicknames.txt'
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# Define unwanted nicknames and substitute them
unwanted_nickname_list = ['SourDough']
text = re.sub("|".join(unwanted_nickname_list), "", text)

Do you want to remove a specific line from file so use this snippet short and simple code you can easily remove any line with sentence or prefix(Symbol).
with open("file_name.txt", "r") as f:
lines = f.readlines()
with open("new_file.txt", "w") as new_f:
for line in lines:
if not line.startswith("write any sentence or symbol to remove line"):
new_f.write(line)

To delete a specific line of a file by its line number:
Replace variables filename and line_to_delete with the name of your file and the line number you want to delete.
filename = 'foo.txt'
line_to_delete = 3
initial_line = 1
file_lines = {}
with open(filename) as f:
content = f.readlines()
for line in content:
file_lines[initial_line] = line.strip()
initial_line += 1
f = open(filename, "w")
for line_number, line_content in file_lines.items():
if line_number != line_to_delete:
f.write('{}\n'.format(line_content))
f.close()
print('Deleted line: {}'.format(line_to_delete))
Example output:
Deleted line: 3

Take the contents of the file, split it by newline into a tuple. Then, access your tuple's line number, join your result tuple, and overwrite to the file.

Modify a string in a text file

In a file I have a names of planets:
sun moon jupiter saturn uranus neptune venus
I would like to say "replace saturn with sun". I have tried to write it as a list. I've tried different modes (write, append etc.)
I think I am struggling to understand the concept of iteration, especially when it comes to iterating over a list, dict, or str in file. I know it can be done using csv or json or even pickle module. But my objective is to get the grasp of iteration using for...loop to modify a txt file. And I want to do that using .txt file only.
with open('planets.txt', 'r+')as myfile:
for line in myfile.readlines():
if 'saturn' in line:
a = line.replace('saturn', 'sun')
myfile.write(str(a))
else:
print(line.strip())

Try this but keep in mind if you use string.replace method it will replace for example testsaturntest to testsuntest, you should use regex instead:
In [1]: cat planets.txt
saturn
In [2]: s = open("planets.txt").read()
In [3]: s = s.replace('saturn', 'sun')
In [4]: f = open("planets.txt", 'w')
In [5]: f.write(s)
In [6]: f.close()
In [7]: cat planets.txt
sun

This replaces the data in the file with the replacement you want and prints the values out:
with open('planets.txt', 'r+') as myfile:
lines = myfile.readlines()
modified_lines = map(lambda line: line.replace('saturn', 'sun'), lines)
with open('planets.txt', 'w') as f:
for line in modified_lines:
f.write(line)
print(line.strip())
Replacing the lines in-file is quite tricky, so instead I read the file, replaced the files and wrote them back to the file.

If you just want to replace the word in the file, you can do it like this:
import re
lines = open('planets.txt', 'r').readlines()
newlines = [re.sub(r'\bsaturn\b', 'sun', l) for l in lines]
open('planets.txt', 'w').writelines(newlines)

f = open("planets.txt","r+")
lines = f.readlines() #Read all lines
f.seek(0, 0); # Go to first char position
for line in lines: # get a single line
f.write(line.replace("saturn", "sun")) #replace and write
f.close()
I think its a clear guide :) You can find everything for this.

I have not tested your code but the issue with r+ is that you need to keep track of where you are in the file so that you can reset the file position so that you replace the current line instead of writing the replacement afterwords. I suggest creating a variable to keep track of where you are in the file so that you can call myfile.seek()

Remove substring from a string in python

I have got a file in python with filenames. I want to delete some lines and some substirng of the filename using python code. My file format is the above:
img/1.jpg
img/10.jpg
img/100.jpg 0 143 84 227
...
I want to delete the img/substring from all the file and the lines where the coordinates are missing. For the second task I did the following:
for con in content:
if ".jpg\n" in con:
content.remove(con)
for con in content:
print con
However content didn't change.

You're attempting to modify the list content while iterating over it. This will very quickly bite you in the knees.
Instead, in python you generate a new list:
>>> content = [fn for fn in content if not fn.endswith(".jpg\n")]
>>>
After this you can overwrite the file you read from with the contents from... contents. The above example assumes there is no whitespace to accomodate for in between the filename and the newline.

The error in your current method is because you are iterating through each line by letter, for l in somestring: will go letter by letter. Obviously, a ".jpg\n" won't be in a single letter, so you never hit content.remove(con).
I would suggest a slightly different approach:
with open("fileofdata.txt", 'r') as f:
content = [line for line in f.readlines() if len(line.split()) > 1]
Using len(line.split()) is more robust than line.endswith() because it allows for withspace between .jpg and \n.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reading a string from an opened text file [duplicate] - python

Related

Python: How to delete line from text file [duplicate]

How to use read next() starting from any line in python?

Deleting specific line from a text file in Python [duplicate]

Modify a string in a text file

Remove substring from a string in python

Categories

Resources