How to loop through consecutive lines in python? - python

I have a text file that, for the sake of simplicity, contains:
cat
dog
goat
giraffe
walrus
elephant
How can I create a script that would set a variable, animal in this case, to the first line in the text file, print animal, but then do the whole thing again, but make animal set to the next line (in this instance, dog).
Here's what I've tried so far:
while True:
with open('./text.txt','r') as f:
for i in enumerate('./text.txt'):
if i in lines:
print(lines)

If you want to read the file one line at a time (which may be needed for large files):
with open('./text.txt','r') as f:
line = True
# this will stop when there is nothing left to read, as line will be ''
# note that an 'empty' line will still have a line ending, i.e. '\n'
while line:
line = f.readline()
print(line)
If you don't care about the size, you can read all lines at once with .readlines() and just loop over the returned values from that.

Use readlines to store each line in a list.
with open('file.txt','r') as f:
animals = f.readlines()
for animal in animals:
print(animal.strip())

You could try the following:
with open('./text.txt') as f:
for animal in f.readlines():
print(animal.strip())

Related

Issue with a skiping of first row

I understand that it skips the first row in the file because of header, but how can I avoid it? The syntax must be exactly the same as it is below.
File contains: Rabbit, Pig, Dog, Horse, Bird
try:
file = open("file.txt")
line = file.readline()
animals = []
for line in file:
animals.append(line.rstrip())
animals.sort()
print(animals)
finally:
file.close()
Output is ['Bird', 'Dog', 'Horse', 'Pig']
Every time readline() is called, it reads a line from a file and moves the cursor to the beggining of the next line.
In your code, line = file.readline() reads the first line and moves the cursor to the second line. As a result, your for loop starts from the second line of the file. If there is no particular reason you need it, just delete it. If you do need it, just append the line variable in the list and then do the sorting.
The problems is the line variable, it reads a line and it is never used. You should consider using another name for the variable file because it is a keyword. Also it is good practice to open files in this format:
with open("file.txt") as f:
f.read()
This should work.
try:
f = open("file.txt")
animals = []
for line in f:
animals.append(line.rstrip())
animals.sort()
print(animals)
finally:
f.close()
you are reading the first line and you are not doing something, you can include the first line in your output list: animals = [line.rstrip()]
or you can use the context manager:
with open('file.txt') as fp:
animals = sorted(l.rstrip() for l in fp)

Python: How to delete line from text file [duplicate]

Let's say I have a text file full of nicknames. How can I delete a specific nickname from this file, using Python?
First, open the file and get all your lines from the file. Then reopen the file in write mode and write your lines back, except for the line you want to delete:
with open("yourfile.txt", "r") as f:
lines = f.readlines()
with open("yourfile.txt", "w") as f:
for line in lines:
if line.strip("\n") != "nickname_to_delete":
f.write(line)
You need to strip("\n") the newline character in the comparison because if your file doesn't end with a newline character the very last line won't either.
Solution to this problem with only a single open:
with open("target.txt", "r+") as f:
d = f.readlines()
f.seek(0)
for i in d:
if i != "line you want to remove...":
f.write(i)
f.truncate()
This solution opens the file in r/w mode ("r+") and makes use of seek to reset the f-pointer then truncate to remove everything after the last write.
The best and fastest option, rather than storing everything in a list and re-opening the file to write it, is in my opinion to re-write the file elsewhere.
with open("yourfile.txt", "r") as file_input:
with open("newfile.txt", "w") as output:
for line in file_input:
if line.strip("\n") != "nickname_to_delete":
output.write(line)
That's it! In one loop and one only you can do the same thing. It will be much faster.
This is a "fork" from #Lother's answer (which I believe that should be considered the right answer).
For a file like this:
$ cat file.txt
1: october rust
2: november rain
3: december snow
This fork from Lother's solution works fine:
#!/usr/bin/python3.4
with open("file.txt","r+") as f:
new_f = f.readlines()
f.seek(0)
for line in new_f:
if "snow" not in line:
f.write(line)
f.truncate()
Improvements:
with open, which discard the usage of f.close()
more clearer if/else for evaluating if string is not present in the current line
The issue with reading lines in first pass and making changes (deleting specific lines) in the second pass is that if you file sizes are huge, you will run out of RAM. Instead, a better approach is to read lines, one by one, and write them into a separate file, eliminating the ones you don't need. I have run this approach with files as big as 12-50 GB, and the RAM usage remains almost constant. Only CPU cycles show processing in progress.
I liked the fileinput approach as explained in this answer:
Deleting a line from a text file (python)
Say for example I have a file which has empty lines in it and I want to remove empty lines, here's how I solved it:
import fileinput
import sys
for line_number, line in enumerate(fileinput.input('file1.txt', inplace=1)):
if len(line) > 1:
sys.stdout.write(line)
Note: The empty lines in my case had length 1
If you use Linux, you can try the following approach.
Suppose you have a text file named animal.txt:
$ cat animal.txt
dog
pig
cat
monkey
elephant
Delete the first line:
>>> import subprocess
>>> subprocess.call(['sed','-i','/.*dog.*/d','animal.txt'])
then
$ cat animal.txt
pig
cat
monkey
elephant
Probably, you already got a correct answer, but here is mine.
Instead of using a list to collect unfiltered data (what readlines() method does), I use two files. One is for hold a main data, and the second is for filtering the data when you delete a specific string. Here is a code:
main_file = open('data_base.txt').read() # your main dataBase file
filter_file = open('filter_base.txt', 'w')
filter_file.write(main_file)
filter_file.close()
main_file = open('data_base.txt', 'w')
for line in open('filter_base'):
if 'your data to delete' not in line: # remove a specific string
main_file.write(line) # put all strings back to your db except deleted
else: pass
main_file.close()
Hope you will find this useful! :)
I think if you read the file into a list, then do the you can iterate over the list to look for the nickname you want to get rid of. You can do it much efficiently without creating additional files, but you'll have to write the result back to the source file.
Here's how I might do this:
import, os, csv # and other imports you need
nicknames_to_delete = ['Nick', 'Stephen', 'Mark']
I'm assuming nicknames.csv contains data like:
Nick
Maria
James
Chris
Mario
Stephen
Isabella
Ahmed
Julia
Mark
...
Then load the file into the list:
nicknames = None
with open("nicknames.csv") as sourceFile:
nicknames = sourceFile.read().splitlines()
Next, iterate over to list to match your inputs to delete:
for nick in nicknames_to_delete:
try:
if nick in nicknames:
nicknames.pop(nicknames.index(nick))
else:
print(nick + " is not found in the file")
except ValueError:
pass
Lastly, write the result back to file:
with open("nicknames.csv", "a") as nicknamesFile:
nicknamesFile.seek(0)
nicknamesFile.truncate()
nicknamesWriter = csv.writer(nicknamesFile)
for name in nicknames:
nicknamesWriter.writeRow([str(name)])
nicknamesFile.close()
In general, you can't; you have to write the whole file again (at least from the point of change to the end).
In some specific cases you can do better than this -
if all your data elements are the same length and in no specific order, and you know the offset of the one you want to get rid of, you could copy the last item over the one to be deleted and truncate the file before the last item;
or you could just overwrite the data chunk with a 'this is bad data, skip it' value or keep a 'this item has been deleted' flag in your saved data elements such that you can mark it deleted without otherwise modifying the file.
This is probably overkill for short documents (anything under 100 KB?).
I like this method using fileinput and the 'inplace' method:
import fileinput
for line in fileinput.input(fname, inplace =1):
line = line.strip()
if not 'UnwantedWord' in line:
print(line)
It's a little less wordy than the other answers and is fast enough for
Save the file lines in a list, then remove of the list the line you want to delete and write the remain lines to a new file
with open("file_name.txt", "r") as f:
lines = f.readlines()
lines.remove("Line you want to delete\n")
with open("new_file.txt", "w") as new_f:
for line in lines:
new_f.write(line)
here's some other method to remove a/some line(s) from a file:
src_file = zzzz.txt
f = open(src_file, "r")
contents = f.readlines()
f.close()
contents.pop(idx) # remove the line item from list, by line number, starts from 0
f = open(src_file, "w")
contents = "".join(contents)
f.write(contents)
f.close()
You can use the re library
Assuming that you are able to load your full txt-file. You then define a list of unwanted nicknames and then substitute them with an empty string "".
# Delete unwanted characters
import re
# Read, then decode for py2 compat.
path_to_file = 'data/nicknames.txt'
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# Define unwanted nicknames and substitute them
unwanted_nickname_list = ['SourDough']
text = re.sub("|".join(unwanted_nickname_list), "", text)
Do you want to remove a specific line from file so use this snippet short and simple code you can easily remove any line with sentence or prefix(Symbol).
with open("file_name.txt", "r") as f:
lines = f.readlines()
with open("new_file.txt", "w") as new_f:
for line in lines:
if not line.startswith("write any sentence or symbol to remove line"):
new_f.write(line)
To delete a specific line of a file by its line number:
Replace variables filename and line_to_delete with the name of your file and the line number you want to delete.
filename = 'foo.txt'
line_to_delete = 3
initial_line = 1
file_lines = {}
with open(filename) as f:
content = f.readlines()
for line in content:
file_lines[initial_line] = line.strip()
initial_line += 1
f = open(filename, "w")
for line_number, line_content in file_lines.items():
if line_number != line_to_delete:
f.write('{}\n'.format(line_content))
f.close()
print('Deleted line: {}'.format(line_to_delete))
Example output:
Deleted line: 3
Take the contents of the file, split it by newline into a tuple. Then, access your tuple's line number, join your result tuple, and overwrite to the file.

How can I write the lines from the first text file that are not present in the second text file?

I would like to compare two text files. The first text file has lines that aren't in the second text file. I would like to copy these lines and write them to a new txt file. I would like a Python script for this as I do this a lot and do not want to go online constantly to find these new lines. I do not need to acknowledge if there is something in file2 that is not in file1.
I have wrote some code that seems to work inconsistently. I am unsure what I am doing wrong.
newLines = open("file1.txt", "r")
originalLines = open("file2.txt", "r")
output = open("output.txt", "w")
lines1 = newLines.readlines()
lines2 = originalLines.readlines()
newLines.close()
originalLines.close()
duplicate = False
for line in lines1:
if line.isspace():
continue
for line2 in lines2:
if line == line2:
duplicate = True
break
if duplicate == False:
output.write(line)
else:
duplicate = False
output.close()
For file1.txt:
Man
Dog
Axe
Cat
Potato
Farmer
file2.txt:
Man
Dog
Axe
Cat
The output.txt should be:
Potato
Farmer
but it is instead this:
Cat
Potato
Farmer
Any help would be much appreciated!
Based on behavior, file2.txt doesn't end with a newline, so the contents of lines2 is ['Man\n', 'Dog\n', 'Axe\n', 'Cat']. Note the lack of a newline for 'Cat'.
I'd suggest normalizing your lines so they don't have newlines, replacing:
lines1 = newLines.readlines()
lines2 = originalLines.readlines()
with:
lines1 = [line.rstrip('\n') for line in newLines]
# Set comprehension makes lookup cheaper and dedupes
lines2 = {line.rstrip('\n') for line in originalLines}
and changing:
output.write(line)
to:
print(line, file=output)
which will add the newline for you. Really, the best solution is to avoid the inner loop entirely, changing all of this:
for line2 in lines2:
if line == line2:
duplicate = True
break
if duplicate == False:
output.write(line)
else:
duplicate = False
to just:
if line not in lines2:
print(line, file=output)
which, if you use a set for lines2 as I suggest, makes the cost of the test drop from linear in the number of lines in file2.txt to roughly constant no matter the size of file2.txt (as long as the set of unique lines can fit in memory at all).
Even better, use with statements for your open files, and stream file1.txt rather than holding it in memory at all, and you end up with:
with open("file2.txt") as origlines:
lines2 = {line.rstrip('\n') for line in origlines}
with open("file1.txt") as newlines, open("output.txt", "w") as output:
for line in newlines:
line = line.rstrip('\n')
if not line.isspace() and line not in lines2:
print(line, file=output)
You can use numpy for smaller and faster solution.
Here we are using these numpy methods
np.loadtxt Docs: https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
np.setdiff1d Docs:https://docs.scipy.org/doc/numpy-1.14.5/reference/generated/numpy.setdiff1d.html
np.savetxt Docs: https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html
import numpy as np
arr=np.setdiff1d(np.loadtxt('file1.txt',dtype=str),np.loadtxt('file2.txt',dtype=str))
np.savetxt('output.txt',b,fmt='%s')

Deleting specific line from a text file in Python [duplicate]

Let's say I have a text file full of nicknames. How can I delete a specific nickname from this file, using Python?
First, open the file and get all your lines from the file. Then reopen the file in write mode and write your lines back, except for the line you want to delete:
with open("yourfile.txt", "r") as f:
lines = f.readlines()
with open("yourfile.txt", "w") as f:
for line in lines:
if line.strip("\n") != "nickname_to_delete":
f.write(line)
You need to strip("\n") the newline character in the comparison because if your file doesn't end with a newline character the very last line won't either.
Solution to this problem with only a single open:
with open("target.txt", "r+") as f:
d = f.readlines()
f.seek(0)
for i in d:
if i != "line you want to remove...":
f.write(i)
f.truncate()
This solution opens the file in r/w mode ("r+") and makes use of seek to reset the f-pointer then truncate to remove everything after the last write.
The best and fastest option, rather than storing everything in a list and re-opening the file to write it, is in my opinion to re-write the file elsewhere.
with open("yourfile.txt", "r") as file_input:
with open("newfile.txt", "w") as output:
for line in file_input:
if line.strip("\n") != "nickname_to_delete":
output.write(line)
That's it! In one loop and one only you can do the same thing. It will be much faster.
This is a "fork" from #Lother's answer (which I believe that should be considered the right answer).
For a file like this:
$ cat file.txt
1: october rust
2: november rain
3: december snow
This fork from Lother's solution works fine:
#!/usr/bin/python3.4
with open("file.txt","r+") as f:
new_f = f.readlines()
f.seek(0)
for line in new_f:
if "snow" not in line:
f.write(line)
f.truncate()
Improvements:
with open, which discard the usage of f.close()
more clearer if/else for evaluating if string is not present in the current line
The issue with reading lines in first pass and making changes (deleting specific lines) in the second pass is that if you file sizes are huge, you will run out of RAM. Instead, a better approach is to read lines, one by one, and write them into a separate file, eliminating the ones you don't need. I have run this approach with files as big as 12-50 GB, and the RAM usage remains almost constant. Only CPU cycles show processing in progress.
I liked the fileinput approach as explained in this answer:
Deleting a line from a text file (python)
Say for example I have a file which has empty lines in it and I want to remove empty lines, here's how I solved it:
import fileinput
import sys
for line_number, line in enumerate(fileinput.input('file1.txt', inplace=1)):
if len(line) > 1:
sys.stdout.write(line)
Note: The empty lines in my case had length 1
If you use Linux, you can try the following approach.
Suppose you have a text file named animal.txt:
$ cat animal.txt
dog
pig
cat
monkey
elephant
Delete the first line:
>>> import subprocess
>>> subprocess.call(['sed','-i','/.*dog.*/d','animal.txt'])
then
$ cat animal.txt
pig
cat
monkey
elephant
Probably, you already got a correct answer, but here is mine.
Instead of using a list to collect unfiltered data (what readlines() method does), I use two files. One is for hold a main data, and the second is for filtering the data when you delete a specific string. Here is a code:
main_file = open('data_base.txt').read() # your main dataBase file
filter_file = open('filter_base.txt', 'w')
filter_file.write(main_file)
filter_file.close()
main_file = open('data_base.txt', 'w')
for line in open('filter_base'):
if 'your data to delete' not in line: # remove a specific string
main_file.write(line) # put all strings back to your db except deleted
else: pass
main_file.close()
Hope you will find this useful! :)
I think if you read the file into a list, then do the you can iterate over the list to look for the nickname you want to get rid of. You can do it much efficiently without creating additional files, but you'll have to write the result back to the source file.
Here's how I might do this:
import, os, csv # and other imports you need
nicknames_to_delete = ['Nick', 'Stephen', 'Mark']
I'm assuming nicknames.csv contains data like:
Nick
Maria
James
Chris
Mario
Stephen
Isabella
Ahmed
Julia
Mark
...
Then load the file into the list:
nicknames = None
with open("nicknames.csv") as sourceFile:
nicknames = sourceFile.read().splitlines()
Next, iterate over to list to match your inputs to delete:
for nick in nicknames_to_delete:
try:
if nick in nicknames:
nicknames.pop(nicknames.index(nick))
else:
print(nick + " is not found in the file")
except ValueError:
pass
Lastly, write the result back to file:
with open("nicknames.csv", "a") as nicknamesFile:
nicknamesFile.seek(0)
nicknamesFile.truncate()
nicknamesWriter = csv.writer(nicknamesFile)
for name in nicknames:
nicknamesWriter.writeRow([str(name)])
nicknamesFile.close()
In general, you can't; you have to write the whole file again (at least from the point of change to the end).
In some specific cases you can do better than this -
if all your data elements are the same length and in no specific order, and you know the offset of the one you want to get rid of, you could copy the last item over the one to be deleted and truncate the file before the last item;
or you could just overwrite the data chunk with a 'this is bad data, skip it' value or keep a 'this item has been deleted' flag in your saved data elements such that you can mark it deleted without otherwise modifying the file.
This is probably overkill for short documents (anything under 100 KB?).
I like this method using fileinput and the 'inplace' method:
import fileinput
for line in fileinput.input(fname, inplace =1):
line = line.strip()
if not 'UnwantedWord' in line:
print(line)
It's a little less wordy than the other answers and is fast enough for
Save the file lines in a list, then remove of the list the line you want to delete and write the remain lines to a new file
with open("file_name.txt", "r") as f:
lines = f.readlines()
lines.remove("Line you want to delete\n")
with open("new_file.txt", "w") as new_f:
for line in lines:
new_f.write(line)
here's some other method to remove a/some line(s) from a file:
src_file = zzzz.txt
f = open(src_file, "r")
contents = f.readlines()
f.close()
contents.pop(idx) # remove the line item from list, by line number, starts from 0
f = open(src_file, "w")
contents = "".join(contents)
f.write(contents)
f.close()
You can use the re library
Assuming that you are able to load your full txt-file. You then define a list of unwanted nicknames and then substitute them with an empty string "".
# Delete unwanted characters
import re
# Read, then decode for py2 compat.
path_to_file = 'data/nicknames.txt'
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# Define unwanted nicknames and substitute them
unwanted_nickname_list = ['SourDough']
text = re.sub("|".join(unwanted_nickname_list), "", text)
Do you want to remove a specific line from file so use this snippet short and simple code you can easily remove any line with sentence or prefix(Symbol).
with open("file_name.txt", "r") as f:
lines = f.readlines()
with open("new_file.txt", "w") as new_f:
for line in lines:
if not line.startswith("write any sentence or symbol to remove line"):
new_f.write(line)
To delete a specific line of a file by its line number:
Replace variables filename and line_to_delete with the name of your file and the line number you want to delete.
filename = 'foo.txt'
line_to_delete = 3
initial_line = 1
file_lines = {}
with open(filename) as f:
content = f.readlines()
for line in content:
file_lines[initial_line] = line.strip()
initial_line += 1
f = open(filename, "w")
for line_number, line_content in file_lines.items():
if line_number != line_to_delete:
f.write('{}\n'.format(line_content))
f.close()
print('Deleted line: {}'.format(line_to_delete))
Example output:
Deleted line: 3
Take the contents of the file, split it by newline into a tuple. Then, access your tuple's line number, join your result tuple, and overwrite to the file.

How to open a text file and traverse through multiple lines concurrently?

I have a text file I want to open and do something to a line based on the next line.
For example if I have the following lines:
(a) A dog jump over the fire.
(1) A fire jump over a dog.
(b) A cat jump over the fire.
(c) A horse jump over a dog.
My code would be something like this:
with open("dog.txt") as f:
lines = filter(None, (line.rstrip() for line in f))
for value in lines:
if value has letter enclosed in parenthesis
do something
then if next line has a number enclosed in parenthesis
do something
EDIT: here is the solution I used.
for i in range(len(lines)) :
if re.search('^\([a-z]', lines[i-1]) :
print lines[i-1]
if re.search('\([0-9]', lines[i]) :
print lines[i]
Store the previous line and process it after reading the next:
file = open("file.txt")
previous = ""
for line in file:
# Don't do anything for the first line, as there is no previous line.
if previous != "":
if previous[0] == "(": # Or any other type of check you want to do.
# Process the 'line' variable here.
pass
previous = line
file.close()
you should use python's iter:
with open('file.txt') as f:
for line in f:
prev_line = line # current line.
next_line = next(f) # when call next(f), the loop will go to next line.
do_something(prev_line, next_line)

Categories