If a line starts with a number(xyz) in a file, I need to print(or write to a file) this line and the next xyz+1 lines.
What's the best way to do this?
So far, I've been able to print the line that starts with an int. How do I print the next lines?
import glob, os, sys
import subprocess
file = 'filename.txt'
with open(file,'r') as f:
data = f.readlines()
for line in data:
if line[0].isdigit():
print int(line)
If I made an iterator out of data, the print function skips a line every time.
with open(file,'r') as f:
data = f.readlines()
x = iter(data)
for line in x:
if line[0].isdigit():
print int(line)
for i in range(int(line)):
print x.next()
How could I make it stop skipping lines?
Use a flag, when you find the line set it to true, then use it to write all future lines:
can_write = False
with open('source.txt') as f, open('destination.txt', 'w') as fw:
for line in f:
if line.startswith(xyz):
can_write = True
if can_write:
fw.write(line)
Related
Say I have a file my_file, and I want to search for a certain word x on every line of the file, and if the word exists, attach my variable y to the left and right side of the word. Then I want replace the old line with the new, modified line in my_new_file. How do I do this? So far I have:
output = open(omy_new_file, "w")
for line in open(my_file):
if (" " + x + "") in line:
You can try this:
y = "someword"
x = "target_string"
lines = [i.strip('\n') for i in open('filename.txt')]
final_lines = ["{}{}{}".format(y, i, y) if x in i else i for i in lines]
f = open(omy_new_file, "w")
for i in final_lines:
f.write("{}\n".format(i))
f.close()
with open('inputfile.txt', 'r') as infile:
with open('outfile.txt', 'w') as outfile:
for line in infile.readlines():
outfile.write(line.replace('string', y + 'string' + y)
Try This:
with open("my_file", "r") as my_file:
raw_data = my_file.read()
# READ YOUR FILE
new_data = raw_data.split("\n")
for line in new_data:
if "sd" in line:
my_new_line = "y" + line + "y"
raw_data = raw_data.replace(line, my_new_line)
print(raw_data)
It's tough to replace a line in a file while reading it, for the same reason that it's tough to safely modify a list as you iterate over it.
It's much better to read through the file, collect a list of lines, then overwrite the original. If the file is particularly large (such that it would be infeasible to hold it all in memory at once), you can write to disk twice.
import tempfile
y = "***"
your_word = "Whatever you're filtering by"
with tempfile.TemporaryFile(mode="w+") as tmpf:
with open(my_file, 'r') as f:
for line in f:
if your_word in line:
line = f"{y}{line.strip()}{y}\n"
tmpf.write(line) # write to the temp file
tmpf.seek(0) # move back to the beginning of the tempfile
with open(my_file, 'w') as f:
for line in tmpf: # reading from tempfile now
my_file.write(line)
How could I print the final line of a text file read in with python?
fi=open(inputFile,"r")
for line in fi:
#go to last line and print it
One option is to use file.readlines():
f1 = open(inputFile, "r")
last_line = f1.readlines()[-1]
f1.close()
If you don't need the file after, though, it is recommended to use contexts using with, so that the file is automatically closed after:
with open(inputFile, "r") as f1:
last_line = f1.readlines()[-1]
Do you need to be efficient by not reading all the lines into memory at once? Instead you can iterate over the file object.
with open(inputfile, "r") as f:
for line in f: pass
print line #this is the last line of the file
Three ways to read the last line of a file:
For a small file, read the entire file into memory
with open("file.txt") as file:
lines = file.readlines()
print(lines[-1])
For a big file, read line by line and print the last line
with open("file.txt") as file:
for line in file:
pass
print(line)
For efficient approach, go directly to the last line
import os
with open("file.txt", "rb") as file:
# Go to the end of the file before the last break-line
file.seek(-2, os.SEEK_END)
# Keep reading backward until you find the next break-line
while file.read(1) != b'\n':
file.seek(-2, os.SEEK_CUR)
print(file.readline().decode())
If you can afford to read the entire file in memory(if the filesize is considerably less than the total memory), you can use the readlines() method as mentioned in one of the other answers, but if the filesize is large, the best way to do it is:
fi=open(inputFile, 'r')
lastline = ""
for line in fi:
lastline = line
print lastline
You could use csv.reader() to read your file as a list and print the last line.
Cons: This method allocates a new variable (not an ideal memory-saver for very large files).
Pros: List lookups take O(1) time, and you can easily manipulate a list if you happen to want to modify your inputFile, as well as read the final line.
import csv
lis = list(csv.reader(open(inputFile)))
print lis[-1] # prints final line as a list of strings
If you care about memory this should help you.
last_line = ''
with open(inputfile, "r") as f:
f.seek(-2, os.SEEK_END) # -2 because last character is likely \n
cur_char = f.read(1)
while cur_char != '\n':
last_line = cur_char + last_line
f.seek(-2, os.SEEK_CUR)
cur_char = f.read(1)
print last_line
This might help you.
class FileRead(object):
def __init__(self, file_to_read=None,file_open_mode=None,stream_size=100):
super(FileRead, self).__init__()
self.file_to_read = file_to_read
self.file_to_write='test.txt'
self.file_mode=file_open_mode
self.stream_size=stream_size
def file_read(self):
try:
with open(self.file_to_read,self.file_mode) as file_context:
contents=file_context.read(self.stream_size)
while len(contents)>0:
yield contents
contents=file_context.read(self.stream_size)
except Exception as e:
if type(e).__name__=='IOError':
output="You have a file input/output error {}".format(e.args[1])
raise Exception (output)
else:
output="You have a file error {} {} ".format(file_context.name,e.args)
raise Exception (output)
b=FileRead("read.txt",'r')
contents=b.file_read()
lastline = ""
for content in contents:
# print '-------'
lastline = content
print lastline
I use the pandas module for its convenience (often to extract the last value).
Here is the example for the last row:
import pandas as pd
df = pd.read_csv('inputFile.csv')
last_value = df.iloc[-1]
The return is a pandas Series of the last row.
The advantage of this is that you also get the entire contents as a pandas DataFrame.
Im trying to delete a specific line (10884121) in a text file that is about 30 million lines long. This is the method I first attempted, however, when I execute it runs for about 20 seconds then gives me a "memory error". Is there a better way to do this? Thanks!
import fileinput
import sys
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
with open(f_in, 'r') as fin:
with open(f_out, 'w') as fout:
linenums = [10884121]
s = [y for x, y in enumerate(fin) if x not in [line - 1 for line in linenums]]
fin.seek(0)
fin.write(''.join(s))
fin.truncate(fin.tell())
First of all, you were not using the imports; you were trying to write to the input file, and your code read the whole file into memory.
Something like this might do the trick with less hassle - we read line by line,
use enumerate to count the line numbers; and for each line we write it to output if its number is not in the list of ignored lines:
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
ignored_lines = [10884121]
with open(f_in, 'r') as fin, open(f_out, 'w') as fout:
for lineno, line in enumerate(fin, 1):
if lineno not in ignored_lines:
fout.write(line)
Please try to use:
import fileinput
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
f = open(f_out,'w')
counter=0
for line in fileinput.input([f_in]):
counter=counter+1
if counter != 10884121:
f.write(line) # python will convert \n to os.linesep, maybe you need to add a os.linesep, check
f.close() # you can omit in most cases as the destructor will call it
There are high chances that you run out of memory since you are trying to store file into list.
Try this below:
import fileinput
import sys
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
_fileOne = open(f_in,'r')
_fileTwo = open(f_out,'w')
linenums = set([10884121])
for lineNumber, line in enumerate(_fileOne):
if lineNumber not in linenums:
_fileTwo.writeLine(line)
_fileOne.close()
_fileTwo.close()
Here we are reading file line by line and excluding lines which are not needed, this may not run out of memory.
You can also try reading file using buffering.
Hope this helps.
How about a generic file filter function?
def file_filter(file_path, condition=None):
"""Yield lines from a file if condition(n, line) is true.
The condition parameter is a callback that receives two
parameters: the line number (first line is 1) and the
line content."""
if condition is None:
condition = lambda n, line: True
with open(file_path) as source:
for n, line in enumerate(source):
if condition(n + 1, line):
yield line
open(f_out, 'w') as destination:
condition = lambda n, line: n != 10884121
for line in file_filter(f_in, condition):
destination.write(line)
How is this wrong? It seems like I am doing this right but every time. I have tried changing the readline part to read but that didn't work.
Here is my code:
f = open("pg1062.txt","r").read()
print f.readline(1)
print f.readline(2)
print f.readline(3)
Here is the error I get:
print f.readline(1)
AttributeError: 'str' object has no attribute 'readline'
This uses a loop to print your lines.
f = open("pg1062.txt", 'r')
while True:
line = f.readline()
if line == "":
break
print(line)
If you want to only print a specific number of lines, then do something like this:
f = open("pg1062.txt", 'r')
count = 1
while count < 4:
line = f.readline()
if line == "":
break
print(line)
count += 1
Your problem is at this line
f = open("pg1062.txt","r").read()
just remove .read() and your problem will be fixed. Your final code should look like.
f = open("pg1062.txt","r")
print f.readline()
print f.readline()
print f.readline()
And if you want to print all lines from text file, see code below
f = open("pg1062.txt","r")
for line in f:
print line
This is certainly a duplicate. At any rate, anything above Python 2.4 should use a with block.
with open("pg1062.txt", "r") as fin:
for line in fin:
print(line)
If you happen to want them in a list:
with open("pg1062.txt", "r") as fin:
lines = [line for line in fin] # keeps newlines at the end
lines = [line.rstrip() for line in fin] # deletes the newlines
or more or less equivalently
with open("pg1062.txt", "r") as fin:
lines = fin.readlines() # keeps newlines at the end
lines = fin.read().splitlines() # deletes the newlines
I am dealing with a large text file on my windows machine, and I want a script that can print out lines from said text fill starting with a given line index.
with open('big_file.txt', 'r') as f:
for line in f[1000]:
print line
Something like the above except that actually works.
Use itertools.islice:
import itertools
with open('big_file.txt', 'r') as f:
for line in itertools.islice(f, 1000, None):
print line
You can use next(f) to skip one line and use it with for loop.
with open('big_file.txt', 'r') as f:
for x in range(1000):
next(f)
for line in f:
print line
I would use readlines and then start iterating from the given index:
with open('big_file.txt', 'r') as f:
for line in f.readlines()[1000:]:
print line
Hope this helps.