Print Last Line of File Read In with Python - python

How could I print the final line of a text file read in with python?
fi=open(inputFile,"r")
for line in fi:
#go to last line and print it

One option is to use file.readlines():
f1 = open(inputFile, "r")
last_line = f1.readlines()[-1]
f1.close()
If you don't need the file after, though, it is recommended to use contexts using with, so that the file is automatically closed after:
with open(inputFile, "r") as f1:
last_line = f1.readlines()[-1]

Do you need to be efficient by not reading all the lines into memory at once? Instead you can iterate over the file object.
with open(inputfile, "r") as f:
for line in f: pass
print line #this is the last line of the file

Three ways to read the last line of a file:
For a small file, read the entire file into memory
with open("file.txt") as file:
lines = file.readlines()
print(lines[-1])
For a big file, read line by line and print the last line
with open("file.txt") as file:
for line in file:
pass
print(line)
For efficient approach, go directly to the last line
import os
with open("file.txt", "rb") as file:
# Go to the end of the file before the last break-line
file.seek(-2, os.SEEK_END)
# Keep reading backward until you find the next break-line
while file.read(1) != b'\n':
file.seek(-2, os.SEEK_CUR)
print(file.readline().decode())

If you can afford to read the entire file in memory(if the filesize is considerably less than the total memory), you can use the readlines() method as mentioned in one of the other answers, but if the filesize is large, the best way to do it is:
fi=open(inputFile, 'r')
lastline = ""
for line in fi:
lastline = line
print lastline

You could use csv.reader() to read your file as a list and print the last line.
Cons: This method allocates a new variable (not an ideal memory-saver for very large files).
Pros: List lookups take O(1) time, and you can easily manipulate a list if you happen to want to modify your inputFile, as well as read the final line.
import csv
lis = list(csv.reader(open(inputFile)))
print lis[-1] # prints final line as a list of strings

If you care about memory this should help you.
last_line = ''
with open(inputfile, "r") as f:
f.seek(-2, os.SEEK_END) # -2 because last character is likely \n
cur_char = f.read(1)
while cur_char != '\n':
last_line = cur_char + last_line
f.seek(-2, os.SEEK_CUR)
cur_char = f.read(1)
print last_line

This might help you.
class FileRead(object):
def __init__(self, file_to_read=None,file_open_mode=None,stream_size=100):
super(FileRead, self).__init__()
self.file_to_read = file_to_read
self.file_to_write='test.txt'
self.file_mode=file_open_mode
self.stream_size=stream_size
def file_read(self):
try:
with open(self.file_to_read,self.file_mode) as file_context:
contents=file_context.read(self.stream_size)
while len(contents)>0:
yield contents
contents=file_context.read(self.stream_size)
except Exception as e:
if type(e).__name__=='IOError':
output="You have a file input/output error {}".format(e.args[1])
raise Exception (output)
else:
output="You have a file error {} {} ".format(file_context.name,e.args)
raise Exception (output)
b=FileRead("read.txt",'r')
contents=b.file_read()
lastline = ""
for content in contents:
# print '-------'
lastline = content
print lastline

I use the pandas module for its convenience (often to extract the last value).
Here is the example for the last row:
import pandas as pd
df = pd.read_csv('inputFile.csv')
last_value = df.iloc[-1]
The return is a pandas Series of the last row.
The advantage of this is that you also get the entire contents as a pandas DataFrame.

Related

how can i convert surname:name to name:surname? [duplicate]

In Python, calling e.g. temp = open(filename,'r').readlines() results in a list in which each element is a line from the file. However, these strings have a newline character at the end, which I don't want.
How can I get the data without the newlines?
You can read the whole file and split lines using str.splitlines:
temp = file.read().splitlines()
Or you can strip the newline by hand:
temp = [line[:-1] for line in file]
Note: this last solution only works if the file ends with a newline, otherwise the last line will lose a character.
This assumption is true in most cases (especially for files created by text editors, which often do add an ending newline anyway).
If you want to avoid this you can add a newline at the end of file:
with open(the_file, 'r+') as f:
f.seek(-1, 2) # go at the end of the file
if f.read(1) != '\n':
# add missing newline if not already present
f.write('\n')
f.flush()
f.seek(0)
lines = [line[:-1] for line in f]
Or a simpler alternative is to strip the newline instead:
[line.rstrip('\n') for line in file]
Or even, although pretty unreadable:
[line[:-(line[-1] == '\n') or len(line)+1] for line in file]
Which exploits the fact that the return value of or isn't a boolean, but the object that was evaluated true or false.
The readlines method is actually equivalent to:
def readlines(self):
lines = []
for line in iter(self.readline, ''):
lines.append(line)
return lines
# or equivalently
def readlines(self):
lines = []
while True:
line = self.readline()
if not line:
break
lines.append(line)
return lines
Since readline() keeps the newline also readlines() keeps it.
Note: for symmetry to readlines() the writelines() method does not add ending newlines, so f2.writelines(f.readlines()) produces an exact copy of f in f2.
temp = open(filename,'r').read().split('\n')
Reading file one row at the time. Removing unwanted chars from end of the string with str.rstrip(chars).
with open(filename, 'r') as fileobj:
for row in fileobj:
print(row.rstrip('\n'))
See also str.strip([chars]) and str.lstrip([chars]).
I think this is the best option.
temp = [line.strip() for line in file.readlines()]
temp = open(filename,'r').read().splitlines()
My preferred one-liner -- if you don't count from pathlib import Path :)
lines = Path(filename).read_text().splitlines()
This it auto-closes the file, no need for with open()...
Added in Python 3.5.
https://docs.python.org/3/library/pathlib.html#pathlib.Path.read_text
Try this:
u=open("url.txt","r")
url=u.read().replace('\n','')
print(url)
To get rid of trailing end-of-line (/n) characters and of empty list values (''), try:
f = open(path_sample, "r")
lines = [line.rstrip('\n') for line in f.readlines() if line.strip() != '']
You can read the file as a list easily using a list comprehension
with open("foo.txt", 'r') as f:
lst = [row.rstrip('\n') for row in f]
my_file = open("first_file.txt", "r")
for line in my_file.readlines():
if line[-1:] == "\n":
print(line[:-1])
else:
print(line)
my_file.close()
This script here will take lines from file and save every line without newline with ,0 at the end in file2.
file = open("temp.txt", "+r")
file2 = open("res.txt", "+w")
for line in file:
file2.writelines(f"{line.splitlines()[0]},0\n")
file2.close()
if you looked at line, this value is data\n, so we put splitlines()
to make it as an array and [0] to choose the only word data
import csv
with open(filename) as f:
csvreader = csv.reader(f)
for line in csvreader:
print(line[0])

Deleting a line stored in a variable in Python

The global variable originalInfo contains
Joe;Bloggs;j.bloggs#anemail.com;0715491874;1
I have written a function to delete that line in a text file containing more information of this type. It works, but it is really clunky and inelegant.
f = open("input.txt",'r') # Input file
t = open("output.txt", 'w') #Temp output file
for line in f:
if line != originalInfo:
t.write(line)
f.close()
t.close()
os.remove("input.txt")
os.rename('output.txt', 'input.txt')
Is there a more efficient way of doing this? Thanks
You solution nearly works, but you need to take care of the trailing newline. This is bit shorter version, doing what you intend:
import shutil
with open("input.txt",'r') as fin, open("output.txt", 'w') as fout:
for line in fin:
if line.strip() != originalInfo:
fout.write(line)
shutil.move('output.txt', 'input.txt')
The strip() is a bit extra effort but would strip away extra white space.
Alternatively, you could do:
originalInfo += '\n'
and later in the loop:
if line != originalInfo:
You can open the file, read it by readlines(), close it and open it to write again. With this way you don't have to create an output file:
with open('input.txt') as file:
lines = file.readlines
with open('input.txt') as file:
for line in lines:
if line != originalInfo:
file.write(line)
But if you want to have an output:
with open('input.txt') as input:
with open('output.txt', 'w') as output:
for line in input:
if line != originalInfo:
output.write(line)

Python Delete a specific Line number

Im trying to delete a specific line (10884121) in a text file that is about 30 million lines long. This is the method I first attempted, however, when I execute it runs for about 20 seconds then gives me a "memory error". Is there a better way to do this? Thanks!
import fileinput
import sys
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
with open(f_in, 'r') as fin:
with open(f_out, 'w') as fout:
linenums = [10884121]
s = [y for x, y in enumerate(fin) if x not in [line - 1 for line in linenums]]
fin.seek(0)
fin.write(''.join(s))
fin.truncate(fin.tell())
First of all, you were not using the imports; you were trying to write to the input file, and your code read the whole file into memory.
Something like this might do the trick with less hassle - we read line by line,
use enumerate to count the line numbers; and for each line we write it to output if its number is not in the list of ignored lines:
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
ignored_lines = [10884121]
with open(f_in, 'r') as fin, open(f_out, 'w') as fout:
for lineno, line in enumerate(fin, 1):
if lineno not in ignored_lines:
fout.write(line)
Please try to use:
import fileinput
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
f = open(f_out,'w')
counter=0
for line in fileinput.input([f_in]):
counter=counter+1
if counter != 10884121:
f.write(line) # python will convert \n to os.linesep, maybe you need to add a os.linesep, check
f.close() # you can omit in most cases as the destructor will call it
There are high chances that you run out of memory since you are trying to store file into list.
Try this below:
import fileinput
import sys
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
_fileOne = open(f_in,'r')
_fileTwo = open(f_out,'w')
linenums = set([10884121])
for lineNumber, line in enumerate(_fileOne):
if lineNumber not in linenums:
_fileTwo.writeLine(line)
_fileOne.close()
_fileTwo.close()
Here we are reading file line by line and excluding lines which are not needed, this may not run out of memory.
You can also try reading file using buffering.
Hope this helps.
How about a generic file filter function?
def file_filter(file_path, condition=None):
"""Yield lines from a file if condition(n, line) is true.
The condition parameter is a callback that receives two
parameters: the line number (first line is 1) and the
line content."""
if condition is None:
condition = lambda n, line: True
with open(file_path) as source:
for n, line in enumerate(source):
if condition(n + 1, line):
yield line
open(f_out, 'w') as destination:
condition = lambda n, line: n != 10884121
for line in file_filter(f_in, condition):
destination.write(line)

parsing/extracting data from a text file. Unable to make it work

I have a file which I'm trying to extract information from, the file has the information in it and is in a neat line by line format, the information is separated by commas.
I want to put it in a list, or do whatever I can to extract information from a specific index. The file is huge with over 1000000000 lines, I have to extract the same index in every line in order to get the same piece of information. These are HASHES I want from the files so I was wondering how I'd find all the occurrences of hashes based on length.
import os
os.chdir('C:\HashFiles')
f = open('Part1.txt','r')
file_contents=f.readlines()
def linesA():
for line in file_contents:
lista = line.split(',')
print linesA()
this is all I have so far and this just puts everything in a list which I can index from, but I want to output the data from those indexes to another file and I am unable to because of the for statement, how can I get around this?
Wow you guys are great, now I have a problem because in the file where this info is stored it starts with information about the sponsor who provided the information, how do I bypass those lines to start from another line since the lines I need start at about 100 lines down the file, to help me because at the moment I get an index error and am unable to figure out how to set a condition to counter it. I tried this condition but didnt work : if line[:] != 15: continue
Most recent code to work with:
import csv
with open('c:/HashFiles/search_engine_primary.sql') as inf, open('c:/HashFiles/hashes.txt','w') as outf:
for i in xrange(47):
inf.next() # skip a line
for line in inf:
data = line.split(',')
if str(line[0]) == 'GO':
continue
hash = data[15]
outf.write(hash + '\n')
You could try to process the file line-by-line
with open('Part1.txt') as inf:
for line in inf:
# do your processing
# ... line.split(',') etc...
rather than using readlines() which reads all of the data into memory at once.
Also, depending on what you are doing, list comprehension could be helpful in creating your desired output list form the file you are reading.
NOTE: The benefit of using with to open the file is that it will automatically close it for you when you are done, or an exception is encountered.
UPDATE:
To skip the first N lines of your input file you can change your code to this:
N = 100
with open('Part1.txt') as inf:
for i, line in enumerate(inf, 1):
if i < N: # if line is less than N
continue # skip the processing
print line # process the line
I am using enumerate() to automatically generate line numbers. I start this counter at 1 (default is 0 if not specified).
You can process the file line-by-line, like so:
with open('c:/HashFiles/Part1.txt') as inf, open('c:/HashFiles/hashes.txt','w') as outf:
for line in inf:
data = line.split(',')
hash = data[4]
outf.write(hash + '\n')
If you want to separate the hashes by length, maybe something like:
class HashStorage(object):
def __init__(self, fname_fmt):
self.fname_fmt = fname_fmt
self.hashfile = {}
def thefile(self, hash):
hashlen = len(hash)
try:
return self.hashfile[hashlen]
except KeyError:
newfile = open(self.fname_fmt.format(hashlen), 'w')
self.hashfile[hashlen] = newfile
return newfile
def write(self, hash):
self.thefile(hash).write(hash + '\n')
def __del__(self):
for f in self.hashfiles.itervalues():
f.close()
del self.hashfiles
store = HashStorage('c:/HashFiles/hashes{}.txt')
with open('c:/HashFiles/Part1.txt') as inf:
for line in inf:
data = line.split(',')
hash = data[4]
store.write(hash)
Edit:: is there any way to identify sponsor lines - for example, they start with "#"? You could filter like
with open('c:/HashFiles/Part1.txt') as inf, open('c:/HashFiles/hashes.txt','w') as outf:
for line in inf:
if not line.startswith('#'):
data = line.split(',')
hash = data[4]
outf.write(hash + '\n')
otherwise, if you have to skip N lines - this is nasty, because what if the number changes? - you can instead
with open('c:/HashFiles/Part1.txt') as inf, open('c:/HashFiles/hashes.txt','w') as outf:
for i in xrange(N):
inf.next() # skip a line
for line in inf:
data = line.split(',')
hash = data[4]
outf.write(hash + '\n')
Edit2:
with open('c:/HashFiles/search_engine_primary.sql') as inf, open('c:/HashFiles/hashes.txt','w') as outf:
for i in xrange(47):
inf.next() # skip a line
for line in inf:
data = line.split(',')
if len(data) > 15: # skip any line without enough data items
hash = data[15]
outf.write(hash + '\n')
Does this still give you errors??
import csv
with open(os.path.join('C:\HashFiles','Part1.txt'), 'rb') as f:
reader = csv.reader(f)
for row in reader:
print row

Better way to remove a line in a file with python?

I want to remove a line in a file; currently I'm creating a new file, copying every line except the one I want to remove, deleting the old file and renaming the new one as the same filename as the old one. Is there a better way to remove a line?
f = open('./todo.txt', 'r')
newF = open('./todo-run.txt', 'a')
lines = f.readlines()
cLine = lines[int(index) - 1]
for line in lines:
if line != cLine:
newF.write(line)
f.close()
newF.close()
os.remove('./todo.txt')
shutil.move('./todo-run.txt', './todo.txt')
A solution in sed, which you might call using "subprocess". Ex, to delete line 18 do:
sed -i '18 d' filename
Better in what way? You could, for instance, shuffle the data within the file then truncate it, using less memory but more seeking (particularly if you adapt it to not read the latter part in one chunk):
def cutfile(file, startcut, endcut):
file.seek(endcut)
dataafter=file.read()
file.seek(startcut)
file.write(dataafter)
file.truncate()
Or you could not remove the old file before renaming, to get atomic updates. It really depends on your goals.
It's not much better than yours, but (since your file seems to fit in main memory) you might try this:
f = open(filepath, 'r')
lines = [line.rstrip('\n') for line in f if not <CONDITION>]
f.close()
f.open('filepath, 'w')
f.write('\n'.join(lines))
f.close()
You could move the lines after the unwanted line up by overwriting one at a time. Not much better than what you're currently doing though. This code acts a little funny if the file doesn't end with a newline. I tested it on Win7 64-bit, Python 2.7.
move_lines.py:
f = open('todo.txt', 'r+')
line_index = 0
prev_line_head = 0
remove_line_index = 3
move_lines = False
while True:
line_head = f.tell()
line = f.readline()
if line == '': #EOF
f.seek(prev_line_head)
f.truncate()
break
if move_lines:
f.seek(prev_line_head)
f.write(line)
f.flush()
line_head = f.tell()
line = f.readline() # read past the line we already read to start this iteration
elif line_index == remove_line_index:
move_lines = True
prev_line_head = line_head
line_index += 1
f.close()

Categories