How to extract last line of text in Python (excluding new lines)? - python

Textfile:
1
2
3
4
5
6
\n
\n
I know lines[-1] gets you the last line, but I want to disregard any new lines and get the last line of text (6 in this case).

The best approach regarding memory is to exhaust the file. Something like this:
with open('file.txt') as f:
last = None
for line in (line for line in f if line.rstrip('\n')):
last = line
print last
It can be done more elegantly though. A slightly different approach:
with open('file.txt') as f:
last = None
for last in (line for line in f if line.rstrip('\n')):
pass
print last

For a small file you can just read all of the lines, discarding any empty ones. Notice that I've used an inner generator to strip the lines before excluding them in the outer one.
with open(textfile) as fp:
last_line = [l2 for l2 in (l1.strip() for l1 in fp) if l2][-1]

with open('file') as f:
print([i for i in f.read().split('\n') if i != ''][-1])
This is just an edit to Avinash Raj's answer (but since I'm a new account, I can't comment on it). This will preserve any None values in your data (i.e. if the data in your last line is "None" it will work, though depending on your input this may not be an issue).

with open('path/to/file') as infile:
for line in infile:
if not line.strip('\n'):
continue
answer = line
print(answer)
This will print 6 with a newline at the end. You can decide how to strip that. Following are some options:
answer.rstrip('\n') removes trailing newlines
answer.rstrip() removes trailing whitespaces
answer.strip() removes any surrounding whitespaces

with open ('file.txt') as myfile:
for num,line in enumerate(myfile):
pass
print num

Related

extract the dimensions from the head lines of text file

Please see following attached image showing the format of the text file. I need to extract the dimensions of data matrix indicated by the first line in the file, here 49 * 70 * 1 for the case shown by the image. Note that the length of name "gd_fac" can be varying. How can I extract these numbers as integers? I am using Python 3.6.
Specification is not very clear. I am assuming that the information you want will always be in the first line, and always be in parenthesis. After that:
with open(filename) as infile:
line = infile.readline()
string = line[line.find('(')+1:line.find(')')]
lst = string.split('x')
This will create the list lst = [49, 70, 1].
What is happening here:
First I open the file (you will need to replace filename with the name of your file, as a string. The with ... as ... structure ensures that the file is closed after use. Then I read the first line. After that. I select only the parts of that line that fall after the open paren (, and before the close paren ). Finally, I break the string into parts, with the character x as the separator. This creates a list that contains the values in the first line of the file, which fall between parenthesis, and are separated by x.
Since you have mentioned that length of 'gd_fac' van be variable, best solution will be using Regular Expression.
import re
with open("a.txt") as fh:
for line in fh:
if '(' in line and ')' in line:
dimension = re.findall(r'.*\((.*)\)',line)[0]
break
print dimension
Output:
'49x70x1'
What this does is it looks for "gd_fac"
then if it's there is removes all the unneeded stuff and replaces it with just what you want.
with open('test.txt', 'r') as infile:
for line in infile:
if("gd_fac" in line):
line = line.replace("gd_fac", "")
line = line.replace("x", "*")
line = line.replace("(","")
line = line.replace(")","")
print (line)
break
OUTPUT: "49x70x1"

how can i convert surname:name to name:surname? [duplicate]

In Python, calling e.g. temp = open(filename,'r').readlines() results in a list in which each element is a line from the file. However, these strings have a newline character at the end, which I don't want.
How can I get the data without the newlines?
You can read the whole file and split lines using str.splitlines:
temp = file.read().splitlines()
Or you can strip the newline by hand:
temp = [line[:-1] for line in file]
Note: this last solution only works if the file ends with a newline, otherwise the last line will lose a character.
This assumption is true in most cases (especially for files created by text editors, which often do add an ending newline anyway).
If you want to avoid this you can add a newline at the end of file:
with open(the_file, 'r+') as f:
f.seek(-1, 2) # go at the end of the file
if f.read(1) != '\n':
# add missing newline if not already present
f.write('\n')
f.flush()
f.seek(0)
lines = [line[:-1] for line in f]
Or a simpler alternative is to strip the newline instead:
[line.rstrip('\n') for line in file]
Or even, although pretty unreadable:
[line[:-(line[-1] == '\n') or len(line)+1] for line in file]
Which exploits the fact that the return value of or isn't a boolean, but the object that was evaluated true or false.
The readlines method is actually equivalent to:
def readlines(self):
lines = []
for line in iter(self.readline, ''):
lines.append(line)
return lines
# or equivalently
def readlines(self):
lines = []
while True:
line = self.readline()
if not line:
break
lines.append(line)
return lines
Since readline() keeps the newline also readlines() keeps it.
Note: for symmetry to readlines() the writelines() method does not add ending newlines, so f2.writelines(f.readlines()) produces an exact copy of f in f2.
temp = open(filename,'r').read().split('\n')
Reading file one row at the time. Removing unwanted chars from end of the string with str.rstrip(chars).
with open(filename, 'r') as fileobj:
for row in fileobj:
print(row.rstrip('\n'))
See also str.strip([chars]) and str.lstrip([chars]).
I think this is the best option.
temp = [line.strip() for line in file.readlines()]
temp = open(filename,'r').read().splitlines()
My preferred one-liner -- if you don't count from pathlib import Path :)
lines = Path(filename).read_text().splitlines()
This it auto-closes the file, no need for with open()...
Added in Python 3.5.
https://docs.python.org/3/library/pathlib.html#pathlib.Path.read_text
Try this:
u=open("url.txt","r")
url=u.read().replace('\n','')
print(url)
To get rid of trailing end-of-line (/n) characters and of empty list values (''), try:
f = open(path_sample, "r")
lines = [line.rstrip('\n') for line in f.readlines() if line.strip() != '']
You can read the file as a list easily using a list comprehension
with open("foo.txt", 'r') as f:
lst = [row.rstrip('\n') for row in f]
my_file = open("first_file.txt", "r")
for line in my_file.readlines():
if line[-1:] == "\n":
print(line[:-1])
else:
print(line)
my_file.close()
This script here will take lines from file and save every line without newline with ,0 at the end in file2.
file = open("temp.txt", "+r")
file2 = open("res.txt", "+w")
for line in file:
file2.writelines(f"{line.splitlines()[0]},0\n")
file2.close()
if you looked at line, this value is data\n, so we put splitlines()
to make it as an array and [0] to choose the only word data
import csv
with open(filename) as f:
csvreader = csv.reader(f)
for line in csvreader:
print(line[0])

\n appending at the end of each line

I am writing lines one by one to an external files. Each line has 9 columns separated by Tab delimiter. If i split each line in that file and output last column, i can see \n being appended to the end of the 9 column. My code is:
#!/usr/bin/python
with open("temp", "r") as f:
for lines in f:
hashes = lines.split("\t")
print hashes[8]
The last column values are integers, either 1 or 2. When i run this program, the output i get is,
['1\n']
['2\n']
I should only get 1 or 2. Why is '\n' being appended here?
I tried the following check to remove the problem.
with open("temp", "r") as f:
for lines in f:
if lines != '\n':
hashes = lines.split("\t")
print hashes[8]
This too is not working. I tried if lines != ' '. How can i make this go away? Thanks in advance.
Try using strip on the lines to remove the \n (the new line character). strip removes the leading and trailing whitespace characters.
with open("temp", "r") as f:
for lines in f.readlines():
if lines.strip():
hashes = lines.split("\t")
print hashes[8]
\n is the newline character, it is how the computer knows to display the data on the next line. If you modify the last item in the array hashes[-1] to remove the last character, then that should be fine.
Depending on the platform, your line ending may be more than just one character. Dos/Windows uses "\r\n" for example.
def clean(file_handle):
for line in file_handle:
yield line.rstrip()
with open('temp', 'r') as f:
for line in clean(f):
hashes = line.split('\t')
print hashes[-1]
I prefer rstrip() for times when I want to preserve leading whitespace. That and using generator functions to clean up my input.
Because each line has 9 columns, the 8th index (which is the 9th object) has a line break, since the next line starts. Just take that away:
print hashes[8][:-1]

Splitting lines in python based on some character

Input:
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/1
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:14,000.
0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W
55.576,+0013!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013!,A,56
281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34
:18,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:19,000.0,0,37N22.
Output:
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:19,000.0,0,37N22.
'!' is the starting character and +0013 should be the ending of each line (if present).
Problem which I am getting:
Output is like :
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/1
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.
0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W
Any help would be highly appreciated...!!!
My code:
file_open= open('sample.txt','r')
file_read= file_open.read()
file_open2= open('output.txt','w+')
counter =0
for i in file_read:
if '!' in i:
if counter == 1:
file_open2.write('\n')
counter= counter -1
counter= counter +1
file_open2.write(i)
You can try something like this:
with open("abc.txt") as f:
data=f.read().replace("\r\n","") #replace the newlines with ""
#the newline can be "\n" in your system instead of "\r\n"
ans=filter(None,data.split("!")) #split the data at '!', then filter out empty lines
for x in ans:
print "!"+x #or write to some other file
.....:
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:19,000.0,0,37N22.
Could you just use str.split?
lines = file_read.split('!')
Now lines is a list which holds the split data. This is almost the lines you want to write -- The only difference is that they don't have trailing newlines and they don't have '!' at the start. We can put those in easily with string formatting -- e.g. '!{0}\n'.format(line). Then we can put that whole thing in a generator expression which we'll pass to file.writelines to put the data in a new file:
file_open2.writelines('!{0}\n'.format(line) for line in lines)
You might need:
file_open2.writelines('!{0}\n'.format(line.replace('\n','')) for line in lines)
if you find that you're getting more newlines than you wanted in the output.
A few other points, when opening files, it's nice to use a context manager -- This makes sure that the file is closed properly:
with open('inputfile') as fin:
lines = fin.read()
with open('outputfile','w') as fout:
fout.writelines('!{0}\n'.format(line.replace('\n','')) for line in lines)
Another option, using replace instead of split, since you know the starting and ending characters of each line:
In [14]: data = """!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/1
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:14,000.
0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W
55.576,+0013!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013!,A,56
281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34
:18,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:19,000.0,0,37N22.""".replace('\n', '')
In [15]: print data.replace('+0013!', "+0013\n!")
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:19,000.0,0,37N22.
Just for some variance, here is a regular expression answer:
import re
outputFile = open('output.txt', 'w+')
with open('sample.txt', 'r') as f:
for line in re.findall("!.+?(?=!|$)", f.read(), re.DOTALL):
outputFile.write(line.replace("\n", "") + '\n')
outputFile.close()
It will open the output file, get the contents of the input file, and loop through all the matches using the regular expression !.+?(?=!|$) with the re.DOTALL flag. The regular expression explanation & what it matches can be found here: http://regex101.com/r/aK6aV4
After we have a match, we strip out the new lines from the match, and write it to the file.
Let's try to add a \n before every "!"; then let python splitlines :-) :
file_read.replace("!", "!\n").splitlines()
I will actually implement as a generator so that you can work on the data stream rather than the entire content of the file. This will be quite memory friendly if working with huge files
>>> def split_on_stream(it,sep="!"):
prev = ""
for line in it:
line = (prev + line.strip()).split(sep)
for parts in line[:-1]:
yield parts
prev = line[-1]
yield prev
>>> with open("test.txt") as fin:
for parts in split_on_stream(fin):
print parts
,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:19,000.0,0,37N22.

Spliting a file into lines in Python using re.split

I'm trying to split a file with a list comprehension using code similar to:
lines = [x for x in re.split(r"\n+", file.read()) if not re.match(r"com", x)]
However, the lines list always has an empty string as the last element. Does anyone know a way to avoid this (excluding the cludge of putting a pop() afterwards)?
Put the regular expression hammer away :-)
You can iterate over a file directly; readlines() is almost obsolete these days.
Read about str.strip() (and its friends, lstrip() and rstrip()).
Don't use file as a variable name. It's bad form, because file is a built-in function.
You can write your code as:
lines = []
f = open(filename)
for line in f:
if not line.startswith('com'):
lines.append(line.strip())
If you are still getting blank lines in there, you can add in a test:
lines = []
f = open(filename)
for line in f:
if line.strip() and not line.startswith('com'):
lines.append(line.strip())
If you really want it in one line:
lines = [line.strip() for line in open(filename) if line.strip() and not line.startswith('com')]
Finally, if you're on python 2.6, look at the with statement to improve things a little more.
lines = file.readlines()
edit:
or if you didnt want blank lines in there, you can do
lines = filter(lambda a:(a!='\n'), file.readlines())
edit^2:
to remove trailing newines, you can do
lines = [re.sub('\n','',line) for line in filter(lambda a:(a!='\n'), file.readlines())]
another handy trick, especially when you need the line number, is to use enumerate:
fp = open("myfile.txt", "r")
for n, line in enumerate(fp.readlines()):
dosomethingwith(n, line)
i only found out about enumerate quite recently but it has come in handy quite a few times since then.
This should work, and eliminate the regular expressions as well:
all_lines = (line.rstrip()
for line in open(filename)
if "com" not in line)
# filter out the empty lines
lines = filter(lambda x : x, all_lines)
Since you're using a list comprehension and not a generator expression (so the whole file gets loaded into memory anyway), here's a shortcut that avoids code to filter out empty lines:
lines = [line
for line in open(filename).read().splitlines()
if "com" not in line]

Categories