\n appending at the end of each line - python

I am writing lines one by one to an external files. Each line has 9 columns separated by Tab delimiter. If i split each line in that file and output last column, i can see \n being appended to the end of the 9 column. My code is:
#!/usr/bin/python
with open("temp", "r") as f:
for lines in f:
hashes = lines.split("\t")
print hashes[8]
The last column values are integers, either 1 or 2. When i run this program, the output i get is,
['1\n']
['2\n']
I should only get 1 or 2. Why is '\n' being appended here?
I tried the following check to remove the problem.
with open("temp", "r") as f:
for lines in f:
if lines != '\n':
hashes = lines.split("\t")
print hashes[8]
This too is not working. I tried if lines != ' '. How can i make this go away? Thanks in advance.

Try using strip on the lines to remove the \n (the new line character). strip removes the leading and trailing whitespace characters.
with open("temp", "r") as f:
for lines in f.readlines():
if lines.strip():
hashes = lines.split("\t")
print hashes[8]

\n is the newline character, it is how the computer knows to display the data on the next line. If you modify the last item in the array hashes[-1] to remove the last character, then that should be fine.

Depending on the platform, your line ending may be more than just one character. Dos/Windows uses "\r\n" for example.
def clean(file_handle):
for line in file_handle:
yield line.rstrip()
with open('temp', 'r') as f:
for line in clean(f):
hashes = line.split('\t')
print hashes[-1]
I prefer rstrip() for times when I want to preserve leading whitespace. That and using generator functions to clean up my input.

Because each line has 9 columns, the 8th index (which is the 9th object) has a line break, since the next line starts. Just take that away:
print hashes[8][:-1]

Related

I'm trying to solve this Python exercise but I have no idea of how to do it: get first character of a line from a file + length of the line

I am learning Python on an app called SoloLearn, got to solve this exercise and I cannot see the solution or see the comments, I don't need to solve it to continue but I'd like to know how to do it.
Book Titles: You have been asked to make a special book categorization program, which assigns each book a special code based on its title.
The code is equal to the first letter of the book, followed by the number of characters in the title.
For example, for the book "Harry Potter", the code would be: H12, as it contains 12 characters (including the space).
You are provided a books.txt file, which includes the book titles, each one written on a separate line.
Read the title one by one and output the code for each book on a separate line.
For example, if the books.txt file contains:
Some book
Another book
Your program should output:
S9
A12
Recall the readlines() method, which returns a list containing the lines of the file.
Also, remember that all lines, except the last one, contain a \n at the end, which should not be included in the character count.
I tried:
file = open("books.txt","r")
for line in file:
for i in range(len(file.readlines())):
title = line[0]+str(len(line)-1)
print(titulo)
title = line[0]+str(len(line)-1)
print(title)
file.close
I also tried with range() and readlines() but I don't know how to solve it
This uses readlines():
with open('books.txt') as f: # Open file
for line in f.readlines(): # Iterate through lines
if line[-1] == '\n': # Check if there is '\n' at end of line
line = line[:-1] # If there is, ignore it
print(line[0], len(line), sep='') # Output first character and length
But I think splitlines() is easier, as it doesn't have the trailing '\n':
with open('books.txt') as f: # Open file
for line in f.read().splitlines(): # Iterate through lines
# No need to check for trailing '\n'
print(line[0], len(line), sep='') # Output first character and length
You can use "with" to handle file oppening and closing.
Use rstrip to get rid of '\n'.
with open('books.txt') as f:
lines = file.readlines()
for line in lines:
print(line[0] + str(len(line.rstrip())))
This is the same:
file = open('books.txt')
lines = file.readlines()
for line in lines:
print(line[0] + str(len(line.rstrip())))
file.close()

Removes white spaces while reading in a file

with open(filename, "r") as f:
for line in f:
line = (' '.join(line.strip().split())).split()
Can anyone break down the line where whitespaces get removed?
I understand line.strip().split() first removes leading and trailing spaces from line then the resulting string gets split on whitespaces and stores all words in a list.
But what does the remaining code do?
The line ' '.join(line.strip().split()) creates a string consisting of all the list elements separated by exactly one whitespace character. Applying split() method on this string again returns a list containing all the words in the string which were separated by a whitespace character.
Here's a breakdown:
# Opens the file
with open(filename, "r") as f:
# Iterates through each line
for line in f:
# Rewriting this line, below:
# line = (' '.join(line.strip().split())).split()
# Assuming line was " foo bar quux "
stripped_line = line.strip() # "foo bar quux"
parts = stripped_line.split() # ["foo", "bar", "quux"]
joined = ' '.join(parts) # "foo bar quux"
parts_again = joined.split() # ["foo", "bar", "quux"]
Is this what you were looking for?
That code is pointlessly complicated is what it is.
There is no need to strip if you're no-arg spliting next (no-arg split drops leading and trailing whitespace by side-effect), so line.strip().split() can simplify to line.split().
The join and re-split doesn't change a thing, join sticks the first split back together with spaces, then split resplits on those very same spaces. So you could save the time spent joining only to split and just keep the original results from the first split, changing it to:
line = line.split()
and it would be functionally identical to the original:
line = (' '.join(line.strip().split())).split()
and faster to boot. I'm guessing the code you were handed was written by someone who didn't understand spliting and joining either, and just threw stuff at their problem without understanding what it did.
Here is explanation to code:-
with open(filename, "r") as f:
for line in f:
line = (' '.join(line.strip().split())).split()
First line.strip() removes leading and trailing white spaces from line and .split() break to list on basis of white spaces.
Again .join convert previous list to a line of white space separated. Finally .split again convert it to list.
This code is superfluous line = (' '.join(line.strip().split())).split(). And it should be:-
line = line.split()
If you again want to strip use:-
line = map(str.strip, line.split())
I think they are doing this to maintain a constant amount of whitespace. The strip is removing all whitespace (could be 5 spaces and a tab), and then they are adding back in the single space in its place.

How to extract last line of text in Python (excluding new lines)?

Textfile:
1
2
3
4
5
6
\n
\n
I know lines[-1] gets you the last line, but I want to disregard any new lines and get the last line of text (6 in this case).
The best approach regarding memory is to exhaust the file. Something like this:
with open('file.txt') as f:
last = None
for line in (line for line in f if line.rstrip('\n')):
last = line
print last
It can be done more elegantly though. A slightly different approach:
with open('file.txt') as f:
last = None
for last in (line for line in f if line.rstrip('\n')):
pass
print last
For a small file you can just read all of the lines, discarding any empty ones. Notice that I've used an inner generator to strip the lines before excluding them in the outer one.
with open(textfile) as fp:
last_line = [l2 for l2 in (l1.strip() for l1 in fp) if l2][-1]
with open('file') as f:
print([i for i in f.read().split('\n') if i != ''][-1])
This is just an edit to Avinash Raj's answer (but since I'm a new account, I can't comment on it). This will preserve any None values in your data (i.e. if the data in your last line is "None" it will work, though depending on your input this may not be an issue).
with open('path/to/file') as infile:
for line in infile:
if not line.strip('\n'):
continue
answer = line
print(answer)
This will print 6 with a newline at the end. You can decide how to strip that. Following are some options:
answer.rstrip('\n') removes trailing newlines
answer.rstrip() removes trailing whitespaces
answer.strip() removes any surrounding whitespaces
with open ('file.txt') as myfile:
for num,line in enumerate(myfile):
pass
print num

appending a semicolon to end of a line

I was intersted to know if there is a easy way to append a semicolon to the end of each line. I tried but always it prints the semicolon in the next line.
rom_data.txt
0123
1
253
3
my_script.py
input_file = open("rom_data.txt", "r")
for line in input_file:
final= line+';'
print final
expected output
0123;
1;
253;
3;
output obtained
0123
;
1
;
253
;
3
;
Could anybody tell me where am i going wrong
so your text file consists of lines. if you ask a viewer to show non-printable characters, it will show something like
0123\n
1\n
253\n
3\n
(or some other symbol marking the line break)
for line in input_file: # so here "line" is "0123\n"
final= line+';' # here you append a semicolon, so it becomes "0123\n;"
print final # print adds another line break, so the output is "0123\n;\n"
a common solution would be to strip the line breaks first thing in the loop:
for line in input_file:
line = line.strip() # here
final= line+';'
print final
The solution of #Pavel points out the problem, but notice that there may be some problem.
That if you initial rom_data row begins or ends with some blank character, the strip() function will remove them all, which may not as expected.
For example:
rom_data.txt:
0123 \n
123 \n
456\n
may obtain output below:
0123\n
123\n
456\n
If you want to keep the blanks, you should only strip the last character:
for line in input_file:
print(line[:-1]) # use this way to strip only the '\n' character at last
This may be more exact.
The problem is that there is a newline character present in line. You will need to strip the newline character using rstrip. Refer below code:
input_file = open("rom_data.txt", "r")
for line in input_file:
final= line.rstrip('\n') + ';'
print final

Python - delete blank lines of text at the end of the file

I am writing a script that modifies any text files. It replaces white space lines with blank lines. It erases the blank lines at the end of the file. The image shows the output I want.
I am able to get very close to the desired output. The problem is that I cannot get rid of the last blank line. I think this has something to do with the last line. e.g ' the lines below me should be gone actually looks like this ' the lines below me should be gone\n' It looks like new lines are created on the previous line. e.g if line 4 has \n than line 5 will actually be the blank line not line 4.
I should note that I can't use rstrip or strip
My code so far.
def clean_file(filename):
# function to check if the line can be deleted
def is_all_whitespace(line):
for char in line:
if char != ' ' and char != '\n':
return False
return True
# generates the new lines
with open(filename, 'r') as file:
file_out = []
for line in file:
if is_all_whitespace(line):
line = '\n'
file_out.append(line)
# removes whitespaces at the end of file
while file_out[-1] == '\n': # while the last item in lst is blank
file_out.pop(-1) # removes last element
# writes the new the output to file
with open(filename, 'w') as file:
file.write(''.join(file_out))
clean_file('test.txt')
The \n essentially means "create another line"
So when you've removed all the lines that are \n, there's still the preceding line
the lines below me should be gone\n
Which again means "create another line", beyond the ones you've already removed
Since you say you can't use rstrip, you could end the loop with
file_out[-1] = file_out[-1].strip('\n')
to remove \n from the last element. Because \n can't exist anywhere else in a line, rstrip and strip will have the same effect
Or without any strip or endswith:
if file_out[-1][-1] == '\n':
file_out[-1] = file_out[-1][:-1]
Note that \n is a single character, ordinal 0x0a as hex, not two characters \ and n, ordinals 0x5c and 0x6e. That is why we use -1 and not -2

Categories