removing blank lines from text file output python 3 - python

I wrote a program in python 3 that edits a text file, and outputs the edited version to a new text file. But the new file has blank lines that I can't have, and I can't figure out how to get rid of them.
Thanks in advance.
newData = ""
i=0
run=1
j=0
k=1
seqFile = open('temp100.txt', 'r')
seqData = seqFile.readlines()
while i < 26:
sLine = seqData[j]
editLine = seqData[k]
tempLine = editLine[0:20]
newLine = editLine.replace(editLine, tempLine)
newData = newData+sLine+'\n'+newLine+'\n'
i=i+1
j=j+2
k=k+2
run=run+1
seqFile.close()
new100 = open("new100a.fastq", "w")
sys.stdout = new100
print(newData)

Problem is at this line:
newData = newData+sLine+'\n'+newLine+'\n'
sLine already contains newline symbol, so you should remove the first '\n'. If length of newLine is less than 20, then newLine also contains the newline. In other case you should add the newline symbol to it.
Try this:
newData = newData + sLine + newLine
if len(seqData[k]) > 20:
newData += '\n'

sLine already contains newlines. newLine will also contain a newline if editLine is shorter or equal to 20 characters long. You can change
newData = newData+sLine+'\n'+newLine+'\n'
to
newData = newData+sLine+newLine
In cases where editLine is longer than 20 characters, the trailing newline will be cut off when you do tempLine = editLine[0:20] and you will need to append a newline to newData yourself.
According to the python documentation on readline (which is used by readlines), trailing newlines are kept in each line:
Read one entire line from the file. A trailing newline character is
kept in the string (but may be absent when a file ends with an
incomplete line). [6] If the size argument is present and
non-negative, it is a maximum byte count (including the trailing
newline) and an incomplete line may be returned. When size is not 0,
an empty string is returned only when EOF is encountered immediately.
In general, you can often get a long way in debugging a program by printing the values of your variables when you get unexpected behaviour. For instance printing sLine with print repr(sLine) would have shown you that there was a trailing newline in there.

Related

Why do two of the same strings not return as being the same when compared?

I have the following code:
file = open('AdjectivesList.txt', 'r')
lines = file.readlines()
file.close()
for word in words:
wordLowercase = word.lower()
for x, lol in enumerate(lines):
gg = (lines[x].lower())
if wordLowercase == gg:
print('identified')
Even when wordLowercase does equal gg, the string "identified" is not being printed. Why is this the case?
.readlines() includes the newline character at the end of every line in the text file. This is most likely the cause of your problem. You can remove the newline character (and any whitespace characters from the left and right of the string) by using .strip().
gg = lines[x].lower().strip()
Reference
https://www.tutorialspoint.com/python/file_readlines.htm

How to remove extra space from end of the line before newline in python?

I'm quite new to python. I have a program which reads an input file with different characters and then writes all unique characters from that file into an output file with a single space between each of them. The problem is that after the last character there is one extra space (before the newline). How can I remove it?
My code:
import sys
inputName = sys.argv[1]
outputName = sys.argv[2]
infile = open(inputName,"r",encoding="utf-8")
outfile = open(outputName,"w",encoding="utf-8")
result = []
for line in infile:
for c in line:
if c not in result:
result.append(c)
outfile.write(c.strip())
if(c == ' '):
pass
else:
outfile.write(' ')
outfile.write('\n')
With the line outfile.write(' '), you write a space after each character (unless the character is a space). So you'll have to avoid writing the last space. Now, you can't tell whether any given character is the last one until you're done reading, so it's not like you can just put in an if statement to test that, but there are a few ways to get around that:
Write the space before the character c instead of after it. That way the space you have to skip is the one before the first character, and that you definitely can identify with an if statement and a boolean variable. If you do this, make sure to check that you get the right result if the first or second c is itself a space.
Alternatively, you can avoid writing anything until the very end. Just save up all the characters you see - you already do this in the list result - and write them all in one go. You can use
' '.join(strings)
to join together a list of strings (in this case, your characters) with spaces between them, and this will automatically omit a trailing space.
Why are you adding that if block on the end?
Your program is adding the extra space on the end.
import sys
inputName = sys.argv[1]
outputName = sys.argv[2]
infile = open(inputName,"r",encoding="utf-8")
outfile = open(outputName,"w",encoding="utf-8")
result = []
for line in infile:
charno = 0
for c in line:
if c not in result:
result.append(c)
outfile.write(c.strip())
charno += 1
if (c == ' '):
pass
elif charno => len(line):
pass
else:
outfile.write(' ')
outfile.write('\n')

python: file i/o counting characters without new lines

I have a text file named number.txt. It contains the following:
0
1
2
3
My code:
def main():
inFile = open("number.txt", "r")
text = inFile.read()
inFile.close()
print(len(text))
main()
I have tried to use the above code to print out how many characters are in the file. It prints out 8, but there are only 4 characters.
I know that when python reads in the file it adds a newline after each line, and this could be extra characters. How do I get rid of this?
The file contains a newline between each line. To filter it out, you can either recreate the string without those newlines with replace, split, or similar, or count the newlines and subtract them from the length (which is faster/more efficient).
with open("number.txt", "r") as file:
text = file.read()
length_without_newlines = len(text) - text.count('\n')
Edit: As #lvc says, Python converts all line endings to '\n' (0x0A), including windows newlines ('\r\n' or [0x0D, 0x0A]), so one need only search for '\n' when finding new line characters.
As Antonio said in the comment the newline characters are in the file.
if you want, you can remove them:
def main():
inFile = open("number.txt", "r")
text = inFile.read()
inFile.close()
text = text.replace('\n', '') # Replace new lines with nothing (empty string).
print(len(text))
main()
The answer of your script is correct: in fact new line are character too (they only are invisible!)
To omit the new line characters (referred in strings with \n or \r\n) then you have to substitute them with an empty string.
See this code:
def main():
inFile = open("number.txt", "r")
text = inFile.read()
text = text.replace("\r\n","") #in windows, new lines are usually these two
text = text.replace("\n","")
caracters.
inFile.close()
print(len(text))
main()
for more information about what \r\n and \n are, try: http://en.wikipedia.org/wiki/Newline
Try this:
if __name__ == '__main__':
with open('number.txt', 'rb') as in_file:
print abs(len(in_file.readlines()) - in_file.tell())
Use string.rstrip('\n'). This will remove newlines from the right side of the string, and nothing else. Note that python should convert all newline chars to \n, regardless of platform. I would also recommend iterating over the lines of the file, rather than dumping it all to memory, in case you have a large file.
Example code:
if __name__ == '__main__':
count = 0
with open("number.txt", "r") as fin):
for line in fin:
text = line.rstrip('\n')
count += len(text)
print(count)
Do it in the print line, like this:
print(len(text.replace("\n", "")))

Python - delete blank lines of text at the end of the file

I am writing a script that modifies any text files. It replaces white space lines with blank lines. It erases the blank lines at the end of the file. The image shows the output I want.
I am able to get very close to the desired output. The problem is that I cannot get rid of the last blank line. I think this has something to do with the last line. e.g ' the lines below me should be gone actually looks like this ' the lines below me should be gone\n' It looks like new lines are created on the previous line. e.g if line 4 has \n than line 5 will actually be the blank line not line 4.
I should note that I can't use rstrip or strip
My code so far.
def clean_file(filename):
# function to check if the line can be deleted
def is_all_whitespace(line):
for char in line:
if char != ' ' and char != '\n':
return False
return True
# generates the new lines
with open(filename, 'r') as file:
file_out = []
for line in file:
if is_all_whitespace(line):
line = '\n'
file_out.append(line)
# removes whitespaces at the end of file
while file_out[-1] == '\n': # while the last item in lst is blank
file_out.pop(-1) # removes last element
# writes the new the output to file
with open(filename, 'w') as file:
file.write(''.join(file_out))
clean_file('test.txt')
The \n essentially means "create another line"
So when you've removed all the lines that are \n, there's still the preceding line
the lines below me should be gone\n
Which again means "create another line", beyond the ones you've already removed
Since you say you can't use rstrip, you could end the loop with
file_out[-1] = file_out[-1].strip('\n')
to remove \n from the last element. Because \n can't exist anywhere else in a line, rstrip and strip will have the same effect
Or without any strip or endswith:
if file_out[-1][-1] == '\n':
file_out[-1] = file_out[-1][:-1]
Note that \n is a single character, ordinal 0x0a as hex, not two characters \ and n, ordinals 0x5c and 0x6e. That is why we use -1 and not -2

\n appending at the end of each line

I am writing lines one by one to an external files. Each line has 9 columns separated by Tab delimiter. If i split each line in that file and output last column, i can see \n being appended to the end of the 9 column. My code is:
#!/usr/bin/python
with open("temp", "r") as f:
for lines in f:
hashes = lines.split("\t")
print hashes[8]
The last column values are integers, either 1 or 2. When i run this program, the output i get is,
['1\n']
['2\n']
I should only get 1 or 2. Why is '\n' being appended here?
I tried the following check to remove the problem.
with open("temp", "r") as f:
for lines in f:
if lines != '\n':
hashes = lines.split("\t")
print hashes[8]
This too is not working. I tried if lines != ' '. How can i make this go away? Thanks in advance.
Try using strip on the lines to remove the \n (the new line character). strip removes the leading and trailing whitespace characters.
with open("temp", "r") as f:
for lines in f.readlines():
if lines.strip():
hashes = lines.split("\t")
print hashes[8]
\n is the newline character, it is how the computer knows to display the data on the next line. If you modify the last item in the array hashes[-1] to remove the last character, then that should be fine.
Depending on the platform, your line ending may be more than just one character. Dos/Windows uses "\r\n" for example.
def clean(file_handle):
for line in file_handle:
yield line.rstrip()
with open('temp', 'r') as f:
for line in clean(f):
hashes = line.split('\t')
print hashes[-1]
I prefer rstrip() for times when I want to preserve leading whitespace. That and using generator functions to clean up my input.
Because each line has 9 columns, the 8th index (which is the 9th object) has a line break, since the next line starts. Just take that away:
print hashes[8][:-1]

Categories