I am writing a script that modifies any text files. It replaces white space lines with blank lines. It erases the blank lines at the end of the file. The image shows the output I want.
I am able to get very close to the desired output. The problem is that I cannot get rid of the last blank line. I think this has something to do with the last line. e.g ' the lines below me should be gone actually looks like this ' the lines below me should be gone\n' It looks like new lines are created on the previous line. e.g if line 4 has \n than line 5 will actually be the blank line not line 4.
I should note that I can't use rstrip or strip
My code so far.
def clean_file(filename):
# function to check if the line can be deleted
def is_all_whitespace(line):
for char in line:
if char != ' ' and char != '\n':
return False
return True
# generates the new lines
with open(filename, 'r') as file:
file_out = []
for line in file:
if is_all_whitespace(line):
line = '\n'
file_out.append(line)
# removes whitespaces at the end of file
while file_out[-1] == '\n': # while the last item in lst is blank
file_out.pop(-1) # removes last element
# writes the new the output to file
with open(filename, 'w') as file:
file.write(''.join(file_out))
clean_file('test.txt')
The \n essentially means "create another line"
So when you've removed all the lines that are \n, there's still the preceding line
the lines below me should be gone\n
Which again means "create another line", beyond the ones you've already removed
Since you say you can't use rstrip, you could end the loop with
file_out[-1] = file_out[-1].strip('\n')
to remove \n from the last element. Because \n can't exist anywhere else in a line, rstrip and strip will have the same effect
Or without any strip or endswith:
if file_out[-1][-1] == '\n':
file_out[-1] = file_out[-1][:-1]
Note that \n is a single character, ordinal 0x0a as hex, not two characters \ and n, ordinals 0x5c and 0x6e. That is why we use -1 and not -2
Related
I'm trying to transform the text in a file according the following rule: for each line, if the line does not begin with "https", add that word to the beginning of subsequent lines until you hit another line with a non-https word.
For example, given this file:
Fruit
https://www.apple.com//
https://www.banana.com//
Vegetable
https://www.cucumber.com//
https://www.lettuce.com//
I want
Fruit-https://www.apple.com//
Fruit-https://www.banana.com//
Vegetable-https://www.cucumber.com//
Vegetable-https://www.lettuce.com//
Here is my attempt:
one = open("links.txt", "r")
for two in one.readlines():
if "https" not in two:
sitex = two
else:
print (sitex + "-" +two)
Here is the output of that program, using the above sample input file:
Fruit
-https://www.apple.com//
Fruit
-https://www.banana.com//
Vegetable
-https://www.cucumber.com//
Vegetable
-https://www.lettuce.com//
What is wrong with my code?
To fix that we need to implement rstrip() method to sitex to remove the new line character at the end of the string. (credit to BrokenBenchmark)
second, the print command by default newlines everytime it's called, so we must add the end="" parameter to fix this.
So your code should look like this
one = open("links.txt", "r")
for two in one.readlines():
if "https" not in two:
sitex = two.rstrip()
else:
print (sitex + "-" +two,end="")
one.close()
Also always close the file when you are done.
Lines in your file end on "\n" - the newline character.
You can remove whitespaces (includes "\n") from a string using strip() (both ends) or rstrip() / lstrip() (remove at one end).
print() adds a "\n" at its end by default, you can omit this using
print("something", end=" ")
print("more) # ==> 'something more' in one line
Fix for your code:
# use a context handler for better file handling
with open("data.txt","w") as f:
f.write("""Fruit
https://www.apple.com//
https://www.banana.com//
Vegetable
https://www.cucumber.com//
https://www.lettuce.com//
""")
with open("data.txt") as f:
what = ""
# iterate file line by line instead of reading all at once
for line in f:
# remove whitespace from current line, including \n
# front AND back - you could use rstring here as well
line = line.strip()
# only do something for non-empty lines (your file does not
# contain empty lines, but the last line may be empty
if line:
# easier to understand condition without negation
if line.startswith("http"):
# printing adds a \n at the end
print(f"{what}-{line}") # line & what are stripped
else:
what = line
Output:
Fruit-https://www.apple.com//
Fruit-https://www.banana.com//
Vegetable-https://www.cucumber.com//
Vegetable-https://www.lettuce.com//
See:
str.lstrip([chars])
str.rstrip([chars])
str.strip([chars])
[chars] are optional - if not given, whitespaces are removed.
You need to strip the trailing newline from the line if it doesn't contain 'https':
sitex = two
should be
sitex = two.rstrip()
You need to do something similar for the else block as well, as ShadowRanger points out:
print (sitex + "-" +two)
should be
print (sitex + "-" + two.rstrip())
I have loaded a file into a list
line_storage = [];
try:
with open(file_name_and_path, 'r') as f:
for line in f:
line_storage.append(line) # store in list
But when trying to convert it to string ("stringify" it):
total_number_of_lines = len(line_storage)
lineBuffer = "";
for line_index in xrange(0, total_number_of_lines):
lineBuffer += line_storage[line_index].rstrip('\n') # append line after removing newline
The print is not showing me the full content, but only the last line. Though, len(lineBuffer) is correct.
The file contents is:
....
[04.01] Test 1:
You should be able to read this.
[04.02] Test 2:
....
=========================================================== EOF
How can I work around this?
Your text lines probably end in \r\n, not just \n. By removing the \n, you are leaving the \r at the end of each line. When you print this to the terminal, each line will overwrite the previous line because \r only moves the cursor back to the beginning of the current line.
The solution is probably to use .rstrip('\r\n').
I was intersted to know if there is a easy way to append a semicolon to the end of each line. I tried but always it prints the semicolon in the next line.
rom_data.txt
0123
1
253
3
my_script.py
input_file = open("rom_data.txt", "r")
for line in input_file:
final= line+';'
print final
expected output
0123;
1;
253;
3;
output obtained
0123
;
1
;
253
;
3
;
Could anybody tell me where am i going wrong
so your text file consists of lines. if you ask a viewer to show non-printable characters, it will show something like
0123\n
1\n
253\n
3\n
(or some other symbol marking the line break)
for line in input_file: # so here "line" is "0123\n"
final= line+';' # here you append a semicolon, so it becomes "0123\n;"
print final # print adds another line break, so the output is "0123\n;\n"
a common solution would be to strip the line breaks first thing in the loop:
for line in input_file:
line = line.strip() # here
final= line+';'
print final
The solution of #Pavel points out the problem, but notice that there may be some problem.
That if you initial rom_data row begins or ends with some blank character, the strip() function will remove them all, which may not as expected.
For example:
rom_data.txt:
0123 \n
123 \n
456\n
may obtain output below:
0123\n
123\n
456\n
If you want to keep the blanks, you should only strip the last character:
for line in input_file:
print(line[:-1]) # use this way to strip only the '\n' character at last
This may be more exact.
The problem is that there is a newline character present in line. You will need to strip the newline character using rstrip. Refer below code:
input_file = open("rom_data.txt", "r")
for line in input_file:
final= line.rstrip('\n') + ';'
print final
I wrote a program in python 3 that edits a text file, and outputs the edited version to a new text file. But the new file has blank lines that I can't have, and I can't figure out how to get rid of them.
Thanks in advance.
newData = ""
i=0
run=1
j=0
k=1
seqFile = open('temp100.txt', 'r')
seqData = seqFile.readlines()
while i < 26:
sLine = seqData[j]
editLine = seqData[k]
tempLine = editLine[0:20]
newLine = editLine.replace(editLine, tempLine)
newData = newData+sLine+'\n'+newLine+'\n'
i=i+1
j=j+2
k=k+2
run=run+1
seqFile.close()
new100 = open("new100a.fastq", "w")
sys.stdout = new100
print(newData)
Problem is at this line:
newData = newData+sLine+'\n'+newLine+'\n'
sLine already contains newline symbol, so you should remove the first '\n'. If length of newLine is less than 20, then newLine also contains the newline. In other case you should add the newline symbol to it.
Try this:
newData = newData + sLine + newLine
if len(seqData[k]) > 20:
newData += '\n'
sLine already contains newlines. newLine will also contain a newline if editLine is shorter or equal to 20 characters long. You can change
newData = newData+sLine+'\n'+newLine+'\n'
to
newData = newData+sLine+newLine
In cases where editLine is longer than 20 characters, the trailing newline will be cut off when you do tempLine = editLine[0:20] and you will need to append a newline to newData yourself.
According to the python documentation on readline (which is used by readlines), trailing newlines are kept in each line:
Read one entire line from the file. A trailing newline character is
kept in the string (but may be absent when a file ends with an
incomplete line). [6] If the size argument is present and
non-negative, it is a maximum byte count (including the trailing
newline) and an incomplete line may be returned. When size is not 0,
an empty string is returned only when EOF is encountered immediately.
In general, you can often get a long way in debugging a program by printing the values of your variables when you get unexpected behaviour. For instance printing sLine with print repr(sLine) would have shown you that there was a trailing newline in there.
I am writing lines one by one to an external files. Each line has 9 columns separated by Tab delimiter. If i split each line in that file and output last column, i can see \n being appended to the end of the 9 column. My code is:
#!/usr/bin/python
with open("temp", "r") as f:
for lines in f:
hashes = lines.split("\t")
print hashes[8]
The last column values are integers, either 1 or 2. When i run this program, the output i get is,
['1\n']
['2\n']
I should only get 1 or 2. Why is '\n' being appended here?
I tried the following check to remove the problem.
with open("temp", "r") as f:
for lines in f:
if lines != '\n':
hashes = lines.split("\t")
print hashes[8]
This too is not working. I tried if lines != ' '. How can i make this go away? Thanks in advance.
Try using strip on the lines to remove the \n (the new line character). strip removes the leading and trailing whitespace characters.
with open("temp", "r") as f:
for lines in f.readlines():
if lines.strip():
hashes = lines.split("\t")
print hashes[8]
\n is the newline character, it is how the computer knows to display the data on the next line. If you modify the last item in the array hashes[-1] to remove the last character, then that should be fine.
Depending on the platform, your line ending may be more than just one character. Dos/Windows uses "\r\n" for example.
def clean(file_handle):
for line in file_handle:
yield line.rstrip()
with open('temp', 'r') as f:
for line in clean(f):
hashes = line.split('\t')
print hashes[-1]
I prefer rstrip() for times when I want to preserve leading whitespace. That and using generator functions to clean up my input.
Because each line has 9 columns, the 8th index (which is the 9th object) has a line break, since the next line starts. Just take that away:
print hashes[8][:-1]