python string assignment - python

I have a StringIO object the is filled correctly. I than have the following code:
val = log_fp.getvalue()
lines = val.split('\n')
newval = ''
for line in lines:
if (not line.startswith('[output]')):
newval = line
print 'test1'+newval
print 'test2' +newval
in the loop, I have the correct value for newval printed, but in the last print, I have an empty string. Any ideas what I am doing wrong? What I need is to extract one of the lines in the stringIO object that is marked [output], but newval seems to be empty in 'test2'.

Splitting on '\n' for a string such as 'foo\n' will produce ['foo', ''].

What I need is to extract one of the
lines in the stringIO object that is
marked [output],
Untested:
content = log_fp.getvalue().split()
output_lines = [x for x in content if x.startswith('[output'])]
Then get the first element of output_lines, if that is what you need.

Is log_fp a text file?
If so, the last value in lines will be everything after the last newline character. Your file probably terminates in a newline, or a newline and some whitespace.
For the former case, the last value of line will be an empty string.
For the latter case, the last value of line will be the whitespace.
To avoid this, you could add a new clause to the if statement to check the trimmed string is not empty, eg.
val = log_fp.getvalue()
lines = val.split('\n')
newval = ''
for line in lines:
if ( len(line.strip()) > 0):
if (not line.startswith('[output]')):
newval = line
print 'test1'+newval
print 'test2' +newval
(I haven't tried running this, but it should give you the idea)

Related

Return value in a quite nested for-loop

I want nested loops to test whether all elements match the condition and then to return True. Example:
There's a given text file: file.txt, which includes lines of this pattern:
aaa:bb3:3
fff:cc3:4
Letters, colon, alphanumeric, colon, integer, newline.
Generally, I want to test whether all lines matches this pattern. However, in this function I would like to check whether the first column includes only letters.
def opener(file):
#Opens a file and creates a list of lines
fi=open(file).read().splitlines()
import string
res = True
for i in fi:
#Checks whether any characters in the first column is not a letter
if any(j not in string.ascii_letters for j in i.split(':')[0]):
res = False
else:
continue
return res
However, the function returns False even if all characters in the first column are letters. I would like to ask you for the explanation, too.
Your code evaluates the empty line after your code - hence False :
Your file contains a newline after its last line, hence your code checks the line after your last data which does not fullfill your test- that is why you get False no matter the input:
aaa:bb3:3
fff:cc3:4
empty line that does not start with only letters
You can fix it if you "spezial treat" empty lines if they occur at the end. If you have an empty line in between filled ones you return False as well:
with open("t.txt","w") as f:
f.write("""aaa:bb3:3
fff:cc3:4
""")
import string
def opener(file):
letters = string.ascii_letters
# Opens a file and creates a list of lines
with open(file) as fi:
res = True
empty_line_found = False
for i in fi:
if i.strip(): # only check line if not empty
if empty_line_found: # we had an empty line and now a filled line: error
return False
#Checks whether any characters in the first column is not a letter
if any(j not in letters for j in i.strip().split(':')[0]):
return False # immediately exit - no need to test the rest of the file
else:
empty_line_found = True
return res # or True
print (opener("t.txt"))
Output:
True
If you use
# example with a file that contains an empty line between data lines - NOT ok
with open("t.txt","w") as f:
f.write("""aaa:bb3:3
fff:cc3:4
""")
or
# example for file that contains empty line after data - which is ok
with open("t.txt","w") as f:
f.write("""aaa:bb3:3
ff2f:cc3:4
""")
you get: False
Colonoscopy
ASCII, and UNICODE, both define character 0x3A as COLON. This character looks like two dots, one over the other: :
ASCII, and UNICODE, both define character 0x3B as SEMICOLON. This character looks like a dot over a comma: ;
You were consistent in your use of the colon in your example: fff:cc3:4 and you were consistent in your use of the word semicolon in your descriptive text: Letters, semicolon, alphanumeric, semicolon, integer, newline.
I'm going to assume you meant colon (':') since that is the character you typed. If not, you should change it to a semicolon (';') everywhere necessary.
Your Code
Here is your code, for reference:
def opener(file):
#Opens a file and creates a list of lines
fi=open(file).read().splitlines()
import string
res = True
for i in fi:
#Checks whether any characters in the first column is not a letter
if any(j not in string.ascii_letters for j in i.split(':')[0]):
res = False
else:
continue
return res
Your Problem
The problem you asked about was the function always returning false. The example you gave included a blank line between the first example and the second. I would caution you to watch out for spaces or tabs in those blank lines. You can fix this by explicitly catching blank lines and skipping over them:
for i in fi:
if i.isspace():
# skip blank lines
continue
Some Other Problems
Now here are some other things you might not have noticed:
You provided a nice comment in your function. That should have been a docstring:
def opener(file):
""" Opens a file and creates a list of lines.
"""
You import string in the middle of your function. Don't do that. Move the import
up to the top of the module:
import string # at top of file
def opener(file): # Not at top of file
You opened the file with open() and never closed it. This is exactly why the with keyword was added to python:
with open(file) as infile:
fi = infile.read().splitlines()
You opened the file, read its entire contents into memory, then split it into lines
discarding the newlines at the end. All so that you could split it by colons and ignore
everything but the first field.
It would have been simpler to just call readlines() on the file:
with open(file) as infile:
fi = infile.readlines()
res = True
for i in fi:
It would have been even easier and even simpler to just iterate on the file directly:
with open(file) as infile:
res = True
for i in infile:
It seems like you are building up towards checking the entire format you gave at the beginning. I suspect a regular expression would be (1) easier to write and maintain; (2) easier to understand later; and (3) faster to execute. Both now, for this simple case, and later when you have more rules in place:
import logging
import re
bad_lines = 0
for line in infile:
if line.isspace():
continue
if re.match(valid_line, line):
continue
logging.warn(f"Bad line: {line}")
bad_lines += 1
return bad_lines == 0
Your names are bad. Your function includes the names file, fi, i, j, and res. The only one that barely makes sense is file.
Considering that you are asking people to read your code and help you find a problem, please, please use better names. If you just replaced those names with file (same), infile, line, ch, and result the code gets more readable. If you restructured the code using standard Python best practices, like with, it gets even more readable. (And has fewer bugs!)

delete empty spaces of files python

I have a file with several lines, and some of them have empty spaces.
x=20
y=3
z = 1.5
v = 0.1
I want to delete those spaces and get each line into a dictionary, where the element before the '=' sign will be the key, and the element after the '=' sign will be its value.
However, my code is not working, at least the "delete empty spaces" part. Here's the code:
def copyFile(filename):
"""
function's contract
"""
with open(filename, 'r') as inFile:
for line in inFile:
cleanedLine = line.strip()
if cleanedLine:
firstPart, secondPart = line.split('=')
dic[firstPart] = float(secondPart)
inFile.close()
return dic
After clearing the empty spaces, my file is supposed to get like this
x=20
y=3
z=1.5
v=0.1
But is not working. What am I doing wrong?
You need to strip after splitting the string. That's assuming that the only unwanted spaces are around the = or before or after the contents of the line.
from ast import literal_eval
def copyFile(filename):
with open(filename, 'r') as inFile:
split_lines = (line.split('=', 1) for line in inFile)
d = {key.strip(): literal_eval(value.strip()) for key, value in split_lines}
return d
There are a few issues with your code.
For one, you never define dic so when you try to add keys to it you get a NameError.
Second, you don't need to inFile.close() because you're opening it in a with which will always close it outside the block.
Third, your function and variable names are not PEP8 standard.
Fourth, you need to strip each part.
Here's some code that works and looks nice:
def copy_file(filename):
"""
function's contract
"""
dic = {}
with open(filename, 'r') as in_file:
for line in in_file:
cleaned_line = line.strip()
if cleaned_line:
first_part, second_part = line.split('=')
dic[first_part.strip()] = float(second_part.strip())
return dic
You have two problems:
The reason you're not removing the white space is that you're calling .strip() on the entire line. strip() removes white space at the beginning and end of the string, not in the middle. Instead, called .strip() on firstpart and lastpart.
That will fix the in-memory dictionary that you're creating but it won't make any changes to the file since you're never writing to the file. You'll want to create a second copy of the file into which you write your strip()ed values and then, at the end, replace the original file with the new file.
to remove the whitespace try .replace(" ", "") instead of .strip()

rstrip not working as expected (Python 2.7)

I have the following code:
file = open("file", "r")
array = file.readlines()
stats = [1, 1, 1, 1, 1] # creating an array to fill
print array
sh1 = array[1] # breaking the array extracted from the text file up for editing
sh2 = array[2]
sh3 = array[3]
sh4 = array[4]
stats[0] = string.rstrip(sh1[1])
stats[1] = string.rstrip(sh2[1])
stats[2] = string.rstrip(sh3[1])
stats[3] = string.rstrip(sh4[1])
print stats
I was expecting it to strip the newlines from the array extracted from the text file and place the new data into a separate array. What is instead happening is I'm having a seemingly random amount of characters stripped from either end of my variables. Please could someone explain what I've done wrong?
sh1,sh2,sh3,sh4 are strings, so sh1[1] is the second character from the string.
rstrip will remove trailing whitespace, so you will put either 1 or 0 character strings into your result array.
I suspect you want something like:
stats = []
for line in open("file").readlines():
line = line.rstrip()
stats.append(line)
print stats
or all on one line:
print [ l.rstrip() for l in open("file").readlines() ]
Use list-comprehension.
array = file.readlines()
print [i.rstrip() for i in array]
You should open the file using with, you don't need to call readlines first. You can simply iterate over the file object in a list comprehension calling rstrip on each line:
with open("file") as f: # with closes your file automatically
stats = [line.rstrip() for line in f]
Why your code removes random characters is because you are passing random characters to remove, you are passing the second character from the second, third,fourth and fifth lines respectively to rstrip and stripping from lines 1,2,3 and 4 so depending on what the strings end with and what you passed different chars will be removed. You can pass no substring to remove any whitespace or specify certain characters:
In [3]: "foobar".rstrip("bar")
Out[3]: 'foo'
In [4]: "foobar \n".rstrip()
Out[4]: 'foobar'
There is also no way you are removing data from the front of the string unless you are completely stripping the string. Lastly if you actually want to skip the first line and start at line 2 you would simply have to call next(f) on the file object before you iterate in the comprehension.

removing blank lines from text file output python 3

I wrote a program in python 3 that edits a text file, and outputs the edited version to a new text file. But the new file has blank lines that I can't have, and I can't figure out how to get rid of them.
Thanks in advance.
newData = ""
i=0
run=1
j=0
k=1
seqFile = open('temp100.txt', 'r')
seqData = seqFile.readlines()
while i < 26:
sLine = seqData[j]
editLine = seqData[k]
tempLine = editLine[0:20]
newLine = editLine.replace(editLine, tempLine)
newData = newData+sLine+'\n'+newLine+'\n'
i=i+1
j=j+2
k=k+2
run=run+1
seqFile.close()
new100 = open("new100a.fastq", "w")
sys.stdout = new100
print(newData)
Problem is at this line:
newData = newData+sLine+'\n'+newLine+'\n'
sLine already contains newline symbol, so you should remove the first '\n'. If length of newLine is less than 20, then newLine also contains the newline. In other case you should add the newline symbol to it.
Try this:
newData = newData + sLine + newLine
if len(seqData[k]) > 20:
newData += '\n'
sLine already contains newlines. newLine will also contain a newline if editLine is shorter or equal to 20 characters long. You can change
newData = newData+sLine+'\n'+newLine+'\n'
to
newData = newData+sLine+newLine
In cases where editLine is longer than 20 characters, the trailing newline will be cut off when you do tempLine = editLine[0:20] and you will need to append a newline to newData yourself.
According to the python documentation on readline (which is used by readlines), trailing newlines are kept in each line:
Read one entire line from the file. A trailing newline character is
kept in the string (but may be absent when a file ends with an
incomplete line). [6] If the size argument is present and
non-negative, it is a maximum byte count (including the trailing
newline) and an incomplete line may be returned. When size is not 0,
an empty string is returned only when EOF is encountered immediately.
In general, you can often get a long way in debugging a program by printing the values of your variables when you get unexpected behaviour. For instance printing sLine with print repr(sLine) would have shown you that there was a trailing newline in there.

Python rstrip() (for tabs) not working as expected

I was trying out the rstrip() function, but it doesn't work as expected.
For example, if I run this:
lines = ['tra\tla\tla\t\t\t\n', 'tri\tli\tli\t\t\t\n', 'tro\tlo\tlo\t\t\t\n']
for line in lines:
line.rstrip('\t')
print lines
It returns
['tra\tla\tla\t\t\t\n', 'tri\tli\tli\t\t\t\n', 'tro\tlo\tlo\t\t\t\n']
whereas I want it to return:
['tra\tla\tla\n', 'tri\tli\tli\n', 'tro\tlo\tlo\n']
What is the problem here?
The function returns the new, stripped string, but you discard that return value.
Use a list comprehension instead to replace the whole lines list; you'll need to ignore the newlines at the end as well; the .rstrip() method won't ignore those for you.
lines = [line[:-1].rstrip('\t') + '\n' for line in lines]
Demo:
>>> lines = ['tra\tla\tla\t\t\t\n', 'tri\tli\tli\t\t\t\n', 'tro\tlo\tlo\t\t\t\n']
>>> [line[:-1].rstrip('\t') + '\n' for line in lines]
['tra\tla\tla\n', 'tri\tli\tli\n', 'tro\tlo\tlo\n']

Categories