Function to read fasta files not working after update python - python

I got a good code to read fasta files:
from itertools import groupby
def is_header(line):
return line[0] == '>'
def parse_fasta(filename):
if filename.endswith('.gz'):
opener = lambda filename: gzip.open(filename, 'rb')
else:
opener = lambda filename: open(filename, 'r')
with opener(filename) as f:
fasta_iter = (it[1] for it in groupby(f, is_header))
for name in fasta_iter:
name = name.__next__()[1:].strip()
sequences = ''.join(seq.strip() for seq in fasta_iter.__next__())
yield name, sequences.upper()
And it worked fine until I update to Python 3.10.4, then when I try to use it I got this error:
Traceback (most recent call last):
File "/media/paulosschlogl/Paulo/pauloscchlogl/Genome_kmers/fasta_parser.py", line 21, in parse_fasta
sequences = ''.join(seq.strip() for seq in fasta_iter.__next__())
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/media/paulosschlogl/Paulo/pauloscchlogl/Genome_kmers/fasta_split_chr_pls.py", line 112, in <module>
sys.exit(main())
File "/media/paulosschlogl/Paulo/pauloscchlogl/Genome_kmers/fasta_split_chr_pls.py", line 81, in main
plasmid, chromosome = split_sequences_from_fasta_file(filename)
File "/media/paulosschlogl/Paulo/pauloscchlogl/Genome_kmers/fasta_parser.py", line 111, in split_sequences_from_fasta_file
for header, seq in parse_fasta(filename)
RuntimeError: generator raised StopIteration
I run my code in conda (conda 4.13.0) environment, and I try to debug the function but I got stucked.
I don't want to loose this function to try use Biopython.
If you guys have any idea to fixe it I really appreciate.
Thanks
Paulo
Example of fasta file:
> seq_name
AGGTAGGGA
The funny stuff is that, when I run the function in python interpret at the command line all worked fine, but when I call the functions from the script using the function imported thats when I got the errors.
>>> import gzip
>>> from itertools import groupby
>>> def is_header(line):
... return line[0] == '>'
...
>>> for name, seq in parse_fasta("/media/paulosschlogl/Paulo/pauloscchlogl/Genome_kmers/Genomes/Archaea/Asgard/Chr/GCA_008000775.1_ASM800077v1_genomic.fna.gz"):
... print(name, seq[:50])
...
CP042905.1 Candidatus Prometheoarchaeum syntrophicum strain MK-D1 chromosome, complete genome TAAATATTATAGCCCGTAATAGCAGAGTCACCAACACTTAAAGGTGCATC
>>> quit()

The exception you're getting is because you're manually calling the __next__ method on various iterators in your code. Eventually you do that on an iterator that doesn't have any values left, and you'll get a StopIteration exception raised.
In much older versions of Python, that was OK to leave uncaught in a generator function. The StopIteration exception would continue to bubble up just like any other exception. For a generator function, raising StopIteration is an expected part of its behavior (it happens automatically when the function ends, either with a return statement, or by reaching the end of its code). In Python 3.5, this behavior changed, with PEP 479 making it an error for a StopIteration to go uncaught in a generator.
Now, given the logic of your code, I'm not exactly sure why you're getting empty iterators. If the file is in the format you describe, the __next__ calls should always have a value to get, and the StopIteration that comes when there are no values left will be received by the for loop, which will suppress it (and just end the loop). Perhaps some of your files are not correctly formatted, with a header line by itself with no subsequent sequences?
In any case, you can better diagnose the issue if you catch the StopIteration and print out some diagnostic information. I'd try:
with opener(filename) as f:
fasta_iter = (it[1] for it in groupby(f, is_header))
for name in fasta_iter:
name = name.__next__()[1:].strip()
try:
sequences = ''.join(seq.strip() for seq in fasta_iter.__next__())
except StopIteration as si:
print(f'no sequence for {name}')
raise ValueError() from si
yield name, sequences.upper()
If you find that the missing sequence is a normal thing and happens at the end of every one of your files, then you could suppress the error by just putting a return statement in the except block, or maybe by using zip in your for loop (for name_iter, sequences_iter in zip(fasta_iter, fasta_iter):). I'd hesitate to jump to that as the first solution though, since it will throw away a header if there is an extra one, and silently losing data is generally a bad idea.

Related

Setting a global variable at the end of a generator, persistence of loop variables

I want to know the number of items that a generator has generated.
I'm trying to do this by using the output of enumerate to set a global variable. It works on simple tests but goes wrong once I try to adapt the technique to my real application case.
The following script tests first a generator based on an iteration over the lines of a file, then a generator based on the parsing of a file using a bioinformatics library I want to use:
#!/usr/bin/env python3
def test1(delete=False):
# I have to comment the following otherwise I get:
# $ ./test.py
# Traceback (most recent call last):
# File "./test.py", line 60, in <module>
# test1()
# File "./test.py", line 31, in test1
# print(nb_things)
# UnboundLocalError: local variable 'nb_things' referenced before assignment
# if delete:
# try:
# del nb_things
# print("deleted nb_things")
# except NameError:
# pass
with open("test.py") as this_file:
def my_gen():
for i, thing in enumerate(this_file, start=1):
yield "just_to_test"
global nb_things
nb_things = i
return
g = my_gen()
for _ in g:
pass
print(nb_things)
return 0
import pysam
def test2(delete=False):
if delete:
try:
del nb_things
print("deleted nb_things")
except NameError:
pass
with pysam.AlignmentFile("/path/to/a/bam/file", "rb") as bamfile:
def my_gen():
for i, thing in enumerate(bamfile.fetch(), start=1):
yield "just_to_test"
global nb_things
nb_things = i
return
g = my_gen()
for _ in g:
pass
print(nb_things)
return 0
if __name__ == "__main__":
test1()
print("end of test 1")
test2()
print("end of test 2")
(As you can see in the comment in the above script, very strange things happen if I include code that mention my global variable without even being executed.)
When I execute the above code, the first test succeeds, but not the second, despite a very similar code structure:
$ ./test.py
63
end of test 1
Traceback (most recent call last):
File "./test.py", line 62, in <module>
test2()
File "./test.py", line 53, in test2
for _ in g:
File "./test.py", line 49, in my_gen
nb_things = i
UnboundLocalError: local variable 'i' referenced before assignment
My main question is:
Why does the enumeration counter still exist after the end of the for loop in the first case and not in the second?
I suspect that this has to do with the way the iteration is stopped. In the second case the generator somehow causes the enumerate result to cease to exist after the internal iterator gets stops.
What could cause such a difference?
A second question that occurred to me while designing the above test script is the following:
Why is the global variable nb_things considered local if I put code referencing it but not even executed? (note the delete=False, and the absence of a message mentioning the deletion)
I'm using python 3.6 and pysam version 0.10.0.
For an earlier version of the real code (but the essential approach is there), and clues regarding why I ended up defining my generator in the main function, see this question. (Essentially, the reason is that the generator actually uses a function that is defined depending on command-line options.)

with open() throwing errors on a json file

So I'm super new to coding and I wanted to design a text based RPG as sort of a fun way to learn some stuff and I picked out the language Python because it was named after Monty Python. How perfect right? Well, that is what I thought until trying to get rooms to load.
I am using json files to store my room names, descriptions, and exits then trying to call them in python via a method I saw on YouTube, here is the code:
def getRoom(id):
ret = None
with open(str(id)+".json", "r") as f:
jsontext = f.read()
d = json.loads(jsontext)
d['id'] = id
ret = Room(**d)
This threw an IOError directory or file not found, so I added a try statement like so:
def getRoom(id):
ret = None
try:
with open(str(id)+".json", "r") as f:
jsontext = f.read()
d = json.loads(jsontext)
d['id'] = id
ret = Room(**d)
except IOError:
print("An error occured")
However now I am getting an "AttributeError: 'NoneType' object has no attribute 'name'" off my look command which I have coded like so:
def look(player, args):
print(player.loc.name)
print("")
print (player.loc.description)
In case this matters here is my json file that I have named 1.json:
{
"name": "A Small Bedroom",
"description": "The old bed room has probably seen many people over the years as the inn sits along a major trade route. The floor boards show wear and creak as you walk over them.",
"neighbors": {"w":2}
}
EDIT:
Full traceback:
Traceback (most recent call last):
File "game.py", line 79, in <module>
main(player) File "game.py", line 68, in main
player.loc = getRoom(1)
File "/home/illyduss/Urth/Locations/room.py", line 6, in getRoom
with open(str(id)+".json", "r") as f:
IOError: [Errno 2] No such file or directory: '1.json'
The error clearly says that the file is not to be found. Try the following.
1. make sure that the filename 1.json is available from where you are calling the python interpretor.
for example: if you are calling $ python game/game.py, then the file should be in the present working directory, not in game dir
Try using absolute paths if you can
import os
base_dir = /path/to/json/dir
filename = str(id)+".json"
abs_file = os.path.join(base_dir, filename)
with open(abs_file, "r"):
#do stuff
If you need the json files to be relative to the game.py file and still need the game file to be called from elsewhere, a good practice would be to define base_dir using __file__ attribute of the python file
base_dir = os.path.dirname(__file__)
The reason you're geting NoneType error is that somehow the loc variable is being set to None. which means that you are passing None to the Player's constructor. Since you haven't provided the code where you initialize player, I am assuming that you're passing the result of getRoom() as loc to the constructor. If that is the case, make sure that the value returned by getRoom is not None. you need an explicit return statement at the end of the function. return ret . by default any function without a return statement returns None. That could be your issue

implementing a deferred exception in Python

I would like to implement a deferred exception in Python that is OK to store somewhere but as soon as it is used in any way, it raises the exception that was deferred. Something like this:
# this doesn't work but it's a start
class DeferredException(object):
def __init__(self, exc):
self.exc = exc
def __getattr__(self, key):
raise self.exc
# example:
mydict = {'foo': 3}
try:
myval = obtain_some_number()
except Exception as e:
myval = DeferredException(e)
mydict['myval'] = myval
def plus_two(x):
print x+2
# later on...
plus_two(mydict['foo']) # prints 5
we_dont_use_this_val = mydict['myval'] # Always ok to store this value if not used
plus_two(mydict['myval']) # If obtain_some_number() failed earlier,
# re-raises the exception, otherwise prints the value + 2.
The use case is that I want to write code to analyze some values from incoming data; if this code fails but the results are never used, I want it to fail quietly; if it fails but the results are used later, then I'd like the failure to propagate.
Any suggestions on how to do this? If I use my DeferredException class I get this result:
>>> ke = KeyError('something')
>>> de = DeferredException(ke)
>>> de.bang # yay, this works
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 6, in __getattr__
KeyError: 'something'
>>> de+2 # boo, this doesn't
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'DeferredException' and 'int'
Read section 3.4.12 of the docs, "Special method lookup for new-style classes." It explains exactly the problem you have encountered. The normal attribute lookup is bypassed by the interpreter for certain operators, such as addition (as you found out the hard way). Thus the statement de+2 in your code never calls your getattr function.
The only solution, according to that section, is to insure that "the special method must be set on the class object itself in order to be consistently invoked by the interpreter."
Perhaps you'd be better off storing all your deferred exceptions in a global list, wrapping your entire program in a try:finally: statement, and printing out the whole list in the finally block.

Safest way to open a file in python 3.4

I was expecting the following would work but PyDev is returning an error:
try fh = open(myFile):
logging.info("success")
except Exception as e:
logging.critical("failed because:")
logging.critical(e)
gives
Encountered "fh" at line 237, column 5. Was expecting: ":"...
I've looked around and cannot find a safe way to open a filehandle for reading in Python 3.4 and report errors properly. Can someone point me in the correct direction please?
You misplaced the :; it comes directly after try; it is better to put that on its own, separate line:
try:
fh = open(myFile)
logging.info("success")
except Exception as e:
logging.critical("failed because:")
logging.critical(e)
You placed the : after the open() call instead.
Instead of passing in e as a separate argument, you can tell logging to pick up the exception automatically:
try:
fh = open(myFile)
logging.info("success")
except Exception:
logging.critical("failed because:", exc_info=True)
and a full traceback will be included in the log. This is what the logging.exception() function does; it'll call logging.error() with exc_info set to true, producing a message at log level ERROR plus a traceback.

TypeError: in Python

I have an issue, where a function returns a number. When I then try to assemble a URL that includes that number I am met with failure.
Specifically the error I get is
TypeError: cannot concatenate 'str' and 'NoneType' objects
Not sure where to go from here.
Here is the relevant piece of code:
# Get the raw ID number of the current configuration
configurationID = generate_configurationID()
# Update config name at in Cloud
updateConfigLog = open(logBase+'change_config_name_log.xml', 'w')
# Redirect stdout to file
sys.stdout = updateConfigLog
rest.rest(('put', baseURL+'configurations/'+configurationID+'?name=this_is_a_test_', user, token))
sys.stdout = sys.__stdout__
It works perfectly if I manually type the following into rest.rest()
rest.rest(('put', http://myurl.com/configurations/123456?name=this_is_a_test_, myusername, mypassword))
I have tried str(configurationID) and it spits back a number, but I no longer get the rest of the URL...
Ideas? Help?
OK... In an attempt to show my baseURL and my configurationID here is what I did.
print 'baseURL: '+baseURL
print 'configurationID: '+configurationID
and here is what I got back
it-tone:trunk USER$ ./skynet.py fresh
baseURL: https://myurl.com/
369596
Traceback (most recent call last):
File "./skynet.py", line 173, in <module>
main()
File "./skynet.py", line 30, in main
fresh()
File "./skynet.py", line 162, in fresh
updateConfiguration()
File "./skynet.py", line 78, in updateConfiguration
print 'configurationID: '+configurationID
TypeError: cannot concatenate 'str' and 'NoneType' objects
it-tone:trunk USER$
What is interesting to me is that the 369596 is the config ID, but like before it seems to clobber everything called up around it.
As kindall pointed out below, my generate_configurationID was not returning the value, but rather it was printing it.
# from generate_configurationID
def generate_configurationID():
dom = parse(logBase+'provision_template_log.xml')
name = dom.getElementsByTagName('id')
p = name[0].firstChild.nodeValue
print p
return p
Your configurationID is None. This likely means that generate_configurationID() is not returning a value. There is no way in Python for a variable name to "lose" its value. The only way, in the code you posted, for configurationID to be None is for generate_configurationID() to return None which is what will happen if you don't explicitly return any value.
"But it prints the configurationID right on the screen!" you may object. Sure, but that's probably in generate_configurationID() where you are printing it to make sure it's right but forgetting to return it.
You may prove me wrong by posting generate_configurationID() in its entirety, and I will admit that your program is magic.

Categories