Indexing for SeqRecords

Indexing for SeqRecords - python

I would like to get a list of indexes for SeqRecords that are in list f. I tried this:
for x in f:
ind = f.index(x)
print(ind)
But I get the error:
0
Traceback (most recent call last):
File "C:\Users\Adrian\Desktop\Sekwencje\skrypt 1.py", line 43, in <module>
ind = f.index(x)
File "C:\Users\Adrian\anaconda3\lib\site-packages\Bio\SeqRecord.py", line 803, in __eq__
raise NotImplementedError(_NO_SEQRECORD_COMPARISON)
NotImplementedError: SeqRecord comparison is deliberately not implemented. Explicitly compare the
attributes of interest.
Thanks for any answer.

Explanation
You can't get the index of and SeqRecord in list, because "SeqRecord comparison is deliberately not implemented" (you got the explanation in error message - NotImplementedError). The index() method returns the lowest index in list that obj appears. But to do it, it needs a specific comparsion method, that is not implemented in Seq module.
Hard method
Since python is a dynamic language, you can add a comparsion method to class. Even the error message is giving you the answer (Explicitly compare the
attributes of interest). Here is a code:
from Bio.SeqRecord import SeqRecord
def equal_seqs(self, other):
if not isinstance(other, SeqRecord):
raise NotImplementedError('Comparsion on wrong types!')
else:
return self.seq == other.seq # You can change it to whatever you want.
SeqRecord.__eq__ = equal_seqs
foo = SeqRecord('ATGCGCAT')
bar = SeqRecord('GACGATCA')
print(foo == bar)
# False
l = [foo, bar]
print(l.index(bar))
# 1
Other possibility
I don't know if I understood you correctly, but if you wanted to print ID of a sequence, then you can do it as following:
for seq in sequences:
print(f'{seq.id} {seq.name}')
Is that what you wanted?
More info
If you want to read more info about rich comparsion methods, then you can find it here

Related

Name error: 'self' not defined - when calling a function to create in-class variables

I have the following class:
class Documents:
def __init__(self, input_file):
self.input_file_ = input_file #List in which each element is a list of tokens
assert type(self.input_file_) is list, 'Input file is not a list'
assert type(self.input_file_[0]) is list, 'Elements in input file are not lists' #Only checks first instance, not all. But should suffice
def get_vocabulary(self):
vocabulary = set([el for lis in self.input_file_ for el in lis])
return vocabulary, len(vocabulary)
vocabulary, vocabulary_size = self.get_vocabulary()
But when I try to execute it, I get the following error:
Traceback (most recent call last):
File "<ipython-input-34-4268f473c299>", line 1, in <module>
class Documents:
File "<ipython-input-34-4268f473c299>", line 30, in Documents
vocabulary, vocabulary_size = self.get_vocabulary()
NameError: name 'self' is not defined
This is a common error on SO. However, I have not found an answer in which the code has a similar structure.
Can someone explain to me why I get this error and how I can change my code so that I don't get an error?

The way you have it, vocabulary, vocabulary_size = self.get_vocabulary() is being executed when the class is being defined, so there is no self. The latter is the name of the first argument passed to methods of a class and is the instance of the class (that was previously created) on which to operate.
The proper way to do this would be to call get_vocabulary() method from the __init__() method when an instance of the class exists and is being initialized.
Here's what I mean:
class Documents:
def __init__(self, input_file):
self.input_file_ = input_file # List in which each element is a list of tokens
self.vocabulary, self.vocabulary_size = self.get_vocabulary()
assert type(self.input_file_) is list, 'Input file is not a list'
assert type(self.input_file_[0]) is list, 'Elements in input file are not lists' # Only checks first instance, not all. But should suffice
def get_vocabulary(self):
vocabulary = set([el for lis in self.input_file_ for el in lis])
return vocabulary, len(vocabulary)
Comment (off-topic):
In languages that have classes and support object-orientated code like Python, it's usually best to avoid type-checking as much as possible because it doesn't support subtyping — but you can overcome that limitation when it is done by using the built-in isinstance() built-in function.
This implies that it would probably be better to do the following in your __init__() method:
assert isinstance(self.input_file, list), 'Input file is not a list'
assert isinstance(self.input_file_[0], list), 'Elements in input file are not lists' # Only checks first instance, not all. But should suffice

Extract value from a dictionary using key that is a list

I have a the following programme:
import QuantLib as ql
deposits = {ql.Period(1,ql.Weeks): 0.0023,
ql.Period(1,ql.Months): 0.0032,
ql.Period(3,ql.Months): 0.0045,
ql.Period(6,ql.Months): 0.0056}
for n, unit in [(1,ql.Weeks),(1,ql.Months),(3,ql.Months),(6,ql.Months)]:
print deposits([n,unit])
What I expect this programme to do is: it loops through the dictionary keys, which comprises an embedded list of a 'number' (i.e. 1,1,3,6) and 'unit' (i.e. weeks and months), and extracts the correct value (or rate). Currently I get an error with the line print deposits([n,unit]).
Here is the error I get:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
execfile(filename, namespace)
File "C:\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "TestFunction.py", line 16, in <module>
print deposits([n,unit])
TypeError: 'dict' object is not callable
The name of my file is TestFunction.py
I know a way round this issue, which is where I convert the dictionary into two lists as follows:
depoMaturities = [ql.Period(1,ql.Weeks),
ql.Period(1,ql.Months),
ql.Period(3,ql.Months),
ql.Period(6,ql.Months)]
depoRates = [0.0023,
0.0032,
0.0045,
0.0056]
But then it does not look as tidy or as sophisticated. I'd be really grateful for your advice.

Update per comments: It looks like the Period class implemented __hash__ incorrectly, so it doesn't obey the hash invariant required by Python (specifically, objects that compare equal should hash to the same value). Per your comment, when you run:
p1 = ql.Period(1,ql.Weeks)
p2 = ql.Period(1,ql.Weeks)
if (p1 == p2): k = 5*2
else: k = 0
you get 10, so p1==p2 is True.
When you run:
if (hash(p1) == hash(p2)): b = 5*2
else: b = 0
you get 0, so hash(p1) == hash(p2) is False. This is a clear violation of the Python rules, which makes the type appear to be a legal key for a dict (or value in a set), but behave incorrectly. Basically, you can't use Periods as keys without having the QuantLib folks fix this, or doing terrible things to work around it (and really terrible things if Period is a C extension type, which seems likely since QuantLib is apparently a SWIG wrapper).
If the Period units behave properly, I'd recommend working with tuples of the paired counts and units most of the time, and only converting to Periods when you have need of a particular Period feature. So your dict would be:
deposits = {(1,ql.Weeks): 0.0023,
(1,ql.Months): 0.0032,
(3,ql.Months): 0.0045,
(6,ql.Months): 0.0056}
and your loop would be:
for n, unit in [(1,ql.Weeks),(1,ql.Months),(3,ql.Months),(6,ql.Months)]:
print deposits[n, unit]
If that still fails, then even the basic unit types are broken, and you just can't use them at all.
If the keys are ql.Periods, you need to look up using ql.Periods (unless Period is tuple subclass). You also need to use brackets for dict lookup, not parentheses.
If ql.Period is a namedtuple or the like, you can just do tuple lookup (lists can't be dict keys, because they're mutable):
for n, unit in [(1,ql.Weeks),(1,ql.Months),(3,ql.Months),(6,ql.Months)]:
print deposits[n, unit]
If ql.Period isn't a tuple subclass, you can do:
for n, unit in [(1,ql.Weeks),(1,ql.Months),(3,ql.Months),(6,ql.Months)]:
print deposits[ql.Period(n, unit)]
or to make the periods in the loop,
import itertools
for period in itertools.starmap(ql.Period, [(1,ql.Weeks),(1,ql.Months),(3,ql.Months),(6,ql.Months)]):
print deposits[period]

deposits is a dictionary with keys and values. The reference of a dictionary is
value = mydict[key]
Thus given n and unit you get that ql.Period(n, unit) returns a type of <class 'QuantLib.QuantLib.Period'>. The result of ql.period(1, ql.Weekly) for example would be 1W.
It would appear that if it is converted to a string, then it would be usable as a key.
deposits = {str(ql.Period(1,ql.Weeks)): 0.0023,
str(ql.Period(1,ql.Months)): 0.0032,
str(ql.Period(3,ql.Months)): 0.0045,
str(ql.Period(6,ql.Months)): 0.0056}
value = deposits[str(ql.Period(n, unit))]
print value

In addition to the syntax problems others have identified, my guess is that your ql.Period object is not hashable; the keys for dictionaries need to be hashable objects. Here's a direct copy-and-past from this answer, which explains the situation nicely.
>>> a = {}
>>> b = ['some', 'list']
>>> hash(b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list objects are unhashable
>>> a[b] = 'some'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list objects are unhashable
What happens when you try hash(ql.Period(1,ql.Weeks))? A similar TypeError? If you had control over QuantLib, you could possibly add a __hash__ method, so that they could be used in dictionaries. But I see that such a module exists on pypi, so I guess you're just using it rather than writing it.
You may still be able to monkey patch these objects to give them a __hash__ method:
# First define a function to add on as a method
def hash_method(period):
hash_value = # some code that produces a unique hash, based on
# the data contained in the `period` object
return hash_value
# Now, monkey patch the ql.Period object by giving it this method
ql.Period.__hash__ = hash_method

string[i:length] giving memory error

def suffix(stng):
list = []
length = len(stng)
for i in range(length):
x = stng[i:length] ## This gives a Memory Error..See below
list.append(x)
return list
This piece of code is a part of my solution of a problem on interviewstreet.com but when i submit it i get a Memory error...i want to know how to correct it?
This is the traceback:
Original exception was:
Traceback (most recent call last):
File "/run-1342184337-542152202/solution.py", line 35, in
listofsuffix=suffix(var)
File "/run-1342184337-542152202/solution.py", line 13, in suffix
x=stng[i:length]
MemoryError

A MemoryError means you have consumed all your RAM. You are creating a list containing all trailing parts of an original string. If your original string is too long, you will consume a lot of memory.
One possibility is to use a generator to produce the suffixes one at a time instead of creating a list of all of them:
def suffixes(stng):
for i in xrange(len(stng)):
yield stng[i:]
If the caller of suffixes simply iterates over the result, you don't even have to change the caller. If you truly needed an explicit list, then you'll need a different solution.

"I need to return a list" -- This is highly unlikely. You just need to return an object which looks enough like a list to make it work.
class FakeList(object):
def __init__(self,strng):
self.string=strng
self._idx=0
def __getitem__(self,i):
return self.strng[:i]
def __len__(self):
return len(self.string)
def __iter__(self):
return self
def __contains__(self,other):
return other in self.string
def next(self):
if(self._idx<len(self)):
self._idx+=1
return self[self._idx-1]
else:
raise StopIteration
a=FakeList("My String")
print a[3]
print a[4]
for i in a:
print i
This creates an object which you can access randomly and iterate over like a list. It also will allow you to call len(my_fake_list). It doesn't support slicing, and a myriad of other methods pop, append, extend ... Which of those you need to add depends on which ones you use.

python unittest failing assertion with overloaded repr

In my code, I've defined a class with its own repr method. The representation of the class should be a list.
def __repr__(self):
if self.front <= self.tail:
q = self._queue[self.front:self.tail+1]
elif self.front > self.tail:
q = self._queue[self.front:]
q.extend(self._queue[:self.tail + 1])
return (q.__repr__())
I've written the following unittest to test this method.
def test_enqueue(self):
q = BoundedQueue(1)
q.enqueue(1)
self.assertEqual(q, [1])
However, I end up with an assertion error:
Traceback (most recent call last):
File "test_internmatch_queue.py", line 13, in test_enqueue
self.assertEqual(q, [1])
AssertionError: [1] != [1]
I'm not sure what the problem is... to my human eyes, [1]==[1]! I've tried several other variations in my repr method (below), and they all returned errors as well.
return repr(q)
return str(q)

q is a BoundedQueue. [1] is a list. They can't be equal unless you override __eq__. repr is not used for equality testing.

As recursive states, __repr__ isn't used to detect whether two values are equal.
You have several options:
Define __eq__ which is what python calls to check equality. I don't recommend this as I don't really see a BoundedQueue as being equal to a list
Define an items() method which returns the list. Then check for equality against the list
Add a way to build a BoundedQueue from a list. Then write __eq__ to check for equality between two BoundedQueues.

Given a Python list, how to write a function that returns a given range of elements?

I have
def findfreq(nltktext, atitem)
fdistscan = FreqDist(nltktext)
distlist = fdistscan.keys()
return distlist[:atitem]
which relies on FreqDist from the NLTK package, and does not work. The problem seems to be the part of the function where I try to return only the first n items of the list, using the variable atitem. So I generalize this function like so
def giveup(listname, lowerbound, upperbound)
return listname[lowerbound:upperbound]
returning the usual error
>>> import bookroutines
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "bookroutines.py", line 70
def giveup(listname, lowerbound, upperbound)
^
SyntaxError: invalid syntax
but hopefully also an answer from some kind person whose Python is much more fluent than mine.

You need a colon (:) at the end of the def line.
def findfreq(nltktext, atitem):
fdistscan = FreqDist(nltktext)
distlist = fdistscan.keys()
return distlist[:atitem]
Python's function declaration syntax is:
def FuncName(Args):
# code

operator.itemgetter() will return a function that slices a sequence if you pass it a slice object.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Indexing for SeqRecords - python

Related

Name error: 'self' not defined - when calling a function to create in-class variables

Extract value from a dictionary using key that is a list

string[i:length] giving memory error

python unittest failing assertion with overloaded repr

Given a Python list, how to write a function that returns a given range of elements?

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Indexing for SeqRecords - python

Related

Name error: 'self' not defined - when calling a function to create in-class variables

Extract value from a dictionary using key that is a list

string[i:length] giving memory error

python unittest failing assertion with overloaded __repr__

Given a Python list, how to write a function that returns a given range of elements?

Categories

Resources

python unittest failing assertion with overloaded repr