How to override Python list(iterator) behaviour? - python

Running this:
class DontList(object):
def __getitem__(self, key):
print 'Getting item %s' % key
if key == 10: raise KeyError("You get the idea.")
return None
def __getattr__(self, name):
print 'Getting attr %s' % name
return None
list(DontList())
Produces this:
Getting attr __length_hint__
Getting item 0
Getting item 1
Getting item 2
Getting item 3
Getting item 4
Getting item 5
Getting item 6
Getting item 7
Getting item 8
Getting item 9
Getting item 10
Traceback (most recent call last):
File "list.py", line 11, in <module>
list(DontList())
File "list.py", line 4, in __getitem__
if key == 10: raise KeyError("You get the idea.")
KeyError: 'You get the idea.'
How can I change that so that I'll get [], while still allowing access to those keys [1] etc.?
(I've tried putting in def __length_hint__(self): return 0, but it doesn't help.)
My real use case: (for perusal if it'll be useful; feel free to ignore past this point)
After applying a certain patch to iniparse, I've found a nasty side-effect to my patch. Having __getattr__ set on my Undefined class, which returns a new Undefined object. Unfortunately, this means that list(iniconfig.invalid_section) (where isinstance(iniconfig, iniparse.INIConfig)) is doing this (put in simple prints in the __getattr__ and __getitem__):
Getting attr __length_hint__
Getting item 0
Getting item 1
Getting item 2
Getting item 3
Getting item 4
Et cetera ad infinitum.

If you want to override the iteration then just define the __iter__ method in your class

As #Sven says, that's the wrong error to raise. But that's not the point, the point is that this is broken because it's not something you should do: preventing __getattr__ from raising AttributeError means that you have overridden Python's default methodology for testing whether an object has an attribute and replaced it with a new one (ini_defined(foo.bar)).
But Python already has hasattr! Why not use that?
>>> class Foo:
... bar = None
...
>>> hasattr(Foo, "bar")
True
>>> hasattr(Foo, "baz")
False

Just raise IndexError instead of KeyError. KeyError is meant for mapping-like classes (e.g. dict), while IndexError is meant for sequences.
If you define the __getitem__() method on your class, Python will automatically generate an iterator from it. And the iterator terminates upon IndexError -- see PEP234.

Override how your class is iterated by implementing an __iter__() method. Iterator signal when they're finished by raising a StopIteration exception, which is part of the normal iterator protocol and not propagated further. Here's one way of applying that to your example class:
class DontList(object):
def __getitem__(self, key):
print 'Getting item %s' % key
if key == 10: raise KeyError("You get the idea.")
return None
def __iter__(self):
class iterator(object):
def __init__(self, obj):
self.obj = obj
self.index = -1
def __iter__(self):
return self
def next(self):
if self.index < 9:
self.index += 1
return self.obj[self.index]
else:
raise StopIteration
return iterator(self)
list(DontList())
print 'done'
# Getting item 0
# Getting item 1
# ...
# Getting item 8
# Getting item 9
# done

I think that using return iter([]) is the right way, but let's start thinking how list() works:
Get an element from __iter__; if receive a StopIrteration error stops..then get that element..
So you have just to yield an empty generator in __iter__, for example (x for x in xrange(0, 0)), or simply iter([]))

Related

Python 3.x - Run function before appending item to list

In Python 3.x, is it possible to run a function before an item gets appended to a list ?
I have a class which inherits from a list, with some additionnal custom functions. I would like a series of checks to be performed on the data of any element which gets added to this list. If an added element does not meet certain criteria, the list will raise an error.
class ListWithExtraFunctions(list):
def __beforeappend__(self):
... run some code ...
... perform checks ...
... raise error if checks fail ...
Define ListWithExtraFunctions.append and call super().append(value) if value passes all the checks:
class ListWithExtraFunctions(list):
def append(self, value):
if okay():
return super().append(value)
else:
raise NotOkay()
This option is very similar to the solution that Vaultah wrote. It only uses “try…except” that will allow you to handle exceptions in certain way.
class Nw_list(list):
def val_check(self, value):
# Accepts only integer
if type(value) == int:
return value
else:
# Any other input type will raise exception
raise ValueError
def append(self, value):
try:
# Try to append checked value
super().append(self.val_check(value))
except ValueError:
# If value error is raised prints msg
print("You can append only int values")

Any way to bypass namedtuple 255 arguments limitation?

I'm using a namedtuple to hold sets of strings and their corresponding values.
I'm not using a dictionary, because I want the strings accessible as attributes.
Here's my code:
from collections import namedtuple
# Shortened for readability :-)
strings = namedtuple("strings", ['a0', 'a1', 'a2', ..., 'a400'])
my_strings = strings(value0, value1, value2, ..., value400)
Ideally, once my_strings is initialized, I should be able to do this:
print(my_strings.a1)
and get value1 printed back.
However, I get the following error instead:
strings(value0, value1, value2, ...value400)
^SyntaxError: more than 255 arguments
It seems python functions (including namedtuple's init()), do not accept more than 255 arguments when called.
Is there any way to bypass this issue and have named tuples with more than 255 items? Why is there a 255 arguments limit anyway?
This is a limit to CPython function definitions; in versions before Python 3.7, you cannot specify more than 255 explicit arguments to a callable. This applies to any function definition, not just named tuples.
Note that this limit has been lifted in Python 3.7 and newer, where the new limit is sys.maxint. See What is a maximum number of arguments in a Python function?
It is the generated code for the class that is hitting this limit. You cannot define a function with more than 255 arguments; the __new__ class method of the resulting class is thus not achievable in the CPython implementation.
You'll have to ask yourself, however, if you really should be using a different structure instead. It looks like you have a list-like piece of data to me; 400 numbered names is a sure sign of your data bleeding into your names.
You can work around this by creating your own subclass, manually:
from operator import itemgetter
from collections import OrderedDict
class strings(tuple):
__slots__ = ()
_fields = tuple('a{}'.format(i) for i in range(400))
def __new__(cls, *args, **kwargs):
req = len(cls._fields)
if len(args) + len(kwargs) > req:
raise TypeError(
'__new__() takes {} positional arguments but {} were given'.format(
req, len(args) + len(kwargs)))
if kwargs.keys() > set(cls._fields):
raise TypeError(
'__new__() got an unexpected keyword argument {!r}'.format(
(kwargs.keys() - set(cls._fields)).pop()))
missing = req - len(args)
if kwargs.keys() & set(cls._fields[:-missing]):
raise TypeError(
'__new__() got multiple values for argument {!r}'.format(
(kwargs.keys() & set(cls._fields[:-missing])).pop()))
try:
for field in cls._fields[-missing:]:
args += (kwargs[field],)
missing -= 1
except KeyError:
pass
if len(args) < req:
raise TypeError('__new__() missing {} positional argument{}: {}'.format(
missing, 's' if missing > 1 else '',
' and '.join(filter(None, [', '.join(map(repr, cls._fields[-missing:-1])), repr(cls._fields[-1])]))))
return tuple.__new__(cls, args)
#classmethod
def _make(cls, iterable, new=tuple.__new__, len=len):
'Make a new strings object from a sequence or iterable'
result = new(cls, iterable)
if len(result) != len(cls._fields):
raise TypeError('Expected %d arguments, got %d' % (len(cls._fields), len(result)))
return result
def __repr__(self):
'Return a nicely formatted representation string'
format = '{}({})'.format(self.__class__.__name__, ', '.join('{}=%r'.format(n) for n in self._fields))
return format % self
def _asdict(self):
'Return a new OrderedDict which maps field names to their values'
return OrderedDict(zip(self._fields, self))
__dict__ = property(_asdict)
def _replace(self, **kwds):
'Return a new strings object replacing specified fields with new values'
result = self._make(map(kwds.pop, self._fields, self))
if kwds:
raise ValueError('Got unexpected field names: %r' % list(kwds))
return result
def __getnewargs__(self):
'Return self as a plain tuple. Used by copy and pickle.'
return tuple(self)
def __getstate__(self):
'Exclude the OrderedDict from pickling'
return None
for i, name in enumerate(strings._fields):
setattr(strings, name,
property(itemgetter(i), doc='Alias for field number {}'.format(i)))
This version of the named tuple avoids the long argument lists altogether, but otherwise behaves exactly like the original. The somewhat verbose __new__ method is not strictly needed but does closely emulate the original behaviour when arguments are incomplete. Note the construction of the _fields attribute; replace this with your own to name your tuple fields.
Pass in a generator expression to set your arguments:
s = strings(i for i in range(400))
or if you have a list of values:
s = strings(iter(list_of_values))
Either technique bypasses the limits on function signatures and function call argument counts.
Demo:
>>> s = strings(i for i in range(400))
>>> s
strings(a0=0, a1=1, a2=2, a3=3, a4=4, a5=5, a6=6, a7=7, a8=8, a9=9, a10=10, a11=11, a12=12, a13=13, a14=14, a15=15, a16=16, a17=17, a18=18, a19=19, a20=20, a21=21, a22=22, a23=23, a24=24, a25=25, a26=26, a27=27, a28=28, a29=29, a30=30, a31=31, a32=32, a33=33, a34=34, a35=35, a36=36, a37=37, a38=38, a39=39, a40=40, a41=41, a42=42, a43=43, a44=44, a45=45, a46=46, a47=47, a48=48, a49=49, a50=50, a51=51, a52=52, a53=53, a54=54, a55=55, a56=56, a57=57, a58=58, a59=59, a60=60, a61=61, a62=62, a63=63, a64=64, a65=65, a66=66, a67=67, a68=68, a69=69, a70=70, a71=71, a72=72, a73=73, a74=74, a75=75, a76=76, a77=77, a78=78, a79=79, a80=80, a81=81, a82=82, a83=83, a84=84, a85=85, a86=86, a87=87, a88=88, a89=89, a90=90, a91=91, a92=92, a93=93, a94=94, a95=95, a96=96, a97=97, a98=98, a99=99, a100=100, a101=101, a102=102, a103=103, a104=104, a105=105, a106=106, a107=107, a108=108, a109=109, a110=110, a111=111, a112=112, a113=113, a114=114, a115=115, a116=116, a117=117, a118=118, a119=119, a120=120, a121=121, a122=122, a123=123, a124=124, a125=125, a126=126, a127=127, a128=128, a129=129, a130=130, a131=131, a132=132, a133=133, a134=134, a135=135, a136=136, a137=137, a138=138, a139=139, a140=140, a141=141, a142=142, a143=143, a144=144, a145=145, a146=146, a147=147, a148=148, a149=149, a150=150, a151=151, a152=152, a153=153, a154=154, a155=155, a156=156, a157=157, a158=158, a159=159, a160=160, a161=161, a162=162, a163=163, a164=164, a165=165, a166=166, a167=167, a168=168, a169=169, a170=170, a171=171, a172=172, a173=173, a174=174, a175=175, a176=176, a177=177, a178=178, a179=179, a180=180, a181=181, a182=182, a183=183, a184=184, a185=185, a186=186, a187=187, a188=188, a189=189, a190=190, a191=191, a192=192, a193=193, a194=194, a195=195, a196=196, a197=197, a198=198, a199=199, a200=200, a201=201, a202=202, a203=203, a204=204, a205=205, a206=206, a207=207, a208=208, a209=209, a210=210, a211=211, a212=212, a213=213, a214=214, a215=215, a216=216, a217=217, a218=218, a219=219, a220=220, a221=221, a222=222, a223=223, a224=224, a225=225, a226=226, a227=227, a228=228, a229=229, a230=230, a231=231, a232=232, a233=233, a234=234, a235=235, a236=236, a237=237, a238=238, a239=239, a240=240, a241=241, a242=242, a243=243, a244=244, a245=245, a246=246, a247=247, a248=248, a249=249, a250=250, a251=251, a252=252, a253=253, a254=254, a255=255, a256=256, a257=257, a258=258, a259=259, a260=260, a261=261, a262=262, a263=263, a264=264, a265=265, a266=266, a267=267, a268=268, a269=269, a270=270, a271=271, a272=272, a273=273, a274=274, a275=275, a276=276, a277=277, a278=278, a279=279, a280=280, a281=281, a282=282, a283=283, a284=284, a285=285, a286=286, a287=287, a288=288, a289=289, a290=290, a291=291, a292=292, a293=293, a294=294, a295=295, a296=296, a297=297, a298=298, a299=299, a300=300, a301=301, a302=302, a303=303, a304=304, a305=305, a306=306, a307=307, a308=308, a309=309, a310=310, a311=311, a312=312, a313=313, a314=314, a315=315, a316=316, a317=317, a318=318, a319=319, a320=320, a321=321, a322=322, a323=323, a324=324, a325=325, a326=326, a327=327, a328=328, a329=329, a330=330, a331=331, a332=332, a333=333, a334=334, a335=335, a336=336, a337=337, a338=338, a339=339, a340=340, a341=341, a342=342, a343=343, a344=344, a345=345, a346=346, a347=347, a348=348, a349=349, a350=350, a351=351, a352=352, a353=353, a354=354, a355=355, a356=356, a357=357, a358=358, a359=359, a360=360, a361=361, a362=362, a363=363, a364=364, a365=365, a366=366, a367=367, a368=368, a369=369, a370=370, a371=371, a372=372, a373=373, a374=374, a375=375, a376=376, a377=377, a378=378, a379=379, a380=380, a381=381, a382=382, a383=383, a384=384, a385=385, a386=386, a387=387, a388=388, a389=389, a390=390, a391=391, a392=392, a393=393, a394=394, a395=395, a396=396, a397=397, a398=398, a399=399)
>>> s.a391
391
namedtuple out of the box doesn't support what you are trying to do.
So the following might achieve the goal, which might change from 400 to 450 arguments, or lesser and saner.
def customtuple(*keys):
class string:
_keys = keys
_dict = {}
def __init__(self, *args):
args = list(args)
if len(args) != len(self._keys):
raise Exception("No go forward")
for key in range(len(args)):
self._dict[self._keys[key]] = args[key]
def __setattr__(self, *args):
raise BaseException("Not allowed")
def __getattr__(self, arg):
try:
return self._dict[arg]
except:
raise BaseException("Name not defined")
def __repr__(self):
return ("string(%s)"
%(", ".join(["%s=%r"
%(self._keys[key],
self._dict[self._keys[key]])
for key in range(len(self._dict))])))
return string
>>> strings = customtuple(*['a'+str(x) for x in range(1, 401)])
>>> s = strings(*['a'+str(x) for x in range(2, 402)])
>>> s.a1
'a2'
>>> s.a1 = 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/hus787/p.py", line 15, in __setattr__
def __setattr__(self, *args):
BaseException: Not allowed
For more light on the subject.
Here is my version of a replacement for namedtuple that supports more than 255 arguments. The idea was not to be functionally equivalent but rather to improve on some aspects (in my opinion). This is for Python 3.4+ only:
class SequenceAttrReader(object):
""" Class to function similar to collections.namedtuple but allowing more than 255 keys.
Initialize with attribute string (space separated), then load in data via a sequence, then access the list keys as properties
i.e.
csv_line = SequenceAttrReader('a b c')
csv_line = csv_line.load([1, 2, 3])
print(csv_line.b)
>> 2
"""
_attr_string = None
_attr_list = []
_data_list = []
def __init__(self, attr_string):
if not attr_string:
raise AttributeError('SequenceAttrReader not properly initialized, please use a non-empty string')
self._attr_string = attr_string
self._attr_list = attr_string.split(' ')
def __getattr__(self, name):
if not self._attr_string or not self._attr_list or not self._data_list:
raise AttributeError('SequenceAttrReader not properly initialized or loaded')
try:
index = self._attr_list.index(name)
except ValueError:
raise AttributeError("'{name}'' not in attribute string".format(name=name)) from None
try:
value = self._data_list[index]
except IndexError:
raise AttributeError("No attribute named '{name}'' in".format(name=name)) from None
return value
def __str__(self):
return str(self._data_list)
def __repr__(self):
return 'SequenceAttrReader("{attr_string}")'.format(attr_string=self._attr_string)
def load_data(self, data_list):
if not self._attr_list:
raise AttributeError('SequenceAttrReader not properly initialized')
if not data_list:
raise AttributeError('SequenceAttrReader needs to load a non-empty sequence')
self._data_list = data_list
This is probably not the most efficient way if you are doing a lot of individual lookups, converting it internally to a dict may be better. I'll work on an optimized version once I have more time or at least see what the performance difference is.

Multiprocessing in Python: impossible to get back my results (get()) (happens rarely)

I use Multiprocessing in Python in order to do several requests to a database (and other stuff):
po = multiprocessing.Pool()
for element in setOfElements:
results.append(po.apply_async(myDBRequestModule, (element, other stuff...)))
po.close()
po.join()
for r in results:
newSet.add(r.get())
myDBRequestModule returns an object I defined, made of a list and two numbers. I redefined the hash function, in order to define what I mean by equality in my sets of these objects:
class myObject:
def __init__(self, aList, aNumber, anotherNumber):
self.list = aList
self.number1 = aNumber
self.number2 = anotherNumber
def __hash__(self):
# turn elements of list into a string, in order to hash the string
hash_text = ""
for element in self.list:
hash_text += str(element.x.id) # I use the ID of the element of my list...
return hash(hash_text)
def __eq__(self, other):
self_hash_text = ""
other_hash_text = ""
for element in self.list:
self_hash_text += str(element.x.id)
for element in other.listDest:
other_hash_text += str(element.x.id)
return self_hash_text == other_hash_text
And in most cases it works as it should. Twice, for no known reason and in exactly the same context, I had a bug:
newSet.add(r.get())
File "/usr/lib/python2.6/multiprocessing/pool.py", line 422, in get
raise self._value
TypeError: 'str' object does not support item assignment
It comes from the get method (last line):
def get(self, timeout=None):
self.wait(timeout)
if not self._ready:
raise TimeoutError
if self._success:
return self._value
else:
raise self._value
Since I had this mistake only once and it disappeared, I decided to give up earlier, but it created a second problem recently, and I really don't know how to fight this bug.
In particular, it's difficult for me to tell why it happens almost never, and usually works perfectly fine.
multiprocessing is not the issue here.
You have not given us the right code to diagnose the issue. At some point you have assigned a caught exception to self._value. That is where the error is occurring. Look at everywhere that self._value is assigned and you will be on your way to finding this error.

List callbacks?

Is there any way to make a list call a function every time the list is modified?
For example:
>>>l = [1, 2, 3]
>>>def callback():
print "list changed"
>>>apply_callback(l, callback) # Possible?
>>>l.append(4)
list changed
>>>l[0] = 5
list changed
>>>l.pop(0)
list changed
5
Borrowing from the suggestion by #sr2222, here's my attempt. (I'll use a decorator without the syntactic sugar):
import sys
_pyversion = sys.version_info[0]
def callback_method(func):
def notify(self,*args,**kwargs):
for _,callback in self._callbacks:
callback()
return func(self,*args,**kwargs)
return notify
class NotifyList(list):
extend = callback_method(list.extend)
append = callback_method(list.append)
remove = callback_method(list.remove)
pop = callback_method(list.pop)
__delitem__ = callback_method(list.__delitem__)
__setitem__ = callback_method(list.__setitem__)
__iadd__ = callback_method(list.__iadd__)
__imul__ = callback_method(list.__imul__)
#Take care to return a new NotifyList if we slice it.
if _pyversion < 3:
__setslice__ = callback_method(list.__setslice__)
__delslice__ = callback_method(list.__delslice__)
def __getslice__(self,*args):
return self.__class__(list.__getslice__(self,*args))
def __getitem__(self,item):
if isinstance(item,slice):
return self.__class__(list.__getitem__(self,item))
else:
return list.__getitem__(self,item)
def __init__(self,*args):
list.__init__(self,*args)
self._callbacks = []
self._callback_cntr = 0
def register_callback(self,cb):
self._callbacks.append((self._callback_cntr,cb))
self._callback_cntr += 1
return self._callback_cntr - 1
def unregister_callback(self,cbid):
for idx,(i,cb) in enumerate(self._callbacks):
if i == cbid:
self._callbacks.pop(idx)
return cb
else:
return None
if __name__ == '__main__':
A = NotifyList(range(10))
def cb():
print ("Modify!")
#register a callback
cbid = A.register_callback(cb)
A.append('Foo')
A += [1,2,3]
A *= 3
A[1:2] = [5]
del A[1:2]
#Add another callback. They'll be called in order (oldest first)
def cb2():
print ("Modify2")
A.register_callback(cb2)
print ("-"*80)
A[5] = 'baz'
print ("-"*80)
#unregister the first callback
A.unregister_callback(cbid)
A[5] = 'qux'
print ("-"*80)
print (A)
print (type(A[1:3]))
print (type(A[1:3:2]))
print (type(A[5]))
The great thing about this is if you realize you forgot to consider a particular method, it's just 1 line of code to add it. (For example, I forgot __iadd__ and __imul__ until just now :)
EDIT
I've updated the code slightly to be py2k and py3k compatible. Additionally, slicing creates a new object of the same type as the parent. Please feel free to continue poking holes in this recipe so I can make it better. This actually seems like a pretty neat thing to have on hand ...
You'd have to subclass list and modify __setitem__.
class NotifyingList(list):
def __init__(self, *args, **kwargs):
self.on_change_callbacks = []
def __setitem__(self, index, value):
for callback in self.on_change_callbacks:
callback(self, index, value)
super(NotifyingList, self).__setitem__(name, index)
notifying_list = NotifyingList()
def print_change(list_, index, value):
print 'Changing index %d to %s' % (index, value)
notifying_list.on_change_callbacks.append(print_change)
As noted in comments, it's more than just __setitem__.
You might even be better served by building an object that implements the list interface and dynamically adds and removes descriptors to and from itself in place of the normal list machinery. Then you can reduce your callback calls to just the descriptor's __get__, __set__, and __delete__.
I'm almost certain this can't be done with the standard list.
I think the cleanest way would be to write your own class to do this (perhaps inheriting from list).

Can't iterate over a list class in Python

I'm trying to write a simple GUI front end for Plurk using pyplurk.
I have successfully got it to create the API connection, log in, and retrieve and display a list of friends. Now I'm trying to retrieve and display a list of Plurks.
pyplurk provides a GetNewPlurks function as follows:
def GetNewPlurks(self, since):
'''Get new plurks since the specified time.
Args:
since: [datetime.datetime] the timestamp criterion.
Returns:
A PlurkPostList object or None.
'''
offset = jsonizer.conv_datetime(since)
status_code, result = self._CallAPI('/Polling/getPlurks', offset=offset)
return None if status_code != 200 else \
PlurkPostList(result['plurks'], result['plurk_users'].values())
As you can see this returns a PlurkPostList, which in turn is defined as follows:
class PlurkPostList:
'''A list of plurks and the set of users that posted them.'''
def __init__(self, plurk_json_list, user_json_list=[]):
self._plurks = [PlurkPost(p) for p in plurk_json_list]
self._users = [PlurkUser(u) for u in user_json_list]
def __iter__(self):
return self._plurks
def GetUsers(self):
return self._users
def __eq__(self, other):
if other.__class__ != PlurkPostList: return False
if self._plurks != other._plurks: return False
if self._users != other._users: return False
return True
Now I expected to be able to do something like this:
api = plurk_api_urllib2.PlurkAPI(open('api.key').read().strip(), debug_level=1)
plurkproxy = PlurkProxy(api, json.loads)
user = plurkproxy.Login('my_user', 'my_pass')
ps = plurkproxy.GetNewPlurks(datetime.datetime(2009, 12, 12, 0, 0, 0))
print ps
for p in ps:
print str(p)
When I run this, what I actually get is:
<plurk.PlurkPostList instance at 0x01E8D738>
from the "print ps", then:
for p in ps:
TypeError: __iter__ returned non-iterator of type 'list'
I don't understand - surely a list is iterable? Where am I going wrong - how do I access the Plurks in the PlurkPostList?
When you define your own __iter__ method, you should realize that that __iter__ method should return an iterator, not an iterable. You are returning a list, not an iterator to a list, so it fails. You can fix it by doing return iter(self._plurks), for example.
If you wanted to do something a little more complex, like process each item in self._plurks as it's being iterated over, the usual trick is to make your __iter__ method be a generator. That way, the returnvalue of the call to __iter__ is the generator, which is an iterator:
def __iter__(self):
for item in self._plurks:
yield process(item)
The __iter__ method should return an object which implements the next() method.
A list does not have a next() method, but it has an __iter__ method, which returns a listiterator object. The listiterator object has a next() method.
You should write:
def __iter__(self):
return iter(self._plurks)
As an alternative, you can also define the next() function and have __iter__() return self. See Build a Basic Python Iterator for a nice example.

Categories