Have Python iterators got a has_next method?
There's an alternative to the StopIteration by using next(iterator, default_value).
For exapmle:
>>> a = iter('hi')
>>> print next(a, None)
h
>>> print next(a, None)
i
>>> print next(a, None)
None
So you can detect for None or other pre-specified value for end of the iterator if you don't want the exception way.
No, there is no such method. The end of iteration is indicated by an exception. See the documentation.
If you really need a has-next functionality, it's easy to obtain it with a little wrapper class. For example:
class hn_wrapper(object):
def __init__(self, it):
self.it = iter(it)
self._hasnext = None
def __iter__(self): return self
def next(self):
if self._hasnext:
result = self._thenext
else:
result = next(self.it)
self._hasnext = None
return result
def hasnext(self):
if self._hasnext is None:
try: self._thenext = next(self.it)
except StopIteration: self._hasnext = False
else: self._hasnext = True
return self._hasnext
now something like
x = hn_wrapper('ciao')
while x.hasnext(): print next(x)
emits
c
i
a
o
as required.
Note that the use of next(sel.it) as a built-in requires Python 2.6 or better; if you're using an older version of Python, use self.it.next() instead (and similarly for next(x) in the example usage). [[You might reasonably think this note is redundant, since Python 2.6 has been around for over a year now -- but more often than not when I use Python 2.6 features in a response, some commenter or other feels duty-bound to point out that they are 2.6 features, thus I'm trying to forestall such comments for once;-)]]
===
For Python3, you would make the following changes:
from collections.abc import Iterator # since python 3.3 Iterator is here
class hn_wrapper(Iterator): # need to subclass Iterator rather than object
def __init__(self, it):
self.it = iter(it)
self._hasnext = None
def __iter__(self):
return self
def __next__(self): # __next__ vs next in python 2
if self._hasnext:
result = self._thenext
else:
result = next(self.it)
self._hasnext = None
return result
def hasnext(self):
if self._hasnext is None:
try:
self._thenext = next(self.it)
except StopIteration:
self._hasnext = False
else: self._hasnext = True
return self._hasnext
In addition to all the mentions of StopIteration, the Python "for" loop simply does what you want:
>>> it = iter("hello")
>>> for i in it:
... print i
...
h
e
l
l
o
Try the __length_hint__() method from any iterator object:
iter(...).__length_hint__() > 0
You can tee the iterator using, itertools.tee, and check for StopIteration on the teed iterator.
hasNext somewhat translates to the StopIteration exception, e.g.:
>>> it = iter("hello")
>>> it.next()
'h'
>>> it.next()
'e'
>>> it.next()
'l'
>>> it.next()
'l'
>>> it.next()
'o'
>>> it.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
StopIteration docs: http://docs.python.org/library/exceptions.html#exceptions.StopIteration
Some article about iterators and generator in python: http://www.ibm.com/developerworks/library/l-pycon.html
No. The most similar concept is most likely a StopIteration exception.
I believe python just has next() and according to the doc, it throws an exception is there are no more elements.
http://docs.python.org/library/stdtypes.html#iterator-types
The use case that lead me to search for this is the following
def setfrom(self,f):
"""Set from iterable f"""
fi = iter(f)
for i in range(self.n):
try:
x = next(fi)
except StopIteration:
fi = iter(f)
x = next(fi)
self.a[i] = x
where hasnext() is available, one could do
def setfrom(self,f):
"""Set from iterable f"""
fi = iter(f)
for i in range(self.n):
if not hasnext(fi):
fi = iter(f) # restart
self.a[i] = next(fi)
which to me is cleaner. Obviously you can work around issues by defining utility classes, but what then happens is you have a proliferation of twenty-odd different almost-equivalent workarounds each with their quirks, and if you wish to reuse code that uses different workarounds, you have to either have multiple near-equivalent in your single application, or go around picking through and rewriting code to use the same approach. The 'do it once and do it well' maxim fails badly.
Furthermore, the iterator itself needs to have an internal 'hasnext' check to run to see if it needs to raise an exception. This internal check is then hidden so that it needs to be tested by trying to get an item, catching the exception and running the handler if thrown. This is unnecessary hiding IMO.
Maybe it's just me, but while I like https://stackoverflow.com/users/95810/alex-martelli 's answer, I find this a bit easier to read:
from collections.abc import Iterator # since python 3.3 Iterator is here
class MyIterator(Iterator): # need to subclass Iterator rather than object
def __init__(self, it):
self._iter = iter(it)
self._sentinel = object()
self._next = next(self._iter, self._sentinel)
def __iter__(self):
return self
def __next__(self): # __next__ vs next in python 2
if not self.has_next():
next(self._iter) # raises StopIteration
val = self._next
self._next = next(self._iter, self._sentinel)
return val
def has_next(self):
return self._next is not self._sentinel
No, there is no such method. The end of iteration is indicated by a StopIteration (more on that here).
This follows the python principle EAFP (easier to ask for forgiveness than permission). A has_next method would follow the principle of LBYL (look before you leap) and contradicts this core python principle.
This interesting article explains the two concepts in more detail.
Suggested way is StopIteration.
Please see Fibonacci example from tutorialspoint
#!usr/bin/python3
import sys
def fibonacci(n): #generator function
a, b, counter = 0, 1, 0
while True:
if (counter > n):
return
yield a
a, b = b, a + b
counter += 1
f = fibonacci(5) #f is iterator object
while True:
try:
print (next(f), end=" ")
except StopIteration:
sys.exit()
It is also possible to implement a helper generator that wraps any iterator and answers question if it has next value:
Try it online!
def has_next(it):
first = True
for e in it:
if not first:
yield True, prev
else:
first = False
prev = e
if not first:
yield False, prev
for has_next_, e in has_next(range(4)):
print(has_next_, e)
Which outputs:
True 0
True 1
True 2
False 3
The main and probably only drawback of this method is that it reads ahead one more element, for most of tasks it is totally alright, but for some tasks it may be disallowed, especially if user of has_next() is not aware of this read-ahead logic and may missuse it.
Code above works for infinite iterators too.
Actually for all cases that I ever programmed such kind of has_next() was totally enough and didn't cause any problems and in fact was very helpful. You just have to be aware of its read-ahead logic.
The way has solved it based on handling the "StopIteration" execption is pretty straightforward in order to read all iterations :
end_cursor = False
while not end_cursor:
try:
print(cursor.next())
except StopIteration:
print('end loop')
end_cursor = True
except:
print('other exceptions to manage')
end_cursor = True
I think there are valid use cases for when you may want some sort of has_next functionality, in which case you should decorate an iterator with a has_next defined.
Combining concepts from the answers to this question here is my implementation of that which feels like a nice concise solution to me (python 3.9):
_EMPTY_BUF = object()
class BufferedIterator(Iterator[_T]):
def __init__(self, real_it: Iterator[_T]):
self._real_it = real_it
self._buf = next(self._real_it, _EMPTY_BUF)
def has_next(self):
return self._buf is not _EMPTY_BUF
def __next__(self) -> _T_co:
v = self._buf
self._buf = next(self._real_it, _EMPTY_BUF)
if v is _EMPTY_BUF:
raise StopIteration()
return v
The main difference is that has_next is just a boolean expression, and also handles iterators with None values.
Added this to a gist here with tests and example usage.
With 'for' one can implement his own version of 'next' avoiding exception
def my_next(it):
for x in it:
return x
return None
very interesting question, but this "hasnext" design had been put into leetcode:
https://leetcode.com/problems/iterator-for-combination/
here is my implementation:
class CombinationIterator:
def __init__(self, characters: str, combinationLength: int):
from itertools import combinations
from collections import deque
self.iter = combinations(characters, combinationLength)
self.res = deque()
def next(self) -> str:
if len(self.res) == 0:
return ''.join(next(self.iter))
else:
return ''.join(self.res.pop())
def hasNext(self) -> bool:
try:
self.res.insert(0, next(self.iter))
return True
except:
return len(self.res) > 0
The way I solved my problem is to keep the count of the number of objects iterated over, so far. I wanted to iterate over a set using calls to an instance method. Since I knew the length of the set, and the number of items counted so far, I effectively had an hasNext method.
A simple version of my code:
class Iterator:
# s is a string, say
def __init__(self, s):
self.s = set(list(s))
self.done = False
self.iter = iter(s)
self.charCount = 0
def next(self):
if self.done:
return None
self.char = next(self.iter)
self.charCount += 1
self.done = (self.charCount < len(self.s))
return self.char
def hasMore(self):
return not self.done
Of course, the example is a toy one, but you get the idea. This won't work in cases where there is no way to get the length of the iterable, like a generator etc.
Related
I'm receiving an unknown number of records for background processing from generators. If there is a more important job, I have to stop to release the process.
The main process is best described as:
def main():
generator_source = generator_for_test_data() # 1. contact server to get data.
uw = UploadWrapper(generator_source) # 2. wrap the data.
while not interrupt(): # 3. check for interrupts.
row = next(uw)
if row is None:
return
print(long_running_job(row)) # 4. do the work.
Is there a way to get to __next__ without having to plug __iter__?
Having two steps - (1) make an iterator, then (2) iterate over it, just seems clumsy.
There are many cases where I'd prefer to submit a function to a function manager (mapreduce style), but in this case I need an instantiated class with some settings. Registering a single function can therefor only work if that function alone is __next__
class UploadWrapper(object):
def __init__(self, generator):
self.generator = generator
self._iterator = None
def __iter__(self):
for page in self.generator:
yield from page.data
def __next__(self):
if self._iterator is None: # ugly bit.
self._iterator = self.__iter__() #
try:
return next(self._iterator)
except StopIteration:
return None
Q: Is there a simpler way?
Working sample added for completeness:
import time
import random
class Page(object):
def __init__(self, data):
self.data = data
def generator_for_test_data():
for t in range(10):
page = Page(data=[(t, i) for i in range(100, 110)])
yield page
def long_running_job(row):
time.sleep(random.randint(1,10)/100)
assert len(row) == 2
assert row[0] in range(10)
assert row[1] in range(100, 110)
return row
def interrupt(): # interrupt check
if random.randint(1,50) == 1:
print("INTERRUPT SIGNAL!")
return True
return False
class UploadWrapper(object):
def __init__(self, generator):
self.generator = generator
self._iterator = None
def __iter__(self):
for ft in self.generator:
yield from ft.data
def __next__(self):
if self._iterator is None:
self._iterator = self.__iter__()
try:
return next(self._iterator)
except StopIteration:
return None
def main():
gen = generator_for_test_data()
uw = UploadWrapper(gen)
while not interrupt(): # check for job interrupt.
row = next(uw)
if row is None:
return
print(long_running_job(row))
if __name__ == "__main__":
main()
Your UploadWrapper seems overtly complex, there is more than a single simpler solution.
My first thought is to ditch the class altogether and just use a function instead:
def uploadwrapper(page_gen):
for page in page_gen:
yield from page.data
Just replace uw = UploadWrapper(gen) with uw = uploadwrapper(gen), and that'll work.
If you insist on the class, you can just get rid of the __next__() and replace uw = UploadWrapper(gen) with uw = iter(UploadWrapper(gen)), and it'll work.
In either case, you must also catch the StopIteration in the caller. __next__() is supposed to raise StopIteration when it's done, and not return None, like yours does. Otherwise, it won't work with things expecting a well-behaving iterator, eg. for loops.
I think you might have some misconceptions about how it all is supposed to fit together, so I'll try my best to explain how it's supposed to work, to the best of my knowledge:
The point of __iter__() is that if you have eg. a list, you can get multiple independent iterators by calling iter(). When you have a for loop, you're essentially first getting an iterator with iter() and then calling next() on it on every loop iteration. If you have two nested loops that use the same list, the iterators and their positions are still separate so there's no conflict. __iter__() is supposed to return an iterator for the container it's on, or if it's called on an iterator, it's supposed to just return self. In that sense, it's kind of wrong for UploadWrapper not to return self in __iter__(), since it wraps a generator and so can't really give independent iterators. As for why leaving out __next__() works, it's because when you define a generator (ie. use yield in a function), the generator has an __iter__() (that returns self, as it should) and __next__() that does what you'd expect. In your original code, you're not really using __iter__() at all for what it's supposed to be used: the code works even if you rename it to something else! This is because you never call iter() on the instance, and just directly call next().
If you wanted to do it "properly" as a class, I think something like this might suffice:
class UploadWrapper(object):
def __init__(self, generator):
self.generator = generator
self.subgen = iter(next(generator).data)
def __iter__(self):
return self
def __next__(self):
while True:
try:
return next(self.subgen)
except StopIteration:
self.subgen = iter(next(self.generator).data)
In Clojure I can do something like this:
(-> path
clojure.java.io/resource
slurp
read-string)
instead of doing this:
(read-string (slurp (clojure.java.io/resource path)))
This is called threading in Clojure terminology and helps getting rid of a lot of parentheses.
In Python if I try to use functional constructs like map, any, or filter I have to nest them to each other. Is there a construct in Python with which I can do something similar to threading (or piping) in Clojure?
I'm not looking for a fully featured version since there are no macros in Python, I just want to do away with a lot of parentheses when I'm doing functional programming in Python.
Edit: I ended up using toolz which supports pipeing.
Here is a simple implementation of #deceze's idea (although, as #Carcigenicate points out, it is at best a partial solution):
import functools
def apply(x,f): return f(x)
def thread(*args):
return functools.reduce(apply,args)
For example:
def f(x): return 2*x+1
def g(x): return x**2
thread(5,f,g) #evaluates to 121
I wanted to take this to the extreme and do it all dynamically.
Basically, the below Chain class lets you chain functions together similar to Clojure's -> and ->> macros. It supports both threading into the first and last arguments.
Functions are resolved in this order:
Object method
Local defined variable
Built-in variable
The code:
class Chain(object):
def __init__(self, value, index=0):
self.value = value
self.index = index
def __getattr__(self, item):
append_arg = True
try:
prop = getattr(self.value, item)
append_arg = False
except AttributeError:
try:
prop = locals()[item]
except KeyError:
prop = getattr(__builtins__, item)
if callable(prop):
def fn(*args, **kwargs):
orig = list(args)
if append_arg:
if self.index == -1:
orig.append(self.value)
else:
orig.insert(self.index, self.value)
return Chain(prop(*orig, **kwargs), index=self.index)
return fn
else:
return Chain(prop, index=self.index)
Thread each result as first arg
file = Chain(__file__).open('r').readlines().value
Thread each result as last arg
result = Chain(range(0, 100), index=-1).map(lambda x: x * x).reduce(lambda x, y: x + y).value
Here is a pattern I often use:
last_value = None
while <some_condition>:
<get current_value from somewhere>
if last_value != current_value:
<do something>
last_value = current_value
One application example would be to print headings in a report when, say, a person's last name changes.
The whole last_value/current_value thing has always seemed clumsy to me. Is there a better way to code this in Python?
I agree that your pattern makes a lot of sense.
But for fun, you could do something like:
class ValueCache(object):
def __init__(self, val=None):
self.val = val
def update(self, new):
if self.val == new:
return False
else:
self.val = new
return True
Then your loop would look like:
val = ValueCache()
while <some_condition>:
if val.update(<get current_value from somewhere>):
<do something>
For example
import time
t = ValueCache()
while True:
if t.update(time.time()):
print("Cache Updated!")
If you changed time.time() to some static object like "Foo", you'd see that "Cache Updated!" would only appear once (when it is initially set from None to "Foo").
Obligatory realistic programmer's note: Don't do this. I can't easily find a good reason to do this in practice. It not only adds to the line count but to the complexity.
(Inspired by Alex Martelli's Assign and Test Recipe)
I think the pattern is very clear, but you can use a generator function to hide the last_value/current_value thing.
def value_change_iterator(iterable):
last_x = None
for x in iterable:
if x != last_x:
yield x
last_x = x
for x in value_change_iterator([1, 1, 2, 2, 3, 3, 4]):
print(x)
prints
1
2
3
4
Another alternative inspired by #jedwards' answer inspired by Alex Martelli's recipe (this one keeps around the current and last values, and lets you use None as an initial value if you're so inclined, also changes the semantics from semantics I don't particularly like to other semantics I'm not sure I much like either):
class undefined:
pass
class ValueCache:
def __init__(self, value=undefined):
self.current_value = value
self.last_value = undefined
self._is_changed = False
#property
def is_changed(self):
is_changed = self._is_changed
self._is_changed = False
return is_changed
def update(self, new_value):
self._is_changed = (new_value != self.current_value)
if self._is_changed:
self.last_value = self.current_value
self.current_value = new_value
Example:
>>> v = ValueCache()
>>> v.update(1)
>>> v.is_changed
True
>>> v.is_changed is False
False
>>> v.update(2)
>>> v.is_changed
True
>>> v.is_changed
False
Or in your case:
t = ValueCache()
while True:
t.update(time.time())
if t.is_changed:
print("Cache updated!")
Same obligatory realistic programmer's note applies.
Is there any way to make a list call a function every time the list is modified?
For example:
>>>l = [1, 2, 3]
>>>def callback():
print "list changed"
>>>apply_callback(l, callback) # Possible?
>>>l.append(4)
list changed
>>>l[0] = 5
list changed
>>>l.pop(0)
list changed
5
Borrowing from the suggestion by #sr2222, here's my attempt. (I'll use a decorator without the syntactic sugar):
import sys
_pyversion = sys.version_info[0]
def callback_method(func):
def notify(self,*args,**kwargs):
for _,callback in self._callbacks:
callback()
return func(self,*args,**kwargs)
return notify
class NotifyList(list):
extend = callback_method(list.extend)
append = callback_method(list.append)
remove = callback_method(list.remove)
pop = callback_method(list.pop)
__delitem__ = callback_method(list.__delitem__)
__setitem__ = callback_method(list.__setitem__)
__iadd__ = callback_method(list.__iadd__)
__imul__ = callback_method(list.__imul__)
#Take care to return a new NotifyList if we slice it.
if _pyversion < 3:
__setslice__ = callback_method(list.__setslice__)
__delslice__ = callback_method(list.__delslice__)
def __getslice__(self,*args):
return self.__class__(list.__getslice__(self,*args))
def __getitem__(self,item):
if isinstance(item,slice):
return self.__class__(list.__getitem__(self,item))
else:
return list.__getitem__(self,item)
def __init__(self,*args):
list.__init__(self,*args)
self._callbacks = []
self._callback_cntr = 0
def register_callback(self,cb):
self._callbacks.append((self._callback_cntr,cb))
self._callback_cntr += 1
return self._callback_cntr - 1
def unregister_callback(self,cbid):
for idx,(i,cb) in enumerate(self._callbacks):
if i == cbid:
self._callbacks.pop(idx)
return cb
else:
return None
if __name__ == '__main__':
A = NotifyList(range(10))
def cb():
print ("Modify!")
#register a callback
cbid = A.register_callback(cb)
A.append('Foo')
A += [1,2,3]
A *= 3
A[1:2] = [5]
del A[1:2]
#Add another callback. They'll be called in order (oldest first)
def cb2():
print ("Modify2")
A.register_callback(cb2)
print ("-"*80)
A[5] = 'baz'
print ("-"*80)
#unregister the first callback
A.unregister_callback(cbid)
A[5] = 'qux'
print ("-"*80)
print (A)
print (type(A[1:3]))
print (type(A[1:3:2]))
print (type(A[5]))
The great thing about this is if you realize you forgot to consider a particular method, it's just 1 line of code to add it. (For example, I forgot __iadd__ and __imul__ until just now :)
EDIT
I've updated the code slightly to be py2k and py3k compatible. Additionally, slicing creates a new object of the same type as the parent. Please feel free to continue poking holes in this recipe so I can make it better. This actually seems like a pretty neat thing to have on hand ...
You'd have to subclass list and modify __setitem__.
class NotifyingList(list):
def __init__(self, *args, **kwargs):
self.on_change_callbacks = []
def __setitem__(self, index, value):
for callback in self.on_change_callbacks:
callback(self, index, value)
super(NotifyingList, self).__setitem__(name, index)
notifying_list = NotifyingList()
def print_change(list_, index, value):
print 'Changing index %d to %s' % (index, value)
notifying_list.on_change_callbacks.append(print_change)
As noted in comments, it's more than just __setitem__.
You might even be better served by building an object that implements the list interface and dynamically adds and removes descriptors to and from itself in place of the normal list machinery. Then you can reduce your callback calls to just the descriptor's __get__, __set__, and __delete__.
I'm almost certain this can't be done with the standard list.
I think the cleanest way would be to write your own class to do this (perhaps inheriting from list).
there is a check I need to perform after each subsequent step in a function, so I wanted to define that step as a function within a function.
>>> def gs(a,b):
... def ry():
... if a==b:
... return a
...
... ry()
...
... a += 1
... ry()
...
... b*=2
... ry()
...
>>> gs(1,2) # should return 2
>>> gs(1,1) # should return 1
>>> gs(5,3) # should return 6
>>> gs(2,3) # should return 3
so how do I get gs to return 'a' from within ry? I thought of using super but think that's only for classes.
Thanks
There's been a little confusion... I only want to return a if a==b. if a!=b, then I don't want gs to return anything yet.
edit: I now think decorators might be the best solution.
Do you mean?
def gs(a,b):
def ry():
if a==b:
return a
return ry()
As you mention "steps" in a function, it almost seems like you want a generator:
def gs(a,b):
def ry():
if a==b:
yield a
# If a != b, ry does not "generate" any output
for i in ry():
yield i
# Continue doing stuff...
yield 'some other value'
# Do more stuff.
yield 'yet another value'
(Generators can now also act as coroutines, since Python 2.5, using the new yield syntax.)
This should allow you to keep checking the state and return from the outer function if a and b ever end up the same:
def gs(a,b):
class SameEvent(Exception):
pass
def ry():
if a==b:
raise SameEvent(a)
try:
# Do stuff here, and call ry whenever you want to return if they are the same.
ry()
# It will now return 3.
a = b = 3
ry()
except SameEvent as e:
return e.args[0]
There's been a little confusion... I
only want to return a if a==b. if
a!=b, then I don't want gs to return
anything yet.
Check for that then:
def gs(a,b):
def ry():
if a==b:
return a
ret = ry()
if ret: return ret
# do other stuff
you return ry() explicitly instead of just calling it.
I had a similar problem, but solved it by simply changing the order of the call.
def ry ()
if a==b
gs()
in some languages like javascript you can even pass a function as a variable in a function:
function gs(a, b, callback) {
if (a==b) callback();
}
gs(a, b, ry);
I came here looking for an answer to the same type of problem. Instead I worked out a solution that (at least for me) makes the intent a bit more clear: define your steps as lambdas or defs, and place them in an array. Then you can simply loop and handle them as needed.
I adapted my solution to the OP's question below:
def gs(a,b):
def step1():
nonlocal a
a += 1
def step2():
nonlocal b
b *= 2
if a == b:
return a
steps = [step1, step2]
for step in steps:
step()
if a == b:
return a
# anything you else you want to do