How to ignore the first yield value? - python

The following code is for illustrative purposes only.
def get_messages_from_redis():
for item in self.pubsub.listen():
yield (item['channel'], item['data']) # how to ignore the first yield?
I know the following way can ignore the first yield value:
g = get_messages_from_redis()
next(g)
But how to ignore this in get_messages_from_redis()?
(counter can be used to control whether to yield, but is there a better way?)

Iterate inside your function before yielding. I'm not sure what your iterable is exactly, but here's a generic example assuming a list.
def get_messages_from_redis():
for item in self.pubsub.listen()[1:]:
yield item['channel'], item['data']
For a more universal solution, you could create an iterator of your iterable, iterate over the first one, then loop and yield from there. Note: This is mostly for broader coverage, I'm not sure what negative consequences this might have with certain iterables.
def iter_skip_first(i):
iterable = iter(i)
next(iterable)
for i in iterable:
yield i
li = [1, 2, 3, 4]
d = {"one": 1, "two": 2, "three": 3, "four": 4}
print(*iter_skip_first(li))
print(*iter_skip_first(d))

Related

'yield from' substitute in Python 2

My code uses yield from in python3 in recursive calls and it works perfectly fine. The problem right now is that this was introduced from PEP-380 in python 3.3 and I need it to work in python 2.7. I read up on a few articles and none of them were detailed enough or simple enough.
Few referred articles :
Converting “yield from” statement to Python 2.7 code
yield from and Python 2.7
and few others.
I have recreated a small Sample code (which takes in a multi-level list and returns a flattened list) that is very minimalistic compared to my requirements.
#python 3
def foo(obj):
for ele in obj:
if isinstance(ele, list):
yield from foo(ele)
else:
yield ele
#driver values :
>>> l = [1, [2, 3, [4,5]]]
>>> list(foo(l))
=> [1, 2, 3, 4, 5]
The same converted does not work in python 2.7 due to the non-availability of yield from.
You still need to loop. It doesn't matter that you have recursion here.
You need to loop over the generator produced by the recursive call and yield the results:
def foo(obj):
for ele in obj:
if isinstance(ele, list):
for res in foo(ele):
yield res
else:
yield ele
Your recursive call produces a generator, and you need to pass the results of the generator onwards. You do so by looping over the generator and yielding the individual values.
There are no better options, other than upgrading to Python 3.
yield from essentially passes on the responsibility to loop over to the caller, and passes back any generator.send() and generator.throw() calls to the delegated generator. You don't have any need to pass on .send() or .throw(), so what remains is taking responsibility to do the looping yourself.
Demo:
>>> import sys
>>> sys.version_info
sys.version_info(major=2, minor=7, micro=14, releaselevel='final', serial=0)
>>> def foo(obj):
... for ele in obj:
... if isinstance(ele, list):
... for res in foo(ele):
... yield res
... else:
... yield ele
...
>>> l = [1, [2, 3, [4,5]]]
>>> list(foo(l))
[1, 2, 3, 4, 5]
yield from was introduced in PEP 380 -- Syntax for Delegating to a Subgenerator (not PEP 342), specifically because a loop over the sub-generator would not delegate generator.throw() and generator.send() information.
The PEP explicitly states:
If yielding of values is the only concern, this can be performed without much difficulty using a loop such as
for v in g:
yield v
The Formal Semantics has a Python implementation equivalent that may look intimidating at first, but you can still pick out that it loops (with while 1:, looping ends when there is an exception or StopIteration is handled, new values are retrieved with next() or generator.send(..)), and yields the results (with yield _y).
Why do you say "my code cannot work with loops and needs to be recursive"? You can easily use a loop in a recursive generator:
def foo(obj):
for ele in obj:
if isinstance(ele, list):
#yield from foo(ele)
for t in foo(ele):
yield t
else:
yield ele
l = [1, [2, 3, [4, 5]]]
print list(foo(l))
output
[1, 2, 3, 4, 5]

How do I determine whether a container is infinitely recursive and find its smallest unique container?

I was reading Flatten (an irregular) list of lists and decided to adopt it as a Python exercise - a small function I'll occasionally rewrite without referring to the original, just for practice. The first time I tried this, I had something like the following:
def flat(iterable):
try:
iter(iterable)
except TypeError:
yield iterable
else:
for item in iterable:
yield from flatten(item)
This works fine for basic structures like nested lists containing numbers, but strings crash it because the first element of a string is a single-character string, the first element of which is itself, the first element of which is itself again, and so on. Checking the question linked above, I realized that that explains the check for strings. That gave me the following:
def flatter(iterable):
try:
iter(iterable)
if isinstance(iterable, str):
raise TypeError
except TypeError:
yield iterable
else:
for item in iterable:
yield from flatten(item)
Now it works for strings as well. However, I then recalled that a list can contain references to itself.
>>> lst = []
>>> lst.append(lst)
>>> lst
[[...]]
>>> lst[0][0][0][0] is lst
True
So, a string isn't the only type that could cause this sort of problem. At this point, I started looking for a way to guard against this issue without explicit type-checking.
The following flattener.py ensued. flattish() is a version that just checks for strings. flatten_notype() checks whether an object's first item's first item is equal to itself to determine recursion. flatten() does this and then checks whether either the object or its first item's first item is an instance of the other's type. The Fake class basically just defines a wrapper for sequences. The comments on the lines that test each function describe the results, in the form should be `desired_result` [> `undesired_actual_result`]. As you can see, each fails in various ways on Fake wrapped around a string, Fake wrapped around a list of integers, single-character strings, and multiple-character strings.
def flattish(*i):
for item in i:
try: iter(item)
except: yield item
else:
if isinstance(item, str): yield item
else: yield from flattish(*item)
class Fake:
def __init__(self, l):
self.l = l
self.index = 0
def __iter__(self):
return self
def __next__(self):
if self.index >= len(self.l):
raise StopIteration
else:
self.index +=1
return self.l[self.index-1]
def __str__(self):
return str(self.l)
def flatten_notype(*i):
for item in i:
try:
n = next(iter(item))
try:
n2 = next(iter(n))
recur = n == n2
except TypeError:
yield from flatten(*item)
else:
if recur:
yield item
else:
yield from flatten(*item)
except TypeError:
yield item
def flatten(*i):
for item in i:
try:
n = next(iter(item))
try:
n2 = next(iter(n))
recur = n == n2
except TypeError:
yield from flatten(*item)
else:
if recur:
yield item if isinstance(n2, type(item)) or isinstance(item, type(n2)) else n2
else:
yield from flatten(*item)
except TypeError:
yield item
f = Fake('abc')
print(*flattish(f)) # should be `abc`
print(*flattish((f,))) # should be `abc` > ``
print(*flattish(1, ('a',), ('bc',))) # should be `1 a bc`
f = Fake([1, 2, 3])
print(*flattish(f)) # should be `1 2 3`
print(*flattish((f,))) # should be `1 2 3` > ``
print(*flattish(1, ('a',), ('bc',))) # should be `1 a bc`
f = Fake('abc')
print(*flatten_notype(f)) # should be `abc`
print(*flatten_notype((f,))) # should be `abc` > `c`
print(*flatten_notype(1, ('a',), ('bc',))) # should be `1 a bc` > `1 ('a',) bc`
f = Fake([1, 2, 3])
print(*flatten_notype(f)) # should be `1 2 3` > `2 3`
print(*flatten_notype((f,))) # should be `1 2 3` > ``
print(*flatten_notype(1, ('a',), ('bc',))) # should be `1 a bc` > `1 ('a',) bc`
f = Fake('abc')
print(*flatten(f)) # should be `abc` > `a`
print(*flatten((f,))) # should be `abc` > `c`
print(*flatten(1, ('a',), ('bc',))) # should be `1 a bc`
f = Fake([1, 2, 3])
print(*flatten(f)) # should be `1 2 3` > `2 3`
print(*flatten((f,))) # should be `1 2 3` > ``
print(*flatten(1, ('a',), ('bc',))) # should be `1 a bc`
I've also tried the following with the recursive lst defined above and flatten():
>>> print(*flatten(lst))
[[...]]
>>> lst.append(0)
>>> print(*flatten(lst))
[[...], 0]
>>> print(*list(flatten(lst))[0])
[[...], 0] 0
As you can see, it fails similarly to 1 ('a',) bc as well as in its own special way.
I read how can python function access its own attributes? thinking that maybe the function could keep track of every object it had seen, but that wouldn't work either because our lst contains an object with matching identity and equality, strings contain objects that may only have matching equality, and equality isn't enough due to the possibility of something like flatten([1, 2], [1, 2]).
Is there any reliable way (i.e. doesn't simply check known types, doesn't require that a recursive container and its containers all be of the same type, etc.) to check whether a container holds iterable objects with potential infinite recursion, and reliably determine the smallest unique container? If there is, please explain how it can be done, why it is reliable, and how it handles various recursive circumstances. If not, please explain why this is logically impossible.
I don't think there's a reliable way to find out if an arbitrary iterable is infinite. The best we can is to yield primitives infinitely from such an iterable without exhausting the stack, for example:
from collections import deque
def flat(iterable):
d = deque([iterable])
def _primitive(x):
return type(x) in (int, float, bool, str, unicode)
def _next():
x = d.popleft()
if _primitive(x):
return True, x
d.extend(x)
return False, None
while d:
ok, x = _next()
if ok:
yield x
xs = [1,[2], 'abc']
xs.insert(0, xs)
for p in flat(xs):
print p
The above definition of "primitive" is, well, primitive, but that surely can be improved.
The scenario you ask about is very loosely defined. As defined in your question, it is logically impossible "to check whether a container holds iterable objects with potential infinite recursion[.]" The only limit on the scope of your question is "iterable" object. The official Python documentation defines "iterable" as follows:
An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() or __getitem__() method. [...]
The key phrase here is "any classes [defined] with an __iter__() or __getitem__() method." This allows for "iterable" objects with members that are generated on demand. For example, suppose that someone seeks to use a bunch of string objects that automatically sort and compare in chronological order based on the time at which the particular string was created. They either subclass str or reimplement its functionality, adding a timestamp associated with each pointer to a timestampedString( ) object, and adjust the comparison methods accordingly.
Accessing a substring by index location is a way of creating a new string, so a timestampedString( ) of len( ) == 1 could legitimately return a timestampedString( ) of len( ) == 1 with the same character but a new timestamp when you access timestampedString( )[0:1]. Because the timestamp is part of the specific object instance, there is no kind of identity test that would say that the two objects are the same unless any two strings consisting of the same character are considered to be the same. You state in your question that this should not be the case.
To detect infinite recursion, you first need to add a constraint to the scope of your question that the container only contain static, i.e. pre-generated, objects. With this constraint, any legal object in the container can be converted to some byte-string representation of the object. A simple way to do this would be to pickle each object in the container as you reach it, and maintain a stack of the byte-string representations that result from pickling. If you allow any arbitrary static object, nothing less than a raw-byte interpretation of the objects is going to work.
However, algorithmically enforcing the constraint that the container only contain static objects presents another problem: it requires type-checking against some pre-approved list of types such as some notion of primitives. Two categories of objects can then be accommodated: single objects of a known-static type (e.g. primitives) and containers for which the number of contained items can be determined in advance. The latter category can then be shown to be finite when that many contained objects have been iterated through and all have been shown to be finite. Containers within the container can be handled recursively. The known-static type single objects are the recursive base-case.
If the container produces more objects, then it violates the definition of this category of object. The problem with allowing arbitrary objects in Python is that these objects can be defined in Python code that can use components written in C code and any other language that C can be linked to. There is no way to evaluate this code to determine if it actually complies with the static requirement.
There's an issue with your test code that's unrelated to the recursive container issue you're trying to solve. The issue is that your Fake class is an iterator and can only be used once. After you iterate over all its values, it will always raise StopIteration when you try to iterate on it again.
So if you do multiple operations on the same Fake instance, you shouldn't expect to get anything be empty output after the first operation has consumed the iterator. If you recreate the iterator before each operation, you won't have that problem (and you can actually try addressing the recursion issue).
So on to that issue. One way to avoid infinite recursion is to maintain a stack with the objects that you're currently nested in. If the next value you see is already on the stack somewhere, you know it's recursive and can skip it. Here's an implementation of this using a list as the stack:
def flatten(obj, stack=None):
if stack is None:
stack = []
if obj in stack:
yield obj
try:
it = iter(obj)
except TypeError:
yield obj
else:
stack.append(obj)
for item in it:
yield from flatten(item, stack)
stack.pop()
Note that this can still yield values from the same container more than once, as long as it's not nested within itself (e.g. for x=[1, 2]; y=[x, 3, x]; print(*flatten(y)) will print 1 2 3 1 2).
It also does recurse into strings, but it will only do so for only one level, so flatten("foo") will yield the letters 'f', 'o' and 'o' in turn. If you want to avoid that, you probably do need the function to be type aware, since from the iteration protocol's perspective, a string is not any different than an iterable container of its letters. It's only single character strings that recursively contain themselves.
What about something like this:
def flat(obj, used=[], old=None):
#This is to get inf. recurrences
if obj==old:
if obj not in used:
used.append(obj)
yield obj
raise StopIteration
try:
#Get strings
if isinstance(obj, str):
raise TypeError
#Try to iterate the obj
for item in obj:
yield from flat(item, used, obj)
except TypeError:
#Get non-iterable items
if obj not in used:
used.append(obj)
yield obj
After a finite number of (recursion) steps a list will contain at most itself as iterable element (Since we have to generate it in finite many steps). That's what we test for with obj==old where obj in an element of old.
The list used keeps track of all elements since we want each element only once. We could remove it but we'd get an ugly (and more importantly not well-defined) behaviour on which elements get yield how often.
Drawback is that we store the entire list at the end in the list used...
Testing this with some lists seems to work:
>> lst = [1]
>> lst.append(lst)
>> print('\nList1: ', lst)
>> print([x for x in flat(lst)])
List1: [1, [...]]
Elements: [1, [1, [...]]]
#We'd need to reset the iterator here!
>> lst2 = []
>> lst2.append(lst2)
>> lst2.append((1,'ab'))
>> lst2.append(lst)
>> lst2.append(3)
>> print('\nList2: ', lst2)
>> print([x for x in flat(lst2)])
List2: [[...], (1, 'ab'), [1, [...]], 3]
Elements: [[[...], (1, 'ab'), [1, [...]], 3], 1, 'ab', [1, [...]], 3]
Note: It actually makes sense that the infinite lists [[...], (1, 'ab'), [1, [...]], 3] and [1, [...]] are considered as elements since these actually contain themselves but if that's not desired one can comment out the first yield in the code above.
Just avoid flattening recurring containers. In the example below keepobj keeps track of them and keepcls ignores containers of a certain type. I believe this works down to python 2.3.
def flatten(item, keepcls=(), keepobj=()):
if not hasattr(item, '__iter__') or isinstance(item, keepcls) or item in keepobj:
yield item
else:
for i in item:
for j in flatten(i, keepcls, keepobj + (item,)):
yield j
It can flatten circular lists like lst = [1, 2, [5, 6, {'a': 1, 'b': 2}, 7, 'string'], [...]] and keep some containers like strings and dicts un-flattened.
>>> list(flatten(l, keepcls=(dict, str)))
[1, 2, 5, 6, {'a': 1, 'b': 2}, 7, 'string', [1, 2, [5, 6, {'a': 1, 'b': 2}, 7, 'string'], [...]]]
It also works with the following case:
>>> list(flatten([[1,2],[1,[1,2]],[1,2]]))
[1, 2, 1, 1, 2, 1, 2]
You may want to keep some default classes in keepcls to make calling
the function more terse.

Conditionally yield nothing in one line in python

I have generator like
def not_nones(some_iterable):
for item in some_iterable:
if item is not None:
yield item
But since "flat is better than nested", I would like to do this in one line, like:
def not_nones(some_iterable):
for item in some_iterable:
yield item if item is not None else None
But this will actually make None an item of the generator.
Is it possible to yield nothing in a one-liner anyway?
You could just return a generator expression:
def not_nones(iterable):
return (item for item in iterable if item is not None)
Or for a real one-liner:
not_nones = lambda it: (i for i in it if i is not None)
which at this point is getting more into code-golf territory.
But really, there's not much wrong with your current code; it does what it needs to do, in a reasonable way. Your code is what I would have written in this situation.
You could use itertools.ifilter(). Given the right predicate function it provides exactly the functionality you are implementing here.
Example:
import itertools
# make up data
l = [1, None, 2, None, 3]
# predicate function
not_none = lambda x: x is not None
# filter out None values
not_nones = itertools.ifilter(not_none, l)
print list(not_nones) # prints [1, 2, 3]
For reference:
https://docs.python.org/2/library/itertools.html#itertools.ifilter

How can I get a Python generator to return None rather than StopIteration?

I am using generators to perform searches in lists like this simple example:
>>> a = [1,2,3,4]
>>> (i for i, v in enumerate(a) if v == 4).next()
3
(Just to frame the example a bit, I am using very much longer lists compared to the one above, and the entries are a little bit more complicated than int. I do it this way so the entire lists won't be traversed each time I search them)
Now if I would instead change that to i == 666, it would return a StopIteration because it can't find any 666 entry in a.
How can I make it return None instead? I could of course wrap it in a try ... except clause, but is there a more pythonic way to do it?
If you are using Python 2.6+ you should use the next built-in function, not the next method (which was replaced with __next__ in 3.x). The next built-in takes an optional default argument to return if the iterator is exhausted, instead of raising StopIteration:
next((i for i, v in enumerate(a) if i == 666), None)
You can chain the generator with (None,):
from itertools import chain
a = [1,2,3,4]
print chain((i for i, v in enumerate(a) if v == 6), (None,)).next()
but I think a.index(2) will not traverse the full list, when 2 is found, the search is finished. you can test this:
>>> timeit.timeit("a.index(0)", "a=range(10)")
0.19335955439601094
>>> timeit.timeit("a.index(99)", "a=range(100)")
2.1938486138533335

Python: add a value to the end of the inner-most right nested list

What I'm trying to do, is, given a list with an arbitrary number of other nested lists, recursively descend through the last value in the nested lists until I've reached the maximum depth, and then append a value to that list. An example might make this clearer:
>>> nested_list1 = [1, 2, 3, [4, 5, 6]]
>>> last_inner_append(nested_list1, 7)
[1, 2, 3, [4, 5, 6, 7]]
>>> nested_list2 = [1, 2, [3, 4], 5, 6]
>>> last_inner_append(nested_list2, 7)
[1, 2, [3, 4], 5, 6, 7]
The following code works, but it seems excessively tricky to me:
def add_to_inner_last(nested, item):
nest_levels = [nested]
try:
nest_levels.append(nested[-1])
except IndexError: # The empty list case
nested.append(item)
return
while type(nest_levels[-1]) == list:
try:
nest_levels.append(nest_levels[-1][-1])
except IndexError: # The empty inner list case
nest_levels[-1].append(item)
return
nest_levels[-2].append(item)
return
Some things I like about it:
It works
It handles the cases of strings at the end of lists, and the cases of empty lists
Some things I don't like about it:
I have to check the type of objects, because strings are also indexable
The indexing system feels too magical--I won't be able to understand this tomorrow
It feels excessively clever to use the fact that appending to a referenced list affects all references
Some general questions I have about it:
At first I was worried that appending to nest_levels was space inefficient, but then I realized that this is probably just a reference, and a new object is not created, right?
This code is purely side effect producing (It always returns None). Should I be concerned about that?
Basically, while this code works (I think...), I'm wondering if there's a better way to do this. By better I mean clearer or more pythonic. Potentially something with more explicit recursion? I had trouble defining a stopping point or a way to do this without producing side effects.
Edit:
To be clear, this method also needs to handle:
>>> last_inner_append([1,[2,[3,[4]]]], 5)
[1,[2,[3,[4,5]]]]
and:
>>> last_inner_append([1,[2,[3,[4,[]]]]], 5)
[1,[2,[3,[4,[5]]]]]
How about this:
def last_inner_append(x, y):
try:
if isinstance(x[-1], list):
last_inner_append(x[-1], y)
return x
except IndexError:
pass
x.append(y)
return x
This function returns the deepest inner list:
def get_deepest_list(lst, depth = 0):
deepest_list = lst
max_depth = depth
for li in lst:
if type(li) == list:
tmp_deepest_list, tmp_max_depth = get_deepest_list(li, depth + 1)
if max_depth < tmp_max_depth: # change to <= to get the rightmost inner list
max_depth = tmp_max_depth
deepest_list = tmp_deepest_list
return deepest_list, max_depth
And then use it as:
def add_to_deepest_inner(lst, item):
inner_lst, depth = get_deepest_list(lst)
inner_lst.append(item)
Here is my take:
def last_inner_append(cont, el):
if type(cont) == list:
if not len(cont) or type(cont[-1]) != list:
cont.append(el)
else:
last_inner_append(cont[-1], el)
I think it's nice and clear, and passes all your tests.
It is also pure side-effect; if you want to change this, I suggest you go with BasicWolf's approach and create a 'selector' and an 'update' function, where the latter uses the former.
It's the same recursion scheme as Phil H's, but handles empty lists.
I don't think there is a good way around the two type tests, however you approach them (e.g. with 'type' or checking for 'append'...).
You can test if append is callable, rather than using try/catch, and recursing:
def add_to_inner_last(nested, item):
if callable(nested,append):
if callable(nested[-1],append):
return add_to_inner_last(nested[-1],item)
else:
nested.append(item)
return true
else:
return false
It's slightly annoying to have to have two callable tests, but the alternative is to pass a reference to the parent as well as the child.
def last_inner_append(sequence, element):
def helper(tmp, seq, elem=element):
if type(seq) != list:
tmp.append(elem)
elif len(seq):
helper(seq, seq[-1])
else:
seq.append(elem)
helper(sequence, sequence)

Categories