How to identify a generator vs list comprehension - python

I have this:
>>> sum( i*i for i in xrange(5))
My question is, in this case am I passing a list comprehension or a generator object to sum ? How do I tell that? Is there a general rule around this?
Also remember sum by itself needs a pair of parentheses to surround its arguments. I'd think that the parentheses above are for sum and not for creating a generator object. Wouldn't you agree?

You are passing in a generator expression.
A list comprehension is specified with square brackets ([...]). A list comprehension builds a list object first, so it uses syntax closely related to the list literal syntax:
list_literal = [1, 2, 3]
list_comprehension = [i for i in range(4) if i > 0]
A generator expression, on the other hand, creates an iterator object. Only when iterating over that object is the contained loop executed and are items produced. The generator expression does not retain those items; there is no list object being built.
A generator expression always uses (...) round parethesis, but when used as the only argument to a call, the parenthesis can be omitted; the following two expressions are equivalent:
sum((i*i for i in xrange(5))) # with parenthesis
sum(i*i for i in xrange(5)) # without parenthesis around the generator
Quoting from the generator expression documentation:
The parentheses can be omitted on calls with only one argument. See section Calls for the detail.

List comprehensions are enclosed in []:
>>> [i*i for i in xrange(5)] # list comprehension
[0, 1, 4, 9, 16]
>>> (i*i for i in xrange(5)) # generator
<generator object <genexpr> at 0x2cee40>
You are passing a generator.

That is a generator:
>>> (i*i for i in xrange(5))
<generator object <genexpr> at 0x01A27A08>
>>>
List comprehensions are enclosed in [].

You might also be asking, "does this syntax truly cause sum to consume a generator one item at a time, or does it secretly create a list of every item in the generator first"? One way to check this is to try it on a very large range and watch memory usage:
sum(i for i in xrange(int(1e8)))
Memory usage for this case is constant, where as range(int(1e8)) creates the full list and consumes several hundred MB of RAM.
You can test that the parentheses are optional:
def print_it(obj):
print obj
print_it(i for i in xrange(5))
# prints <generator object <genexpr> at 0x03853C60>

I tried this:
#!/usr/bin/env python
class myclass:
def __init__(self,arg):
self.p = arg
print type(self.p)
print self.p
if __name__ == '__main__':
c = myclass(i*i for i in xrange(5))
And this prints:
$ ./genexprorlistcomp.py
<type 'generator'>
<generator object <genexpr> at 0x7f5344c7cf00>
Which is consistent with what Martin and mdscruggs explained in their post.

You are passing a generator object, list comprehension is surrounded by [].

Related

String join() method used with for loop

I need your help, because I don't understand why is possible to use the join() method with a forloop as an argument.
Ex:
" ".join(str(x) for x in list)
Python Documentation:
str.join(iterable)
Return a string which is the concatenation of the strings in iterable.
A TypeError will be raised if there are any non-string values in iterable, including bytes objects. The separator between elements is the string providing this method.
Can please somebody explain?
The statement (str(x) for x in list) is called a generator expression:
>>> (str(x) for x in [1,2,3])
<generator object <genexpr> at 0x7fc916f01d20>
What this does is create an object that can be iterated over exactly once, and yields elements that would be created one at a time. You can iterate over it as you would a list, like this:
>>> gen = (str(x) for x in [1,2,3])
>>> for s in gen:
... print s
...
1
2
3
A generator expression is iterable, so what the join function does is iterate over it and join its values.

<generator object <genexpr> at 0x11ad5dbf8> instead of word stems? [duplicate]

This question already has answers here:
How exactly does a generator comprehension work?
(8 answers)
Closed 3 hours ago.
I got an error for a simple print statement, what could be the possible error, have changed to float and tried but still error persist.
if __name__ == '__main__':
print (i*i for i in range(5))
error:
<generator object <genexpr> at 0x0000000002731828>
Thanks in advance...
In Python 3, print() is a function, not a statement.
A generator expression is like a list comprehension, except it creates an object that produces results when you iterate over it, not when you create it. For example,
[i*i for i in range(5)]
produces a list, [0, 1, 4, 9, 16], while
(i*i for i in range(5))
produces a generator object that will produce those numbers when you iterate over it.
If you give a function only one argument and it is a generator expression, you can omit the parentheses around the generator expression, so you do not have to do myfunc((i + 1 for i in something)).
So you are creating a generator object, and passing it to the print() function, which prints its representation. It’s doing exactly what you asked for, just not what you meant to ask for.
You can initialize a list from a generator expression:
print(list(i*i for i in range(5)))
but it is easier to use the list comprehension:
print([i*i for i in range(5)])
A simple example of how you might use the generator object is:
for value in (i * i for i in range(5)):
print value
although in that simple example it would obviously be easier to write:
for i in range(5):
print i * i
There is no error. I think you are simply trying to print a list. Use [] to get a list instead of a generator:
if __name__ == '__main__':
print([i*i for i in range(5)])
Output:
[0, 1, 4, 9, 16]
To print on separate lines, you would do:
if __name__ == '__main__':
print('\n'.join([str(i*i) for i in range(5)]))
This uses the 'delimiter'.join(list) approach to join all the elements of the list with the specified delimiter (in this case a newline: \n)
Output:
0
1
4
9
16
Or as #MartijnPieters suggested (for python3 only), you can also do:
print(*(i*i for i in range(5)), sep='\n')

What is a better pythonic version of this conditional deleting?

i am refreshing my python (2.7) and i am discovering iterators and generators.
As i understood, they are an efficient way of navigating over values without consuming too much memory.
So the following code do some kind of logical indexing on a list:
removing the values of a list L that triggers a False conditional statement represented here by the function f.
I am not satisfied with my code because I feel this code is not optimal for three reasons:
I read somewhere that it is better to use a for loop than a while loop.
However, in the usual for i in range(10), i can't modify the value of 'i' because it seems that the iteration doesn't care.
Logical indexing is pretty strong in matrix-oriented languages, and there should be a way to do the same in python (by hand granted, but maybe better than my code).
Third reason is just that i want to use generator/iterator on this example to help me understand.
Third reason is just that i want to use generator/iterator on this example to help me understand.
TL;DR : Is this code a good pythonic way to do logical indexing ?
#f string -> bool
def f(s):
return 'c' in s
L=['','a','ab','abc','abcd','abcde','abde'] #example
length=len(L)
i=0
while i < length:
if not f(L[i]): #f is a conditional statement (input string output bool)
del L[i]
length-=1 #cut and push leftwise
else:
i+=1
print 'Updated list is :', L
print length
This code has a few problems, but the main one is that you must never modify a list you're iterating over. Rather, you create a new list from the elements that match your condition. This can be done simply in a for loop:
newlist = []
for item in L:
if f(item):
newlist.append(item)
which can be shortened to a simple list comprehension:
newlist = [item for item in L if f(item)]
It looks like filter() is what you're after:
newlist = filter(lambda x: not f(x), L)
filter() filters (...) an iterable and only keeps the items for which a predicate returns True. In your case f(..) is not quite the predicate but not f(...).
Simpler:
def f(s):
return 'c' not in s
newlist = filter(f, L)
See: https://docs.python.org/2/library/functions.html#filter
Never modify a list with del, pop or other methods that mutate the length of the list while iterating over it. Read this for more information.
The "pythonic" way to filter a list is to use reassignment and either a list comprehension or the built-in filter function:
List comprehension:
>>> [item for item in L if f(item)]
['abc', 'abcd', 'abcde']
i want to use generator/iterator on this example to help me understand
The for item in L part is implicitly making use of the iterator protocol. Python lists are iterable, and iter(somelist) returns an iterator .
>>> from collections import Iterable, Iterator
>>> isinstance([], Iterable)
True
>>> isinstance([], Iterator)
False
>>> isinstance(iter([]), Iterator)
True
__iter__ is not only being called when using a traditional for-loop, but also when you use a list comprehension:
>>> class mylist(list):
... def __iter__(self):
... print('iter has been called')
... return super(mylist, self).__iter__()
...
>>> m = mylist([1,2,3])
>>> [x for x in m]
iter has been called
[1, 2, 3]
Filtering:
>>> filter(f, L)
['abc', 'abcd', 'abcde']
In Python3, use list(filter(f, L)) to get a list.
Of course, to filter a list, Python needs to iterate over it, too:
>>> filter(None, mylist())
iter has been called
[]
"The python way" to do it would be to use a generator expression:
# list comprehension
L = [l for l in L if f(l)]
# alternative generator comprehension
L = (l for l in L if f(l))
It depends on your context if a list or a generator is "better" (see e.g. this so question). Because your source data is coming from a list, there is no real benefit of using a generator here.
For simply deleting elements, especially if the original list is no longer needed, just iterate backwards:
Python 2.x:
for i in xrange(len(L) - 1, -1, -1):
if not f(L[i]):
del L[i]
Python 3.x:
for i in range(len(L) - 1, -1, -1):
if not f(L[i]):
del L[i]
By iterating from the end, the "next" index does not change after deletion and a for loop is possible. Note that you should use the xrange generator in Python 2, or the range generator in Python 3, to save memory*.
In cases where you must iterate forward, use your given solution above.
*Note that Python 2's xrange will break if there are >= 2 ** 32 - 1 elements. Python 3's range, as well as the less efficient Python 2's range do not have this limitation.

How to use python generator expressions to create a oneliner to run a function multiple times and get a list output

I am wondering if there is there is a simple Pythonic way (maybe using generators) to run a function over each item in a list and result in a list of returns?
Example:
def square_it(x):
return x*x
x_set = [0,1,2,3,4]
squared_set = square_it(x for x in x_set)
I notice that when I do a line by line debug on this, the object that gets passed into the function is a generator.
Because of this, I get an error:
TypeError: unsupported operand type(s) for *: 'generator' and 'generator'
I understand that this generator expression created a generator to be passed into the function, but I am wondering if there is a cool way to accomplish running the function multiple times only by specifying an iterable as the argument? (without modifying the function to expect an iterable).
It seems to me that this ability would be really useful to cut down on lines of code because you would not need to create a loop to fun the function and a variable to save the output in a list.
Thanks!
You want a list comprehension:
squared_set = [square_it(x) for x in x_set]
There's a builtin function, map(), for this common problem.
>>> map(square_it, x_set)
[0,1,4,9,16] # On Python 3, a generator is returned.
Alternatively, one can use a generator expression, which is memory-efficient but lazy (meaning the values will not be computed now, only when needed):
>>> (square_it(x) for x in x_set)
<generator object <genexpr> at ...>
Similarly, one can also use a list comprehension, which computes all the values upon creation, returning a list.
Additionally, here's a comparison of generator expressions and list comprehensions.
You want to call the square_it function inside the generator, not on the generator.
squared_set = (square_it(x) for x in x_set)
As the other answers have suggested, I think it is best (most "pythonic") to call your function explicitly on each element, using a list or generator comprehension.
To actually answer the question though, you can wrap your function that operates over scalers with a function that sniffs the input, and has different behavior depending on what it sees. For example:
>>> import types
>>> def scaler_over_generator(f):
... def wrapper(x):
... if isinstance(x, types.GeneratorType):
... return [f(i) for i in x]
... return f(x)
... return wrapper
>>> def square_it(x):
... return x * x
>>> square_it_maybe_over = scaler_over_generator(square_it)
>>> square_it_maybe_over(10)
100
>>> square_it_maybe_over(x for x in range(5))
[0, 1, 4, 9, 16]
I wouldn't use this idiom in my code, but it is possible to do.
You could also code it up with a decorator, like so:
>>> #scaler_over_generator
... def square_it(x):
... return x * x
>>> square_it(x for x in range(5))
[0, 1, 4, 9, 16]
If you didn't want/need a handle to the original function.
Note that there is a difference between list comprehension returning a list
squared_set = [square_it(x) for x in x_set]
and returning a generator that you can iterate over it:
squared_set = (square_it(x) for x in x_set)

Python `for` syntax: block code vs single line generator expressions

I'm familiar with the for loop in a block-code context. eg:
for c in "word":
print c
I just came across some examples that use for differently. Rather than beginning with the for statement, they tag it at the end of an expression (and don't involve an indented code-block). eg:
sum(x*x for x in range(10))
Can anyone point me to some documentation that outlines this use of for? I've been able to find examples, but not explanations. All the for documentation I've been able to find describes the previous use (block-code example). I'm not even sure what to call this use, so I apologize if my question's title is unclear.
What you are pointing to is Generator in Python. Take a look at: -
http://wiki.python.org/moin/Generators
http://www.python.org/dev/peps/pep-0255/
http://docs.python.org/whatsnew/2.5.html#pep-342-new-generator-features
See the documentation: - Generator Expression which contains exactly the same example you have posted
From the documentation: -
Generators are a simple and powerful tool for creating iterators. They
are written like regular functions but use the yield statement
whenever they want to return data. Each time next() is called, the
generator resumes where it left-off (it remembers all the data values
and which statement was last executed)
Generators are similar to List Comprehension that you use with square brackets instead of brackets, but they are more memory efficient. They don't return the complete list of result at the same time, but they return generator object. Whenever you invoke next() on the generator object, the generator uses yield to return the next value.
List Comprehension for the above code would look like: -
[x * x for x in range(10)]
You can also add conditions to filter out results at the end of the for.
[x * x for x in range(10) if x % 2 != 0]
This will return a list of numbers multiplied by 2 in the range 1 to 5, if the number is not divisible by 2.
An example of Generators depicting the use of yield can be: -
def city_generator():
yield("Konstanz")
yield("Zurich")
yield("Schaffhausen")
yield("Stuttgart")
>>> x = city_generator()
>>> x.next()
Konstanz
>>> x.next()
Zurich
>>> x.next()
Schaffhausen
>>> x.next()
Stuttgart
>>> x.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
So, you see that, every call to next() executes the next yield() in generator. and at the end it throws StopIteration.
Those are generator expressions and they are related to list comprehensions
List comprehensions allow for the easy creation of lists. For example, if you wanted to create a list of perfect squares you could do this:
>>> squares = []
>>> for x in range(10):
... squares.append(x**2)
...
>>> squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
But instead you could use a list comprehension:
squares = [x**2 for x in range(10)]
Generator expressions are like list comprehensions, except they return a generator object instead of a list. You can iterate over this generator object in a similar manner to list comprehensions, but you don't have to store the whole list in memory at once, as you would if you created the list in a list comprehension.
Documentation for generator expressions is here https://www.python.org/dev/peps/pep-0289/
Following is the code using generator expression .
list(x**2 for x in range(0,10))
Your specific example is called a generator expression. List comprehensions, dictionary comprehensions, and set comprehensions are similar in meaning (different result types, and generator expressions are lazy) and have the same syntax, modulo being inside other kinds of brackets, and in the case of a dict comprehension having expr1: expr2 instead of a single expression (x*x in your example).

Categories