How exactly does a generator comprehension work? - python

What does generator comprehension do? How does it work? I couldn't find a tutorial about it.

Do you understand list comprehensions? If so, a generator expression is like a list comprehension, but instead of finding all the items you're interested and packing them into list, it waits, and yields each item out of the expression, one by one.
>>> my_list = [1, 3, 5, 9, 2, 6]
>>> filtered_list = [item for item in my_list if item > 3]
>>> print(filtered_list)
[5, 9, 6]
>>> len(filtered_list)
3
>>> # compare to generator expression
...
>>> filtered_gen = (item for item in my_list if item > 3)
>>> print(filtered_gen) # notice it's a generator object
<generator object <genexpr> at 0x7f2ad75f89e0>
>>> len(filtered_gen) # So technically, it has no length
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'generator' has no len()
>>> # We extract each item out individually. We'll do it manually first.
...
>>> next(filtered_gen)
5
>>> next(filtered_gen)
9
>>> next(filtered_gen)
6
>>> next(filtered_gen) # Should be all out of items and give an error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> # Yup, the generator is spent. No values for you!
...
>>> # Let's prove it gives the same results as our list comprehension
...
>>> filtered_gen = (item for item in my_list if item > 3)
>>> gen_to_list = list(filtered_gen)
>>> print(gen_to_list)
[5, 9, 6]
>>> filtered_list == gen_to_list
True
>>>
Because a generator expression only has to yield one item at a time, it can lead to big savings in memory usage. Generator expressions make the most sense in scenarios where you need to take one item at a time, do a lot of calculations based on that item, and then move on to the next item. If you need more than one value, you can also use a generator expression and grab a few at a time. If you need all the values before your program proceeds, use a list comprehension instead.

A generator comprehension is the lazy version of a list comprehension.
It is just like a list comprehension except that it returns an iterator instead of the list ie an object with a next() method that will yield the next element.
If you are not familiar with list comprehensions see here and for generators see here.

List/generator comprehension is a construct which you can use to create a new list/generator from an existing one.
Let's say you want to generate the list of squares of each number from 1 to 10. You can do this in Python:
>>> [x**2 for x in range(1,11)]
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
here, range(1,11) generates the list [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], but the range function is not a generator before Python 3.0, and therefore the construct I've used is a list comprehension.
If I wanted to create a generator that does the same thing, I could do it like this:
>>> (x**2 for x in xrange(1,11))
<generator object at 0x7f0a79273488>
In Python 3, however, range is a generator, so the outcome depends only on the syntax you use (square brackets or round brackets).

Generator comprehension is an easy way of creating generators with a certain structure. Lets say you want a generator that outputs one by one all the even numbers in your_list. If you create it by using the function style it would be like this:
def allEvens( L ):
for number in L:
if number % 2 is 0:
yield number
evens = allEvens( yourList )
You could achieve the same result with this generator comprehension expression:
evens = ( number for number in your_list if number % 2 == 0 )
In both cases, when you call next(evens) you get the next even number in your_list.

Generator comprehension is an approach to create iterables, something like a cursor which moves on a resource. If you know mysql cursor or mongodb cursor, you may be aware of that the whole actual data never gets loaded into the memory at once, but one at a time. Your cursor moves back and forth, but there is always a one row/list element in memory.
In short, by using generators comprehension you can easily create cursors in python.

Another example of Generator comprehension:
print 'Generator comprehensions'
def sq_num(n):
for num in (x**2 for x in range(n)):
yield num
for x in sq_num(10):
print x

Generators are same as lists only, the minor difference is that in lists we get all the required numbers or items of the list at ones, but in generators the required numbers are yielded one at a time. So for getting the required items we have to use the for loop to get all the required items.
#to get all the even numbers in given range
def allevens(n):
for x in range(2,n):
if x%2==0:
yield x
for x in allevens(10)
print(x)
#output
2
4
6
8

We can understand this as a generator version of list comprehension. In the case of list comprehension, we create a using one-liners or a short code and for generator comprehensions, we do one-liner code or a small code for generators.
These have the same syntax just replace the [] (square brackets) with the () curly brackets.
generator_composition_object = (num**3 for num in range(5))
print(generator_composition_object)
this will give an address of the object of the type generator. We can also use functionalities like next() in these.

Related

Why complain that 'tuple' object does not support item assignment when extending a list in a tuple? [duplicate]

This question already has answers here:
a mutable type inside an immutable container
(3 answers)
Closed 6 years ago.
So I have this code:
tup = ([1,2,3],[7,8,9])
tup[0] += (4,5,6)
which generates this error:
TypeError: 'tuple' object does not support item assignment
While this code:
tup = ([1,2,3],[7,8,9])
try:
tup[0] += (4,5,6)
except TypeError:
print tup
prints this:
([1, 2, 3, 4, 5, 6], [7, 8, 9])
Is this behavior expected?
Note
I realize this is not a very common use case. However, while the error is expected, I did not expect the list change.
Yes it's expected.
A tuple cannot be changed. A tuple, like a list, is a structure that points to other objects. It doesn't care about what those objects are. They could be strings, numbers, tuples, lists, or other objects.
So doing anything to one of the objects contained in the tuple, including appending to that object if it's a list, isn't relevant to the semantics of the tuple.
(Imagine if you wrote a class that had methods on it that cause its internal state to change. You wouldn't expect it to be impossible to call those methods on an object based on where it's stored).
Or another example:
>>> l1 = [1, 2, 3]
>>> l2 = [4, 5, 6]
>>> t = (l1, l2)
>>> l3 = [l1, l2]
>>> l3[1].append(7)
Two mutable lists referenced by a list and by a tuple. Should I be able to do the last line (answer: yes). If you think the answer's no, why not? Should t change the semantics of l3 (answer: no).
If you want an immutable object of sequential structures, it should be tuples all the way down.
Why does it error?
This example uses the infix operator:
Many operations have an “in-place” version. The following functions
provide a more primitive access to in-place operators than the usual
syntax does; for example, the statement x += y is equivalent to x =
operator.iadd(x, y). Another way to put it is to say that z =
operator.iadd(x, y) is equivalent to the compound statement z = x; z
+= y.
https://docs.python.org/2/library/operator.html
So this:
l = [1, 2, 3]
tup = (l,)
tup[0] += (4,5,6)
is equivalent to this:
l = [1, 2, 3]
tup = (l,)
x = tup[0]
x = x.__iadd__([4, 5, 6]) # like extend, but returns x instead of None
tup[0] = x
The __iadd__ line succeeds, and modifies the first list. So the list has been changed. The __iadd__ call returns the mutated list.
The second line tries to assign the list back to the tuple, and this fails.
So, at the end of the program, the list has been extended but the second part of the += operation failed. For the specifics, see this question.
Well I guess tup[0] += (4, 5, 6) is translated to:
tup[0] = tup[0].__iadd__((4,5,6))
tup[0].__iadd__((4,5,6)) is executed normally changing the list in the first element. But the assignment fails since tuples are immutables.
Tuples cannot be changed directly, correct. Yet, you may change a tuple's element by reference. Like:
>>> tup = ([1,2,3],[7,8,9])
>>> l = tup[0]
>>> l += (4,5,6)
>>> tup
([1, 2, 3, 4, 5, 6], [7, 8, 9])
The Python developers wrote an official explanation about why it happens here: https://docs.python.org/2/faq/programming.html#why-does-a-tuple-i-item-raise-an-exception-when-the-addition-works
The short version is that += actually does two things, one right after the other:
Run the thing on the right.
assign the result to the variable on the left
In this case, step 1 works because you’re allowed to add stuff to lists (they’re mutable), but step 2 fails because you can’t put stuff into tuples after creating them (tuples are immutable).
In a real program, I would suggest you don't do a try-except clause, because tup[0].extend([4,5,6]) does the exact same thing.

sort(),sorted Where to put sort? [duplicate]

list.sort() sorts the list and replaces the original list, whereas sorted(list) returns a sorted copy of the list, without changing the original list.
When is one preferred over the other?
Which is more efficient? By how much?
Can a list be reverted to the unsorted state after list.sort() has been performed?
Please use Why do these list operations (methods) return None, rather than the resulting list? to close questions where OP has inadvertently assigned the result of .sort(), rather than using sorted or a separate statement. Proper debugging would reveal that .sort() had returned None, at which point "why?" is the remaining question.
sorted() returns a new sorted list, leaving the original list unaffected. list.sort() sorts the list in-place, mutating the list indices, and returns None (like all in-place operations).
sorted() works on any iterable, not just lists. Strings, tuples, dictionaries (you'll get the keys), generators, etc., returning a list containing all elements, sorted.
Use list.sort() when you want to mutate the list, sorted() when you want a new sorted object back. Use sorted() when you want to sort something that is an iterable, not a list yet.
For lists, list.sort() is faster than sorted() because it doesn't have to create a copy. For any other iterable, you have no choice.
No, you cannot retrieve the original positions. Once you called list.sort() the original order is gone.
What is the difference between sorted(list) vs list.sort()?
list.sort mutates the list in-place & returns None
sorted takes any iterable & returns a new list, sorted.
sorted is equivalent to this Python implementation, but the CPython builtin function should run measurably faster as it is written in C:
def sorted(iterable, key=None):
new_list = list(iterable) # make a new list
new_list.sort(key=key) # sort it
return new_list # return it
when to use which?
Use list.sort when you do not wish to retain the original sort order
(Thus you will be able to reuse the list in-place in memory.) and when
you are the sole owner of the list (if the list is shared by other code
and you mutate it, you could introduce bugs where that list is used.)
Use sorted when you want to retain the original sort order or when you
wish to create a new list that only your local code owns.
Can a list's original positions be retrieved after list.sort()?
No - unless you made a copy yourself, that information is lost because the sort is done in-place.
"And which is faster? And how much faster?"
To illustrate the penalty of creating a new list, use the timeit module, here's our setup:
import timeit
setup = """
import random
lists = [list(range(10000)) for _ in range(1000)] # list of lists
for l in lists:
random.shuffle(l) # shuffle each list
shuffled_iter = iter(lists) # wrap as iterator so next() yields one at a time
"""
And here's our results for a list of randomly arranged 10000 integers, as we can see here, we've disproven an older list creation expense myth:
Python 2.7
>>> timeit.repeat("next(shuffled_iter).sort()", setup=setup, number = 1000)
[3.75168503401801, 3.7473005310166627, 3.753129180986434]
>>> timeit.repeat("sorted(next(shuffled_iter))", setup=setup, number = 1000)
[3.702025591977872, 3.709248117986135, 3.71071034099441]
Python 3
>>> timeit.repeat("next(shuffled_iter).sort()", setup=setup, number = 1000)
[2.797430992126465, 2.796825885772705, 2.7744789123535156]
>>> timeit.repeat("sorted(next(shuffled_iter))", setup=setup, number = 1000)
[2.675589084625244, 2.8019039630889893, 2.849375009536743]
After some feedback, I decided another test would be desirable with different characteristics. Here I provide the same randomly ordered list of 100,000 in length for each iteration 1,000 times.
import timeit
setup = """
import random
random.seed(0)
lst = list(range(100000))
random.shuffle(lst)
"""
I interpret this larger sort's difference coming from the copying mentioned by Martijn, but it does not dominate to the point stated in the older more popular answer here, here the increase in time is only about 10%
>>> timeit.repeat("lst[:].sort()", setup=setup, number = 10000)
[572.919036605, 573.1384446719999, 568.5923951]
>>> timeit.repeat("sorted(lst[:])", setup=setup, number = 10000)
[647.0584738299999, 653.4040515829997, 657.9457361929999]
I also ran the above on a much smaller sort, and saw that the new sorted copy version still takes about 2% longer running time on a sort of 1000 length.
Poke ran his own code as well, here's the code:
setup = '''
import random
random.seed(12122353453462456)
lst = list(range({length}))
random.shuffle(lst)
lists = [lst[:] for _ in range({repeats})]
it = iter(lists)
'''
t1 = 'l = next(it); l.sort()'
t2 = 'l = next(it); sorted(l)'
length = 10 ** 7
repeats = 10 ** 2
print(length, repeats)
for t in t1, t2:
print(t)
print(timeit(t, setup=setup.format(length=length, repeats=repeats), number=repeats))
He found for 1000000 length sort, (ran 100 times) a similar result, but only about a 5% increase in time, here's the output:
10000000 100
l = next(it); l.sort()
610.5015971539542
l = next(it); sorted(l)
646.7786222379655
Conclusion:
A large sized list being sorted with sorted making a copy will likely dominate differences, but the sorting itself dominates the operation, and organizing your code around these differences would be premature optimization. I would use sorted when I need a new sorted list of the data, and I would use list.sort when I need to sort a list in-place, and let that determine my usage.
The main difference is that sorted(some_list) returns a new list:
a = [3, 2, 1]
print sorted(a) # new list
print a # is not modified
and some_list.sort(), sorts the list in place:
a = [3, 2, 1]
print a.sort() # in place
print a # it's modified
Note that since a.sort() doesn't return anything, print a.sort() will print None.
Can a list original positions be retrieved after list.sort()?
No, because it modifies the original list.
Here are a few simple examples to see the difference in action:
See the list of numbers here:
nums = [1, 9, -3, 4, 8, 5, 7, 14]
When calling sorted on this list, sorted will make a copy of the list. (Meaning your original list will remain unchanged.)
Let's see.
sorted(nums)
returns
[-3, 1, 4, 5, 7, 8, 9, 14]
Looking at the nums again
nums
We see the original list (unaltered and NOT sorted.). sorted did not change the original list
[1, 2, -3, 4, 8, 5, 7, 14]
Taking the same nums list and applying the sort function on it, will change the actual list.
Let's see.
Starting with our nums list to make sure, the content is still the same.
nums
[-3, 1, 4, 5, 7, 8, 9, 14]
nums.sort()
Now the original nums list is changed and looking at nums we see our original list has changed and is now sorted.
nums
[-3, 1, 2, 4, 5, 7, 8, 14]
Note: Simplest difference between sort() and sorted() is: sort()
doesn't return any value while, sorted() returns an iterable list.
sort() doesn't return any value.
The sort() method just sorts the elements of a given list in a specific order - Ascending or Descending without returning any value.
The syntax of sort() method is:
list.sort(key=..., reverse=...)
Alternatively, you can also use Python's in-built function sorted()
for the same purpose. sorted function return sorted list
list=sorted(list, key=..., reverse=...)
The .sort() function stores the value of new list directly in the list variable; so answer for your third question would be NO.
Also if you do this using sorted(list), then you can get it use because it is not stored in the list variable. Also sometimes .sort() method acts as function, or say that it takes arguments in it.
You have to store the value of sorted(list) in a variable explicitly.
Also for short data processing the speed will have no difference; but for long lists; you should directly use .sort() method for fast work; but again you will face irreversible actions.
With list.sort() you are altering the list variable but with sorted(list) you are not altering the variable.
Using sort:
list = [4, 5, 20, 1, 3, 2]
list.sort()
print(list)
print(type(list))
print(type(list.sort())
Should return this:
[1, 2, 3, 4, 5, 20]
<class 'NoneType'>
But using sorted():
list = [4, 5, 20, 1, 3, 2]
print(sorted(list))
print(list)
print(type(sorted(list)))
Should return this:
[1, 2, 3, 4, 5, 20]
[4, 5, 20, 1, 3, 2]
<class 'list'>

For as expression

The for operator can also be used as an expression, like in
print(c for c in iter)
The python language reference makes no mention of this, or at least I could not find it.
Is the value of this expression well defined, and is the a point of using it?
EDIT: I wrote this from the smartphone, but now that I'm back to the code I saw this in, I noticed an error as pointed out in the comments - I added the c in front of for.
This form of for loop is called a generator object, from here:
Generator functions allow you to declare a function that behaves like
an iterator, i.e. it can be used in a for loop.
Also
Generator expressions provide an additional shortcut to build
generators out of expressions similar to that of list comprehensions.
In fact, we can turn a list comprehension into a generator expression
by replacing the square brackets ("[ ]") with parentheses.
Alternately, we can think of list comprehensions as generator
expressions wrapped in a list constructor.
example:
>>> (x for x in range(10))
<generator object <genexpr> at 0x0000014DEA749E08>
You can use the starting * character to print the all values of an iterator.
Code:
my_list = [1, 2, 3, 4, 5]
print(*my_list, sep="\n")
Output:
>>> python3 test.py
1
2
3
4
5
Or you can make more complex expressions with * character and list comprehension.
Code:
my_list = [1, 2, 3, 4, 5]
print(*[x for x in my_list if x > 2], sep="\n")
Output:
>>> python3 test.py
3
4
5
NOTE:
You can print the generator object as well, like this:
Code:
my_list = [1, 2, 3, 4, 5]
print(x for x in my_list)
Output:
>>> python3 test.py
<generator object <genexpr> at 0x7f614ded7d58>

printing items in a list represented by bit list

I have this problem on writing a python function which takes a bit list as input and prints the items represented by this bit list.
so the question is on Knapsack and it is a relatively simple and straightforward one as I'm new to the python language too.
so technically the items can be named in a list [1,2,3,4] which corresponds to Type 1, Type 2, Type 3 and etc but we won't be needing the "type". the problem is, i represented the solution in a bit list [0,1,1,1] where 0 means not taken and 1 means taken. in another words, item of type 1 is not taken but the rest are taken, as represented in the bit list i wrote.
now we are required to write a python function which takes the bit list as input and prints the item corresponding to it in which in this case i need the function to print out [2,3,4] leaving out the 1 since it is 0 by bit list. any help on this? it is a 2 mark question but i still couldn't figure it out.
def printItems(l):
for x in range(len(l)):
if x == 0:
return False
elif x == 1:
return l
i tried something like that but it is wrong. much appreciated for any help.
You can do this with the zip function that takes two tiers Lee and returns them in pairs:
for bit_item, item in zip(bit_list, item_list):
if bit_item:
print item
Or if you need a list rather than printing them, you can use a list comprehension:
[item for bit_item, item in zip(bit_list, item_list) if bit_item]
You can use itertools.compress for a quick solution:
>>> import itertools
>>> list(itertools.compress(itertools.count(1), [0, 1, 1, 1]))
[2, 3, 4]
The reason your solution doesn't work is because you are using return in your function, where you need to use print, and make sure you are iterating over your list correctly. In this case, enumerate simplifies things, but there are many similar approaches that would work:
>>> def print_items(l):
... for i,b in enumerate(l,1):
... if b:
... print(i)
...
>>> print_items([0,1,1,1])
2
3
4
>>>
You may do it using list comprehension with enumerate() as:
>>> my_list = [0, 1, 1, 1]
>>> taken_list = [i for i, item in enumerate(my_list, 1) if item]
>>> taken_list # by default start with 0 ^
[2, 3, 4]
Alternatively, in case you do not need any in-built function and want to create your own function, you may modify your code as:
def printItems(l):
new_list = []
for x in range(len(l)):
if l[x] == 1:
new_list.append(x+1) # "x+1" because index starts with `0` and you need position
return new_list
Sample run:
>>> printItems([0, 1, 1, 1])
[2, 3, 4]

Python: tuple with mutable items [duplicate]

This question already has answers here:
a mutable type inside an immutable container
(3 answers)
Closed 6 years ago.
So I have this code:
tup = ([1,2,3],[7,8,9])
tup[0] += (4,5,6)
which generates this error:
TypeError: 'tuple' object does not support item assignment
While this code:
tup = ([1,2,3],[7,8,9])
try:
tup[0] += (4,5,6)
except TypeError:
print tup
prints this:
([1, 2, 3, 4, 5, 6], [7, 8, 9])
Is this behavior expected?
Note
I realize this is not a very common use case. However, while the error is expected, I did not expect the list change.
Yes it's expected.
A tuple cannot be changed. A tuple, like a list, is a structure that points to other objects. It doesn't care about what those objects are. They could be strings, numbers, tuples, lists, or other objects.
So doing anything to one of the objects contained in the tuple, including appending to that object if it's a list, isn't relevant to the semantics of the tuple.
(Imagine if you wrote a class that had methods on it that cause its internal state to change. You wouldn't expect it to be impossible to call those methods on an object based on where it's stored).
Or another example:
>>> l1 = [1, 2, 3]
>>> l2 = [4, 5, 6]
>>> t = (l1, l2)
>>> l3 = [l1, l2]
>>> l3[1].append(7)
Two mutable lists referenced by a list and by a tuple. Should I be able to do the last line (answer: yes). If you think the answer's no, why not? Should t change the semantics of l3 (answer: no).
If you want an immutable object of sequential structures, it should be tuples all the way down.
Why does it error?
This example uses the infix operator:
Many operations have an “in-place” version. The following functions
provide a more primitive access to in-place operators than the usual
syntax does; for example, the statement x += y is equivalent to x =
operator.iadd(x, y). Another way to put it is to say that z =
operator.iadd(x, y) is equivalent to the compound statement z = x; z
+= y.
https://docs.python.org/2/library/operator.html
So this:
l = [1, 2, 3]
tup = (l,)
tup[0] += (4,5,6)
is equivalent to this:
l = [1, 2, 3]
tup = (l,)
x = tup[0]
x = x.__iadd__([4, 5, 6]) # like extend, but returns x instead of None
tup[0] = x
The __iadd__ line succeeds, and modifies the first list. So the list has been changed. The __iadd__ call returns the mutated list.
The second line tries to assign the list back to the tuple, and this fails.
So, at the end of the program, the list has been extended but the second part of the += operation failed. For the specifics, see this question.
Well I guess tup[0] += (4, 5, 6) is translated to:
tup[0] = tup[0].__iadd__((4,5,6))
tup[0].__iadd__((4,5,6)) is executed normally changing the list in the first element. But the assignment fails since tuples are immutables.
Tuples cannot be changed directly, correct. Yet, you may change a tuple's element by reference. Like:
>>> tup = ([1,2,3],[7,8,9])
>>> l = tup[0]
>>> l += (4,5,6)
>>> tup
([1, 2, 3, 4, 5, 6], [7, 8, 9])
The Python developers wrote an official explanation about why it happens here: https://docs.python.org/2/faq/programming.html#why-does-a-tuple-i-item-raise-an-exception-when-the-addition-works
The short version is that += actually does two things, one right after the other:
Run the thing on the right.
assign the result to the variable on the left
In this case, step 1 works because you’re allowed to add stuff to lists (they’re mutable), but step 2 fails because you can’t put stuff into tuples after creating them (tuples are immutable).
In a real program, I would suggest you don't do a try-except clause, because tup[0].extend([4,5,6]) does the exact same thing.

Categories