Why is next() sometimes called implicitely on generator? - python

I am just learning generators in Python. It seems that if you assign generator to a tuple then next() is called silently behind the scenes - as if unpacking was forcing that call. But if you assign to tuple with single value then you get generator object itself. Do I get it right?
Trivial code follows:
def generator(n):
x = 0
while x < n:
yield x
x = x + 1
(x,*foo) = generator(1)
print(x, foo)
(x,*foo) = generator(3)
print(x, foo)
(x) = generator(1)
print(x)
Output is:
0 []
0 [1, 2]
<generator object generator at 0x05F06900>

The syntax
(x) = generator(1)
is not a tuple of one item. You want:
(x,) = generator(1)
or
x, = generator(1)
Then you'll find the generator is called, just as in your other examples, due to "unpacking".
Note that in the expression (x, y) it is not the () that make it a tuple, it is the comma. The parentheses simply bound the expression x, y.

The first two lines use tuple packing/unpacking: they yield all values from the generator and then set them to x and foo. This is because (x, foo) is syntax for a tuple (a pair of items).
However, (x) is syntax for a variable. You would get what you expected if you write (x,), which is how tuples of size 1 are created.

Related

What is the difference between return x,y and return (x,y) in a function?

I am watching Introduction to Computer Science and Programming in Python of MIT Open Course Ware. In Lecture 5 which introduces Tuples, the Professor says that you are only allowed to return one object in functions. And then she says, tuples are handy for returning more than one object values.(For anyone who want to watch this short part of the lecture which starts at 06:15 min). The question is regarding the MIT Lecture:
Yes, we can use Tuples for returning and collecting a couple of values. On the other hand, a function in Python already can return more than one value by separating them by commas. As a result, I am confused with what the lecturer said. Is returning 2 or more values by separating with commas mean that these objects are becoming an item of a tuple although it is not declared explicitly? What am I missing? In other words, what is the difference between separating with comma as x,y and in parenthesis as (x,y). You can provide explanation by using the function below. Thanks.
def smth(x,y):
x = y % 2
y = x % 5
return x, y
Yes, they are becoming tuple.
If you run the following code you will find that the type is tuple
def smth(x,y):
x = y % 2
y = x % 5
return x, y
x=smth(10,20)
print(x)
print(type(x))
Your return statement return x, y is actually just a shortcut for return (x, y).
The python interpreter interprets those equally.
>>> def smth(x,y):
... x = y%2
... y = x%2
... return x,y
>>> smth(1,2)
(0, 0)
>>> type(smth(1,2))
<class 'tuple'>
The automatic unpacking of arguments is done in the same way:
a,b = smth(1,2) and (a, b) = smth(1,2) is equivalent code.
However, the first one is much more readable.
A concrete explanation why those two statements are equal can be found in this answer to a similar question.
The function you wrote is, in fact, returning a tuple. Parentheses are only required for the empty tuple. See this spot in the docs:
6.2.3. Parenthesized forms
If you define a function like you mentioned, with
return x, y
or
return (x, y)
The python interpreter will recognize them the same way
type(smth(21,11))
will be
<class 'tuple'>
By separating them with a comma, a tuple is created. However you can't create an empty tuple this way.. for that you need to use parenthesis.
Note that when you print a tuple, python will output it surrounded by parens whether you created it using parens or not:
>>> mytuple = (1, 2)
>>> print(mytuple)
(1, 2)
>>> mytuple = ()
>>> print(mytuple)
()
>>> mytuple = 1,2
>>> print(mytuple)
(1, 2)
>>> mytuple = 1,
>>> print(mytuple)
(1,)
For this reason, personally I prefer to use parens in most places, it's just more obvious and consistent. But, others may disagree. In any case, it's not always required.
So, in your code, whether you use:
return x, y
or
return (x, y)
.. the result is that a tuple with two elements will be returned.
When calling your function, you can unpack the tuple or keep it as-is:
With unpacking:
x_value, y_value = smth(x, y)
Without unpacking:
xy_tuple = smth(x, y)
x_value = xy_tuple[0]
y_value = xy_tuple[1]

Why does the UnboundLocalError occur on the second variable of the flat comprehension?

I answered a question here: comprehension list in python2 works fine but i get an error in python3
OP's error was using the same variables for max range and indices:
x = 12
y = 10
z = 12
n = 100
ret_list = [ (x,y,z) for x in range(x+1) for y in range(y+1) for z in range(z+1) if x+y+z!=n ]
This is a Python-3 error only, and related to the scopes that were added to the comprehension to avoid the variables defined here "leaking". Changing the variable names fixes that.
The error is:
UnboundLocalError: local variable 'y' referenced before assignment
because outer, global y is shadowed by the local scope.
My question is: why do I get the error on y and not on z or x ?
EDIT: If I remove the loop on x, the error moves to z:
>> ret_list = [ (x,y,z) for y in range(y+1) for z in range(z+1) if x+y+z!=n ]
UnboundLocalError: local variable 'z' referenced before assignment
If I just do one loop:
ret_list = [ (x,y,z) for y in range(y+1) if x+y+z!=n ]
it works. So I'm suspecting that the first range function is evaluated before all the other expressions, which leaves the value of x intact. But the exact reason is still to be found. Using Python 3.4.3.
This behaviour is (implicitly) described in the reference documentation (emphasis mine).
However, aside from the iterable expression in the leftmost for clause, the comprehension is executed in a separate implicitly nested scope. This ensures that names assigned to in the target list don’t “leak” into the enclosing scope.
The iterable expression in the leftmost for clause is evaluated directly in the enclosing scope and then passed as an argument to the implictly [sic] nested scope. Subsequent for clauses and any filter condition in the leftmost for clause cannot be evaluated in the enclosing scope as they may depend on the values obtained from the leftmost iterable. For example: [x*y for x in range(10) for y in range(x, x+10)].
This means that:
list_ = [(x, y) for x in range(x) for y in range(y)]
equivalent to:
def f(iter_):
for x in iter_:
for y in range(y):
yield x, y
list_ = list(f(iter(range(x))))
As the name x in for the leftmost iterable is read in the enclosing scope as opposed to the nested scope then there is no name conflict between these two uses of x. The same is not true for y, which is why it is where the UnboundLocalError occurs.
As to why this happens: a list comprehension is more-or-less syntactic sugar for list(<generator expression>), so it's going to be using the same code path as a generator expression (or at least behave in the same way). Generator expressions evaluate the iterable expression in the leftmost for clause to make error handling when the generator expression somewhat saner. Consider the following code:
y = None # line 1
gen = (x + 1 for x in range(y + 1)) # line 2
item = next(gen) # line 3
y is clearly the wrong type and so the addition will raise a TypeError. By evaluating range(y + 1) immediately that type error is raised on line 2 rather than line 3. Thus, it is easier to diagnose where and why the problem occurred. Had it occurred on line 3 then you might mistakenly assume that it was the x + 1 statement that caused the error.
There is a bug report here that mentions this behaviour. It was resolved as "not a bug" for reason that it is desirable that list comprehensions and generator expressions have the same behaviour.

Using *args in a for loop doesn't take into account changing variables

I have long loops for deduping names, and I was hoping to simplify things by using a function rather than repeating all the carry-on effects of a match each time. I'm starting with the following test case, but it doesn't work like I expect:
x, y = 0, 0
def testFunc(*args):
global x, y
for arg in args:
if arg:
x +=1
else:
y +=1
return (x, y)
When I run it:
>>>testFunc(x==y,x==y,x==y)
(3,0)
>>>testFunc(x==y)
(3,1)
>>>testFunc(x!=y,x!=y,x!=y)
(3,4)
Basically, the arguments seem to be transformed into boolean before any operation happens. Is there a way to avoid that? I would have expected:
>>>testFunc(x==y,x==y,x==y)
(2,1)
>>>testFunc(x==y)
(2,2)
>>>testFunc(x!=y,x!=y,x!=y)
(4,3)
Basically, the arguments seem to be transformed into boolean before any operation happens.
Python first evaluates the arguments before calling the function, so this is the expected behaviour. The equality of two ints is a boolean. So it first evaluates x == y three times, each time yielding the same result. Next it calls the function with testFunc(True, True, True).
Is there a way to avoid that?
You can make it a callable, and thus postpone evaluation, with:
def testFunc(*args):
global x, y
for arg in args:
if arg(): # we call arg
x +=1
else:
y +=1
return (x, y)
and then calling it with:
>>> eq = lambda: x == y
>>> neq = lambda: x != y
>>> testFunc(eq, eq, eq)
(2, 1)
>>> testFunc(eq)
(2, 2)
>>> testFunc(neq, neq, neq)
(3, 4)
Here we thus do not pass the result of x == y, we pass a function that, when called, calculates the result of x == y. As a result the value is calculates for x and y at the moment when the call is made.
The statement x==y tests for equality, and will only yield a boolean. To assign your variable, just use one = sign.
For example:
1==3 yields False, therefore x==y will do the same unless x and y are explicitly the same
testFunc(x=y) should solve your problem. Furthermore, *args implies a list of arguments:
def myFunc(*args):
for arg in args:
print(arg)
Will take myFunc(*[1,2,3,4,5,6,7]) and print each member of that list. Note, you will need the *<list> syntax, otherwise myFunc(<list>) will have the list be the first and only argument. The * unpacks the values from the list

int object is not iterable python

I am trying to learn python reduce function.
This is some code that does not make sense to me
>>> x = [1,2,3]
>>> reduce(sum, [b for b in x if b > 0])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable
>>> reduce(sum, [b for b in x if b > 2])
3
Why does it work when b > 2 but not b > 0
The code seems almost exactly the same
In the second case the list you created with list comprehension has only one element, so it is not reduced. This reduction doesn't fail, because there is nothing to be reduced there, so function sum isn't executed and it doesn't fail. The result is the only integer, that has been there before.
The function sum is defined in the documentation. As you can see, it is supposed to sum values in some iterable container, such as list or tuple. You could write sum([1,2,3]) and it would work just like this reduction, that you want to make. If you want to achieve this result with using the reduction function, you have to make sure, that you have a function, that takes two integers and returns their sum.
Here you have such function:
def sum2(x, y):
return x + y
Reduce with such function should give you expected results.
sum2 = lambda x, y: x+y
The following function does the same, but is shorter and nicer due to lambda notation, it might be nice to look at it sometime in the future, if you don't know it yet.
From the docs about reduce:
Apply function of two arguments cumulatively to the items of iterable,
from left to right, so as to reduce the iterable to a single value
So in the first case, you have used reduce wrongly by using the builtin sum which does not take two arguments but an iterable. sum consumes the iterable in the first iteration of reduce and returns an int which is not further reduceable.
In the second case, sum is not even called on the iterable:
If initializer is not given and iterable contains only one item, the
first item is returned.
Since iterable is of length 1, reduce returns the one item in the iterable and sum does not nothing.
Input in second code snippet is single element list - it's a degenerated case where callable isn't even called.
Consider following example:
def my_sum(*args):
print "my_sum called with", args
return sum(*args)
x = [1, 2, 3]
reduce(my_sum, [b for b in x if b > 2])
# nothing gets printed, value 3 is returned, since it's only element in list
Now consider your failed case:
reduce(my_sum, [b for b in x if b > 0])
Output is:
my_sum called with (1, 2)
[exception traceback]
Equivalent call to builtin sum is sum(1, 2), which results in same exception as quoted by you.
sum signature does not follow rules for reduce function. sum(iterable[, start]) takes two arguments (as expected), but first of them has to be iterable, where second is optional initial value.
reduce requires function to take two arguments, where (quoting docs):
The left argument, x, is the accumulated value and the right argument,
y, is the update value from the iterable.
We can clearly see that these interfaces are not conform with each other. Correct callable will be something like:
def reduce_sum(accumulator, new_value):
return accumulator + new_value
reduce(reduce_sum, [b for b in x if b > 0])
As Moses points out "sum consumes the iterable in the first iteration"
so let's play with it
>>> sum(1,2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable
>>> sum([1,2])
3
>>> sum(x)
6
>>> sum([b for b in x if b >0])
6
>>> reduce(lambda x, y: x+y, x)
6
>>> reduce(lambda x, y: x+y, [b for b in x if b > 0])
6
>>> reduce(lambda x, y: x+y, [b for b in x if b > 1])
5
The reduce function takes in a function and an iterable.
It then maps each element of that iterable to the function.
What your first b > 0 function is doing is trying to run sum with 2 arguments. One argument being the accumulated total, and the other being an element of the input list, for example it would look like: sum(0, 1).
This is stated in the documentation as:
Apply function of two arguments cumulatively to the items of iterable, from left to right, so as to reduce the iterable to a single value. ... The left argument, is the accumulated value and the right argument, is the update value from the iterable.
What your second b > 2 function is doing is simply running the reduce function with an iterator containing one item, which does not attempt to apply sum as stated in the documentation.
If initializer is not given and iterable contains only one item, the first item is returned.
But since sum takes in an iterable, you could just do:
sum([b for b in x if b > 0])

Lazy evaluation in Python

What is lazy evaluation in Python?
One website said :
In Python 3.x the range() function returns a special range object which computes elements of the list on demand (lazy or deferred evaluation):
>>> r = range(10)
>>> print(r)
range(0, 10)
>>> print(r[3])
3
What is meant by this?
The object returned by range() (or xrange() in Python2.x) is known as a lazy iterable.
Instead of storing the entire range, [0,1,2,..,9], in memory, the generator stores a definition for (i=0; i<10; i+=1) and computes the next value only when needed (AKA lazy-evaluation).
Essentially, a generator allows you to return a list like structure, but here are some differences:
A list stores all elements when it is created. A generator generates the next element when it is needed.
A list can be iterated over as much as you need, a generator can only be iterated over exactly once.
A list can get elements by index, a generator cannot -- it only generates values once, from start to end.
A generator can be created in two ways:
(1) Very similar to a list comprehension:
# this is a list, create all 5000000 x/2 values immediately, uses []
lis = [x/2 for x in range(5000000)]
# this is a generator, creates each x/2 value only when it is needed, uses ()
gen = (x/2 for x in range(5000000))
(2) As a function, using yield to return the next value:
# this is also a generator, it will run until a yield occurs, and return that result.
# on the next call it picks up where it left off and continues until a yield occurs...
def divby2(n):
num = 0
while num < n:
yield num/2
num += 1
# same as (x/2 for x in range(5000000))
print divby2(5000000)
Note: Even though range(5000000) is a generator in Python3.x, [x/2 for x in range(5000000)] is still a list. range(...) does it's job and generates x one at a time, but the entire list of x/2 values will be computed when this list is create.
In a nutshell, lazy evaluation means that the object is evaluated when it is needed, not when it is created.
In Python 2, range will return a list - this means that if you give it a large number, it will calculate the range and return at the time of creation:
>>> i = range(100)
>>> type(i)
<type 'list'>
In Python 3, however you get a special range object:
>>> i = range(100)
>>> type(i)
<class 'range'>
Only when you consume it, will it actually be evaluated - in other words, it will only return the numbers in the range when you actually need them.
A github repo named python patterns and wikipedia tell us what lazy evaluation is.
Delays the eval of an expr until its value is needed and avoids repeated evals.
range in python3 is not a complete lazy evaluation, because it doesn't avoid repeated eval.
A more classic example for lazy evaluation is cached_property:
import functools
class cached_property(object):
def __init__(self, function):
self.function = function
functools.update_wrapper(self, function)
def __get__(self, obj, type_):
if obj is None:
return self
val = self.function(obj)
obj.__dict__[self.function.__name__] = val
return val
The cached_property(a.k.a lazy_property) is a decorator which convert a func into a lazy evaluation property. The first time property accessed, the func is called to get result and then the value is used the next time you access the property.
eg:
class LogHandler:
def __init__(self, file_path):
self.file_path = file_path
#cached_property
def load_log_file(self):
with open(self.file_path) as f:
# the file is to big that I have to cost 2s to read all file
return f.read()
log_handler = LogHandler('./sys.log')
# only the first time call will cost 2s.
print(log_handler.load_log_file)
# return value is cached to the log_handler obj.
print(log_handler.load_log_file)
To use a proper word, a python generator object like range are more like designed through call_by_need pattern, rather than lazy evaluation

Categories