Yield only as many are required from a generator

Yield only as many are required from a generator - python

I wish to yield from a generator only as many items are required.
In the following code
a, b, c = itertools.count()
I receive this exception:
ValueError: too many values to unpack
I've seen several related questions, however I have zero interest in the remaining items from the generator, I only wish to receive as many as I ask for, without providing that quantity in advance.
It seems to me that Python determines the number of items you want, but then proceeds to try to read and store more than that number (generating ValueError).
How can I yield only as many items as I require, without passing in how many items I want?
Update0
To aid in understanding the approximate behaviour I believe should be possible, here's a code sample:
def unpack_seq(ctx, items, seq):
for name in items:
setattr(ctx, name, seq.next())
import itertools, sys
unpack_seq(sys.modules[__name__], ["a", "b", "c"], itertools.count())
print a, b, c
If you can improve this code please do.
Alex Martelli's answer suggests to me the byte op UNPACK_SEQUENCE is responsible for the limitation. I don't see why this operation should require that generated sequences must also exactly match in length.
Note that Python 3 has different unpack syntaxes which probably invalidate technical discussion in this question.

You need a deep bytecode hack - what you request cannot be done at Python source code level, but (if you're willing to commit to a specific version and release of Python) may be doable by post-processing the bytecode after Python has compiled it. Consider, e.g.:
>>> def f(someit):
... a, b, c = someit()
...
>>> dis.dis(f)
2 0 LOAD_FAST 0 (someit)
3 CALL_FUNCTION 0
6 UNPACK_SEQUENCE 3
9 STORE_FAST 1 (a)
12 STORE_FAST 2 (b)
15 STORE_FAST 3 (c)
18 LOAD_CONST 0 (None)
21 RETURN_VALUE
>>>
As you see, the UNPACK_SEQUENCE 3 bytecode occurs without the iterator someit being given the least indication of it (the iterator has already been called!) -- so you must prefix it in the bytecode with a "get exactly N bytes" operation, e.g.:
>>> def g(someit):
... a, b, c = limit(someit(), 3)
...
>>> dis.dis(g)
2 0 LOAD_GLOBAL 0 (limit)
3 LOAD_FAST 0 (someit)
6 CALL_FUNCTION 0
9 LOAD_CONST 1 (3)
12 CALL_FUNCTION 2
15 UNPACK_SEQUENCE 3
18 STORE_FAST 1 (a)
21 STORE_FAST 2 (b)
24 STORE_FAST 3 (c)
27 LOAD_CONST 0 (None)
30 RETURN_VALUE
where limit is your own function, easily implemented (e.g via itertools.slice). So, the original 2-bytecode "load fast; call function" sequence (just before the unpack-sequence bytecode) must become this kind of 5-bytecode sequence, with a load-global bytecode for limit before the original sequence, and the load-const; call function sequence after it.
You can of course implement this bytecode munging in a decorator.
Alternatively (for generality) you could work on altering the function's original source, e.g. via parsing and alteration of the AST, and recompiling to byte code (but that does require the source to be available at decoration time, of course).
Is this worth it for production use? Absolutely not, of course -- ridiculous amount of work for a tiny "syntax sugar" improvement. However, it can be an instructive project to pursue to gain mastery in bytecode hacking, ast hacking, and other black magic tricks that you will probably never need but are surely cool to know when you want to move beyond the language of mere language wizard to that of world-class guru -- indeed I suspect that those who are motivated to become gurus are typically people who can't fail to yield to the fascination of such "language internals' mechanics"... and those who actually make it to that exalted level are the subset wise enough to realize such endeavors are "just play", and pursue them as a spare-time activity (mental equivalent of weight lifting, much like, say, sudoku or crosswords;-) without letting them interfere with the important tasks (delivering value to users by deploying solid, clear, simple, well-performing, well-tested, well-documented code, most often without even the slightest hint of black magic to it;-).

You need to make sure that the number of items on both sides matches. One way is to use islice from the itertools module:
from itertools import islice
a, b, c = islice(itertools.count(), 3)

Python does not work the way you desire. In any assignment, like
a, b, c = itertools.count()
the right-hand side is evaluated first, before the left-hand side.
The right-hand side can not know how many items are on the left-hand side unless you tell it.

Or use a list comprehension, since you know the desired number of items:
ic = itertools.count()
a,b,c = [ic.next() for i in range(3)]
or even simpler:
a,b,c = range(3)

Related

In python on x86-64, are the first 6 arguments typically passed on registers?

To the best of my knowledge, in low level languages such as C, it is generally advisable to keep the number of arguments to functions to 6 or lower, since then there is no need to pass arguments on the stack (i.e there are enough registers), and sometimes no need to even create a stack frame for a function.
Does this logic still apply in Python? Or is there any transformation done on function arguments, when the interpreter is called, that makes this point irrelevant/moot?
I'm well aware that realistically, the performance gains are negligible if they exist at all (and for this type of optimization, it's best to just switch to cython, or something else altogether) but I would like to understand Python better.
On Python 3.8.10 (on an x86-64 machine, Ubuntu 20.04), I tried using dis.dis() to look at the bytecode disassembly of some minimal example:
import random
def foo(a, b, c, d, e, f, g):
return a+b+c+d+e+f+g
a = random.randint(0, 10)
b = random.randint(0, 10)
c = random.randint(0, 10)
d = random.randint(0, 10)
e = random.randint(0, 10)
f = random.randint(0, 10)
g = random.randint(0, 10)
foo(a, b, c, d, e, f, g)
(using random just to make sure there aren't any optimisation shenanigans).
This resulted in this bytecode for the last line of the code (trimmed for brevity):
...
12 100 LOAD_NAME 1 (foo)
102 LOAD_NAME 3 (a)
104 LOAD_NAME 4 (b)
106 LOAD_NAME 5 (c)
108 LOAD_NAME 6 (d)
110 LOAD_NAME 7 (e)
112 LOAD_NAME 8 (f)
114 LOAD_NAME 9 (g)
116 CALL_FUNCTION 7
118 POP_TOP
120 LOAD_CONST 1 (None)
122 RETURN_VALUE
...
However I'm not familiar with the bytecode, specifically LOAD_NAME, if there is any internal logic to separate loading into registers from loading onto the stack.

No.
Not even close.
Not really
Python code level is a high abstraction and very, very dettached from the actual underlying architecture.
Arguments for function calls will be collected, each in a couple Python bytecode instructions - and each bytecode will execute at least tens, but typically hundreds of lines of code-equivalent c-level instructions.
Moreover, most calls will even build a temporary tuple object which will be de-structured again (though there are likely optimizations in place to avoid that in pure python-to-python calls nowadays).
That said, even when coding C that level of parent optimization is nonsense: a shallow stack would likely use l1 CPU cache and make no difference to full in-register parameters on a modern, desktop/notebook class CPU.

How does Python's re-declaration of variables work internally?

I'm fairly new to Python so this might seem like a trivial question to some. But I'm curious about how Python works internally when you bind a new object to a variable, referring to the previous object bound to the same variable name. Please see the code below as an example - I understand that python breaks the bond with the original object 'hello', bind it to the new object, but what is the sequence of events here? how does python break the bond with the original object but also refer to it?
greeting = 'hello'
greeting = f'y{greeting[1:len(greeting)]}'
In addition to the explanation, I would also very much appreciate some contexts. I understand that strings are immutable but what about other types like floats and integers?
Does it matter whether I understand how python operates internally? Also, where would be a good place to learn more about how Python works internally if it does?
Hope I'm being clear with my questions.

An explanation through the medium of the disassembly:
>>> dis.dis('''greeting = 'hello'
... greeting = f'y{greeting[1:len(greeting)]}'
... ''')
1 0 LOAD_CONST 0 ('hello')
2 STORE_NAME 0 (greeting)
2 4 LOAD_CONST 1 ('y')
6 LOAD_NAME 0 (greeting)
8 LOAD_CONST 2 (1)
10 LOAD_NAME 1 (len)
12 LOAD_NAME 0 (greeting)
14 CALL_FUNCTION 1
16 BUILD_SLICE 2
18 BINARY_SUBSCR
20 FORMAT_VALUE 0
22 BUILD_STRING 2
24 STORE_NAME 0 (greeting)
26 LOAD_CONST 3 (None)
28 RETURN_VALUE
The number on the far left indicates where the bytecode for a particular line begins. Line 1 is pretty self-explanatory, so I'll explain line 2.
As you might notice, your f-string doesn't survive compilation; it becomes a bunch of raw opcodes mixing the loading of constant segments with the evaluation of formatting placeholders, eventually leading to the stack being topped by all the fragments that will make up the final string. When they're all on the stack, it then puts all the fragments together at the end with BUILD_STRING 2 (which says "Take the top two values off the stack and combine them into a single string").
greeting is just a name holding a binding. It doesn't actually hold a value, just a reference to whatever object it's currently bound to. And the original reference is pushed onto the stack (with LOAD_NAME) entirely before the STORE_NAME that pops the top of the stack and rebinds greeting.
In short, the reason it works is that the value of greeting is no longer needed by the time it's replaced; it's used to make the new string, then discarded in favor of the new string.

In your second line, Python evaluates the right side of the assignment statement, which creates a string that uses the old binding for greeting. Only after evaluating that expression does it handle the assignment operator, which binds that string to the name. It's all very linear.
Floats and integers are also immutable. Only lists and dictionaries are mutable. Actually, it's not clear how you would modify an integer object in any case. You can't refer to the inside of the object. It's important to remember that in this case:
i = 3
j = 4
i = i + j
the last line just creates a new integer/float object and binds it to i. None of this attempts to modify the integer object 3.
I wrote this article that tries to delineate the difference between Python objects and the names we use:
https://github.com/timrprobocom/documents/blob/main/UnderstandingPythonObjects.md

Nested if statements or And

Is a nested statement or the use of and in an if statement would be more efficient?
Basically is:
if condition1:
if condition2:
#do stuff
or
if condition1 and condition2:
#do stuff
more efficient or are they similar on performance and for the sake of readability which should I choose to use

Short answer: no
dis disassemblers python module to bytecode that being executed by interpreter. For both functions number of executed operations is the same (10)
But generally preferred to use and clause because of better code readability and less nesting
Code:
>>> def test_inner_if(condition1=True, condition2=True):
... if condition1:
... if condition2:
... pass
...
>>> def test_and_if(condition1=True, condition2=True):
... if condition1 and condition2:
... pass
...
>>> import dis
>>> dis.dis(test_inner_if)
2 0 LOAD_FAST 0 (condition1)
2 POP_JUMP_IF_FALSE 8
3 4 LOAD_FAST 1 (condition2)
6 POP_JUMP_IF_FALSE 8
4 >> 8 LOAD_CONST 0 (None)
10 RETURN_VALUE
>>> dis.dis(test_and_if)
2 0 LOAD_FAST 0 (condition1)
2 POP_JUMP_IF_FALSE 8
4 LOAD_FAST 1 (condition2)
6 POP_JUMP_IF_FALSE 8
3 >> 8 LOAD_CONST 0 (None)
10 RETURN_VALUE

2nd one is generally preferred. In terms of run time they wouldn't make really any noticeable difference, but in terms of readability of code (good code is easily readable) the second one is preferred.

Just go for readability
If you're trying to do something easy like x>3 and x<30 just use on if.
If your conditions are function calls probably use multiple ifs as it will be easier to read and easier to debug

The best piece of advice I have gotten in terms of optimization is,
"find the slowest part of your code and optimize the heck out of that"
The context of the if statements matter, so for example in one case they can be deep inside 3 or more for loops, then going for good optimization (and comment you logic) would be good. In another case though, you may determining whether to throw an error at the beginning of a function. In that case readability is critical.
Secondly, how you optimize is important too. The interpreter way see both way as equivalent which means that it is best to go for readability. One easy way to find out is to use this
import time
s = time.clock()
#your code here
print(time.clock() - s) #shows how long the code segment took to run
This can be an interesting experiment whenever you have an optimization question.

Python Dictionary vs If Statement Speed

I have found a few links talking about switch cases being faster in c++ than if else because it can be optimized in compilation. I then found some suggestions people had that using a dictionary may be faster than an If statement. However, most of the conversation are about someones work end just end up discussing that they should optimize other parts of the code first and it wont matter unless your doing millions of if else. Can anyone explain why this is?
Say I have 100 unique numbers that are going to be streamed in to a python code constantly. I want to check which number it is, then execute something. So i could either do a ton of if else, or i could put each number in a dictionary. For arguments sake, lets say its a single thread.
Does someone understand the layer between python and the low level execution that can explain how this is working?
Thanks :)

However, most of the conversation are about someones work end just end
up discussing that they should optimize other parts of the code first
and it wont matter unless your doing millions of if else. Can anyone
explain why this is?
Generally, you should only bother to optimize code if you really need to, i.e. if the program's performance is unusably slow.
If this is the case, you should use a profiler to determine which parts are actually causing the most problems. For Python, the cProfile module is pretty good for this.
Does someone understand the layer between python and the low level
execution that can explain how this is working?
If you want to get an idea of how your code executes, take a look at the dis module.
A quick example...
import dis
# Here are the things we might want to do
def do_something_a():
print 'I did a'
def do_something_b():
print 'I did b'
def do_something_c():
print 'I did c'
# Case 1
def f1(x):
if x == 1:
do_something_a()
elif x == 2:
do_something_b()
elif x == 3:
do_something_c()
# Case 2
FUNC_MAP = {1: do_something_a, 2: do_something_b, 3: do_something_c}
def f2(x):
FUNC_MAP[x]()
# Show how the functions execute
print 'Case 1'
dis.dis(f1)
print '\n\nCase 2'
dis.dis(f2)
...which outputs...
Case 1
18 0 LOAD_FAST 0 (x)
3 LOAD_CONST 1 (1)
6 COMPARE_OP 2 (==)
9 POP_JUMP_IF_FALSE 22
19 12 LOAD_GLOBAL 0 (do_something_a)
15 CALL_FUNCTION 0
18 POP_TOP
19 JUMP_FORWARD 44 (to 66)
20 >> 22 LOAD_FAST 0 (x)
25 LOAD_CONST 2 (2)
28 COMPARE_OP 2 (==)
31 POP_JUMP_IF_FALSE 44
21 34 LOAD_GLOBAL 1 (do_something_b)
37 CALL_FUNCTION 0
40 POP_TOP
41 JUMP_FORWARD 22 (to 66)
22 >> 44 LOAD_FAST 0 (x)
47 LOAD_CONST 3 (3)
50 COMPARE_OP 2 (==)
53 POP_JUMP_IF_FALSE 66
23 56 LOAD_GLOBAL 2 (do_something_c)
59 CALL_FUNCTION 0
62 POP_TOP
63 JUMP_FORWARD 0 (to 66)
>> 66 LOAD_CONST 0 (None)
69 RETURN_VALUE
Case 2
29 0 LOAD_GLOBAL 0 (FUNC_MAP)
3 LOAD_FAST 0 (x)
6 BINARY_SUBSCR
7 CALL_FUNCTION 0
10 POP_TOP
11 LOAD_CONST 0 (None)
14 RETURN_VALUE
...so it's pretty easy to see which function has to execute the most instructions.
As for which is actually faster, that's something you'd have to check by profiling the code.

The if/elif/else structure compares the key it was given to a sequence of possible values one by one until it finds a match in the condition of some if statement, then reads what it is supposed to execute from inside the if block. This can take a long time, because so many checks (n/2 on average, for n possible values) have to be made for every lookup.
The reason that a sequence of if statements is more difficult to optimize than a switch statement is that the condition checks (what's inside the parens in C++) might conceivably change the state of some variable that's involved in the next check, so you have to do them in order. The restrictions on switch statements remove that possibility, so the order doesn't matter (I think).
Python dictionaries are implemented as hash tables. The idea is this: if you could deal with arbitrarily large numbers and had infinite RAM, you could create a huge array of function pointers that is indexed just by casting whatever your lookup value is to an integer and using that as the index. Lookup would be virtually instantaneous.
You can't do that, of course, but you can create an array of some manageable length, pass the lookup value to a hash function (which generates some integer, depending on the lookup value), then % your result with the length of your array to get an index within the bounds of that array. That way, lookup takes as much time as is needed to call the hash function once, take the modulus, and jump to an index. If the amount of different possible lookup values is large enough, the overhead of the hash function becomes negligible compared to those n/2 condition checks.
(Actually, since many different lookup values will inevitably map to the same index, it's not quite that simple. You have to check for and resolve possible conflicts, which can be done in a number of ways. Still, the gist of it is as described above.)

What's the difference between "while 1" and "while True"?

I've seen two ways to create an infinite loop in Python:
while 1:
do_something()
while True:
do_something()
Is there any difference between these? Is one more pythonic than the other?

Fundamentally it doesn't matter, such minutiae doesn't really affect whether something is 'pythonic' or not.
If you're interested in trivia however, there are some differences.
The builtin boolean type didn't exist till Python 2.3 so code that was intended to run on ancient versions tends to use the while 1: form. You'll see it in the standard library, for instance.
The True and False builtins are not reserved words prior to Python 3 so could be assigned to, changing their value. This helps with the case above because code could do True = 1 for backwards compatibility, but means that the name True needs to be looked up in the globals dictionary every time it is used.
Because of the above restriction, the bytecode the two versions compile to is different in Python 2 as there's an optimisation for constant integers that it can't use for True. Because Python can tell when compiling the 1 that it's always non-zero, it removes the conditional jump and doesn't load the constant at all:
>>> import dis
>>> def while_1():
... while 1:
... pass
...
>>> def while_true():
... while True:
... pass
...
>>> dis.dis(while_1)
2 0 SETUP_LOOP 5 (to 8)
3 >> 3 JUMP_ABSOLUTE 3
6 POP_TOP
7 POP_BLOCK
>> 8 LOAD_CONST 0 (None)
11 RETURN_VALUE
>>> dis.dis(while_true)
2 0 SETUP_LOOP 12 (to 15)
>> 3 LOAD_GLOBAL 0 (True)
6 JUMP_IF_FALSE 4 (to 13)
9 POP_TOP
3 10 JUMP_ABSOLUTE 3
>> 13 POP_TOP
14 POP_BLOCK
>> 15 LOAD_CONST 0 (None)
18 RETURN_VALUE
So, while True: is a little easier to read, and while 1: is a bit kinder to old versions of Python. As you're unlikely to need to run on Python 2.2 these days or need to worry about the bytecode count of your loops, the former is marginally preferable.

The most pythonic way will always be the most readable. Use while True:

It doesn't really matter. Neither is hard to read or understand, though personally I'd always use while True, which is a bit more explicit.
More generally, a whole lot of while–break loops people write in Python could be something else. Sometimes I see people write i = 0; while True: i += 1 ..., which can be replaced with for i in itertools.count() and people writing while True: foo = fun() if foo is None: break when this can be written for foo in iter(fun, None), which requires learning but has less boilerplate and opportunity for silly mistakes.

Neither.
Both of them mean I have to scan the code looking for the break, instead of being able to see the stop condition right where it belongs.
I try to avoid this kind of thing wherever possible, and if it's not possible, let the code speak for itself like this:
while not found_answer:
check_number += 1
if check_number == 42:
found_answer = True
Edit: It seems that the word "avoid" above wasn't clear enough. Using a basically infinite loop and leaving it from somewhere within the loop (using break) should usually be avoided altogether. Sometimes that isn't possible. In that case, I like to use something like the code above, which, however, still represents the same concept – the above code is nothing more than a compromise – but at least, I can show the purpose of the loop at the beginning – just like I wouldn't call a function do_something_with_args(*args).

IMO the second option is more obvious.
If you could get rid of the while and write more compact code, that might be more pythonic.
For example:
# Get the even numbers in the range 1..10
# Version 1
l = []
n = 1
while 1:
if n % 2 == 0: l.append(n)
n += 1
if n > 10: break
print l
# Version 2
print [i for i in range(1, 11) if i % 2 == 0]
# Version 3
print range(2, 11, 2)

I think this is mostly a matter of style. Both should be easily understandable as an infinite loop.
However, personally I prefer the second option. That's because it just takes a mental micro-step less to understand, especially for programmers without C background.

The first one will work also in those early versions where True is not yet defined.

If you have an algorithm that is suppose to terminate in a finite time, I would recommend this, which is always safer than while True:
maxiter = 1000
for i in xrange(maxiter):
# your code
# on success:
break
else:
# that algorithm has not finished in maxiter steps! do something accordingly

I believe the second expression is more explicit, and thus more pythonic.

This is only a matter of style, any programming beginner will understand either option.
But the second option will only work if True wasn't assigned to False, which was possible until Python 3:
>>> True = False
>>> True
False

The better way is "while True" with a conditional break out of the loop.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Yield only as many are required from a generator - python

You need to make sure that the number of items on both sides matches. One way is to use islice from the itertools module: from itertools import islice a, b, c = islice(itertools.count(), 3)

Python does not work the way you desire. In any assignment, like a, b, c = itertools.count() the right-hand side is evaluated first, before the left-hand side. The right-hand side can not know how many items are on the left-hand side unless you tell it.

Or use a list comprehension, since you know the desired number of items: ic = itertools.count() a,b,c = [ic.next() for i in range(3)] or even simpler: a,b,c = range(3)

Related

In python on x86-64, are the first 6 arguments typically passed on registers?

How does Python's re-declaration of variables work internally?

Nested if statements or And

Python Dictionary vs If Statement Speed

What's the difference between "while 1" and "while True"?

Categories

Resources