Python slice/index evaluation order (before list definition) - python

I was wondering if there is a reason why python evaluates the slicing/indexing after the definition of a list/tuple ?
(This question concerns Python Code Golf so I know it's not readable or "good practice", it's just about the accepted syntax and the fundamental behavior of the language.)
We can use index to mimic the ternary operator's behavior:
a if x > 9 else b
(b,a)[x>9] # way shorter
But this has an issue: the content of the tuple is evaluated before the condition in the index.
I created an example to illustrate the point: a function f that reduces the size of a string by one recursively till the string is empty, then returns 0
f = lambda s: f(s) if len(s:=s[:-1]) else 0
print(f("abc")) # works fine
f = lambda s: (0, f(s))[len(s:=s[:-1])==0]
print(f("abc")) # max recursion depth error
The recursion depth error occurs because the tuple definition is evaluated before the index. It means that what is in the slice/index doesn't matter, the function will be called again and again.
I don't really understand why python doesn't evaluate the slice/index before because even an obvious case like the following fails:
f = lambda: (0, f())[0]
f() # max recursion depth error
On top of that, it could benefit in terms of memory usage and runtime if we just evaluate the single element (or the slice) we want from the array and not every single element:
x = 2
print([long_computation(), other_long_computation(), 0][x])
Is there any reason not to evaluate the slice/index before the tuple definition ?

Humans read left-to-right, the parser reads left-to-right, the compiler just converts whatever is parsed into bytecode. It would make sense- and it's easier- to just parse it left-to-right rather than adding special cases for stuff that can be already done properly in the left-to-right fashion. Why do you need to do this anyways? You don't need to do it. You can already do it properly in the current parser and compiler. The complexity of this special case and how rare it's ever used, both factors are enough as reasons to not evaluate the slice before the tuple definition.

Related

Evaluation order of augmented operators (delimiters) in python

If I evaluate the following minimal example in python
a = [1, 2, 3]
a[-1] += a.pop()
I get
[1, 6]
So it seems that this is evaluated as
a[-1] = a[-1] + a.pop()
where each expression/operand would be evaluated in the order
third = first + second
so that on the lefthand side a[-1] is the 2nd element while on the righthand side it is the 3rd.
a[1] = a[2] + a.pop()
Can someone explain to me how one could infer this from the docs? Apparently '+=' is lexically a delimiter that also performs an operation (see here). What does that imply for its evaluation order?
EDIT:
I tried to clarify my question in a comment. I'll include it here for reference.
I want to understand if augmented operators have to be treated in a
special way (i.e. by expanding them) during lexical analysis, because
you kind of have to duplicate an expression and evaluate it twice.
This is not clear in the docs and I want to know where this behaviour
is specified. Other lexical delimiters (e.g. '}') behave differently.
Let me start with the question you asked at the end:
I want to understand if augmented operators have to be treated in a special way (i.e. by expanding them) during lexical analysis,
That one is simple; the answer is "no". A token is just a token and the lexical analyser just divides the input into tokens. As far as the lexical analyser is concerned, += is just a token, and that's what it returns for it.
By the way, the Python docs make a distinction between "operators" and "punctuation", but that's not really a significant difference for the current lexical analyser. It might have made sense in some previous incarnation of the parser based on operator-precedence parsing, in which an "operator" is a lexeme with associated precedence and associativity. But I don't know if Python ever used that particular parsing algorithm; in the current parser, both "operators" and "punctuation" are literal lexemes which appear as such in syntax rules. As you might expect, the lexical analyser is more concerned with the length of the tokens (<= and += are both two-character tokens) than with the eventual use inside the parser.
"Desugaring" -- the technical term for source tranformations which convert some language construct into a simpler construct -- is not usually performed either in the lexer or in the parser, although the internal workings of compilers are not subject to a Code of Conduct. Whether a language even has a desugaring component is generally considered an implementation detail, and may not be particularly visible; that's certainly true of Python. Python doesn't expose an interface to its tokeniser, either; the tokenizer module is a reimplementation in pure Python which does not produce exactly the same behaviour (although it's close enough to be a useful exploratory tool). But the parser is exposed in the ast module, which provides direct access to Python's own parser (at least in the CPython implementation), and that let's us see that no desugaring is done up to the point that the AST is constructed (note: requires Python3.9 for the indent option):
>>> import ast
>>> def showast(code):
... print(ast.dump(ast.parse(code), indent=2))
...
>>> showast('a[-1] += a.pop()')
Module(
body=[
AugAssign(
target=Subscript(
value=Name(id='a', ctx=Load()),
slice=UnaryOp(
op=USub(),
operand=Constant(value=1)),
ctx=Store()),
op=Add(),
value=Call(
func=Attribute(
value=Name(id='a', ctx=Load()),
attr='pop',
ctx=Load()),
args=[],
keywords=[]))],
type_ignores=[])
This produces exactly the syntax tree you would expect from the grammar, in which "augmented assignment" statements are represented as a specific production within assignment:
assignment:
| single_target augassign ~ (yield_expr | star_expressions)
single_target is a single assignable expression (such as a variable or, as in this case, a subscripted array); augassign is one of the augmented assignment operators, and the rest are alternatives for the right-hand side of the assignment. (You can ignore the "fence" grammar operator ~.) The parse tree produced by ast.dump is pretty close to the grammar, and shows no desugaring at all:
--------------------------
| | |
Subscript Add Call
--------- -----------------
| | | | |
a -1 Attribute [ ] [ ]
---------
| |
a 'pop'
The magic happens afterwards, which we can also see because the Python standard library also includes a disassembler:
>>> import dis
>>> dis.dis(compile('a[-1] += a.pop()', '--', 'exec'))
1 0 LOAD_NAME 0 (a)
2 LOAD_CONST 0 (-1)
4 DUP_TOP_TWO
6 BINARY_SUBSCR
8 LOAD_NAME 0 (a)
10 LOAD_METHOD 1 (pop)
12 CALL_METHOD 0
14 INPLACE_ADD
16 ROT_THREE
18 STORE_SUBSCR
20 LOAD_CONST 1 (None)
22 RETURN_VALUE
As can be seen, trying to summarize the evaluation order of augmented assignment as "left-to-right" is just an approximation. Here's what actually happens, as revealed in the virtual machine code above:
The target aggregate and its index are "computed" (lines 0 and 2), and then these two values are duplicated (line 4). (The duplication means that neither the target nor its subscript are evaluated twice.)
Then the duplicated values are used to lookup the value of the element (line 6). So it's at this point that the value of a[-1] is evaluated.
The right-hand side expression (a.pop()) is then evaluated (lines 8 through 12).
These two values (both 3, in this case) are combined with INPLACE_ADD because this is an ADD augmented assignment. In the case of integers, there's no difference between INPLACE_ADD and ADD, because integers are immutable values. But the compiler doesn't know that the first operand is an integer. a[-1] could be anything, including another list. So it emits an operand which will trigger the use of the __iadd__ method instead of __add__, in case there is a difference.
The original target and subscript, which have been patiently waiting on the stack since step 1, are then used to perform a subscripted store (lines 16 and 18. The subscript is still the subscript computed at line 2, -1. But at this point a[-1] refers to a different element of a.
The rotate is needed to get the arguments for into the correct order. Because the normal order of evaluation for assignment is to evaluate the right-hand side first, the virtual machine assumes that the new value will be at the bottom of the stack, followed by the object and its subscript.
Finally, None is returned as the value of the statement.
The precise workings of assignment and augmented assignment statements are documented in the Python reference manual. Another important source of information is the description of the __iadd__ special method. Evaluation (and evaluation order) for augmented assignment operations is sufficiently confusing that there is a Programming FAQ dedicated to it, which is worth reading carefully if you want to understand the exact mechanism.
Interesting though that information is, it's worth adding that writing programs which depend on details of the evaluation order inside an augmented assignment is not conducive to producing readable code. In almost all cases, augmented assignment which relies on non-obvious details of the procedure should be avoided, including statements such as the one that is the target of this question.
rici did a great job showing what's happening under the hood in the CPython reference interpreter, but there's a much simpler "source of truth" here in the language spec, which guarantees this behavior for any Python interpreter (not just CPython, but PyPy, Jython, IronPython, Cython, etc.). In the language spec, under Chapter 6: Expressions, section 6.16, Evaluation Order, it specifies:
Python evaluates expressions from left to right. Notice that while evaluating an assignment, the right-hand side is evaluated before the left-hand side.
That second sentence sounds like an exception to the general rule, but it isn't; assignment with = (including augmented assignment with += or the like) is not an expression in Python (the walrus operator introduced in 3.8 is an expression, but it can only assign to bare names, so there is never anything to "evaluate" on the left side, it's purely storing there, never reading from it), it's a statement, and the assignment statement has its own rules for order of evaluation. Those rules for assignment specify:
An assignment statement evaluates the expression list (remember that this can be a single expression or a comma-separated list, the latter yielding a tuple) and assigns the single resulting object to each of the target lists, from left to right.
This confirms the second sentence from the Expression Evaluation Order documentation; the expression list (the thing to be assigned) is evaluated first, then assignments to the targets proceed from there. So by the language spec itself, a[-1] += a.pop() must completely evaluate a.pop() first (the "expression list"), then perform assignment.
This behavior is required by the language spec, and has been for some time, so it can be relied on no matter what Python interpreter you're using.
That said, I'd recommend against code that relies on these guarantees from Python. For one, when you switch to other languages, the rules differ (and in some cases, e.g. many similar cases in C and C++, varying by version of the standard, there are no "rules", and trying to mutate the same object in multiple parts of an expression produces undefined behavior), so growing to rely on Python's behavior will hamper your ability to use other languages. Beyond that, it's still going to be confusing as hell, and just slight changes will avoid the confusion, for example, in your case, changing:
a[-1] += a.pop()
to just:
x = a.pop()
a[-1] += x
which, while admittedly a two-liner and therefore inferior!!!, achieves the same result, with meaningless overhead, and greater clarity.
TL;DR: The Python language spec guarantees that the right-hand side of += is fully evaluated before the augmented assignment operation begins and any of the code on the left-hand side is evaluated. But for code clarity, any code that relies on that guarantee should probably be refactored to avoid said reliance.

making python code block with loop faster

Is there a way I can implement the code block below using map or list comprehension or any other faster way, keeping it functionally the same?
def name_check(names, n_val):
lower_names = names.lower()
for item in set(n_val):
if item in lower_names:
return True
return False
Any help here is appreciated
A simple implementation would be
return any(character in names_lower for character in n_val)
A naive guess at the complexity would be O(K*2*N) where K is the number of characters in names and N is the number of characters in n_val. We need one "loop" for the call to lower*, one for the inner comprehension, and one for any. Since any is a built-in function and we're using a generator expression, I would expect it to be faster than your manual loop, but as always, profile to be sure.
To be clear, any short-circuits, so that behaviour is preserved
Notes on Your Implementation
On using a set: Your intuition to use a set to reduce the number of checks is a good one (you could add it to my form above, also), but it's a trade-off. In the case that the first element short circuits, the extra call to set is an additional N steps to produce the set expression. In the case where you wind up checking each item, it will save you some time. It depends on your expected inputs. If n_val was originally an iterable, you've lost that benefit and allocated all the memory up front. If you control the input to the function, why not just recommend it's called using lists that don't have duplicates (i.e., call set() on its input), and leave the function general?
* #Neopolitan pointed out that names_lower = names.lower() should be called out of the loop, as your original implementation called it, else it may (will?) be called repeatedly in the generator expression

Limitations of variables in python

I realize this may be a bit broad, and thought this was an interesting question that I haven't really seen an answer to. It may be hidden in the python documentation somewhere, but as I'm new to python haven't gone through all of it yet.
So.. are there any general rules of things that we cannot set to be variables? Everything in python is an object and we can use variables for the typical standard usage of storing strings, integers, aliasing variables, lists, calling references to classes, etc and if we're clever even something along the lines as the below that I can think of off the top of my head, wherever this may be useful
var = lambda: some_function()
storing comparison operators to clean code up such as:
var = some_value < some_value ...
So, that being said I've never come across anything that I couldn't store as a variable if I really wanted to, and was wondering if there really are any limitations?
You can't store syntactical constructs in a variable. For example, you can't do
command = break
while condition:
if other_condition:
command
or
operator = +
three = 1 operator 2
You can't really store expressions and statements as objects in Python.
Sure, you can wrap an expression in a lambda, and you can wrap a series of statements in a code object or callable, but you can't easily manipulate them. For instance, changing all instances of addition to multiplication is not readily possible.
To some extent, this can be worked around with the ast module, which provides for parsing Python code into abstract syntax trees. You can then manipulate the trees, instead of the code itself, and pass it to compile() to turn it back into a code object.
However, this is a form of indirection, compensating for a feature Python itself lacks. ast can't really compare to the anything-goes flexibility of (say) Lisp macros.
According to the Language Reference, the right hand side of an assignment statement can be an 'expression list' or a 'yield expression'. An expression list is a comma-separated list of one or more expressions. You need to follow this through several more tokens to come up with anything concrete, but ultimately you can find that an 'expression' is any number of objects (literals or variable names, or the result of applying a unary operator such as not, ~ or - to a nested expression_list) chained together by any binary operator (such as the arithmetic, comparison or bitwise operators, or logical and and or) or the ternary a if condition else b.
You can also note in other parts of the language reference that an 'expression' is exactly something you can use as an argument to a function, or as the first part (before the for) of a list comprehension or generator expression.
This is a fairly broad definition - in fact, it amounts to "anything Python resolves to an object". But it does leave out a few things - for example, you can't directly store the less-than operator < in a variable, since it isn't a valid expression by itself (it has to be between two other expressions) and you have to put it in a function that uses it instead. Similarly, most of the Python keywords aren't expressions (the exceptions are True, False and None, which are all canonical names for certain objects).
Note especially that functions are also objects, and hence the name of a function (without calling it) is a valid expression. This means that your example:
var = lambda: some_function()
can be written as:
var = some_function
By definition, a variable is something which can vary, or change. In its broadest sense, a variable is no more than a way of referring to a location in memory in your given program. Another way to think of a variable is as a container to place your information in.
Unlike popular strongly typed languages, variable declaration in Python is not required. You can place pretty much anything in a variable so long as you can come up with a name for it. Furthermore, in addition to the value of a variable in Python being capable of changing, the type often can as well.
To address your question, I would say the limitations on a variable in Python relate only to a few basic necessary attributes:
A name
A scope
A value
(Usually) a type
As a result, things like operators (+ or * for instance) cannot be stored in a variable as they do not meet these basic requirements, and in general you cannot store expressions themselves as variables (unless you're wrapping them in a lambda expression).
As mentioned by Kevin, it's also worth noting that it is possible to sort of store an operator in a variable using the operator module , however even doing so you cannot perform the kinds of manipulations that a variable is otherwise subject to as really you are just making a value assignment. An example of the operator module:
import operator
operations = {"+": operator.add,
"-": operator.sub,}
operator_variable_string= input('Give me an operand:')
operator_function = operations[operator_variable_string]
result = operator_function(8, 4)

Python 'pointer arithmetic' - Quicksort

In idiomatic C fashion, one can implement quicksort in a simple way with two arguments:
void quicksort(int inputArray[], int numelems);
We can safely use two arguments for later subdivisions (i.e. the partitions, as they're commonly called) via pointer arithmetic:
//later on in our quicksort routine...
quicksort(inputArray+last+1, numelems-last-1);
In fact, I even asked about this before on SO because I was untrained in pointer arithmetic at the time: see Passing an array to a function with an odd format - “v+last+1”
Basically, Is it possible to replicate the same behavior in python and if so, how? I have noticed that lists can be subdivided with the colon inside of square brackets (the slicing operator), but the slice operator does not pass the list from that point on; that is to say that the 1st element (0th index) is still the same in both cases.
As you're aware, Python's slice syntax makes a copy, so in order to manipulate a subsection of a list (not "array", in Python) in place, you need to pass around both the list and the start-index and size (or end-index) of the portion under discussion, much as you could in C. The signature of the recursive function would be something like:
def quicksort( inputList, numElems, startIndex = 0 ):
And the recursive call would be something like:
quicksort( inputList, numElems-last-1, last+1 )
Throughout the function you'd add startIndex to whatever list accesses you would make.
I suppose if you want to do something like that you could do the following:
# list we want to mutate
sort_list = [1,2,3,4,5,6,7,8,9,0]
#wrapper just so everything looks pretty, process could go here if we wanted
def wrapper(a, numlems):
cut = len(a) - numlems
# overwrites a part of the list with another one
a[cut:] = process(a[cut:])
# processing of the slice
def process(a):
# just to show it works
a[1] = 15
return a
wrapper(sort_list, 2)
print(sort_list)
wrapper(sort_list, 4)
print(sort_list)
wrapper(sort_list, 6)
print(sort_list)
This is probably considered pretty evil in python and I wouldn't really recommend it, but it does emulate the functionality you wanted.
For python you only really need:
def quicksort(inputList, startIndex):
Then creating and concatenating slices would work fine without the need for pointer like functionality.

Why is there no explicit emptyness check (for example `is Empty`) in Python

The Zen of Python says "Explicit is better than implicit". Yet the "pythonic" way to check for emptiness is using implicit booleaness:
if not some_sequence:
some_sequence.fill_sequence()
This will be true if some_sequence is an empty sequence, but also if it is None or 0.
Compare with a theoretical explicit emptiness check:
if some_sequence is Empty:
some_sequence.fill_sequence()
With some unfavorably chosen variable name the implicit booleaness to check for emptiness gets even more confusing:
if saved:
mess_up()
Compare with:
if saved is not Empty:
mess_up()
See also: "Python: What is the best way to check if a list is empty?". I find it ironic that the most voted answer claims that implicit is pythonic.
So is there a higher reason why there is no explicit emptiness check, like for example is Empty in Python?
Polymorphism in if foo: and if not foo: isn't a violation of "implicit vs explicit": it explicitly delegates to the object being checked the task of knowing whether it's true or false. What that means (and how best to check it) obviously does and must depend on the object's type, so the style guide mandates the delegation -- having application-level code arrogantly asserts it knows better than the object would be the height of folly.
Moreover, X is Whatever always, invariably means that X is exactly the same object as Whatever. Making a totally unique exception for Empty or any other specific value of Whatever would be absurd -- hard to imagine a more unPythonic approach. And "being exactly the same object" is obviously transitive -- so you could never any more have distinct empty lists, empty sets, empty dicts... congratulations, you've just designed a completely unusable and useless language, where every empty container crazily "collapses" to a single empty container object (just imagine the fun when somebody tries to mutate an empty container...?!).
The reason why there is no is Empty is astoundingly simple once you understand what the is operator does.
From the python manual:
The operators is and is not test for object identity: x is y is true
if and only if x and y are the same object. x is not y yields the
inverse truth value.
That means some_sequence is Empty checks whether some_sequence is the same object as Empty. That cannot work the way you suggested.
Consider the following example:
>>> a = []
>>> b = {}
Now let's pretend there is this is Empty construct in python:
>>> a is Empty
True
>>> b is Empty
True
But since the is operator does identity check that means that a and b are identical to Empty. That in turn must mean that a and b are identical, but they are not:
>>> a is b
False
So to answer your question "why is there no is Empty in python?": because is does identity check.
In order to have the is Empty construct you must either hack the is operator to mean something else or create some magical Empty object which somehow detects empty collections and then be identical to them.
Rather than asking why there is no is Empty you should ask why there is no builtin function isempty() which calls the special method __isempty__().
So instead of using implicit booleaness:
if saved:
mess_up()
we have explicit empty check:
if not isempty(saved):
mess_up()
where the class of saved has an __isempty__() method implemented to some sane logic.
I find that far better than using implicit booleaness for emptyness check.
Of course you can easily define your own isempty() function:
def isempty(collection):
try:
return collection.__isempty__()
except AttributeError:
# fall back to implicit booleaness but check for common pitfalls
if collection is None:
raise TypeError('None cannot be empty')
if collection is False:
raise TypeError('False cannot be empty')
if collection == 0:
raise TypeError('0 cannot be empty')
return bool(collection)
and then define an __isempty__() method which returns a boolean for all your collection classes.
I agree that sometimes if foo: isn't explicit for me when I really want to tell the reader of the code that it's emptiness I'm testing. In those cases, I use if len(foo):. Explicit enough.
I 100% agree with Alex w.r.t is Empty being unpythonic.
Consider that Lisp has been using () empty list or its symbol NIL quite some years as False and T or anything not NIL as True, but generally computation of Truth already produced some useful result that need not be reproduce if needed. Look also partition method of strings, where middle result works very nicely as while control with the non-empty is True convention.
I try generally avoid using of len as it is most times very expensive in tight loops. It is often worth while to update length value of result in program logic instead of recalculating length.
For me I would prefer Python to have False as () or [] instead of 0, but that is how it is. Then it would be more natural to use not [] as not empty. But now () is not [] is True so you could use:
emptyset = set([])
if myset == emptyset:
If you want to be explicit of the empty set case (not myset is set([]))
I myself quite like the if not myset as my commenter.
Now came to my mind that maybe this is closest to explicit not_empty:
if any(x in myset for x in myset): print "set is not empty"
and so is empty would be:
if not any(x in myset for x in myset): print "set is empty"
There is an explicit emptyness check for iterables in Python. It is spelled not. What's implicit there? not gives True when iterable is empty, and gives False when it is nonempty.
What exactly do you object to? A name? As others have told you, it's certainly better than is Empty. And it's not so ungrammatical: considering how things are usually named in Python, we might imagine a sequence called widgets, containing, surprisingly, some widgets. Then,
if not widgets:
can be read as "if there are no widgets...".
Or do you object the length? Explicit doesn't mean verbose, those are two different concepts. Python does not have addition method, it has + operator, that is completely explicit if you know the type you're applying it to. The same thing with not.

Categories