Is indentation semantically meaningful or syntactically meaningful in Python - python

I came across the sentence,
In Python, indentation is semantically meaningful.
I'm not sure I understand what "semantically meaningful" means here.
Also, since indentations are used to delimit the if and else blocks of coniditional expressions in Python, wouldn't they be considered to be part of the language grammar and therefore "syntactically meaningful"? (I cannot find mention of them in the docs for conditional expressions.)

This question is mostly useful for pedantry, since the answer won't change the way you write your code.
However, I would say: in Python, leading spaces are syntactically meaningful, and indentation is semantically meaningful.
The number of spaces at the start of a logical line (in the sense as defined in the documentation) defines the indentation level for that line, by comparing the number of leading spaces to the same number for the previous logical line. It either matches the previous number (continue current block), is greater (increase level and start new block) or matches a previous number of spaces (decrease level, end current block, continue matched block). If it doesn't match a previous level, that's an indentation error. That's the syntactic meaning of leading spaces.
Once Python knows the indentation level, that decides the meaning of the line (i.e. continue a block, start a new one, or continue a previous block and ending the current) - these are the semantics of the indentation level.
In other words: leading spaces are syntax, indentation is semantics.

I think they should have said "semantically meaningful", but the distinction is somewhat fuzzy. TechDifferences says:
The syntax of a programming language is a collection of rules to specify the structure or form of code whereas semantics refers to the interpretation of the code or the associated meaning of the symbols, characters or any part of a program.
Since indentation determines things like whether a line is part of a function or loop, and that impacts things like variable scope, it could be considered to affect the "associated meaning" of the symbols.

Related

How to do a one line while loop with initialization of the loop variable on the same line? [duplicate]

I can join lines in Python using semi-colon, e.g.
a=5; b=10
But why can't I do the same with for
x=['a','b']; for i,j in enumerate(x): print(i,":", j)
Because the Python grammar disallows it. See the documentation:
stmt_list ::= simple_stmt (";" simple_stmt)* [";"]
Semicolons can only be used to separate simple statements (not compound statements like for). And, really, there's almost no reason to ever use them even for that. Just use separate lines. Python isn't designed to make it convenient to jam lots of code onto one line.
The short (yet valid) answer is simply "because the language grammar isn't defined to allow it". As for why that's the case, it's hard if not impossible to be sure unless you ask whoever came up with that portion of the grammar, but I imagine it's due to readability, which is one of the goals of Python1.
Why would you ever want to write something obscure like that? Just split it up into multiple lines:
x = ['a','b']
for i,j in enumerate(x):
print(i, ":", j)
I would argue that this variant is much clearer.
1 From import this: Readability counts.
Because Guido van Rossum, the creator of Python, and other developers, don't actually like the "semicolons and braces" style of code formatting.
For a brief look at how they feel about these things, try:
from __future__ import braces
Statements in Python are supposed to be separated by blank lines, and compound statements in Python are supposed to be bounded by indentation.
The existence of ; and one-line compound statements (e.g. for i in x: print i) are meant to be only very limited concessions... and you can't combine them.
The grammar of Python does not allow this. It's a good answer, but what's the reason for it?
I think the logic behind the decision is the following: body of a for loop must be indented in order to be recognized. So, if we allow not a simple_stmt there, it would require a complex and easy-to-break indentation.
A compound statement consists of one or more ‘clauses’. A clause consists of a header and a ‘suite.’ The clause headers of a particular compound statement are all at the same indentation level. Each clause header begins with a uniquely identifying keyword and ends with a colon. A suite is a group of statements controlled by a clause. A suite can be one or more semicolon-separated simple statements on the same line as the header, following the header’s colon, or it can be one or more indented statements on subsequent lines.
x=['a','b'];
This does not justify the clause definition and thus cannot be used as a part of a compound statement. Therefore you encounter error.
Try this.
x=['a','b']; [print(i,":", j) for i,j in enumerate(x)]

How to tell if the next line should be indented when parsing python

I am writing a simple templating engine in python, and it involves mixing python with other languages, and I need to determine the indentation level of any given line of python code.
I was wondering if it's accurate to say that a new indentation level is always indicated by a colon (:) at the end of the line.
Here's a line of python:
if my_boolean:
Since there is a colon at the end of this line, I would determine that the next line of python should be an indented block. Is this always accurate? Are there cases when I need to indent when a colon is not present?
A colon at the end of the line is the most prevalent example of an indicator that the following line is indented. The other one is any line that has more opening parentheses, braces, or brackets than closing ones. The latter case is more complicated because the order of the brackets matters very much, and also because the following indentation is arbitrary.
Another thing to consider is that you don't have any indication that wether a given line is expected to be unindented until you get to it.
The moral of the story is that you're better off using the existing machinery exposed by the ast module rather than reinventing the wheel. It's an awfully complicated wheel sometimes.

PEP8 - Contradiction between E129 and E127/E128

According to the PEP standards, indents should come before binary operators. Furthermore, multiline conditions should be enclosed within parentheses to avoid using backslashes before newlines. These two conventions lead to the following situation
if (long_condition_1
or long_condition_2):
do_some_function()
This code in turn breaks E129 visually indented line with same indent as next logical line in PEP8. However, the second line must be indented exactly four spaces, as otherwise it breaks E128 or E127 for under-indented or over-indented lines.
How should one format the above so that it confirms to PEP8 standards?
This should work properly
if (long_condition_1 or
long_condition_2):
do_some_function()
The answer to this question has changed over time. Due to a change in stance from PEP8, W503 is now widely regarded to go against PEP8.
PEP8 now says it's fine to break before OR after, but to keep it consistent locally.
For newer code, Knuth-style is preferred (which I think refers to breaking before the operator).
if (
long_condition_1
or long_condition_2
or (
long_condition_3
and long_condition4
)
):
do_some_function()
if any((long_condition_1,
long_condition_2)):
do_some_function()
it's better to read when both conditions aligned too ...

Python style for `chained` function calls

More and more we use chained function calls:
value = get_row_data(original_parameters).refine_data(leval=3).transfer_to_style_c()
It can be long. To save long line in code, which is prefered?
value = get_row_data(
original_parameters).refine_data(
leval=3).transfer_to_style_c()
or:
value = get_row_data(original_parameters)\
.refine_data(leval=3)\
.transfer_to_style_c()
I feel it good to use backslash \, and put .function to new line. This makes each function call has it own line, it's easy to read. But this sounds not preferred by many. And when code makes subtle errors, when it's hard to debug, I always start to worry it might be a space or something after the backslash (\).
To quote from the Python style guide:
Long lines can be broken over multiple lines by wrapping expressions
in parentheses. These should be used in preference to using a
backslash for line continuation. Make sure to indent the continued
line appropriately. The preferred place to break around a binary
operator is after the operator, not before it.
I tend to prefer the following, which eschews the non-recommended \ at the end of a line, thanks to an opening parenthesis:
value = (get_row_data(original_parameters)
.refine_data(level=3)
.transfer_to_style_c())
One advantage of this syntax is that each method call is on its own line.
A similar kind of \-less structure is also often useful with string literals, so that they don't go beyond the recommended 79 character per line limit:
message = ("This is a very long"
" one-line message put on many"
" source lines.")
This is a single string literal, which is created efficiently by the Python interpreter (this is much better than summing strings, which creates multiple strings in memory and copies them multiple times until the final string is obtained).
Python's code formatting is nice.
What about this option:
value = get_row_data(original_parameters,
).refine_data(leval=3,
).transfer_to_style_c()
Note that commas are redundant if there are no other parameters but I keep them to maintain consistency.
The not quoting my own preference (although see comments on your question:)) or alternatives answer to this is:
Stick to the style guidelines on any project you have already - if not stated, then keep as consistent as you can with the rest of the code base in style.
Otherwise, pick a style you like and stick with that - and let others know somehow that's how you'd appreciate chained function calls to be written if not reasonably readable on one-line (or however you wish to describe it).

Python indentation in "empty lines"

Which is preferred ("." indicating whitespace)?
A)
def foo():
x = 1
y = 2
....
if True:
bar()
B)
def foo():
x = 1
y = 2
if True:
bar()
My intuition would be B (that's also what vim does for me), but I see people using A) all the time. Is it just because most of the editors out there are broken?
If you use A, you could copy paste your block in python shell, B will get unexpected indentation error.
The PEP 8 does not seem to be clear on this issue, although the statements about "blank lines" could be interpreted in favor of B. The PEP 8 style-checker (pep8.py) prefers B and warns if you use A; however, both variations are legal. My own view is that since Python will successfully interpret the code in either case that this doesn't really matter, and trying to enforce it would be a lot of work for very little gain. I suppose if you are very adamantly in favor of one or the other you could automatically convert the one to the other. Trying to fix all such lines manually, though, would be a huge undertaking and really not worth the effort, IMHO.
Adding proper indentation to blank lines (style A in the question) vastly improves code readability with display whitespace enabled because it makes it easier to see whether code after a blank line is part of the same indentation block or not.
For a language like Python, where there is no end statement or close bracket, I'm surprised this is not part of PEP. Editing Python with display whitespace on is strongly recommended, to avoid both trailing whitespace and mixed indentation.
Compare reading the following:
A)
def foo():
....x = 1
....y = 2
....
....if True:
........bar()
B)
def foo():
....x = 1
....y = 2
....if True:
........bar()
In A, it is far clearer that the last two lines are part of foo. This is even more useful at higher indentation levels.
That empty line belongs to foo(), so I would consider A to be the most natural. But I guess it's just a matter of opinion.
TextMate breaks block collapsing if you use B, and I prefer A anyway since it's more "logical".
My experience in open-source development is that one should never leave whitespace inside blank lines. Also one should never leave trailing white-space.
It's a matter of coding etiquette.
I wouldn't necessarily call the first example "broken", because I know some people hate it when the cursor "jumps back" when moving the cursor up or down in code. E.g. Visual Studio (at least 2008) automatically prevents this from happening without using any whitespace characters on those lines.
B is preferred - i.e. no indentation. PEP 8 says:
Avoid trailing whitespace anywhere. Because it's usually invisible, it can be confusing: e.g. a backslash followed by a space and a newline does not count as a line continuation marker. Some editors don't preserve it and many projects (like CPython itself) have pre-commit hooks that reject it.
Emacs does B) for me, but I really don't think it matters. A) means that you can add in a line at the correct indentation without any tabbing.
vi implicitly discourages the behaviour in A because the {/} navigations no longer work as expected. git explicitly discourages it by highlighting it in red when you run git diff. I would also argue that if a line contains spaces it is not a blank line.
For that reason I strongly prefer B. There is nothing worse than expecting to skip six or so lines up with the { motion and ending up at the top of a class def.

Categories