PEP8 - Contradiction between E129 and E127/E128 - python

According to the PEP standards, indents should come before binary operators. Furthermore, multiline conditions should be enclosed within parentheses to avoid using backslashes before newlines. These two conventions lead to the following situation
if (long_condition_1
or long_condition_2):
do_some_function()
This code in turn breaks E129 visually indented line with same indent as next logical line in PEP8. However, the second line must be indented exactly four spaces, as otherwise it breaks E128 or E127 for under-indented or over-indented lines.
How should one format the above so that it confirms to PEP8 standards?

This should work properly
if (long_condition_1 or
long_condition_2):
do_some_function()

The answer to this question has changed over time. Due to a change in stance from PEP8, W503 is now widely regarded to go against PEP8.
PEP8 now says it's fine to break before OR after, but to keep it consistent locally.
For newer code, Knuth-style is preferred (which I think refers to breaking before the operator).
if (
long_condition_1
or long_condition_2
or (
long_condition_3
and long_condition4
)
):
do_some_function()

if any((long_condition_1,
long_condition_2)):
do_some_function()
it's better to read when both conditions aligned too ...

Related

Is indentation semantically meaningful or syntactically meaningful in Python

I came across the sentence,
In Python, indentation is semantically meaningful.
I'm not sure I understand what "semantically meaningful" means here.
Also, since indentations are used to delimit the if and else blocks of coniditional expressions in Python, wouldn't they be considered to be part of the language grammar and therefore "syntactically meaningful"? (I cannot find mention of them in the docs for conditional expressions.)
This question is mostly useful for pedantry, since the answer won't change the way you write your code.
However, I would say: in Python, leading spaces are syntactically meaningful, and indentation is semantically meaningful.
The number of spaces at the start of a logical line (in the sense as defined in the documentation) defines the indentation level for that line, by comparing the number of leading spaces to the same number for the previous logical line. It either matches the previous number (continue current block), is greater (increase level and start new block) or matches a previous number of spaces (decrease level, end current block, continue matched block). If it doesn't match a previous level, that's an indentation error. That's the syntactic meaning of leading spaces.
Once Python knows the indentation level, that decides the meaning of the line (i.e. continue a block, start a new one, or continue a previous block and ending the current) - these are the semantics of the indentation level.
In other words: leading spaces are syntax, indentation is semantics.
I think they should have said "semantically meaningful", but the distinction is somewhat fuzzy. TechDifferences says:
The syntax of a programming language is a collection of rules to specify the structure or form of code whereas semantics refers to the interpretation of the code or the associated meaning of the symbols, characters or any part of a program.
Since indentation determines things like whether a line is part of a function or loop, and that impacts things like variable scope, it could be considered to affect the "associated meaning" of the symbols.

How to tell if the next line should be indented when parsing python

I am writing a simple templating engine in python, and it involves mixing python with other languages, and I need to determine the indentation level of any given line of python code.
I was wondering if it's accurate to say that a new indentation level is always indicated by a colon (:) at the end of the line.
Here's a line of python:
if my_boolean:
Since there is a colon at the end of this line, I would determine that the next line of python should be an indented block. Is this always accurate? Are there cases when I need to indent when a colon is not present?
A colon at the end of the line is the most prevalent example of an indicator that the following line is indented. The other one is any line that has more opening parentheses, braces, or brackets than closing ones. The latter case is more complicated because the order of the brackets matters very much, and also because the following indentation is arbitrary.
Another thing to consider is that you don't have any indication that wether a given line is expected to be unindented until you get to it.
The moral of the story is that you're better off using the existing machinery exposed by the ast module rather than reinventing the wheel. It's an awfully complicated wheel sometimes.

Matching indentation level according to PEP8/flake8

The question is how to properly break lines according to PEP8 while using TABs.
So here is a related question How to break a line in a function definition in Python according to PEP8. But the issue is that this only works properly when the length of the definition header def dummy( is an integer multiple of the tab length.
def tes(para1=x,
--->--->para2=y)
Otherwise I end up with a new error and flake8 complains about Error E127 or E128 because the its either over- or under-indented like this:
Under-indented E128
def test(para1=x,
--->--->para2=y)
Over-indented
def te(para1=x,
--->--->para2=y)
A solution where flake8 does not complain is to do:
def test(
--->--->para1=x,
--->--->para2=y
--->--->)
However, when I am programming I don't necessarily know in advance how many parameters I'm gonna use in that test() function. So once I hit the line limit I have rearrange quite a bit.
This obviously does apply to all continuations. Does this mean the cleanest solution is to break the line as soon as possible for every line which final length cannot be said by the time of first writing, or is there another solution.
Tab and space shall not be mixed for the solution.
So now I ask myself what is the legis artis to deal with line continuations?
I'm turning my original comment into an official answer.
The PEP-0008 has a section about whether to use Tabs or Spaces, quoted below (with my emphasis):
Spaces are the preferred indentation method.
Tabs should be used solely to remain consistent with code that is
already indented with tabs.
Python 3 disallows mixing the use of tabs and spaces for indentation.
Python 2 code indented with a mixture of tabs and spaces should be
converted to using spaces exclusively.
When invoking the Python 2 command line interpreter with the -t
option, it issues warnings about code that illegally mixes tabs and
spaces. When using -tt these warnings become errors. These options are
highly recommended!
You're running into issues with tabs, and you don't say whether you're using Python2 or 3, but I'd suggest you stick to the PEP-0008 guidelines.
You should replace tab chars in the file/module with 4 spaces and use spaces exclusively when indenting.
WARNING: Be very careful if you plan to use shell commands to do this for you, as some commands can be dangerous and mangle intended tab chars within strings (i.e. not only indentation tabs) and can break other things, such as repositories -especially if the command is recursive.
PEP8 is very clear:
Tabs or Spaces?
Spaces are the preferred indentation method.
Tabs should be used solely to remain consistent with code that is already indented with tabs.
Python 3 disallows [emphasis added] mixing the use of tabs and spaces for indentation.
Reference: python.org.
So if you're writing new code and want to adhere to standards, just use spaces.

Python style for `chained` function calls

More and more we use chained function calls:
value = get_row_data(original_parameters).refine_data(leval=3).transfer_to_style_c()
It can be long. To save long line in code, which is prefered?
value = get_row_data(
original_parameters).refine_data(
leval=3).transfer_to_style_c()
or:
value = get_row_data(original_parameters)\
.refine_data(leval=3)\
.transfer_to_style_c()
I feel it good to use backslash \, and put .function to new line. This makes each function call has it own line, it's easy to read. But this sounds not preferred by many. And when code makes subtle errors, when it's hard to debug, I always start to worry it might be a space or something after the backslash (\).
To quote from the Python style guide:
Long lines can be broken over multiple lines by wrapping expressions
in parentheses. These should be used in preference to using a
backslash for line continuation. Make sure to indent the continued
line appropriately. The preferred place to break around a binary
operator is after the operator, not before it.
I tend to prefer the following, which eschews the non-recommended \ at the end of a line, thanks to an opening parenthesis:
value = (get_row_data(original_parameters)
.refine_data(level=3)
.transfer_to_style_c())
One advantage of this syntax is that each method call is on its own line.
A similar kind of \-less structure is also often useful with string literals, so that they don't go beyond the recommended 79 character per line limit:
message = ("This is a very long"
" one-line message put on many"
" source lines.")
This is a single string literal, which is created efficiently by the Python interpreter (this is much better than summing strings, which creates multiple strings in memory and copies them multiple times until the final string is obtained).
Python's code formatting is nice.
What about this option:
value = get_row_data(original_parameters,
).refine_data(leval=3,
).transfer_to_style_c()
Note that commas are redundant if there are no other parameters but I keep them to maintain consistency.
The not quoting my own preference (although see comments on your question:)) or alternatives answer to this is:
Stick to the style guidelines on any project you have already - if not stated, then keep as consistent as you can with the rest of the code base in style.
Otherwise, pick a style you like and stick with that - and let others know somehow that's how you'd appreciate chained function calls to be written if not reasonably readable on one-line (or however you wish to describe it).

Python indentation in "empty lines"

Which is preferred ("." indicating whitespace)?
A)
def foo():
x = 1
y = 2
....
if True:
bar()
B)
def foo():
x = 1
y = 2
if True:
bar()
My intuition would be B (that's also what vim does for me), but I see people using A) all the time. Is it just because most of the editors out there are broken?
If you use A, you could copy paste your block in python shell, B will get unexpected indentation error.
The PEP 8 does not seem to be clear on this issue, although the statements about "blank lines" could be interpreted in favor of B. The PEP 8 style-checker (pep8.py) prefers B and warns if you use A; however, both variations are legal. My own view is that since Python will successfully interpret the code in either case that this doesn't really matter, and trying to enforce it would be a lot of work for very little gain. I suppose if you are very adamantly in favor of one or the other you could automatically convert the one to the other. Trying to fix all such lines manually, though, would be a huge undertaking and really not worth the effort, IMHO.
Adding proper indentation to blank lines (style A in the question) vastly improves code readability with display whitespace enabled because it makes it easier to see whether code after a blank line is part of the same indentation block or not.
For a language like Python, where there is no end statement or close bracket, I'm surprised this is not part of PEP. Editing Python with display whitespace on is strongly recommended, to avoid both trailing whitespace and mixed indentation.
Compare reading the following:
A)
def foo():
....x = 1
....y = 2
....
....if True:
........bar()
B)
def foo():
....x = 1
....y = 2
....if True:
........bar()
In A, it is far clearer that the last two lines are part of foo. This is even more useful at higher indentation levels.
That empty line belongs to foo(), so I would consider A to be the most natural. But I guess it's just a matter of opinion.
TextMate breaks block collapsing if you use B, and I prefer A anyway since it's more "logical".
My experience in open-source development is that one should never leave whitespace inside blank lines. Also one should never leave trailing white-space.
It's a matter of coding etiquette.
I wouldn't necessarily call the first example "broken", because I know some people hate it when the cursor "jumps back" when moving the cursor up or down in code. E.g. Visual Studio (at least 2008) automatically prevents this from happening without using any whitespace characters on those lines.
B is preferred - i.e. no indentation. PEP 8 says:
Avoid trailing whitespace anywhere. Because it's usually invisible, it can be confusing: e.g. a backslash followed by a space and a newline does not count as a line continuation marker. Some editors don't preserve it and many projects (like CPython itself) have pre-commit hooks that reject it.
Emacs does B) for me, but I really don't think it matters. A) means that you can add in a line at the correct indentation without any tabbing.
vi implicitly discourages the behaviour in A because the {/} navigations no longer work as expected. git explicitly discourages it by highlighting it in red when you run git diff. I would also argue that if a line contains spaces it is not a blank line.
For that reason I strongly prefer B. There is nothing worse than expecting to skip six or so lines up with the { motion and ending up at the top of a class def.

Categories