Why is imperative mood important for docstrings?

Why is imperative mood important for docstrings? - python

The error code D401 for pydocstyle reads: First line should be in imperative mood.
I often run into cases where I write a docstring, have this error thrown by my linter, and rewrite it -- but the two docstrings are semantically identical. Why is it important to have imperative mood for docstrings?

From the docstring of check_imperative_mood itself:
"""D401: First line should be in imperative mood: 'Do', not 'Does'.
[Docstring] prescribes the function or method's effect as a command:
("Do this", "Return that"), not as a description; e.g. don't write
"Returns the pathname ...".
(We'll ignore the irony that this docstring itself would fail the test.)

Consider the following example candidate for a docstring:
Make a row from a given bit string or with the given number of columns.
In English, this is a complete, grammatical sentence that begins with a capital letter and ends with a period. It's a sentence because it has a subject (implicitly, "you"), an object, "row," and a predicate (verb), "make."
Now consider an alternative:
Makes a row from a given bit string or with the given number of columns.
In English, this is ungrammatical. It is an adjectival phrase, therefore it should not begin with a capital letter and should not end with a period. Let's fix that problem:
makes a row from a given bit string or with the given number of columns
As an adjectival phrase, its antecedent --- its target --- is not explicit. Of course, we know it's the item being "docstringed," but, grammatically, it's dangling. That's a problem. Aesthetically, it's ugly, and that's another problem. So we fixed one problem and added two more.
For people who care about clear, anambiguous communication in grammatical English, the first proposal is clearly superior. I will guess that's the reason the Pythonistas chose the first proposal. In summary, "Docstrings shall be complete, grammatical sentences, specifically in the imperative mood."

It is not complete to just say it is about a convention or a consistency (otherwise, the follow-up question would be, "consistent with what?").
It is actually an explicit requirement from - albeit buried down deep in - the canonical PEP 257 Docstring Conventions. Quoted below:
def kos_root():
"""Return the pathname of the KOS root directory."""
...
Notes:
The docstring is a phrase ending in a period. It prescribes the
function or method's effect as a command ("Do this", "Return that"),
not as a description; e.g. don't write "Returns the pathname ...".
That pydocstyle docstring was actually quoted from PEP 257 paragraph above.

It is more important to have a consistent style within a project or in a company.
The whole idea comes from the PEP-257, which says
The docstring is a phrase ending in a period. It prescribes the
function or method’s effect as a command (“Do this”, “Return that”),
not as a description; e.g. don’t write “Returns the pathname …”.
But for example the Google Python Style Guide states the total opposite of this:
The docstring should be descriptive-style ("""Fetches rows from a
Bigtable.""") rather than imperative-style ("""Fetch rows from a
Bigtable.""").
Also worth to mention that both the Oracle Java style guide and the Microsoft .NET guides prefer the descriptive style.
Use 3rd person (descriptive) not 2nd person (prescriptive).
The description is in 3rd person declarative rather than 2nd person
imperative.
Gets the label. (preferred)
Get the label. (avoid)
So it looks like this preference of imperative style is Python-specific.

Why is it important? Because that's the explicit convention for Python docstrings, as detailed in PEP 257. There's nothing particularly special about it - it doesn't seem obvious to me that one of "Multiplies two integers and returns the product" and "Multiply two integers and return the product" is clearly better than the other. But it is explicitly specified in the documentation.

For consistency. It might stem from the fact that the commit messages git automatically creates, like for merge commits, also uses the imperative mood.

I find the grammatical argument compelling.
I use the imperative style for names and docstrings. In my experience, the imperative style works better in review. People are more likely to comment on an obvious untruth when "what" of the docstring disagrees with "how" of the code:
def _update_calories(meal):
return sum(item.calories for item in meal) # where is the update?
I use the descriptive style for some inline comments when I want to highlight an unexpected behaviour. In my experience, people tend to read descriptive phrases as authoritative:
def _update_calories(meal):
meal.calories = sum(item.calories for item in meal)
return meal.calories
# warning: changes pizza.calories, easy to miss side-effect
calories = _update_calories(pizza)

Related

Is there something similar to END of perl in python? [duplicate]

Am I correct in thinking that that Python doesn't have a direct equivalent for Perl's __END__?
print "Perl...\n";
__END__
End of code. I can put anything I want here.
One thought that occurred to me was to use a triple-quoted string. Is there a better way to achieve this in Python?
print "Python..."
"""
End of code. I can put anything I want here.
"""

The __END__ block in perl dates from a time when programmers had to work with data from the outside world and liked to keep examples of it in the program itself.
Hard to imagine I know.
It was useful for example if you had a moving target like a hardware log file with mutating messages due to firmware updates where you wanted to compare old and new versions of the line or keep notes not strictly related to the programs operations ("Code seems slow on day x of month every month") or as mentioned above a reference set of data to run the program against. Telcos are an example of an industry where this was a frequent requirement.
Lastly Python's cult like restrictiveness seems to have a real and tiresome effect on the mindset of its advocates, if your only response to a question is "Why would you want to that when you could do X?" when X is not as useful please keep quiet++.

The triple-quote form you suggested will still create a python string, whereas Perl's parser simply ignores anything after __END__. You can't write:
"""
I can put anything in here...
Anything!
"""
import os
os.system("rm -rf /")
Comments are more suitable in my opinion.
#__END__
#Whatever I write here will be ignored
#Woohoo !

What you're asking for does not exist.
Proof: http://www.mail-archive.com/python-list#python.org/msg156396.html
A simple solution is to escape any " as \" and do a normal multi line string -- see official docs: http://docs.python.org/tutorial/introduction.html#strings
( Also, atexit doesn't work: http://www.mail-archive.com/python-list#python.org/msg156364.html )

Hm, what about sys.exit(0) ? (assuming you do import sys above it, of course)
As to why it would useful, sometimes I sit down to do a substantial rewrite of something and want to mark my "good up to this point" place.
By using sys.exit(0) in a temporary manner, I know nothing below that point will get executed, therefore if there's a problem (e.g., server error) I know it had to be above that point.
I like it slightly better than commenting out the rest of the file, just because there are more chances to make a mistake and uncomment something (stray key press at beginning of line), and also because it seems better to insert 1 line (which will later be removed), than to modify X-many lines which will then have to be un-modified later.
But yeah, this is splitting hairs; commenting works great too... assuming your editor supports easily commenting out a region, of course; if not, sys.exit(0) all the way!

I use __END__ all the time for multiples of the reasons given. I've been doing it for so long now that I put it (usually preceded by an exit('0');), along with BEGIN {} / END{} routines, in by force-of-habit. It is a shame that Python doesn't have an equivalent, but I just comment-out the lines at the bottom: extraneous, but that's about what you get with one way to rule them all languages.

Python does not have a direct equivalent to this.
Why do you want it? It doesn't sound like a really great thing to have when there are more consistent ways like putting the text at the end as comments (that's how we include arbitrary text in Python source files. Triple quoted strings are for making multi-line strings, not for non-code-related text.)
Your editor should be able to make using many lines of comments easy for you.

What is the proper level of indent for hanging indent with type hinting in python?

What is the proper syntax for a hanging indent for a method with multiple parameters and type hinting?
Align under first parameter
def get_library_book(self,
book_id: str,
library_id: str
)-> Book:
Indent one level beneath
def get_library_book(
self,
book_id: str,
library_id: str
) -> Book:
PEP8 supports the Indent one level beneath case, but does not specify if Align under first parameter is allowed. It states:
When using a hanging indent the following should be considered; there
should be no arguments on the first line and further indentation
should be used to clearly distinguish itself as a continuation line.

PEP8 has many good ideas in it, but I wouldn't rely on it to decide this kind of question about whitespace. When I studied PEP8's recommendations on whitespace, I found them to be inconsistent and even contradictory.
Instead, I would look at general principles that apply to nearly all programming languages, not just Python.
The column alignment shown in the first example has many disadvantages, and I don't use or allow it in any of my projects.
Some of the disadvantages:
If you change the function name so its length is different, you must realign all of the parameters.
When you do that realignment, your source control diffs are cluttered with unnecessary whitespace changes.
As the code is updated and maintained, it's likely that you'll miss some of the alignment when renaming variables, leading to misaligned code.
You get much longer line lengths.
The alignment doesn't work in a proportional font. (Yes, some developers prefer proportional fonts, and if you avoid column alignment, your code will be equally readable in monospaced or proportional fonts.)
It gets even worse if you use column alignment in more complex cases. Consider this example:
let mut rewrites = try_opt!(subexpr_list.iter()
.rev()
.map(|e| {
rewrite_chain_expr(e,
total_span,
context,
max_width,
indent)
})
.collect::<Option<Vec<_>>>());
This is Rust code from the Servo browser, whose coding style mandates this kind of column alignment. While it isn't Python code, exactly the same principles apply in Python or nearly any language.
It should be apparent in this code sample how the use of column alignment leads to a bad situation. What if you needed to call another function, or had a longer variable name, inside that nested rewrite_chain_expr call? You're just about out of room unless you want very long lines.
Compare the above with either of these versions which use a purely indentation-based style like your second Python example:
let mut rewrites = try_opt!(
subexpr_list
.iter()
.rev()
.map( |e| {
rewrite_chain_expr( e, total_span, context, max_width, indent )
})
.collect::<Option<Vec<_>>>()
);
Or, if the parameters to rewrite_chain_expr were longer or if you just wanted shorter lines:
let mut rewrites = try_opt!(
subexpr_list
.iter()
.rev()
.map( |e| {
rewrite_chain_expr(
e,
total_span,
context,
max_width,
indent
)
})
.collect::<Option<Vec<_>>>()
);
In contrast to the column-aligned style, this pure indentation style has many advantages and no disadvantages at all.

Appart from Terrys answer, take an example from typeshed which is the project on Python's GitHub for annotating the stdlib with stubs.
For example, in importlib.machinery (and in other cases if you look) annotations are done using your first form, for example:
def find_module(cls, fullname: str,
path: Optional[Sequence[importlib.abc._Path]]
) -> Optional[importlib.abc.Loader]:

Read the previous line of PEP 8 more carefully, the part before "or using a hanging indent".
Continuation lines should align wrapped elements either vertically using Python's implicit line joining inside parentheses, brackets and braces, or using a hanging indent.
This is intended to cover the first "yes' example, and your first example above.
# Aligned with opening delimiter.
foo = long_function_name(var_one, var_two,
var_three, var_four)

Python style for `chained` function calls

More and more we use chained function calls:
value = get_row_data(original_parameters).refine_data(leval=3).transfer_to_style_c()
It can be long. To save long line in code, which is prefered?
value = get_row_data(
original_parameters).refine_data(
leval=3).transfer_to_style_c()
or:
value = get_row_data(original_parameters)\
.refine_data(leval=3)\
.transfer_to_style_c()
I feel it good to use backslash \, and put .function to new line. This makes each function call has it own line, it's easy to read. But this sounds not preferred by many. And when code makes subtle errors, when it's hard to debug, I always start to worry it might be a space or something after the backslash (\).
To quote from the Python style guide:
Long lines can be broken over multiple lines by wrapping expressions
in parentheses. These should be used in preference to using a
backslash for line continuation. Make sure to indent the continued
line appropriately. The preferred place to break around a binary
operator is after the operator, not before it.

I tend to prefer the following, which eschews the non-recommended \ at the end of a line, thanks to an opening parenthesis:
value = (get_row_data(original_parameters)
.refine_data(level=3)
.transfer_to_style_c())
One advantage of this syntax is that each method call is on its own line.
A similar kind of \-less structure is also often useful with string literals, so that they don't go beyond the recommended 79 character per line limit:
message = ("This is a very long"
" one-line message put on many"
" source lines.")
This is a single string literal, which is created efficiently by the Python interpreter (this is much better than summing strings, which creates multiple strings in memory and copies them multiple times until the final string is obtained).
Python's code formatting is nice.

What about this option:
value = get_row_data(original_parameters,
).refine_data(leval=3,
).transfer_to_style_c()
Note that commas are redundant if there are no other parameters but I keep them to maintain consistency.

The not quoting my own preference (although see comments on your question:)) or alternatives answer to this is:
Stick to the style guidelines on any project you have already - if not stated, then keep as consistent as you can with the rest of the code base in style.
Otherwise, pick a style you like and stick with that - and let others know somehow that's how you'd appreciate chained function calls to be written if not reasonably readable on one-line (or however you wish to describe it).

lexer error-handling PLY Python

The t_error() function is used to handle lexing errors that occur when illegal characters are detected. My question is: How can I use this function to get more specific information on errors? Like error type, in which rule or section the error appears, etc.

In general, there is only very limited information available to the t_error() function. As input, it receives a token object where the value has been set to the remaining input text. Analysis of that text is entirely up to you. You can use the t.lexer.skip(n) function to have the lexer skip ahead by a certain number of characters and that's about it.
There is no notion of an "error type" other than the fact that there is an input character that does not match the regular expression of any known token. Since the lexer is decoupled from the parser, there is no direct way to get any information about the state of the parsing engine or to find out what grammar rule is being parsed. Even if you could get the state (which would simply be the underlying state number of the LALR state machine), interpretation of it would likely be very difficult since the parser could be in the intermediate stages of matching dozens of possible grammar rules looking for reduce actions.
My advice is as follows: If you need additional information in the t_error() function, you should set up some kind of object that is shared between the lexer and parser components of your code. You should explicitly make different parts of your compiler update that object as needed (e.g., it could be updated in specific grammar rules).
Just as aside, there are usually very few courses of action for a bad token. Essentially, you're getting input text that doesn't any known part of the language alphabet (e.g., no known symbol). As such, there's not even any kind of token value you can give to the parser. Usually, the only course of action is to report the bad input, throw it out, and continue.
As a followup to Raymond's answer, I would also not advise modifying any attribute of the lexer object in t_error().

Ply includes an example ANSI-C style lexer in a file called cpp.py. It has an example of how to extract some information out of t_error():
def t_error(t):
t.type = t.value[0]
t.value = t.value[0]
t.lexer.skip(1)
return t
In that function, you can also access the lexer's public attributes:
lineno - Current line number
lexpos - Current position in the input string
There are also some other attributes that aren't listed as public but may provide some useful diagnostics:
lexstate - Current lexer state
lexstatestack - Stack of lexer states
lexstateinfo - State information
lexerrorf - Error rule (if any)

There is indeed a way of managing errors in PLY, take a look at this very interesting resentation:
http://www.slideshare.net/dabeaz/writing-parsers-and-compilers-with-ply
and at chapter 6.8.1 of
http://www.dabeaz.com/ply/ply.html#ply_nn3

## in python using notepad++ syntax coloring

In my editor (notepad++) in Python script edit mode, a line
## is this a special comment or what?
Turns a different color (yellow) than a normal #comment.
What's special about a ##comment vs a #comment?

From the Python point of view, there's no difference. However, Notepad++'s highlighter considers the ## sequence as a STRINGEOL, which is why it colours it this way. See this thread.

I thought the difference had something to do with usage:
#this is a code block header
vs.
##this is a comment
I know Python doesn't care one way or the other, but I thought it was just convention to do it that way.

Also, in a different situations:
Comment whose first line is a double hash:
This is used by doxygen and Fredrik Lundh's PythonDoc. In doxygen,
if there's text on the line with the double hash, it is treated as
a summary string. I dislike this convention because it seems too
likely to result in false positives. E.g., if you comment-out a
region with a comment in it, you get a double-hash.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.