I'm studying operators in Python and came across multiple concepts that decide order of evaluation when there are multiple operators in an expression.
I understand the concept of Operator precedence, and came across the operator precedence table in Python docs. There were a few things that confused me there,
Why are assignment and augmented-assignment operators not included in the list?
What really counts as an operator in Python? (And is there any difference between operator and a keyword).
The later question stems from what the categorization of operator that I've read at various places on the internet, they categorize operators in the following categories
Arithmetic Operators
Comparison Operators
Assignment Operators
Logical Operators
Bitwise Operators
Membership Operators
Identity Operators
But when I saw keywords like lambda, if-else in the operator precedence table in the Python documentation, it confused me. Moreover the operator mapping table in documentation for operator module includes keywords like del which are neither part of usual categorization on the internet and the precedence table in Python docs.
My final question is "is there any grouping that can be done about the
categories of operators and their behavior (precedence, chaining, associativity, etc) in Python? Or should I be studying every operator and its behavior independently?"
Why are assignment and augmented-assignment operators not included in the list?
Because they're not true operators. We sometimes call them operators for convenience, but they cannot form expressions and therefore do not have precedence relative to real operators.
What really counts as an operator in Python? (And is there any difference between operator and a keyword).
According to the doc on operators, it seems to be any punctuation that can form an expression. For simplicity, I prefer to define an operator as any punctuation or keyword that can form an expression.
But when I saw keywords like lambda, if-else in the operator precedence table in the Python documentation, it confused me.
Those are keywords that can form expressions, therefore they must have operator precedence. Note that if-else can be an expression or a block statement, depending on the syntax:
# Expression
a if condition else b
# Statement
if condition:
pass
else:
pass
Moreover the operator mapping table in documentation for operator module includes keywords like del which are neither part of usual categorization on the internet and the precedence table in Python docs.
del is not an operator, because it's used only to get a side-effect. However, it can potentially modify an object in place, so it makes sense to include a function in the operator library that does the same thing. The other use for del is to remove a variable, which is something a function can't do.
My final question is "is there any grouping that can be done about the
categories of operators and their behavior (precedence, chaining, associativity, etc) in Python? Or should I be studying every operator and its behavior independently?"
Operators can always be combined to form larger expressions, so they must have precedence and associativity to define the meaning of a non-trivial expression. Non-operator syntax usually forms either a statement or a group of statements.
I am struggling to understand what the &^ and &^= operators mean in Go. I cannot find an answer either in the documentation (which states it's a bit clear operator, but it does not help me much) or by trials.
In particular, I want to know if there is an equivalent in Python.
These are the "AND NOT" or "bit clear" operators, "useful" for clearing those bits of the left-hand side operand that are set in right-side operand.
I put the "useful" in quotes since all other languages that derive the bitwise operations from C one does this with bitwise AND & and bitwise NOT ~; thus 5 &^ 2 would be just 5 & ~2 in Python; and a &^= 3 of Go would be a &= ~3 in Python.
I know the triple quote strings are used as docstrings, but is there a real need to have two string literals?
Are there any use case when identifying between single-line & multi-line is useful.
in Clojure we have 1 string literal, is multi-line and we use it as docstring. So why the difference in python?
The advantage of having to be explicit about creating a multi-line string literal is probably best demonstrated with an example:
with open("filename.ext) as f:
for line in f:
print(line.upper())
Of course, any decent syntax-highlighting editor will catch that, but:
It isn't always the case that you're using a syntax-highlighting editor
Python has no control over what editor you are using.
Two of Python's design principles are that
errors should never pass silently, and
explicit is better than implicit.
Outside docstrings, multi-line strings are rarely used in Python, so the example above is much more likely to occur (everyone mistypes sometimes) than the case where you want a multi-line string, but forgot to explicitly say so by triple-quoting.
It's similar to Python's use of significant whitespace, in that enforcing good, consistent indentation practice means that errors are much more easily caught than in e.g. a brace-delimited language.
More and more we use chained function calls:
value = get_row_data(original_parameters).refine_data(leval=3).transfer_to_style_c()
It can be long. To save long line in code, which is prefered?
value = get_row_data(
original_parameters).refine_data(
leval=3).transfer_to_style_c()
or:
value = get_row_data(original_parameters)\
.refine_data(leval=3)\
.transfer_to_style_c()
I feel it good to use backslash \, and put .function to new line. This makes each function call has it own line, it's easy to read. But this sounds not preferred by many. And when code makes subtle errors, when it's hard to debug, I always start to worry it might be a space or something after the backslash (\).
To quote from the Python style guide:
Long lines can be broken over multiple lines by wrapping expressions
in parentheses. These should be used in preference to using a
backslash for line continuation. Make sure to indent the continued
line appropriately. The preferred place to break around a binary
operator is after the operator, not before it.
I tend to prefer the following, which eschews the non-recommended \ at the end of a line, thanks to an opening parenthesis:
value = (get_row_data(original_parameters)
.refine_data(level=3)
.transfer_to_style_c())
One advantage of this syntax is that each method call is on its own line.
A similar kind of \-less structure is also often useful with string literals, so that they don't go beyond the recommended 79 character per line limit:
message = ("This is a very long"
" one-line message put on many"
" source lines.")
This is a single string literal, which is created efficiently by the Python interpreter (this is much better than summing strings, which creates multiple strings in memory and copies them multiple times until the final string is obtained).
Python's code formatting is nice.
What about this option:
value = get_row_data(original_parameters,
).refine_data(leval=3,
).transfer_to_style_c()
Note that commas are redundant if there are no other parameters but I keep them to maintain consistency.
The not quoting my own preference (although see comments on your question:)) or alternatives answer to this is:
Stick to the style guidelines on any project you have already - if not stated, then keep as consistent as you can with the rest of the code base in style.
Otherwise, pick a style you like and stick with that - and let others know somehow that's how you'd appreciate chained function calls to be written if not reasonably readable on one-line (or however you wish to describe it).
I was recently bitten by a subtle bug.
char ** int2str = {
"zero", // 0
"one", // 1
"two" // 2
"three",// 3
nullptr };
assert( int2str[1] == std::string("one") ); // passes
assert( int2str[2] == std::string("two") ); // fails
If you have godlike code review powers you'll notice I forgot the , after "two".
After the considerable effort to find that bug I've got to ask why would anyone ever want this behavior?
I can see how this might be useful for macro magic, but then why is this a "feature" in a modern language like python?
Have you ever used string literal concatenation in production code?
Sure, it's the easy way to make your code look good:
char *someGlobalString = "very long "
"so broken "
"onto multiple "
"lines";
The best reason, though, is for weird printf formats, like type forcing:
uint64_t num = 5;
printf("Here is a number: %"PRIX64", what do you think of that?", num);
There are a bunch of those defined, and they can come in handy if you have type size requirements. Check them all out at this link. A few examples:
PRIo8 PRIoLEAST16 PRIoFAST32 PRIoMAX PRIoPTR
It's a great feature that allows you to combine preprocessor strings with your strings.
// Here we define the correct printf modifier for time_t
#ifdef TIME_T_LONG
#define TIME_T_MOD "l"
#elif defined(TIME_T_LONG_LONG)
#define TIME_T_MOD "ll"
#else
#define TIME_T_MOD ""
#endif
// And he we merge the modifier into the rest of our format string
printf("time is %" TIME_T_MOD "u\n", time(0));
I see several C and C++ answers but none of the really answer why or really what was the rationale for this feature? In C++ this is feature comes from C99 and we can find the rationale for this feature by going to Rationale for International Standard—Programming Languages—C section 6.4.5 String literals which says (emphasis mine):
A string can be continued across multiple lines by using the backslash–newline line continuation, but this requires that the continuation of the string start in the first position of the next line. To permit more flexible layout, and to solve some preprocessing problems (see §6.10.3), the C89 Committee introduced string literal concatenation. Two string literals in a row are pasted together, with no null character in the middle, to make one combined string literal. This addition to the C language allows a programmer to extend a string literal beyond the end of a physical line without having to use the backslash–newline mechanism and thereby destroying the indentation scheme of the program. An explicit concatenation operator was not introduced because the concatenation is a lexical construct rather than a run-time operation.
Python which seems to have the same reason, this reduces the need for ugly \ to continue long string literals. Which is covered in section 2.4.2 String literal concatenation of the
The Python Language Reference.
Cases where this can be useful:
Generating strings including components defined by the preprocessor (this is perhaps the largest use case in C, and it's one I see very, very frequently).
Splitting string constants over multiple lines
To provide a more concrete example for the former:
// in version.h
#define MYPROG_NAME "FOO"
#define MYPROG_VERSION "0.1.2"
// in main.c
puts("Welcome to " MYPROG_NAME " version " MYPROG_VERSION ".");
I'm not sure about other programming languages, but for example C# doesn't allow you to do this (and I think this is a good thing). As far as I can tell, most of the examples that show why this is useful in C++ would still work if you could use some special operator for string concatenation:
string someGlobalString = "very long " +
"so broken " +
"onto multiple " +
"lines";
This may not be as comfortable, but it is certainly safer. In your motivating example, the code would be invalid unless you added either , to separate elements or + to concatenate strings...
From the python lexical analysis reference, section 2.4.2:
This feature can be used to reduce the
number of backslashes needed, to split
long strings conveniently across long
lines, or even to add comments to
parts of strings
http://docs.python.org/reference/lexical_analysis.html
For rationale, expanding and simplifying Shafik Yaghmour’s answer: string literal concatenation originated in C (hence inherited by C++), as did the term, for two reasons (references are from Rationale for the ANSI C Programming Language):
For formatting: to allow long string literals to span multiple lines with proper indentation – in contrast to line continuation, which destroys the indentation scheme (3.1.4 String literals); and
For macro magic: to allow the construction of string literals by macros (via stringizing) (3.8.3.2 The # operator).
It is included in the modern languages Python and D because they copied it from C, though in both of these it has been proposed for deprecation, as it is bug-prone (as you note) and unnecessary (since one can just have a concatenation operator and constant folding for compile-time evaluation; you can’t do this in C because strings are pointers, and so you can’t add them).
It’s not simple to remove because that breaks compatibility, and you have to be careful about precedence (implicit concatenation happens during lexing, prior to operators, but replacing this with an operator means you need to be careful about precedence), hence why it’s still present.
Yes, it is in used production code. Google Python Style Guide: Line length specifies:
When a literal string won't fit on a single line, use parentheses for implicit line joining.
x = ('This will build a very long long '
'long long long long long long string')
See “String literal concatenation” at Wikipedia for more details and references.
So that you can split long string literals across lines.
And yes, I've seen it in production code.
While people have taken the words out of my mouth about the practical uses of the feature, nobody has so far tried to defend the choice of syntax.
For all I know, the typo that can slip through as a result was probably just overlooked. After all, it seems robustness against typos wasn't at the front of Dennis's mind, as shown further by:
if (a = b);
{
printf("%d", a);
}
Furthermore, there's the possible view that it wasn't worth using up an extra symbol for concatenation of string literals - after all, there isn't much else that can be done with two of them, and having a symbol there might create temptation to try to use it for runtime string concatenation, which is above the level of C's built-in features.
Some modern, higher-level languages based on C syntax have discarded this notation presumably because it is typo-prone. But these languages have an operator for string concatenation, such as + (JS, C#), . (Perl, PHP), ~ (D, though this has also kept C's juxtaposition syntax), and constant folding (in compiled languages, anyway) means that there is no runtime performance overhead.
Another sneaky error I've seen in the wild is people presuming that two single quotes are a way to escape the quote (as it is commonly used for double quotes in CSV files, for example), so they'll write things like the following in python:
print('Beggars can''t be choosers')
which outputs Beggars cant be choosers instead of the Beggars can't be choosers the coder desired.
As for the original "why" question: why is this a "feature" in a modern language like python? - in my opinion, I concur with the OP, it shouldn't be.