PEP8 E226 recommendation - python

The E226 error code is about "missing whitespace around arithmetic operator".
I use Anaconda's package in Sublime which will highlight as a PEP8 E226 violation for example this line:
hypot2 = x*x + y*y
But in Guido's PEP8 style guide that line is actually shown as an example of recommended use of spaces within operators.
Question: which is the correct guideline? Always spaces around operators or just in some cases (as Guido's recommendation shows)?
Also: who decides what goes into PEP8? I would've thought Guido's recommendation would pretty much determine how that works.

The maintainers of the PEP8 tool decide what goes into it.
As you noticed, these do not always match the PEP8 style guide exactly. In this particular case, I don't know whether it's an oversite by the maintainers, or a deliberate decision. You'd have to ask them to find out, or you might find the answer in the commit history.
Guido recently asked the maintainers of pep8 and pep257 tools to rename them, to avoid this confusion. See this issue for example. As a result, the tools are getting renamed to pycodestyle and pydocstyle, respectively.

It says in PEP8:
If operators with different priorities are used, consider adding whitespace around the operators with the lowest priority(ies). Use your own judgment; however, never use more than one space, and always have the same amount of whitespace on both sides of a binary operator.
(Emphasis is my own).
In the listed example, + has a lower priority, so the BDFL elects to use whitespace around it and uses no whitespace around higher priority *.

In this case that happened to me. We should have space always between numbers or variables and operations.
example:
a=b*4 wrong
a = b * 4 correct

Related

Center alignment print format specifier implementation detail

In Python (2.7 and above, probably other versions too), it is possible to create a string that is centered by doing something like this:
'{:^10}'.format('abc')
The meaning of 'centered' is pretty clear when the total number of padding characters is even, but what about when it is odd?
When I print the above in vanilla C Python (and IPython), I get
' abc '
This appears to put the extra pad character on the right. However, the docs do not explicitly mention a spec for this behavior. Is the behavior of the centering format specifier in the presence of an odd number of padding characters specified somewhere, or is it an implementation detail that is not to be relied on?
You should be able to rely on this. I don't know that it is documented anywhere, but the standard python test suite asserts that the extra space is added on the right. Since test is part of the standard library, it's a good starting point for other python implementations and they'll be aiming for compliance with the reference implementation wherever possible.

Techniques for condensing code

As I understand it, python is specifically designed to force people to use indentation but is it possible to break this rule. As an example:
y=[1,2,3]
print('ListY:')
for x in y:
print(x)
Now, I can condense the last two lines as such:
for x in y:print(x)
but I cannot do:
print('ListY');for x in y:print(x)
But is there a way you can?
First of all, I should say that I agree that such tricks may be of some use. Not too often, though. A good example is code in doctests. It is usually clear enough to be readable even when compacted, and making it compact often makes less problems than making it "as readable as possible". However, for regular code joining lines is usually not a good practice. When you are not able to create a breakpoint inside if or for statement, it's usually a bigger problem than an extra line. Also the coverage tools give more information in case you do not practice such tricks.
However, answering your question, it seems there is no way to do what you want. There are many limitations in using ;. Compound statements can not be used with ;. Usually these limitations are reasonable, but sometimes I also regret they are so strict.
UPD: But if you are very focused on making it a one-liner, there is a lot of tricks. For example, generators and list comprehensions (instead of for), reduce() and so on, and in Python 3 even print() can be used inside them.
I'm not entering in why you would ever want to do that on Python, but no, you can't do that.
There are two types of statements in Python: simple statements that span one line and compund statements that span several lines. You can put several simple statements into one line, separating them by semicolons, but you can't put a compound statement after a simple statement.
Namely (straight from the Python Language Reference):
statement ::= stmt_list NEWLINE | compound_stmt
stmt_list ::= simple_stmt (";" simple_stmt)* [";"]
def f(g,xs):
for x in xs:
g(x)
print('ListY');f(print,[1,2,3])
As the other answers say...
You could (if you really wanted) do something like this although you wouldn´t.
Often taking a "functional" approach can shorten code (or at least allows for cleaner re-use of code) Have a look at pythons ´partial´function and others in the functools library

Syntax recognizer in python

I need a module or strategy for detecting that a piece of data is written in a programming language, not syntax highlighting where the user specifically chooses a syntax to highlight. My question has two levels, I would greatly appreciate any help, so:
Is there any package in python that receives a string(piece of data) and returns if it belongs to any programming language syntax ?
I don't necessarily need to recognize the syntax, but know if the string is source code or not at all.
Any clues are deeply appreciated.
Maybe you can use existing multi-language syntax highlighters. Many of them can detect language a file is written in.
You could have a look at methods around baysian filtering.
My answer somewhat depends on the amount of code you're going to be given. If you're going to be given 30+ lines of code, it should be fairly easy to identify some unique features of each language that are fairly common. For example, tell the program that if anything matches an expression like from * import * then it's Python (I'm not 100% sure that phrasing is unique to Python, but you get the gist). Other things you could look at that are usually slightly different would be class definition (i.e. Python always starts with 'class', C will start with a definition of the return so you could check to see if there is a line that starts with a data type and has the formatting of a method declaration), conditionals are usually formatted slightly differently, etc, etc. If you wanted to make it more accurate, you could introduce some sort of weighting system, features that are more unique and less likely to be the result of a mismatched regexp get a higher weight, things that are commonly mismatched get a lower weight for the language, and just calculate which language has the highest composite score at the end. You could also define features that you feel are 100% unique, and tell it that as soon as it hits one of those, to stop parsing because it knows the answer (things like the shebang line).
This would, of course, involve you knowing enough about the languages you want to identify to find unique features to look for, or being able to find people that do know unique structures that would help.
If you're given less than 30 or so lines of code, your answers from parsing like that are going to be far less accurate, in that case the easiest best way to do it would probably be to take an appliance similar to Travis, and just run the code in each language (in a VM of course). If the code runs successfully in a language, you have your answer. If not, you would need a list of errors that are "acceptable" (as in they are errors in the way the code was written, not in the interpreter). It's not a great solution, but at some point your code sample will just be too short to give an accurate answer.

When should I use underscores between words in Python function names (according to the style guide)?

The style guide says that underscores should be used, but many Python built-in functions do not. What should the criteria be for underscores? I would like to stay consistent with Python style guidelines but this area seems a little vague. Is there a good rule of thumb, is it based on my own judgment, or does it just not really matter either way?
For example, should I name my function isfoo() to match older functions, or should I name it is_foo() to match the style guideline?
The style guide leaves this up to you:
Function names should be lowercase, with words separated by underscores as necessary to improve readability.
In other words, if you feel like adding an underscore to your method name would make it easier to read -- by all means go ahead an throw one (or two!) in there. If you think that there are enough other similar cases in the standard library, then feel free to leave it out. There is no hard rule here (although others may disagree about that point). The only thing which I think is universally accepted is that you shouldn't use "CapWords" or "camelCase" for your methods. "CapWords" should be reserved for classes, and I'm not sure of any precedence for "camelCase" anywhere (though I could be wrong about that) ...
The style guide says that underscores should be used, but many Python built-in functions do not.
Using the built-in isinstance() as an example, this was written in 1997.
PEP 8 -- Style Guide for Python Code wasn't published until 2001, which may account for this discrepancy.
What should the criteria be for underscores? I would like to stay consistent with Python style guidelines but this area seems a little vague. Is there a good rule of thumb, is it based on my own judgment, or does it just not really matter either way?
"When in doubt, use your best judgment."
It's a style guide, not a law! Follow the guide where it is appropriate, bearing in mind the caveats (emphasis my own):
Function names should be lowercase, with words separated by underscores as necessary to improve readability.
mixedCase is allowed only in contexts where that's already the prevailing style (e.g. threading.py), to retain backwards compatibility.
Therefore...
For example, should I name my function isfoo() to match older functions, or should I name it is_foo() to match the style guideline?
You should probably call it isfoo(), to follow the convention of similar functions. It is readable. "Readability counts."

Regular expression implementation details

A question that I answered got me wondering:
How are regular expressions implemented in Python? What sort of efficiency guarantees are there? Is the implementation "standard", or is it subject to change?
I thought that regular expressions would be implemented as DFAs, and therefore were very efficient (requiring at most one scan of the input string). Laurence Gonsalves raised an interesting point that not all Python regular expressions are regular. (His example is r"(a+)b\1", which matches some number of a's, a b, and then the same number of a's as before). This clearly cannot be implemented with a DFA.
So, to reiterate: what are the implementation details and guarantees of Python regular expressions?
It would also be nice if someone could give some sort of explanation (in light of the implementation) as to why the regular expressions "cat|catdog" and "catdog|cat" lead to different search results in the string "catdog", as mentioned in the question that I referenced before.
Python's re module was based on PCRE, but has moved on to their own implementation.
Here is the link to the C code.
It appears as though the library is based on recursive backtracking when an incorrect path has been taken.
Regular expression and text size n
a?nan matching an
Keep in mind that this graph is not representative of normal regex searches.
http://swtch.com/~rsc/regexp/regexp1.html
There are no "efficiency guarantees" on Python REs any more than on any other part of the language (C++'s standard library is the only widespread language standard I know that tries to establish such standards -- but there are no standards, even in C++, specifying that, say, multiplying two ints must take constant time, or anything like that); nor is there any guarantee that big optimizations won't be applied at any time.
Today, F. Lundh (originally responsible for implementing Python's current RE module, etc), presenting Unladen Swallow at Pycon Italia, mentioned that one of the avenues they'll be exploring is to compile regular expressions directly to LLVM intermediate code (rather than their own bytecode flavor to be interpreted by an ad-hoc runtime) -- since ordinary Python code is also getting compiled to LLVM (in a soon-forthcoming release of Unladen Swallow), a RE and its surrounding Python code could then be optimized together, even in quite aggressive ways sometimes. I doubt anything like that will be anywhere close to "production-ready" very soon, though;-).
Matching regular expressions with backreferences is NP-hard, which is at least as hard as NP-Complete. That basically means that it's as hard as any problem you're likely to encounter, and most computer scientists think it could require exponential time in the worst case. If you could match such "regular" expressions (which really aren't, in the technical sense) in polynomial time, you could win a million bucks.

Categories