How to wrap big integer definition to comply with pep8? - python

my code contains this line, which is only a (big) integer initialization:
myvalue = 0xcc9e4307e00db722fc71e019c7c74c3cd23e056d0c7cb683b9e3c1549eee3d309a6106f819417701108b9424247cc5e97a8c963a4c493573ab12d890f221d495
When I run pep8 on the script, I get a E501 line too long.
What is the most convenient way to have my code pep8-compliant?

Integer literals can't be broken across multiple lines. Your options are, in order of preference:
Add a pragma telling the linter to ignore the line. PEP8 is a guideline, not a requirement.
Calculate the number from smaller parts.
Convert from a string using int('..', 16), where you break the string over multiple lines.
You should also consider moving the number out of your python code altogether; move it into a configuration file read at start-up, for example.

Related

Python : Fast search a valid substring for text from list of substrings

I need a fast and efficient method for searching pattern string from a list of many pattern strings which are valid substring of a string.
Conditions -
I have a list of 100 pattern strings added in a particular sequence (known).
The test case file is of size 35 GB and contains long strings in subsequent
lines
Ask -
I have to traverse the file and for each line, I have to search for a matched pattern string that is a valid substring of the line (whichever comes first from the list of 100 pattern strings).
Example -
pattern_strings = ["earth is round and huge","earth is round", "mars is small"]
Testcase file contents -
Among all the planets, the earth is round and mars is small.
..
..
Hence for the first line, the string at index 1 should qualify the condition.
Currently, I am trying to do a linear search -
def search(line,list_of_patterns):
for pat in list_of_patterns:
if pat in line:
return pat
else:
continue
return -1
The current run time is 21 minutes. The intent is to reduce it further. Need suggestions!
One trick I know of, though it has nothing to do with changing your existing code, is to try to run your code with PyPy rather than the standard CPython interpreter. That could be one trick that does significantly speed up execution.
https://www.pypy.org/features.html
As I have installed and used it myself, I can tell you know that installation is fairly simple.
This is one option if you do not want to change your code.
Another suggestion would be to time your code or use profilers to see where the bottleneck is and what is taking a relatively long amount of time.
Code-wise, you could avoid for loop and try these methods: https://betterprogramming.pub/how-to-replace-your-python-for-loops-with-map-filter-and-reduce-c1b5fa96f43a
A final option would be to write that piece of code in a faster more performant language such as C++ and call that .exe (if on Windows) from Python.

What is the proper level of indent for hanging indent with type hinting in python?

What is the proper syntax for a hanging indent for a method with multiple parameters and type hinting?
Align under first parameter
def get_library_book(self,
book_id: str,
library_id: str
)-> Book:
Indent one level beneath
def get_library_book(
self,
book_id: str,
library_id: str
) -> Book:
PEP8 supports the Indent one level beneath case, but does not specify if Align under first parameter is allowed. It states:
When using a hanging indent the following should be considered; there
should be no arguments on the first line and further indentation
should be used to clearly distinguish itself as a continuation line.
PEP8 has many good ideas in it, but I wouldn't rely on it to decide this kind of question about whitespace. When I studied PEP8's recommendations on whitespace, I found them to be inconsistent and even contradictory.
Instead, I would look at general principles that apply to nearly all programming languages, not just Python.
The column alignment shown in the first example has many disadvantages, and I don't use or allow it in any of my projects.
Some of the disadvantages:
If you change the function name so its length is different, you must realign all of the parameters.
When you do that realignment, your source control diffs are cluttered with unnecessary whitespace changes.
As the code is updated and maintained, it's likely that you'll miss some of the alignment when renaming variables, leading to misaligned code.
You get much longer line lengths.
The alignment doesn't work in a proportional font. (Yes, some developers prefer proportional fonts, and if you avoid column alignment, your code will be equally readable in monospaced or proportional fonts.)
It gets even worse if you use column alignment in more complex cases. Consider this example:
let mut rewrites = try_opt!(subexpr_list.iter()
.rev()
.map(|e| {
rewrite_chain_expr(e,
total_span,
context,
max_width,
indent)
})
.collect::<Option<Vec<_>>>());
This is Rust code from the Servo browser, whose coding style mandates this kind of column alignment. While it isn't Python code, exactly the same principles apply in Python or nearly any language.
It should be apparent in this code sample how the use of column alignment leads to a bad situation. What if you needed to call another function, or had a longer variable name, inside that nested rewrite_chain_expr call? You're just about out of room unless you want very long lines.
Compare the above with either of these versions which use a purely indentation-based style like your second Python example:
let mut rewrites = try_opt!(
subexpr_list
.iter()
.rev()
.map( |e| {
rewrite_chain_expr( e, total_span, context, max_width, indent )
})
.collect::<Option<Vec<_>>>()
);
Or, if the parameters to rewrite_chain_expr were longer or if you just wanted shorter lines:
let mut rewrites = try_opt!(
subexpr_list
.iter()
.rev()
.map( |e| {
rewrite_chain_expr(
e,
total_span,
context,
max_width,
indent
)
})
.collect::<Option<Vec<_>>>()
);
In contrast to the column-aligned style, this pure indentation style has many advantages and no disadvantages at all.
Appart from Terrys answer, take an example from typeshed which is the project on Python's GitHub for annotating the stdlib with stubs.
For example, in importlib.machinery (and in other cases if you look) annotations are done using your first form, for example:
def find_module(cls, fullname: str,
path: Optional[Sequence[importlib.abc._Path]]
) -> Optional[importlib.abc.Loader]:
Read the previous line of PEP 8 more carefully, the part before "or using a hanging indent".
Continuation lines should align wrapped elements either vertically using Python's implicit line joining inside parentheses, brackets and braces, or using a hanging indent.
This is intended to cover the first "yes' example, and your first example above.
# Aligned with opening delimiter.
foo = long_function_name(var_one, var_two,
var_three, var_four)

How to let pyflakes ignore some errors?

I am using SublimePythonIDE which is using pyflakes.
There are some errors that I would like it to ignore like:
(E501) line too long
(E101) indentation contains mixed spaces and tabs
What is the easiest way to do that?
Configuring a plugin in Sublime almost always uses the same procedure: Click on Preferences -> Package Settings -> Plugin Name -> Settings-Default to open the (surprise surprise) default settings. This file generally contains all the possible settings for the plugin, usually along with comments explaining what each one does. This file cannot be modified, so to customize any settings you open Preferences -> Package Settings -> Plugin Name -> Settings-User. I usually copy the entire contents of the default settings into the user file, then customize as desired, then save and close.
In the case of this particular plugin, while it does use pyflakes (as advertised), it also makes use of pep8, a style checker that makes use of the very same PEP-8 official Python style guide I mentioned in the comments. This knowledge is useful because pyflakes does not make use of specific error codes, while pep8 does.
So, upon examination of the plugin's settings file, we find a "pep8_ignore" option as well as a "pyflakes_ignore" one. Since the error codes are coming from pep8, we'll use that setting:
"pep8_ignore": [ "E501", // line too long
"E303", // too many blank lines (3)
"E402" // module level import not at top of file
]
Please note that codes E121, E123, E126, E133, E226, E241, E242, and E704 are ignored by default because they are not rules unanimously accepted, and PEP 8 does not enforce them.
Regarding long lines:
Sometimes, long lines are unavoidable. PEP-8's recommendation of 79-character lines is based in ancient history when terminal monitors only had 80 character-wide screens, but it continues to this day for several reasons: it's backwards-compatible with old code, some equipment is still being used with those limitations, it looks good, it makes it easier on wider displays to have multiple files open side-by-side, and it is readable (something that you should always be keeping in mind when coding). If you prefer to have a 90- or 100-character limit, that's fine (if your team/project agrees with it), but use it consistently, and be aware that others may use different values. If you'd like to set pep8 to a larger value than its default of 80, just modify the "pep8_max_line_length" setting.
There are many ways to either decrease the character count of lines to stay within the limit, or split long lines into multiple shorter ones. In the case of your example in the comments:
flag, message = FacebookUserController.AddFBUserToDB(iOSUserId, fburl, fbsecret, code)
you can do a couple of things:
# shorten the module/class name
fbuc = FacebookUserController
# or
import FacebookUserController as fbuc
flag, message = fbuc.AddFBUserToDB(iOSUserId, fburl, fbsecret, code)
# or eliminate it all together
from FacebookUserController import AddFBUserToDB
flag, message = AddFBUserToDB(iOSUserId, fburl, fbsecret, code)
# split the function's arguments onto separate lines
flag, message = FacebookUserController.AddFBUserToDB(iOSUserId,
fburl,
fbsecret,
code)
# There are multiple ways of doing this, just make sure the subsequent
# line(s) are indented. You don't need to escape newlines inside of
# braces, brackets, and parentheses, but you do need to outside of them.
As others suggest, possibly heed the warnings. But in those cases where you can't, you can add # NOQA to the end offending lines. Note the two spaces before the # as that too is a style thing that will be complained about.
And if pyflakes is wrapped in flake8 that allows ignoring by specific errors.
For example in a file in the project put or add to tox.ini:
[flake8]
exclude = .tox,./build
filename = *.py
ignore = E501,E101
This is possibly a duplicate with How do I get Pyflakes to ignore a statement?

Regexp to match chords, issue with national accents

I am dealing with this problem. I have *.txt file containing tens of songs. Each song might consist of
name
lines with chords
lines with lyrics
blank lines
I'm writing Python script, which reads the file by lines. I need to recognise the lines with chords. For that purpose I have decided to use regular expressions, since it looks like playful but strong tool for such tasks. I am new to regexp, I've done this tutorial (which I am rather fond of). I have written something like this
\b ?\(?([AC-Hac-h]{1})(#|##|b|bb)?(is|mi|maj|sus)?\d?[ \n(/\(\))?]
I am not very happy with that, since it does not do the job properly. One of the problems is that the language of the songs uses a lot of accents. The second one: the chords might come in pairs - e.g. C(D), h/e. You can see my approach here.
Note
For better readability in final script I would split the regexp into more variables and those then add together.
Edit
After rereading my question I thought, that my goal might not be clear enough. I would like to much different types of chords for instance:
C, C#, Cis, c#, Cmaj, Cmi, Csus, C7, C#7, Db, Dbsus
Also sometimes there might be (no more than two) chord next to each other such as this: C7/D7, Cmi(a). The best solution would be to catch those "pairs" together in one that is match C7/D7 not C7 and D7. I think, that with this additional condition it might be a bit robust, but if it would be unnecessarily difficult I might go with the (I assume) easier version (meaning: matching C7 and D7 instead of C7/D7) and deal with this later separately.
Your Python script reads the text file line by line and you want to find out with a regular expression if the current line is a line with chords or with other information.
Perhaps it is enough to apply the regular expression ^[\t #()/\dAC-Hac-jmsu]+$ on each line. If the regular expression does not return a match, the line contains characters not being allowed in a line with chords. Perhaps this simple regular expression using only a single character class definition is enough.
But it could be that a line with a name or lyrics matches also the expression above. For your example this is not the case, but it could be. In such a case I would suggest to use first the function strip() on every line to remove spaces and tabs from begin and end of every line. And then apply the following regular expression
^(?:[#()/\dAC-Hac-jmsu]{1,6}[\t ]+?)*[#()/\dAC-Hac-jmsu]{1,6}$
The difference is that now each string not containing a space or tab character must have a length between 1 to 6. Longer strings are not allowed. With this additional rule it could be that there are no false positive anymore on detection of lines with chords.
The problems for the chords line detection rule are definitely the letters as a name or a lyric text consisting only of the letters allowed for chords could match too. A solution would be to create a list of strings consisting only of letters which are allowed for chords and using them in an OR expression. That would avoid most likely a false positive by a name or lyric string. With the complete list of chord strings it is most likely also possible to define the rule shorter without the need to list all chord strings in an OR expression.

Python style for `chained` function calls

More and more we use chained function calls:
value = get_row_data(original_parameters).refine_data(leval=3).transfer_to_style_c()
It can be long. To save long line in code, which is prefered?
value = get_row_data(
original_parameters).refine_data(
leval=3).transfer_to_style_c()
or:
value = get_row_data(original_parameters)\
.refine_data(leval=3)\
.transfer_to_style_c()
I feel it good to use backslash \, and put .function to new line. This makes each function call has it own line, it's easy to read. But this sounds not preferred by many. And when code makes subtle errors, when it's hard to debug, I always start to worry it might be a space or something after the backslash (\).
To quote from the Python style guide:
Long lines can be broken over multiple lines by wrapping expressions
in parentheses. These should be used in preference to using a
backslash for line continuation. Make sure to indent the continued
line appropriately. The preferred place to break around a binary
operator is after the operator, not before it.
I tend to prefer the following, which eschews the non-recommended \ at the end of a line, thanks to an opening parenthesis:
value = (get_row_data(original_parameters)
.refine_data(level=3)
.transfer_to_style_c())
One advantage of this syntax is that each method call is on its own line.
A similar kind of \-less structure is also often useful with string literals, so that they don't go beyond the recommended 79 character per line limit:
message = ("This is a very long"
" one-line message put on many"
" source lines.")
This is a single string literal, which is created efficiently by the Python interpreter (this is much better than summing strings, which creates multiple strings in memory and copies them multiple times until the final string is obtained).
Python's code formatting is nice.
What about this option:
value = get_row_data(original_parameters,
).refine_data(leval=3,
).transfer_to_style_c()
Note that commas are redundant if there are no other parameters but I keep them to maintain consistency.
The not quoting my own preference (although see comments on your question:)) or alternatives answer to this is:
Stick to the style guidelines on any project you have already - if not stated, then keep as consistent as you can with the rest of the code base in style.
Otherwise, pick a style you like and stick with that - and let others know somehow that's how you'd appreciate chained function calls to be written if not reasonably readable on one-line (or however you wish to describe it).

Categories