I'm trying out pylint to check my source code for conventions. Somehow some variable names are matched with the regex for constants (const-rgx) instead of the variable name regex (variable-rgx). How to match the variable name with variable-rgx? Or should I extend const-rgx with my variable-rgx stuff?
e.g.
C0103: 31: Invalid name "settings" (should match (([A-Z_][A-Z1-9_]*)|(__.*__))$)
Somehow some variable names are matched with the regex for constants (const-rgx) instead of the variable name regex (variable-rgx).
Are those variables declared on module level? Maybe that's why they are treated as constants (at least that's how they should be declared, according to PEP-8).
I just disable that warning because I don't follow those naming conventions.
To do that, add this line to the top of you module:
# pylint: disable-msg=C0103
If you want to disable that globally, then add it to the pylint command:
python lint.py --disable-msg=C0103 ...
(should match (([A-Z_][A-Z1-9_]*)|(__.*__))$)
like you said that is the const-rgx that is only matching UPPERCASE names, or names surrounded by double underscores.
the variables-rgx is
([a-z_][a-z0-9_]{2,30}$)
if your variable is called 'settings' that indeed should match the variables-rgx
I can think of only 2 reasons for this..
either settings is a constant or it is a bug in PyLint.
Related
I'm using lark, an excellent python parsing library.
It provides an Earley and LALR(1) parser and is defined through a custom EBNF format. (EBNF stands for Extended Backus–Naur form).
Lowercase definitions are rules, uppercase definitions are terminals. Lark also provides a weight for uppercase definitions to prioritize the matching.
I'm trying to define a grammar but I'm stuck with a behavior I can't seem to balance.
I have some rules with unnamed literals (the strings or characters between double-quotes):
directives: directive+
directive: "#" NAME arguments ?
directive_definition: description? "directive" "#" NAME arguments? "on" directive_locations
directive_locations: "SCALAR" | "OBJECT" | "ENUM"
arguments: "(" argument+ ")"
argument: NAME ":" value
union_type_definition: description? "union" NAME directives? union_member_types?
union_member_types: "=" NAME ("|" NAME)*
description: STRING | LONG_STRING
STRING: /("(?!"").*?(?<!\\)(\\\\)*?"|'(?!'').*?(?<!\\)(\\\\)*?')/i
LONG_STRING: /(""".*?(?<!\\)(\\\\)*?"""|'''.*?(?<!\\)(\\\\)*?''')/is
NAME.2: /[_A-Za-z][_0-9A-Za-z]*/
It works well for 99% of use case. But if, in my parsed language, I use a directive which is called directive, everything breaks:
union Foo #something(test: 42) = Bar | Baz # This works
union Foo #directive(test: 42) = Bar | Baz # This fails
Here, the directive string is matched on the unnamed literal in the directive_definition rule when it should match the NAME.2 terminal.
How can I balance / adjust this so there is no ambiguity possible for the LALR(1) parser ?
Author of Lark here.
This misinterpretation happens because "directive" can be two different tokens: The "directive" string, or NAME. By default, Lark's LALR lexer always chooses the more specific one, namely the string.
So how can we let the lexer know that #directive is a name, and not just two constant strings?
Solution 1 - Use the Contextual Lexer
What would probably help in this situation (it's hard to be sure without the full grammar), is to use the contextual lexer, instead of the standard LALR(1) lexer.
The contextual lexer can communicate to some degree with the parser, to figure out which terminal makes more sense at each point. This is an algorithm that is unique to Lark, and you can use it like this:
parser = Lark(grammar, parser="lalr", lexer="contextual")
(This lexer can do anything the standard lexer can do and more, so in future versions it might become the default lexer.)
Solution 2 - Prefix the terminal
If the contextual lexer doesn't solve your collision, a more "classic" solution to this situation would be to define a directive token, something like:
DIRECTIVE: "#" NAME
Unlike your directive rule, this leaves no ambiguity to the lexer. There is a clear distinction between a directive, and the "directive" string (or NAME terminal).
And if all else fails, you can always use the Earley parser, which at the price of performance, will work with any grammar you give it, regardless of how many collisions there might be.
Hope this helps!
Edit: I'd just like to point out the the contextual lexer is the default for LALR now, so it's enough to call:
parser = Lark(grammar, parser="lalr")
I'd like to escape ":" and/or "=" as the name in a configuration file.
Does anyone know how to achieve this?
I try backslash "\", it does not work.
If you're using Python 3, you don't need to. Look at the Python docs section on Customizing Parser Behavior. By default, configparser uses ":" and "=" as delimiters, but you can specify different delimiters when you create the configparser object:
import configparser
parser = configparser.ConfigParser(delimiters=('?', '*'))
In this example, the default delimiters have been replaced with a question mark and an asterisk. You can change the delimiters to whatever characters you want that won't conflict with the information you need to put in the config file.
The above listed method will only work for Python 3, as the Python 2 ConfigParser is hard-coded to recognize equal signs and colons as delimiters. According to this SO question, there is a backported configparser available for the 2.7 intepreter at https://pypi.python.org/pypi/configparser. See if that will work for you.
I'm setting up my credentials for the library: https://pypi.python.org/pypi/python-amazon-product-api/
Code for the relevant configparser on the project file here.
I'm wondering, what format should the config file variables be? Should strings be inserted inside quotes? Should there be spaces between the variable name and the equal sign?
How does this look?
[Credentials]
access_key=xxxxxxxxxxxxxxxxxxxxx
secret_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
associate_tag=xxxxxxxxxxxx
As taken from the documentation
The configuration file consists of sections, led by a [section] header and followed by name: value entries, with continuations in the style of RFC 822 (see section 3.1.1, “LONG HEADER FIELDS”); name=value is also accepted. Note that leading whitespace is removed from values. The optional values can contain format strings which refer to other values in the same section, or values in a special DEFAULT section. Additional defaults can be provided on initialization and retrieval. Lines beginning with '#' or ';' are ignored and may be used to provide comments.
Configuration files may include comments, prefixed by specific characters (# and ;). Comments may appear on their own in an otherwise empty line, or may be entered in lines holding values or section names. In the latter case, they need to be preceded by a whitespace character to be recognized as a comment. (For backwards compatibility, only ; starts an inline comment, while # does not.)
On top of the core functionality, SafeConfigParser supports interpolation. This means values can contain format strings which refer to other values in the same section, or values in a special DEFAULT section. Additional defaults can be provided on initialization.
For example:
[My Section]
foodir: %(dir)s/whatever
dir=frob
long: this value continues
in the next line
You are pretty free in writing what ever you want in the settings file.
In your particular case you just need to copy & paste your keys and tag and ConfigParser should do the rest.
Basically when I have a python file like:
python-code.py
and use:
import (python-code)
the interpreter gives me syntax error.
Any ideas on how to fix it? Are dashes illegal in python file names?
You should check out PEP 8, the Style Guide for Python Code:
Package and Module Names Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.
Since module names are mapped to file names, and some file systems are case insensitive and truncate long names, it is important that module names be chosen to be fairly short -- this won't be a problem on Unix, but it may be a problem when the code is transported to older Mac or Windows versions, or DOS.
In other words: rename your file :)
One other thing to note in your code is that import is not a function. So import(python-code) should be import python-code which, as some have already mentioned, is interpreted as "import python minus code", not what you intended. If you really need to import a file with a dash in its name, you can do the following::
python_code = __import__('python-code')
But, as also mentioned above, this is not really recommended. You should change the filename if it's something you control.
TLDR
Dashes are not illegal but you should not use them for 3 reasons:
You need special syntax to import files with dashes
Nobody expects a module name with a dash
It's against the recommendations of the Python Style Guide
If you definitely need to import a file name with a dash the special syntax is this:
module_name = __import__('module-name')
Curious about why we need special syntax?
The reason for the special syntax is that when you write import somename you're creating a module object with identifier somename (so you can later use it with e.g. somename.funcname). Of course module-name is not a valid identifier and hence the special syntax that gives a valid one.
You don't get why module-name is not valid identifier?
Don't worry -- I didn't either. Here's a tip to help you: Look at this python line: x=var1-var2. Do you see a subtraction on the right side of the assignment or a variable name with a dash?
PS
Nothing original in my answer except including what I considered to be the most relevant bits of information from all other answers in one place
The problem is that python-code is not an identifier. The parser sees this as python minus code. Of course this won't do what you're asking. You will need to use a filename that is also a valid python identifier. Try replacing the - with an underscore.
On Python 3 use import_module:
from importlib import import_module
python_code = import_module('python-code')
More generally,
import_module('package.subpackage.module')
You could probably import it through some __import__ hack, but if you don't already know how, you shouldn't. Python module names should be valid variable names ("identifiers") -- that means if you have a module foo_bar, you can use it from within Python (print foo_bar). You wouldn't be able to do so with a weird name (print foo-bar -> syntax error).
Although proper file naming is the best course, if python-code is not under our control, a hack using __import__ is better than copying, renaming, or otherwise messing around with other authors' code. However, I tried and it didn't work unless I renamed the file adding the .py extension. After looking at the doc to derive how to get a description for .py, I ended up with this:
import imp
try:
python_code_file = open("python-code")
python_code = imp.load_module('python_code', python_code_file, './python-code', ('.py', 'U', 1))
finally:
python_code_file.close()
It created a new file python-codec on the first run.
I'm pretty new to Python, and I want to develop my first serious open source project. I want to ask what is the common coding style for python projects. I'll put also what I'm doing right now.
1.- What is the most widely used column width? (the eternal question)
I'm currently sticking to 80 columns (and it's a pain!)
2.- What quotes to use? (I've seen everything and PEP 8 does not mention anything clear)
I'm using single quotes for everything but docstrings, which use triple double quotes.
3.- Where do I put my imports?
I'm putting them at file header in this order.
import sys
import -rest of python modules needed-
import whatever
import -rest of application modules-
<code here>
4.- Can I use "import whatever.function as blah"?
I saw some documents that disregard doing this.
5.- Tabs or spaces for indenting?
Currently using 4 spaces tabs.
6.- Variable naming style?
I'm using lowercase for everything but classes, which I put in camelCase.
Anything you would recommend?
PEP 8 is pretty much "the root" of all common style guides.
Google's Python style guide has some parts that are quite well thought of, but others are idiosyncratic (the two-space indents instead of the popular four-space ones, and the CamelCase style for functions and methods instead of the camel_case style, are pretty major idiosyncrasies).
On to your specific questions:
1.- What is the most widely used column width? (the eternal question)
I'm currently sticking to 80 columns
(and it's a pain!)
80 columns is most popular
2.- What quotes to use? (I've seen everything and PEP 8 does not mention
anything clear) I'm using single
quotes for everything but docstrings,
which use triple double quotes.
I prefer the style you're using, but even Google was not able to reach a consensus about this:-(
3.- Where do I put my imports? I'm putting them at file header in this
order.
import sys import -rest of python
modules needed-
import whatever import -rest of
application modules-
Yes, excellent choice, and popular too.
4.- Can I use "import whatever.function as blah"? I saw some
documents that disregard doing this.
I strongly recommend you always import modules -- not specific names from inside a module. This is not just style -- there are strong advantages e.g. in testability in doing that. The as clause is fine, to shorten a module's name or avoid clashes.
5.- Tabs or spaces for indenting? Currently using 4 spaces tabs.
Overwhelmingly most popular.
6.- Variable naming style? I'm using lowercase for everything but classes,
which I put in camelCase.
Almost everybody names classes with uppercase initial and constants with all-uppercase.
1.- Most everyone has a 16:9 or 16:10 monitor now days. Even if they don't have a wide-screen they have lots of pixels, 80 cols isn't a big practical deal breaker like it was when everyone was hacking at the command line in a remote terminal window on a 4:3 monitor at 320 X 240. I usually end the line when it gets too long, which is subjective. I am at 2048 X 1152 on a 23" Monitor X 2.
2.- Single quotes by default so you don't have to escape Double quotes, Double quotes when you need to embed single quotes, and Triple quotes for strings with embedded newlines.
3.- Put them at the top of the file, sometimes you put them in the main function if they aren't needed globally to the module.
4.- It is a common idiom to rename some modules. A good example is the following.
try:
# for Python 2.6.x
import json
except ImportError:
# for previous Pythons
try:
import simplejson as json
except ImportError:
sys.exit('easy_install simplejson')
but the preferred way to import just a class or function is from module import xxx with the optional as yyy if needed
5.- Always use SPACES! 2 or 4 as long as no TABS
6.- Classes should up UpperCaseCamelStyle, variables are lowercase sometimes lowerCamelCase or sometimes all_lowecase_separated_by_underscores, as are function names. "Constants" should be ALL_UPPER_CASE_SEPARATED_BY_UNDERSCORES
When in doubt refer to the PEP 8, the Python source, existing conventions in a code base. But the most import thing is to be internally consistent as possible. All Python code should look like it was written by the same person when ever possible.
Since I'm really crazy about "styling" I'll write down the guidelines that I currently use in a near 8k SLOC project with about 35 files, most of it matches PEP8.
PEP8 says 79(WTF?), I go with 80 and I'm used to it now. Less eye movement after all!
Docstrings and stuff that spans multiple lines in '''. Everything else in ''. Also I don't like double quotes, I only use single quotes all the time... guess that's because I came form the JavaScript corner, where it's just easier too use '', because that way you don't have to escape all the HTML stuff :O
At the head, built-in before custom application code. But I also go with a "fail early" approach, so if there's something that's version depended(GTK for example) I'd import that first.
Depends, most of the times I go with import foo and from foo import, but there a certain cases(e.G. the name is already defined by another import) were I use from foo import bar as bla too.
4 Spaces. Period. If you really want to use tabs, make sure to convert them to spaces before committing when working with SCM. BUT NEVER(!) MIX TABS AND SPACES!!! It can AND WILL introduce horrible bugs.
some_method or foo_function, a CONSTANT, MyClass.
Also you can argue about indentation in cases where a method call or something spans multiple lines, and you can argue about which line continuation style you will use. Either surround everything with () or do the \ at the end of the line thingy. I do the latter, and I also place operators and other stuff at the start of the next line.
# always insert a newline after a wrapped one
from bla import foo, test, goo, \
another_thing
def some_method_thats_too_long_for_80_columns(foo_argument, bar_argument, bla_argument,
baz_argument):
do_something(test, bla, baz)
value = 123 * foo + ten \
- bla
if test > 20 \
and x < 4:
test_something()
elif foo > 7 \
and bla == 2 \
or me == blaaaaaa:
test_the_megamoth()
Also I have some guidelines for comparison operations, I always use is(not) to check against None True False and I never do an implicit boolean comparison like if foo:, I always do if foo is True:, dynamic typing is nice but in some cases I just want to be sure that the thing does the right thing!
Another thing that I do is to never use empty strings! They are in a constants file, in the rest of the code I have stuff like username == UNSET_USERNAME or label = UNSET_LABEL it's just more descriptive that way!
I also have some strict whitespace guidelines and other crazy stuff, but I like it(because I'm crazy about it), I even wrote a script which checks my code:
http://github.com/BonsaiDen/Atarashii/blob/master/checkstyle
WARNING(!): It will hurt your feelings! Even more than JSLint does...
But that's just my 2 cents.