Where does the argparse and ConfigParser string replacement syntax come from?

Where does the argparse and ConfigParser string replacement syntax come from? - python

When specifying help in argparse, I often use strings like %(default)s or %(const)s in the help= argument to display default arguments. The syntax is a bit weird, though: I assume it's left over from the days where python strings were formatted with %, but since python 2.6 the standard way to format strings has been using the format() function.
So are these libraries just using the 'old' replacement syntax, or does this come from somewhere else? It's been stated that the % replacement operator will disappear at some point, will these libraries change to the '{}'.format() syntax then?

Yes, the argparse and ConfigParser libraries use the old-style % string formatting syntax internally. These libraries were developed before str.format() and format() were available, or in the case of argparse the library authors aimed at compatibility with earlier Python versions.
If the % formatting ever is removed, then those libraries will indeed have to move to using string formatting using {} placeholders.
However, for various reasons, the % old-style string formatting style is here to stay for the foreseeable future; it has been 'un-deprecated'; str.format() is to be preferred but % is kept around for backwards compatibility.

The approved way of customizing the help formatting is to subclass HelpFormatter. A user can do this without waiting for a future Python release.
This formatter implements the {}.format in 2 places.
class NewHelpFormatter(argparse.HelpFormatter):
# _format_usage - format usage, but only uses dict(prog=self._prog)
def _format_text(self, text):
# for description, epilog, version
if '{prog}' in text:
text = text.format(prog=self._prog) # change from %
text_width = self._width - self._current_indent
indent = ' ' * self._current_indent
return self._fill_text(text, text_width, indent) + '\n\n'
def _expand_help(self, action):
params = dict(vars(action), prog=self._prog)
for name in list(params):
if params[name] is argparse.SUPPRESS:
del params[name]
for name in list(params):
if hasattr(params[name], '__name__'):
params[name] = params[name].__name__
if params.get('choices') is not None:
choices_str = ', '.join([str(c) for c in params['choices']])
params['choices'] = choices_str
return self._get_help_string(action).format(**params) # change from %
For example:
parser = argparse.ArgumentParser(prog='NewFormatter',
formatter_class=NewHelpFormatter,
description='{prog} description')
parser.add_argument('foo',nargs=3, default=[1,2,3],
help='nargs:{nargs} prog:{prog!r} defaults:{default} last:{default[2]}')
parser.add_argument('--bar',choices=['yes','no'],
help='choices: {choices!r}')
parser.print_help()
produces:
usage: NewFormatter [-h] [--bar {yes,no}] foo foo foo
NewFormatter description
positional arguments:
foo nargs:3 prog:'NewFormatter' defaults:[1, 2, 3] last:3
optional arguments:
-h, --help show this help message and exit
--bar {yes,no} choices: 'yes, no'

Related

PyLint message: logging-format-interpolation

For the following code:
logger.debug('message: {}'.format('test'))
pylint produces the following warning:
logging-format-interpolation (W1202):
Use % formatting in logging functions and pass the % parameters as
arguments Used when a logging statement has a call form of
“logging.(format_string.format(format_args...))”. Such
calls should use % formatting instead, but leave interpolation to the
logging function by passing the parameters as arguments.
I know I can turn off this warning, but I'd like to understand it. I assumed using format() is the preferred way to print out statements in Python 3. Why is this not true for logger statements?

It is not true for logger statement because it relies on former "%" format like string to provide lazy interpolation of this string using extra arguments given to the logger call. For instance instead of doing:
logger.error('oops caused by %s' % exc)
you should do
logger.error('oops caused by %s', exc)
so the string will only be interpolated if the message is actually emitted.
You can't benefit of this functionality when using .format().
Per the Optimization section of the logging docs:
Formatting of message arguments is deferred until it cannot be avoided. However, computing the arguments passed to the logging method can also be expensive, and you may want to avoid doing it if the logger will just throw away your event.

Maybe this time differences can help you.
Following description is not the answer for your question, but it can help people.
If you want to use fstrings (Literal String Interpolation) for logging, then you can disable it from .pylintrc file with disable=logging-fstring-interpolation, see: related issue and comment.
Also you can disable logging-format-interpolation.
For pylint 2.4:
There are 3 options for logging style in the .pylintrc file: old, new, fstr
fstr option added in 2.4 and removed in 2.5
Description from .pylintrc file (v2.4):
[LOGGING]
# Format style used to check logging format string. `old` means using %
# formatting, `new` is for `{}` formatting,and `fstr` is for f-strings.
logging-format-style=old
for old (logging-format-style=old):
foo = "bar"
self.logger.info("foo: %s", foo)
for new (logging-format-style=new):
foo = "bar"
self.logger.info("foo: {}", foo)
# OR
self.logger.info("foo: {foo}", foo=foo)
Note: you can not use .format() even if you select new option.
pylint still gives the same warning for this code:
self.logger.info("foo: {}".format(foo)) # W1202
# OR
self.logger.info("foo: {foo}".format(foo=foo)) # W1202
for fstr (logging-format-style=fstr):
foo = "bar"
self.logger.info(f"foo: {foo}")
Personally, I prefer fstr option because of PEP-0498.

In my experience a more compelling reason than optimization (for most use cases) for the lazy interpolation is that it plays nicely with log aggregators like Sentry.
Consider a 'user logged in' log message. If you interpolate the user into the format string, you have as many distinct log messages as there are users. If you use lazy interpolation like this, the log aggregator can more reasonably interpret this as the same log message with a bunch of different instances.

Here is an example of why it's better to use %s instead of f-strings in logging.
>>> import logging
>>> logging.basicConfig(level=logging.INFO)
>>> logger = logging.getLogger('MyLogger')
>>>
>>> class MyClass:
... def __init__(self, name: str) -> None:
... self._name = name
... def __str__(self) -> str:
... print('GENERATING STRING')
... return self._name
...
>>> c = MyClass('foo')
>>> logger.debug('Created: %s', c)
>>> logger.debug(f'Created: {c}')
GENERATING STRING
Inspired by Python 3.7 logging: f-strings vs %.

Might be several years after but having to deal with this the other day, I made simple; just formatted the string before logger.
message = 'message: {}'.format('test')
logger.debug(message)
That way there was no need to change any of the settings from log, if later on desire to change to a normal print there is no need to change the formatting or code.

"logging-format-interpolation (W1202)" is another one wrong recommendation from pylint (like many from pep8).
F-string are described as slow vs %, but have you checked ?
With 500_000 rotation of logging with f-string vs % -> f-string:23.01 sec. , %:25.43 sec.
So logging with f-string is faster than %.
When you look at the logging source code : log.error() -> self.logger._log() -> self.makeRecord() -> self._logRecordFactory() -> class LogRecord() -> home made equivalent to format()
code :
import logging
import random
import time
loops = 500_000
r_fstr = 0.0
r_format = 0.0
def test_fstr():
global loops, r_fstr
for i in range(0, loops):
r1 = time.time()
logging.error(f'test {random.randint(0, 1000)}')
r2 = time.time()
r_fstr += r2 - r1
def test_format():
global loops, r_format
for i in range(0 ,loops):
r1 = time.time()
logging.error('test %d', random.randint(0, 1000))
r2 = time.time()
r_format += r2 - r1
test_fstr()
test_format()
print(f'Logging f-string:{round(r_fstr,2)} sec. , %:{round(r_format,2)} sec.')

Argparse with spaces

I have the following script:
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--arg1', dest='arg1')
parse_results = parser.parse_known_args(['--arg1 = value'])
However I get the result:
(Namespace(arg1=None), ['--arg1 = value'])
Is there a way using argparse to accept this kind of input where there are spaces between argument, = and the value?

The shell processes your argument list before your program ever runs, and that processing is very simple: treat each white-space separated word following the command name as a separate argument. By that logic, --foo = bar is 3 separate arguments, so sys.argv contains ["--foo", "=", "bar"]. There's no avoiding that, since it happens before argparse ever runs.
Given this fact, there's no benefit to writing --foo = bar, since it would be equivalent to --foo=bar or --foo bar, just with extraneous characters (spaces in the first case, = in the second).
As to why --foo=bar is supported at all when it is equivalent to --foo bar, I don't know. It might just be for compatibility with existing practice; it might be useful with different argument parsers, but argparse doesn't seem to need it. At some point in the past, I thought I had come up with a reason why being able to specify the argument to --foo using one shell word instead of two was useful, but I seem to have forgotten it.

It's about how the shell parses arguments:
bruno#bigb:~/Work/playground$ cat argz.py
import sys
print "args:", sys.argv
bruno#bigb:~/Work/playground$ python argz.py --foo = bar --bar=baaz
args: ['argz.py', '--foo', '=', 'bar', '--bar=baaz']

To start, a problem with your code: A program will see space-separated components as separate arguments. Your example should have read
parse_results = parser.parse_known_args(['--arg1', '=', 'value'])
or you can use split() to generate the array on the fly, as the documentation does it:
parse_results = parser.parse_known_args('--arg1 = value'.split())
The above still doesn't give you want you want:
>>> parser.parse_known_args('--arg1 = value'.split())
(Namespace(arg1='='), ['value'])
The equals-sign is parsed as the argument, because long argument syntax is either --option=value or --option value. The = sign is only there to separate the option name from its argument; there is no point in including it if arguments are separated by spaces, so the syntax --option = value is not supported.
Recommendation: Use only standard long-argument syntax, i.e. --arg1 value or --arg=value.
But if you absolutely must support the form --arg1 = value, you can filter the argument list to throw out elements that are a single = sign.:
>>> cleanup = lambda args: (a for a in args if a != '=')
>>> parser.parse_known_args(cleanup('--arg1 = value'.split()))
(Namespace(arg1='value'), [])

When you use
parser.parse_known_args(['--arg1 = value'])
it's equivalent to the commandline:
$myfun.py '--arg1 = value'
In both cases --arg1 = value is treated as one string. Since it does not match '--arg1' or '--arg1=' it goes into the unrecognized strings list.
As others have pointed out, without the quotes $myfun.py --arg1 = value passes to the parser as ['--arg1', '=', 'value']. Which would parse to:
(Namespace(arg1='='), ['value'])
If --arg is defined with nargs='+', then you'd get
(Namespace(arg1=['=', 'value']), [])
and you could ignore the '='.
But for sake of consistency with other commandline practices, I'd suggest sticking with the --arg1=value, and --arg1 value syntax.

How to print like printf in Python3?

In Python 2 I used:
print "a=%d,b=%d" % (f(x,n),g(x,n))
I've tried:
print("a=%d,b=%d") % (f(x,n),g(x,n))

In Python2, print was a keyword which introduced a statement:
print "Hi"
In Python3, print is a function which may be invoked:
print ("Hi")
In both versions, % is an operator which requires a string on the left-hand side and a value or a tuple of values or a mapping object (like dict) on the right-hand side.
So, your line ought to look like this:
print("a=%d,b=%d" % (f(x,n),g(x,n)))
Also, the recommendation for Python3 and newer is to use {}-style formatting instead of %-style formatting:
print('a={:d}, b={:d}'.format(f(x,n),g(x,n)))
Python 3.6 introduces yet another string-formatting paradigm: f-strings.
print(f'a={f(x,n):d}, b={g(x,n):d}')

The most recommended way to do is to use format method. Read more about it here
a, b = 1, 2
print("a={0},b={1}".format(a, b))

Simple printf() function from O'Reilly's Python Cookbook.
import sys
def printf(format, *args):
sys.stdout.write(format % args)
Example output:
i = 7
pi = 3.14159265359
printf("hi there, i=%d, pi=%.2f\n", i, pi)
# hi there, i=7, pi=3.14

Python 3.6 introduced f-strings for inline interpolation. What's even nicer is it extended the syntax to also allow format specifiers with interpolation. Something I've been working on while I googled this (and came across this old question!):
print(f'{account:40s} ({ratio:3.2f}) -> AUD {splitAmount}')
PEP 498 has the details. And... it sorted my pet peeve with format specifiers in other langs -- allows for specifiers that themselves can be expressions! Yay! See: Format Specifiers.

Simple Example:
print("foo %d, bar %d" % (1,2))

A simpler one.
def printf(format, *values):
print(format % values )
Then:
printf("Hello, this is my name %s and my age %d", "Martin", 20)

print("Name={}, balance={}".format(var-name, var-balance))

Because your % is outside the print(...) parentheses, you're trying to insert your variables into the result of your print call. print(...) returns None, so this won't work, and there's also the small matter of you already having printed your template by this time and time travel being prohibited by the laws of the universe we inhabit.
The whole thing you want to print, including the % and its operand, needs to be inside your print(...) call, so that the string can be built before it is printed.
print( "a=%d,b=%d" % (f(x,n), g(x,n)) )
I have added a few extra spaces to make it clearer (though they are not necessary and generally not considered good style).

You can literally use printf in Python through the ctypes module (or even your own C extension module).
On Linux, you should be able to do
import ctypes
libc = ctypes.cdll.LoadLibrary("libc.so.6")
printf = libc.printf
printf(b"Integer: %d\n"
b"String : %s\n"
b"Double : %f (%%f style)\n"
b"Double : %g (%%g style)\n",
42, b"some string", ctypes.c_double(3.20), ctypes.c_double(3.20))
On Windows, the equivalent would have
libc = ctypes.cdll.msvcrt
printf = libc.printf
As stated in the documentation:
None, integers, bytes objects and (unicode) strings are the only native Python objects that can directly be used as parameters in these function calls. None is passed as a C NULL pointer, bytes objects and strings are passed as pointer to the memory block that contains their data (char* or wchar_t*). Python integers are passed as the platforms default C int type, their value is masked to fit into the C type.
This explains why I wrapped the Python float objects using ctypes.c_double (FYI, a Python float is typically a double in C).
Output
Integer: 42
String : some string
Double : 3.200000 (%f style)
Double : 3.2 (%g style)

print("{:.4f} #\n".format(2))
Formatting helpful in situations like these.
You can refer to further details in the link below.
Python Formatting

Text wrapping from optparse help string on multiple lines

I am trying to make my help strings helpful. To do this I have a function Function() with a doc string something like
def Function(x):
""" First line describing what Function does
Keyword Arguments
x = float -- A description of what x does that may be long
"""
Having done this I think have something like this at the end
def parse_command_line(argvs):
parser = optparse.OptionParser()
parser.add_option("-f","--Function", help=Function.__doc__,metavar="Bs" )
(options,arguments) = parser.parse_args(argvs)
return options, arguments
options, arguments = parse_command_line(sys.argv)
The trouble occurs when calling the program with -h or --help The output is line wrapped by OptParse, this means the KeyWord arguments are not started on a new line, is it possible to stop OptParse from wrapping the output or is there a better way to do this?

In case of interest, I have written two such formatter classes for argparse. The first supports much of the mediaWiki, MarkDown, and POD syntaxes (even intermixed):
import MarkupHelpFormatter
MarkupHelpFormatter.InputOptions["mediawiki"] = True
parser = argparse.ArgumentParser(
description="""...your text, with mediawiki markup...""",
epilog='...',
formatter_class=MarkupHelpFormatter.MarkupHelpFormatter
)
The other is called "ParagraphHelpFormatter". It merely wraps text like the default argparse formatter, except that it respects blank lines.
Both are at http://derose.net/steve/utilities/PY/MarkupHelpFormatter.py, and
licensed as CCLI Attribution-Share-alike. They format for ANSI terminal
interfaces. Not highly polished (for example, auto-numbering is unfinished), but you may find them helpful.

argparse provides you with a formatter for raw formatting, ie your lines wont get wrapped
parser = argparse.ArgumentParser(formatter_class=argparse.RawDescriptionHelpFormatter)
optparse also lets you set the formatter.. I suppose you can write your own formatter, but theres nothing provided that i know of

Disable abbreviation in argparse

argparse uses per default abbreviation in unambiguous cases.
I don't want abbreviation and I'd like to disable it.
But didn't find it in the documentation.
Is it possible?
Example:
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--send', action='store_true')
parser.parse_args(['--se']) # returns Namespace(send=True)
But I want it only to be true when the full parameter is supplied. To prevent user errors.
UPDATE:
I created a ticket at python bugtracker after Vikas answer. And it already has been processed.

As of Python 3.5.0 you can disable abbreviations by initiating the ArgumentParser with the following:
parser = argparse.ArgumentParser(allow_abbrev=False)
Also see the documentation.

No, apparently this is not possible. At least in Python 2.7.2.
First, I took a look into the documentation - to no avail.
Then I opened the Lib\argparse.py and looked through the source code. Omitting a lot of details, it seems that each argument is parsed by a regular expression like this (argparse:2152):
# allow one or more arguments
elif nargs == ONE_OR_MORE:
nargs_pattern = '(-*A[A-]*)'
This regex will successfully parse both '-' and '--', so we have no control over the short and long arguments. Other regexes use the -* construct too, so it does not depend on the type of the parameter (no sub-arguments, 1 sub-argument etc).
Later in the code double dashes are converted to one dash (only for non-optional args), again, without any flags to control by user:
# if this is an optional action, -- is not allowed
if action.option_strings:
nargs_pattern = nargs_pattern.replace('-*', '')
nargs_pattern = nargs_pattern.replace('-', '')

No, well not without ugly hacks.
The code snippet #Vladimir posted, i suppose that is not what you are looking for. The actual code that is doing this is:
def _get_option_tuples(self, option_string):
...
if option_string.startswith(option_prefix):
...
See the check is startswith not ==.
And you can always extend argparse.ArgumentParser to provide your own _get_option_tuples(self, option_string) to change this behavior. I just did by replacing two occurrence of option_string.startswith(option_prefix) to option_string == option_prefix and:
>>> parser = my_argparse.MyArgparse
>>> parser = my_argparse.MyArgparse()
>>> parser.add_argument('--send', action='store_true')
_StoreTrueAction(option_strings=['--send'], dest='send', nargs=0, const=True, default=False, type=None, choices=None, help=None, metavar=None)
>>> parser.parse_args(['--se'])
usage: [-h] [--send]
: error: unrecognized arguments: --se
A word of caution
The method _get_option_tuples is prefixed with _, which typically means a private method in python. And it is not a good idea to override a private.

Another way for Python 2.7. Let's get clunky! Say you want to recognize --dog without abbreviation.
p = argparse.ArgumentParser()
p.add_argument('--dog')
p.add_argument('--dox', help=argparse.SUPPRESS, metavar='IGNORE')
By adding a second argument --dox that differs from the argument you want only in the third letter, --d and --do become ambiguous. Therefore, the parser will refuse to recognize them. You would need to add code to catch the resulting exception and process it according to the context in which you are calling parse_args. You might also need to suppress/tweak the help text.
The help=... keeps the argument out of the option list on the default help message (per this), and metavar='IGNORE' is just to make it clear you really aren't doing anything with this option :) .

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Where does the argparse and ConfigParser string replacement syntax come from? - python

Related

PyLint message: logging-format-interpolation

Argparse with spaces

How to print like printf in Python3?

Text wrapping from optparse help string on multiple lines

Disable abbreviation in argparse

Categories

Resources