How to truncate and pad in python3 using just str.format?

How to truncate and pad in python3 using just str.format? - python

I would like to do this:
'{pathname:>90}'.format(pathname='abcde')[-2:]
using string formatting instead of array indexing.
So the result would be 'de'
or in the case of pathname='e' the result would be ' e' with a space before e. If the index would be [2:] this question would be answered by How to truncate a string using str.format in Python?
I need this in the following example:
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO,style='{',format='{pathname:>90}{lineno:4d}{msg}')
logging.info('k')

The precision trick (using a precision format) doesn't work. Only works to truncate the end of the string.
a workaround would be to slice the string before passing it to str.format:
>>> '{pathname:>2}'.format(pathname='abcde'[-2:])
'de'
>>> '{pathname:>2}'.format(pathname='e'[-2:])
' e'
since you cannot control the arguments passed to format, you could create a subclass of str and redefine format so when it meets pathname in the keyword arguments it truncates, then calls original str.format method.
Small self-contained example:
class TruncatePathnameStr(str):
def format(self,*args,**kwargs):
if "pathname" in kwargs:
# truncate
kwargs["pathname"] = kwargs["pathname"][-2:]
return str.format(self,*args,**kwargs)
s = TruncatePathnameStr('##{pathname:>4}##')
print(s.format(pathname='abcde'))
that prints:
## de##
use it in your real-life example:
logging.basicConfig(stream=sys.stdout, level=logging.INFO,style='{',
format=TruncatePathnameStr('{pathname:>90}{lineno:4d}{msg}'))

Related

Why does f-string literal not work here, whereas %()s formatting does?

I am trying to format my validator message with the min/max values in the actual validator.
Here's my Flask Form:
class MyForm(FlaskForm):
example = IntegerField(label=('Integer 0-10'),
validators=[InputRequired(), NumberRange(min=0, max=10, message="must be between %(min)s and %(max)s!")])
Using message="must be between %(min)s and %(max)s!" gives me the expected output:
must be between 0 and 10!
Whereas using message=f"must be between {min} and {max}!" gives me the output:
must be between <built-in function min> and <built-in function max>!
How can I use f-string formatting for my validator message? Is this something related to f-string evaluating at run-time? I don't fully understand the concept behind it, I just know it's the preferred way to string format.

The f-string literal is evaluated immediately, before being passed to IntegerField.
>>> foo = 3
>>> print(f'{foo}')
3
The other string contains literal %(...) substrings which are
used later with the % operator.
>>> print("%(foo)s")
%(foo)s
>>> print("%(foo)s" % {'foo': 3})
3

"must be between %(min)s and %(max)s!" is a string literal that Flask will later perform a search-and-replace on, while f"must be between {min} and {max}!" is a simpler and more efficient way to say "must be between " + str(min) + " and " + str(max) + "!". That evaluates to the string you described.

You must declare such variables, like
min = 1
max = 2
print(f"must be between {min} and {max}!")
But please consider to use somewhat different variable names to not shadow builtin functions.
Ok, I see it now, you wanted to use that as a kind of string template.

How to get the names of the named variables from the python string

Is there a graceful way to get names of named %s-like variables of string object?
Like this:
string = '%(a)s and %(b)s are friends.'
names = get_names(string) # ['a', 'b']
Known alternative ways:
Parse names using regular expression, e.g.:
import re
names = re.findall(r'%\((\w)\)[sdf]', string) # ['a', 'b']
Use .format()-compatible formating and Formatter().parse(string).
How to get the variable names from the string for the format() method
But what about a string with %s-like variables?
PS: python 2.7

In order to answer this question, you need to define "graceful". Several factors might be worth considering:
Is the code short, easy to remember, easy to write, and self explanatory?
Does it reuse the underlying logic (i.e. follow the DRY principle)?
Does it implement exactly the same parsing logic?
Unfortunately, the "%" formatting for strings is implemented in the C routine "PyString_Format" in stringobject.c. This routine does not provide an API or hooks that allow access to a parsed form of the format string. It simply builds up the result as it is parsing the format string. Thus any solution will need to duplicate the parsing logic from the C routine. This means DRY is not followed and exposes any solution to breaking if a change is made to the formatting specification.
The parsing algorithm in PyString_Format includes a fair bit of complexity, including handling nested parentheses in key names, so cannot be fully implemented using regular expression nor using string "split()". Short of copying the C code from PyString_Format and converting it to Python code, I do not see any remotely easy way of correctly extracting the names of the mapping keys under all circumstances.
So my conclusion is that there is no "graceful" way to obtain the names of the mapping keys for a Python 2.7 "%" format string.
The following code uses a regular expression to provide a partial solution that covers most common usage:
import re
class StringFormattingParser(object):
__matcher = re.compile(r'(?<!%)%\(([^)]+)\)[-# +0-9.hlL]*[diouxXeEfFgGcrs]')
#classmethod
def getKeyNames(klass, formatString):
return klass.__matcher.findall(formatString)
# Demonstration of use with some sample format strings
for value in [
'%(a)s and %(b)s are friends.',
'%%(nomatch)i',
'%%',
'Another %(matched)+4.5f%d%% example',
'(%(should_match(but does not))s',
]:
print StringFormattingParser.getKeyNames(value)
# Note the following prints out "really does match"!
print '%(should_match(but does not))s' % {'should_match(but does not)': 'really does match'}
P.S. DRY = Don't Repeat Yourself (https://en.wikipedia.org/wiki/Don%27t_repeat_yourself)

You could also do this:
[y[0] for y in [x.split(')') for x in s.split('%(')] if len(y)>1]

Don't know if this qualifies as graceful in your book, but here's a short function that parses out the names. No error checking, so it will fail for malformed format strings.
def get_names(s):
i = s.find('%')
while 0 <= i < len(s) - 3:
if s[i+1] == '(':
yield(s[i+2:s.find(')', i)])
i = s.find('%', i+2)
string = 'abd %(one) %%(two) 99 %%%(three)'
list(get_names(string) #=> ['one', 'three']

Also, you can reduce this %-task to Formater-solution.
>>> import re
>>> from string import Formatter
>>>
>>> string = '%(a)s and %(b)s are friends.'
>>>
>>> string = re.sub('((?<!%)%(\((\w)\)s))', '{\g<3>}', string)
>>>
>>> tuple(fn[1] for fn in Formatter().parse(string) if fn[1] is not None)
('a', 'b')
>>>
In this case you can use both variants of formating, I suppose.
The regular expression in it depends on what you want.
>>> re.sub('((?<!%)%(\((\w)\)s))', '{\g<3>}', '%(a)s and %(b)s are %(c)s friends.')
'{a} and {b} are {c} friends.'
>>> re.sub('((?<!%)%(\((\w)\)s))', '{\g<3>}', '%(a)s and %(b)s are %%(c)s friends.')
'{a} and {b} are %%(c)s friends.'
>>> re.sub('((?<!%)%(\((\w)\)s))', '{\g<3>}', '%(a)s and %(b)s are %%%(c)s friends.')
'{a} and {b} are %%%(c)s friends.'

ConfigParser.get returning a string need to convert it to section

I want to use the return value of RawConfigParser.get ('somesection', 'someoption') as the section for another RawConfigParser.get, but in practice the result is a doubly encased string.
section = RawConfigParser.get ('somesection', 'someoption')
subsection = RawConfigParser.get (section, 'someotheroption') # INCORRECT RawConfigParser.get ('"somesection"', 'someotheroption')
How do I avoid this?

You have a couple options, one of which is to use the ast library
>>> quoted_string = '"this is a quote"'
>>> quoted_string
'"this is a quote"'
>>> import ast
>>> unquoted_string = ast.literal_eval(quoted_string)
>>> unquoted_string
'this is a quote'

You should realized a file-object and use RawConfigParser.readfp()
>>> help(ConfigParser.RawConfigParser.readfp)
Help on method readfp in module ConfigParser:
readfp(self, fp, filename=None) unbound ConfigParser.RawConfigParser method
Like read() but the argument must be a file-like object.
The `fp' argument must have a `readline' method. Optional
second argument is the `filename', which if not given, is
taken from fp.name. If fp has no `name' attribute, `<???>' is
used.

Python Argparse: Issue with optional arguments which are negative numbers

I'm having a small issue with argparse. I have an option xlim which is the xrange of a plot. I want to be able to pass numbers like -2e-5. However this does not work - argparse interprets this is a positional argument. If I do -0.00002 it works: argparse reads it as a negative number. Is it possible to have able to read in -2e-3?
The code is below, and an example of how I would run it is:
./blaa.py --xlim -2.e-3 1e4
If I do the following it works:
./blaa.py --xlim -0.002 1e4
The code:
parser.add_argument('--xlim', nargs = 2,
help = 'X axis limits',
action = 'store', type = float,
default = [-1.e-3, 1.e-3])
Whilst I can get it to work this way I would really rather be able to use scientific notation. Anyone have any ideas?
Cheers

One workaround I've found is to quote the value, but adding a space. That is,
./blaa.py --xlim " -2.e-3" 1e4
This way argparse won't think -2.e-3 is an option name because the first character is not a hyphen-dash, but it will still be converted properly to a float because float(string) ignores spaces on the left.

As already pointed out by the comments, the problem is that a - prefix is parsed as an option instead of as an argument. One way to workaround this is change the prefix used for options with prefix_chars argument:
#!/usr/bin/python
import argparse
parser = argparse.ArgumentParser(prefix_chars='#')
parser.add_argument('##xlim', nargs = 2,
help = 'X axis limits',
action = 'store', type = float,
default = [-1.e-3, 1.e-3])
print parser.parse_args()
Example output:
$ ./blaa.py ##xlim -2.e-3 1e4
Namespace(xlim=[-0.002, 10000.0])
Edit: Alternatively, you can keep using - as separator, pass xlim as a single value and use a function in type to implement your own parsing:
#!/usr/bin/python
import argparse
def two_floats(value):
values = value.split()
if len(values) != 2:
raise argparse.ArgumentError
values = map(float, values)
return values
parser = argparse.ArgumentParser()
parser.add_argument('--xlim',
help = 'X axis limits',
action = 'store', type=two_floats,
default = [-1.e-3, 1.e-3])
print parser.parse_args()
Example output:
$ ./blaa.py --xlim "-2e-3 1e4"
Namespace(xlim=[-0.002, 10000.0])

If you specify the value for your option with an equals sign, argparse will not treat it as a separate option, even if it starts with -:
./blaa.py --xlim='-0.002 1e4'
# As opposed to --xlim '-0.002 1e4'
And if the value does not have spaces in it (or other special characters given your shell), you can drop the quotes:
./blaa.py --xlim=-0.002
See: https://www.gnu.org/software/guile/manual/html_node/Command-Line-Format.html
With this, there is no need to write your own type= parser or redefine the prefix character from - to # as the accepted answer suggests.

Here is the code that I use. (It is similar to jeremiahbuddha's but it answers the question more directly since it deals with negative numbers.)
Put this before calling argparse.ArgumentParser()
for i, arg in enumerate(sys.argv):
if (arg[0] == '-') and arg[1].isdigit(): sys.argv[i] = ' ' + arg

Another workaround is to pass in the argument using '=' symbol in addition to quoting the argument - i.e., --xlim="-2.3e14"

If you are up to modifying argparse.py itself, you could change the negative number matcher to handle scientific notation:
In class _ActionsContainer.__init__()
self._negative_number_matcher = _re.compile(r'^-(\d+\.?|\d*\.\d+)([eE][+\-]?\d+)?$')
Or after creating the parser, you could set parser._negative_number_matcher to this value. This approach might have problems if you are creating groups or subparsers, but should work with a simple parser.

Inspired by andrewfn's approach, I created a separate helper function to do the sys.argv fiddling:
def _tweak_neg_scinot():
import re
import sys
p = re.compile('-\\d*\\.?\\d*e', re.I)
sys.argv = [' ' + a if p.match(a) else a for a in sys.argv]
The regex looks for:
- : a negative sign
\\d* : zero or more digits (for oddly formatted values like -.5e-2 or -4354.5e-6)
\\.? : an optional period (e.g., -2e-5 is reasonable)
\\d* : another set of zero or more digits (for things like -2e-5 and -7.e-3)
e : to match the exponent marker
re.I makes it match both -2e-5 and -2E-5. Using p.match means that it only searches from the start of each string.

Defining dynamic functions to a string

I have a small python script which i use everyday......it basically reads a file and for each line i basically apply different string functions like strip(), replace() etc....im constanstly editing the file and commenting to change the functions. Depending on the file I'm dealing with, I use different functions. For example I got a file where for each line, i need to use line.replace(' ','') and line.strip()...
What's the best way to make all of these as part of my script? So I can just say assign numbers to each functions and just say apply function 1 and 4 for each line.

First of all, many string functions – including strip and replace – are deprecated. The following answer uses string methods instead. (Instead of string.strip(" Hello "), I use the equivalent of " Hello ".strip().)
Here's some code that will simplify the job for you. The following code assumes that whatever methods you call on your string, that method will return another string.
class O(object):
c = str.capitalize
r = str.replace
s = str.strip
def process_line(line, *ops):
i = iter(ops)
while True:
try:
op = i.next()
args = i.next()
except StopIteration:
break
line = op(line, *args)
return line
The O class exists so that your highly abbreviated method names don't pollute your namespace. When you want to add more string methods, you add them to O in the same format as those given.
The process_line function is where all the interesting things happen. First, here is a description of the argument format:
The first argument is the string to be processed.
The remaining arguments must be given in pairs.
The first argument of the pair is a string method. Use the shortened method names here.
The second argument of the pair is a list representing the arguments to that particular string method.
The process_line function returns the string that emerges after all these operations have performed.
Here is some example code showing how you would use the above code in your own scripts. I've separated the arguments of process_line across multiple lines to show the grouping of the arguments. Of course, if you're just hacking away and using this code in day-to-day scripts, you can compress all the arguments onto one line; this actually makes it a little easier to read.
f = open("parrot_sketch.txt")
for line in f:
p = process_line(
line,
O.r, ["He's resting...", "This is an ex-parrot!"],
O.c, [],
O.s, []
)
print p
Of course, if you very specifically wanted to use numerals, you could name your functions O.f1, O.f2, O.f3… but I'm assuming that wasn't the spirit of your question.

If you insist on numbers, you can't do much better than a dict (as gimel suggests) or list of functions (with indices zero and up). With names, though, you don't necessarily need an auxiliary data structure (such as gimel's suggested dict), since you can simply use getattr to retrieve the method to call from the object itself or its type. E.g.:
def all_lines(somefile, methods):
"""Apply a sequence of methods to all lines of some file and yield the results.
Args:
somefile: an open file or other iterable yielding lines
methods: a string that's a whitespace-separated sequence of method names.
(note that the methods must be callable without arguments beyond the
str to which they're being applied)
"""
tobecalled = [getattr(str, name) for name in methods.split()]
for line in somefile:
for tocall in tobecalled: line = tocall(line)
yield line

It is possible to map string operations to numbers:
>>> import string
>>> ops = {1:string.split, 2:string.replace}
>>> my = "a,b,c"
>>> ops[1](",", my)
[',']
>>> ops[1](my, ",")
['a', 'b', 'c']
>>> ops[2](my, ",", "-")
'a-b-c'
>>>
But maybe string descriptions of the operations will be more readable.
>>> ops2={"split":string.split, "replace":string.replace}
>>> ops2["split"](my, ",")
['a', 'b', 'c']
>>>
Note:
Instead of using the string module, you can use the str type for the same effect.
>>> ops={1:str.split, 2:str.replace}

To map names (or numbers) to different string operations, I'd do something like
OPERATIONS = dict(
strip = str.strip,
lower = str.lower,
removespaces = lambda s: s.replace(' ', ''),
maketitle = lamdba s: s.title().center(80, '-'),
# etc
)
def process(myfile, ops):
for line in myfile:
for op in ops:
line = OPERATIONS[op](line)
yield line
which you use like this
for line in process(afile, ['strip', 'removespaces']):
...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to truncate and pad in python3 using just str.format? - python

Related

Why does f-string literal not work here, whereas %()s formatting does?

How to get the names of the named variables from the python string

ConfigParser.get returning a string need to convert it to section

Python Argparse: Issue with optional arguments which are negative numbers

Defining dynamic functions to a string

Categories

Resources