To begin with, I am working to a get a desired output like this:
*********************************************************************
hello
*********************************************************************
To achieve this I have assigned the desired output to a variable with multiline string and printing the same with format.
$ cat varibale.py
decorator = """ **********************************************************************
{}
********************************************************************** """
print(decorator.format("hello"))
Output:
**********************************************************************
hello
**********************************************************************
The issue with above approach is the extra spaces in the third line of output which is looking odd.
I am able to achieve this in the following way:
$ cat varibale.py
decorator = """ **********************************************************************
{}
*********************************************************************
"""
print(decorator.format("hello"))
Output:
**********************************************************************
hello
*********************************************************************
But this way my code doesn't look good, as it is not following the indentation.
Please suggest the right way to achieve the desired output.
One way to make multi-line literal strings look good is to use a backslash to escape the newline, like this:
s = '''\
*********************************************************************
hello
*********************************************************************
'''
print(s)
output
*********************************************************************
hello
*********************************************************************
However, PEP-008 discourages backslash usage like that. It's too fragile: if there's a space between the backslash and the newline then the newline won't get escaped, and the backslash will get printed.
A more versatile approach is to use a function which calculates the amount of padding required to centre the text, and applies it via a nested formatting specifier. For example:
def banner(s, width=69):
stars = '*' * width
pad = (width + len(s)) // 2
return '{0}\n{1:>{2}}\n{0}'.format(stars, s, pad)
print(banner('hello'))
print(banner('Hello, world', width=16))
output
*********************************************************************
hello
*********************************************************************
****************
Hello, world
****************
How it works
That format string is a little dense, so I guess I should try to explain it. ;) For full information on this topic please see Format String Syntax in the docs. The explanation below borrows from & paraphrases those docs.
'{0}\n{1:>{2}}\n{0}'.format(stars, s, pad)
The stuff enclosed in {} in a format string is called a "replacement field". The first item in a replacement field is the optional field name. This lets us identify which arg of .format goes with this replacement field. There are a couple of possible variations for field names, this format string uses numeric names, so it identifies the .format args by their position. That is, 0 corresponds to stars, 1 corresponds to s and 2 corresponds to pad.
If no field names are given they get automatically filled by the numbers 0, 1, 2, ... etc (unless you're using Python 2.6, where field names are mandatory). That's quite useful most of the time, so most format strings don't bother using field names.
After the field name we can give a "format specifier" or "format spec" which describes how the value is to be presented. A colon : separates the field name from the format spec. If you don't supply a format spec then you get a default one, and most of the time that's adequate. But here we do want a little more control, so we need to supply a format spec.
In a form spec the > sign forces the field to be right-aligned within the available space. After the alignment sign we can provide a number to specify the minimum field width; the field will automatically be made larger if necessary.
For example, '{0:>6}'.format('test') says to put argument 0 ('test') in a space that's at least 6 chars wide, aligned to the right. Which results in the string ' test'.
But a format spec can actually contain a whole new replacement field! This allows us to supply a variable to control the field width. So in my format string {1:>{2}} says to put arg 1 here (s), right aligned in a field with a width given by arg 2 (pad). Only one level of replacement field nesting is permitted, but it's hard to think of a situation where you'd actually want deeper nesting.
So putting it all together: '{0}\n{1:>{2}}\n{0}' tells .format to build a string that starts with arg 0 (stars) using the default format spec, followed by a newline, followed by arg 1 (s) right aligned in a field of width pad, followed by another newline, finally followed by arg 0 (stars) again.
I hope that made enough sense. :)
In Python 3.6+, we could use an f-string:
def banner(s, width=69):
stars = '*' * width
pad = (width + len(s)) // 2
return f'{stars}\n{s:>{pad}}\n{stars}'
you could proceed for example as:
print('*'*80)
print('{msg:^80s}'.format(msg = 'HELLO')) #^ centers the message
print('*'*80)
or if you want to have the text-width dynamic:
def fn(msg, w = 80):
delim = '*'*w
fmt = '{msg:^%ds}'%w
print(delim)
print(fmt.format(msg=msg))
print(delim)
fn('hello')
or slightly generalized version should you need to write to a file:
import sys
def fn(msg, w = 80, F = sys.stdout):
delim = '*'*w
fmt = '{delim:s}\n{msg:^%ds}\n{delim:s}\n'%w
F.write(fmt.format(delim = delim, msg = msg))
fn('hello')
Maybe :
print '*' * 80 + '\n' + ' ' * 38 + 'hello' + '\n' + '*' *80
OR
If it is python3
a = lambda x,c,mess: print(c*x + ('\n' if not mess else mess))
a(80, '*', None)
a(38, ' ', 'Hello')
a(80, '*', None)
Related
In a project of mine, I'm passing strings to a Formatter subclass whic formats it using the format specifier mini-language. In my case it is customized (using the features of the Formatter class) by adding additional bang converters : !u converts the resulting string to lowercase, !c to titlecase, !q doubles any square bracket (because reasons), and some others.
For example, using a = "toFu", "{a!c}" becomes "Tofu"
How could I make my system use f-string syntax, so I can have "{a+a!c}" be turned into "Tofutofu" ?
NB: I'm not asking for a way of making f"{a+a!c}" (note the presence of an f) resolve itself as "Tofutofu", which is what hook into the builtin python f-string format machinery covers, I'm asking if there is a way for a function or any form of python code to turn "{a+a!c}" (note the absence of an f) into "Tofutofu".
Not sure I still fully understand what you need, but from the details given in the question and some comments, here is a function that parses strings with the format you specified and gives the desired results:
import re
def formatter(s):
def replacement(match):
expr, frmt = match[1].split('!')
if frmt == 'c':
return eval(expr).title()
return re.sub(r"{([^{]+)}", replacement, s)
a = "toFu"
print(formatter("blah {a!c}"))
print(formatter("{a+a!c}blah"))
Outputs:
blah Tofu
Tofutofublah
This uses the function variation of the repl argument of the re.sub function. This function (replacement) can be further extended to support all other !xs.
Main disadvantages:
Using eval is evil.
This doesn't take in count regular format specifiers, i.e. :0.3
Maybe someone can take it from here and improve.
Evolved from #Tomerikoo 's life-saving answer, here's the code:
import re
def formatter(s):
def replacement(match):
pre, bangs, suf = match.group(1, 2, 3)
# pre : the part before the first bang
# bangs : the bang (if any) and the characters going with it
# suf : the colon (if any) and the characters going with it
if not bangs:
return eval("f\"{" + pre + suf + "}\"")
conversion = set(bangs[1:]) # the first character is always a bang
sra = conversion - set("tiqulc")
conversion = conversion - sra
if sra:
sra = "!" + "".join(sra)
value = eval("f\"{" + pre + (sra or "") + suf + "}\"")
if "q" in conversion:
value = value.replace("{", "{{")
if "u" in conversion:
value = value.upper()
if "l" in conversion:
value = value.lower()
if "c" in conversion and value:
value = value.capitalize()
return value
return re.sub(r"{([^!:\n]+)((?:![^!:\n]+)?)((?::[^!:\n]+)?)}", replacement, s)
The massive regex results in the three groups I commented about at the top.
Caveat: it still uses eval (no acceptable way around it anyway), it doesn't allow for multiline replacement fields, and it may cause issues and/or discrepancies to put spaces between the ! and the :.
But these are acceptable for the use I have.
Please check specifcation
only those characters are allowed : 's', 'r', or 'a'
https://peps.python.org/pep-0498/
So I have several examples of raw text in which I have to extract the characters after 'Terms'. The common pattern I see is after the word 'Terms' there is a '\n' and also at the end '\n' I want to extract all the characters(words, numbers, symbols) present between these to \n but after keyword 'Terms'.
Some examples of text are given below:
1) \nTERMS \nDirect deposit; Routing #256078514, acct. #160935\n\n'
2) \nTerms\nDue on receipt\nDue Date\n1/31/2021
3) \nTERMS: \nNET 30 DAYS\n
The code I have written is given below:
def get_term_regex(s):
raw_text = s
term_regex1 = r'(TERMS\s*\\n(.*?)\\n)'
try:
if ('TERMS' or 'Terms') in raw_text:
pattern1 = re.search(term_regex1,raw_text)
#print(pattern1)
return pattern1
except:
pass
But I am not getting any output, as there is no match.
The expected output is:
1) Direct deposit; Routing #256078514, acct. #160935
2) Due on receipt
3) NET 30 DAYS
Any help would be really appreciated.
Try the following:
import re
text = '''1) \nTERMS \nDirect deposit; Routing #256078514, acct. #160935\n\n'
2) \nTerms\nDue on receipt\nDue Date\n1/31/2021
3) \nTERMS: \nNET 30 DAYS\n''' # \n are real new lines
for m in re.finditer(r'(TERMS|Terms)\W*\n(.*?)\n', text):
print(m.group(2))
Note that your regex could not deal with the third 'line' because there is a colon : after TERMS. So I replaced \s with \W.
('TERMS' or 'Terms') in raw_text might not be what you want. It does not raise a syntax error, but it is just the same as 'TERMS' in raw_text; when python evaluates the parenthesis part, both 'TERMS' and 'Terms' are all truthy, and therefore python just takes the last truthy value, i.e., 'Terms'. The result is, TERMS cannot be picked up by that part!
So you might instead want someting like ('TERMS' in raw_text) or ('Terms' in raw_text), although it is quite verbose.
How do I use the string format function to pre-pend a specified number of spaces to a string? Everything I search (e.g. this post and this post) tells me to use something like
"{:>15}".format("Hello")
But that will give me 10 spaces in front. What if I always want to put 4 spaces in front, keeping things left-aligned, when the input strings are of variable length? For example:
Hello
Goodbye
I thought of doing
"{:4}{}".format("", "Hello")
Which does work, but then I have to pass in this bogus empty string. Is there a cleaner way to achieve this?
If you have n as your number of spaces required
newString = f'{" "*n}oldstring'
should add n spaces
You can use a helper function and define the number of spaces or type of indent you want:
def indent(word, n = 1, style = ' '):
print(f"{style * n}->{word}")
indent('hello', n = 10)
>> ->hello
indent('hello', n = 10, style = '*')
>>**********->hello
You can change the default value of the n keyword or style according to your needs so that you won't have to always have to use f-strings or format on every output.
This doesn't use format, but textwrap.indent() does what you want.
>>> import textwrap
>>> s = 'hello\n\n \nworld'
>>> textwrap.indent(s, ' ' * 4)
' hello\n\n \n world'
Python also allows you to define your own formatting options. See this question an example of how to override it. In this case, it might look like:
import string
import re
class Template(string.Formatter):
def format_field(self, value, spec):
if re.match('\d+t', spec):
value = ' ' * int(spec[:-1]) + value
spec = ''
return super(Template, self).format_field(value, spec)
Usage:
>>> fmt = Template().format
>>> fmt('{:4t} {}', 'hello', 'world')
hello world
Studying the format string language, I do not see a way to do exactly what you want.
x='''
print '\tone \'piece\' of \"metal\" with \\sharp\\ edge '
print '\nthanks'
'''
exec(x)
I have a string variable (x) and i want to use exec() to print it out correctly. If i have access to the variable i can directly use r' and problem solved, such as :
x=r'''
print '\tone \'piece\' of \"metal\" with \\sharp\\ edge '
print '\nthanks'
'''
exec(x)
But in this case , i don't have access to the variable. so The string variable come from other end or other user. so how can i apply this r' to the existing variable so i can use exec() .
The correct output should be :
one 'piece' of "metal" with \sharp\ edge
thanks
The real case is :
Inside my software, i can create objects and each this object can have properties such as text input and button. I will refer this text input and button properties as object.text_input and object.button. Let say i have 2 objects named as AA and BB and i type an expression/script in AA.text_input (the text input of object1). Now from BB (object2) i want to use the expression entered in AA.text_input and execute it using exec(). So in BB.button i will write a code such as : exec(AA.text_input). So the data i grab from AA will be a string. The problem is the code type in text input of AA may contain any character including escape chars and others. So when i use exec() in BB i will have error because of those chars. So the question : how to bring that string from BB.text_input correctly to AA ?
As far as I understand, once 'x' is declared it losing all memory of what characters were typed in, e.g. it replaces '\n' with a newline character.
The only way I can think of reverting this would be to write a statement to replace all the characters in x with the original characters.
For example:
for char in x:
if char == '\n':
#replace with '\\n'
Edit: One way to do this, for this example, is
x = x.replace('\\', '\\\\')
x = x.replace('\n', '\\n')
x = x.replace('\t', '\\t')
x = x.replace('\"', '\\\"')
x = x.replace('\'', '\\\'')
Pythonistas:
Suppose you want to parse the following string using Pyparsing:
'ABC_123_SPEED_X 123'
were ABC_123 is an identifier; SPEED_X is a parameter, and 123 is a value. I thought of the following BNF using Pyparsing:
Identifier = Word( alphanums + '_' )
Parameter = Keyword('SPEED_X') or Keyword('SPEED_Y') or Keyword('SPEED_Z')
Value = # assume I already have an expression valid for any value
Entry = Identifier + Literal('_') + Parameter + Value
tokens = Entry.parseString('ABC_123_SPEED_X 123')
#Error: pyparsing.ParseException: Expected "_" (at char 16), (line:1, col:17)
If I remove the underscore from the middle (and adjust the Entry definition accordingly) it parses correctly.
How can I make this parser be a bit lazier and wait until it matches the Keyword (as opposed to slurping the entire string as an Identifier and waiting for the _, which does not exist.
Thank you.
[Note: This is a complete rewrite of my question; I had not realized what the real problem was]
I based my answer off of this one, since what you're trying to do is get a non-greedy match. It seems like this is difficult to make happen in pyparsing, but not impossible with some cleverness and compromise. The following seems to work:
from pyparsing import *
Parameter = Literal('SPEED_X') | Literal('SPEED_Y') | Literal('SPEED_Z')
UndParam = Suppress('_') + Parameter
Identifier = SkipTo(UndParam)
Value = Word(nums)
Entry = Identifier + UndParam + Value
When we run this from the interactive interpreter, we can see the following:
>>> Entry.parseString('ABC_123_SPEED_X 123')
(['ABC_123', 'SPEED_X', '123'], {})
Note that this is a compromise; because I use SkipTo, the Identifier can be full of evil, disgusting characters, not just beautiful alphanums with the occasional underscore.
EDIT: Thanks to Paul McGuire, we can concoct a truly elegant solution by setting Identifier to the following:
Identifier = Combine(Word(alphanums) +
ZeroOrMore('_' + ~Parameter + Word(alphanums)))
Let's inspect how this works. First, ignore the outer Combine; we'll get to this later. Starting with Word(alphanums) we know we'll get the 'ABC' part of the reference string, 'ABC_123_SPEED_X 123'. It's important to note that we didn't allow the "word" to contain underscores in this case. We build that separately in to the logic.
Next, we need to capture the '_123' part without also sucking in '_SPEED_X'. Let's also skip over ZeroOrMore at this point and return to it later. We start with the underscore as a Literal, but we can shortcut with just '_', which will get us the leading underscore, but not all of '_123'. Instictively, we would place another Word(alphanums) to capture the rest, but that's exactly what will get us in trouble by consuming all of the remaining '_123_SPEED_X'. Instead, we say, "So long as what follows the underscore is not the Parameter, parse that as part of my Identifier. We state that in pyparsing terms as '_' + ~Parameter + Word(alphanums). Since we assume we can have an arbitrary number of underscore + WordButNotParameter repeats, we wrap that expression a ZeroOrMore construct. (If you always expect at least underscore + WordButNotParameter following the initial, you can use OneOrMore.)
Finally, we need to wrap the initial Word and the special underscore + Word repeats together so that it's understood they are contiguous, not separated by whitespace, so we wrap the whole expression up in a Combine construct. This way 'ABC _123_SPEED_X' will raise a parse error, but 'ABC_123_SPEED_X' will parse correctly.
Note also that I had to change Keyword to Literal because the ways of the former are far too subtle and quick to anger. I do not trust Keywords, nor could I get matching with them.
If you are sure that the identifier never ends with an underscore, you can enforce it in the definition:
from pyparsing import *
my_string = 'ABC_123_SPEED_X 123'
Identifier = Combine(Word(alphanums) + Literal('_') + Word(alphanums))
Parameter = Literal('SPEED_X') | Literal('SPEED_Y') | Literal('SPEED_Z')
Value = Word(nums)
Entry = Identifier + Literal('_').suppress() + Parameter + Value
tokens = Entry.parseString(my_string)
print tokens # prints: ['ABC_123', 'SPEED_X', '123']
If it's not the case but if the identifier length is fixed you can define Identifier like this:
Identifier = Word( alphanums + '_' , exact=7)
You can also parse the identifier and parameter as one token, and split them in a parse action:
from pyparsing import *
import re
def split_ident_and_param(tokens):
mo = re.match(r"^(.*?_.*?)_(.*?_.*?)$", tokens[0])
return [mo.group(1), mo.group(2)]
ident_and_param = Word(alphanums + "_").setParseAction(split_ident_and_param)
value = Word(nums)
entry = ident_and_param + value
print entry.parseString("APC_123_SPEED_X 123")
The example above assumes that the identifiers and parameters always have the format XXX_YYY (containing one single underscore).
If this is not the case, you need to adjust the split_ident_and_param() method.
This answers a question that you probably have also asked yourself: "What's a real-world application for reduce?):
>>> keys = ['CAT', 'DOG', 'HORSE', 'DEER', 'RHINOCEROS']
>>> p = reduce(lambda x, y: x | y, [Keyword(x) for x in keys])
>>> p
{{{{"CAT" | "DOG"} | "HORSE"} | "DEER"} | "RHINOCEROS"}
Edit:
This was a pretty good answer to the original question. I'll have to work on the new one.
Further edit:
I'm pretty sure you can't do what you're trying to do. The parser that pyparsing creates doesn't do lookahead. So if you tell it to match Word(alphanums + '_'), it's going to keep matching characters until it finds one that's not a letter, number, or underscore.