(pyparsing) delimitedList does not work as expected - python

I have a stringized array that I am receiving from an external system. It is stripped of quotes and separated by a comma and a space.
I'm trying to use pyparsing, but I'm only getting the first element of the array. How do I specify that a word must end in alphanumeric ?
value = '[AaAa=Aaa_xx_12,Bxfm=djfn_13, ldfjk=ddd,ttt=ddfs_ddfj_99]'
LBR, RBR = map(pp.Suppress, "[]")
qs = pp.Word(pp.alphas, pp.alphanums + pp.srange("[_=,]"))
qsList = LBR + pp.delimitedList(qs, delim=', ') + RBR
print(value)
print(qsList.parseString(value).asList())
[AaAa=Aaa_xx_12,Bxfm=djfn_13, ldfjk=ddd,ttt=ddfs_ddfj_99]
# pyparsing.exceptions.ParseException: Expected ']', found 'ldfjk' (at char 30), (line:1, col:31)
BR

Thanks for the mental support :D
LBR, RBR = map(pp.Suppress, "[]")
element = pp.Word(pp.alphanums) + pp.Literal('=') + pp.Word(pp.alphanums + pp.srange("[_]"))
qs = pp.Combine(element + pp.OneOrMore(pp.Optional(pp.Literal(',')) + element))
qsList = LBR + pp.delimitedList(qs, delim= ', ') + RBR
print(value)
print(qsList.parseString(value).asList())
#[AaAa=Aaa_xx_12,Bxfm=djfn_13, ldfjk=ddd,ttt=ddfs_ddfj_99]
#['AaAa=Aaa_xx_12,Bxfm=djfn_13', 'ldfjk=ddd,ttt=ddfs_ddfj_99']

Related

Splits based on pyparsing

so I want to do this (but using pyparsing)
Package:numpy11 Package:scipy
will be split into
[["Package:", "numpy11"], ["Package:", "scipy"]]
My code so far is
package_header = Literal("Package:")
single_package = Word(printables + " ") + ~Literal("Package:")
full_parser = OneOrMore( pp.Group( package_header + single_package ) )
The current output is this
([(['Package:', 'numpy11 Package:scipy'], {})], {})
I was hoping for something like this
([(['Package:', 'numpy11'], {})], [(['Package:', 'scipy'], {})], {})
Essentially the rest of the text matches pp.printables
I am aware that I can use Words but I want to do
all printables but not the Literal
How do I accomplish this? Thank you.
You shouldn't need the negative lookahead, ie. this:
from pyparsing import *
package_header = Literal("Package:")
single_package = Word(printables)
full_parser = OneOrMore( Group( package_header + single_package ) )
print full_parser.parseString("Package:numpy11 Package:scipy")
prints:
[['Package:', 'numpy11'], ['Package:', 'scipy']]
Update: to parse packages delimited by | you can use the delimitedList() function (now you can also have spaces in package names):
from pyparsing import *
package_header = Literal("Package:")
package_name = Regex(r'[^|]+') # | is a printable, so create a regex that excludes it.
package = Group(package_header + package_name)
full_parser = delimitedList(package, delim="|" )
print full_parser.parseString("Package:numpy11 foo|Package:scipy")
prints:
[['Package:', 'numpy11 foo'], ['Package:', 'scipy']]

Best way to write understandable and Python-friendly code [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have this code
def testing1(terms, request):
dat = datetime.now(pytz.timezone(geo_timezone(request)))
__start = terms['year']+'-'+terms['month']+'-'+terms['day']+'T'+'00:00:00'+dat.strftime('%z')[:-2]+':'+dat.strftime('%z')[-2:]
__end = terms['year']+'-'+terms['month']+'-'+terms['day']+'T'+'23:59:59'+dat.strftime('%z')[:-2]+':'+dat.strftime('%z')[-2:]
return __start, __end
testing({"month":12,"day":1, "year":"2015"}, request)
But I have a interrogant, whats is the best way to write this code, readable and friendly for the others programers?
Any suggestion for write code extensive in one line like this?
This suggestion is readable?
def testing2(terms, request):
dat = datetime.now(pytz.timezone(geo_timezone(request)))
__start = terms['year'] + '-' + terms['month'] + '-' + terms['day'] + \
'T' + '00:00:00' + dat.strftime('%z')[:-2] + ':' + dat.strftime('%z')[-2:]
__end = terms['year'] + '-' + terms['month'] + '-' + terms['day'] + \
'T' + '23:59:59' + dat.strftime('%z')[:-2] + ':' + dat.strftime('%z')[-2:]
return __start, __end
The only hard part to read is where you build your string so I would use .format(). This way you can see the resulting layout and then all of the corresponding entries.
__start = '{}-{}-{}T00:00:00{}:{}'.format(terms['year'],
terms['month'],
terms['day'],
dat.strftime('%z')[:-2],
dat.strftime('%z')[-2:])
__end = '{}-{}-{}T23:59:59{}:{}'.format(terms['year'],
terms['month'],
terms['day'],
dat.strftime('%z')[:-2],
dat.strftime('%z')[-2:])
Personally, I'd go with that second block and call it a day. If you want, you could try aligning it into groups, or messing with str.format():
__start = terms['year'] + '-' + terms['month'] + '-' + terms['day'] + \
'T00:00:00' + \
dat.strftime('%z')[:-2] + ':' + dat.strftime('%z')[-2:]
__end = terms['year'] + '-' + terms['month'] + '-' + terms['day'] + \
'T23:59:59' + \
dat.strftime('%z')[:-2] + ':' + dat.strftime('%z')[-2:]
__start = ('{}-{}-{}'.format(terms['year'], terms['month'], terms['day']) +
'T00:00:00' +
'{}:{}'.format(dat.strftime('%z')[:-2], dat.strftime('%z')[-2:]))
__end = ('{}-{}-{}'.format(terms['year'], terms['month'], + terms['day']) +
'T23:59:59' +
'{}:{}'.format(dat.strftime('%z')[:-2], dat.strftime('%z')[-2:]))
You can try something like :
__start = ''.join([terms['year'], '-',
terms['month'], '-',
terms['day'], 'T',
'00:00:00',
dat.strftime('%z')[:-2], ':',
dat.strftime('%z')[-2:]
])
Parenthesis, brackets and braces will help you to keep your lines of code < 80 characters
(and here the join method of string objects is more efficient than operator +)
As your post is about coding style it's hard to not mention the PEP8 if you don't know it already.

Change from Combine(Literal('#') + 'spec') to Keyword('#spec') removes whitespace

Why does using Combine(...) preserve the whitespace, whereas Keyword(...) removes thes whitespace?
I need to preserve the whitespace after the matched token.
The test is as follows:
from pyparsing import *
def parse(string, refpattern):
print refpattern.searchString(string)
pattern = StringStart() \
+ SkipTo(refpattern)('previous') \
+ refpattern('ref') \
+ SkipTo(StringEnd())('rest')
print pattern.parseString(string)
string = "With #ref to_something"
identifier = Combine(Word(alphas + '_', alphanums + '_') + Optional('.' + Word(alphas)))
pattern_without_space = (CaselessKeyword('#ref') | CaselessKeyword(r'\ref')).setParseAction(lambda s, l, t: ['ref']) \
+ White().suppress() + identifier
pattern_with_space = Combine((Literal('#') | Literal('\\')).suppress() + 'ref') + White().suppress() + identifier
parse(string, pattern_without_space)
parse(string, pattern_with_space)
will output:
[['ref', 'to_something']]
['With', 'ref', 'to_something', '']
[['ref', 'to_something']]
['With ', 'ref', 'to_something', '']
# ^ space i need is preserved here
The problem happens when using alternation (the | operator) with CaselessKeyword. See these examples:
from pyparsing import *
theString = 'This is #Foo Bar'
identifier = Combine(Word(alphas + '_', alphanums + '_') + Optional('.' + Word(alphas)))
def testParser(p):
q = StringStart() + SkipTo(p)("previous") + p("body") + SkipTo(StringEnd())("rest")
return q.parseString(theString)
def test7():
p0 = (CaselessKeyword('#Foo') | Literal('#qwe')) + White().suppress() + identifier
p1 = (CaselessKeyword('#Foo') | CaselessKeyword('#qwe')) + White().suppress() + identifier
p2 = (Literal('#qwe') | CaselessKeyword('#Foo')) + White().suppress() + identifier
p3 = (CaselessKeyword('#Foo')) + White().suppress() + identifier
p4 = Combine((Literal('#') | Literal('\\')).suppress() + 'Foo') + White().suppress() + identifier
print "p0:", testParser(p0)
print "p1:", testParser(p1)
print "p2:", testParser(p2)
print "p3:", testParser(p3)
print "p4:", testParser(p4)
test7()
The output is:
p0: ['This is', '#Foo', 'Bar', '']
p1: ['This is', '#Foo', 'Bar', '']
p2: ['This is', '#Foo', 'Bar', '']
p3: ['This is ', '#Foo', 'Bar', '']
p4: ['This is ', 'Foo', 'Bar', '']
Perhaps this is a bug?
Update: This is how you could define your own parser to match either #Foo or \Foo as a keyword:
from pyparsing import *
import string
class FooKeyWord(Token):
alphas = string.ascii_lowercase + string.ascii_uppercase
nums = "0123456789"
alphanums = alphas + nums
def __init__(self):
super(FooKeyWord,self).__init__()
self.identChars = alphanums+"_$"
self.name = "#Foo"
def parseImpl(self, instring, loc, doActions = True):
if (instring[loc] in ['#', '\\'] and
instring.startswith('Foo', loc+1) and
(loc+4 >= len(instring) or instring[loc+4] not in self.identChars) and
(loc == 0 or instring[loc-1].upper() not in self.identChars)):
return loc+4, instring[loc] + 'Foo'
raise ParseException(instring, loc, self.errmsg, self)
def test8():
p = FooKeyWord() + White().suppress() + identifier
q = StringStart() + SkipTo(p)("previous") + p("body") + SkipTo(StringEnd())("rest")
print "with #Foo:", q.parseString("This is #Foo Bar")
print "with \\Foo:", q.parseString("This is \\Foo Bar")
And the output:
with #Foo: ['This is ', '#Foo', 'Bar', '']
with \Foo: ['This is ', '\\Foo', 'Bar', '']

How to get pyparser to work in a particular form

Sorry for the sorry title. I could not think of anything better
I am trying to implement a DSL with pyparsing that has the following requirements:
varaibles: All of them begin with v_
Unary operators: +, -
Binary operators: +,-,*,/,%
Constant numbers
Functions, like normal functions when they have just one variable
Functions need to work like this: foo(v_1+v_2) = foo(v_1) + foo(v_2), foo(bar(10*v_6))=foo(bar(10))*foo(bar(v_6)). It should be the case for any binary operation
I am able to get 1-5 working
This is the code I have so far
from pyparsing import *
exprstack = []
#~ #traceParseAction
def pushFirst(tokens):
exprstack.insert(0,tokens[0])
# define grammar
point = Literal( '.' )
plusorminus = Literal( '+' ) | Literal( '-' )
number = Word( nums )
integer = Combine( Optional( plusorminus ) + number )
floatnumber = Combine( integer +
Optional( point + Optional( number ) ) +
Optional( integer )
)
ident = Combine("v_" + Word(nums))
plus = Literal( "+" )
minus = Literal( "-" )
mult = Literal( "*" )
div = Literal( "/" )
cent = Literal( "%" )
lpar = Literal( "(" ).suppress()
rpar = Literal( ")" ).suppress()
addop = plus | minus
multop = mult | div | cent
expop = Literal( "^" )
band = Literal( "#" )
# define expr as Forward, since we will reference it in atom
expr = Forward()
fn = Word( alphas )
atom = ( ( floatnumber | integer | ident | ( fn + lpar + expr + rpar ) ).setParseAction(pushFirst) |
( lpar + expr.suppress() + rpar ))
factor = Forward()
factor << atom + ( ( band + factor ).setParseAction( pushFirst ) | ZeroOrMore( ( expop + factor ).setParseAction( pushFirst ) ) )
term = factor + ZeroOrMore( ( multop + factor ).setParseAction( pushFirst ) )
expr << term + ZeroOrMore( ( addop + term ).setParseAction( pushFirst ) )
print(expr)
bnf = expr
pattern = bnf + StringEnd()
def test(s):
del exprstack[:]
bnf.parseString(s,parseAll=True)
print exprstack
test("avg(+10)")
test("v_1+8")
test("avg(v_1+10)+10")
Here is the what I want.
My functions are of this type:
foo(v_1+v_2) = foo(v_1) + foo(v_2)
The same behaviour is expected for any other binary operation as well. I have no idea how to make the parser do this automatically.
Break out the function call as a separate sub expression:
function_call = fn + lpar + expr + rpar
Then add a parse action to function_call that pops the operators and operands from expr_stack, then pushes them back onto the stack:
if an operand, push operand then function
if an operator, push the operator
Since you are only doing binary operations, you might be better off doing a simple approach first:
expr = Forward()
identifier = Word(alphas+'_', alphanums+'_')
expr = Forward()
function_call = Group(identifier + LPAR + Group(expr) + RPAR)
unop = oneOf("+ -")
binop = oneOf("+ - * / %")
operand = Group(Optional(unop) + (function_call | number | identifier))
binexpr = operand + binop + operand
expr << (binexpr | operand)
bnf = expr
This gives you a simpler structure to work with, without having to mess with exprstack.
def test(s):
exprtokens = bnf.parseString(s,parseAll=True)
print exprtokens
test("10")
test("10+20")
test("avg(10)")
test("avg(+10)")
test("column_1+8")
test("avg(column_1+10)+10")
Gives:
[['10']]
[['10'], '+', ['20']]
[[['avg', [['10']]]]]
[[['avg', [['+', '10']]]]]
[['column_1'], '+', ['8']]
[[['avg', [['column_1'], '+', ['10']]]], '+', ['10']]
You want to expand fn(a op b) to fn(a) op fn(b), but fn(a) should be left alone, so you need to test on the length of the parsed expression argument:
def distribute_function(tokens):
# unpack function name and arguments
fname, args = tokens[0]
# if args contains an expression, expand it
if len(args) > 1:
ret = ParseResults([])
for i,a in enumerate(args):
if i % 2 == 0:
# even args are operands to be wrapped in the function
ret += ParseResults([ParseResults([fname,ParseResults([a])])])
else:
# odd args are operators, just add them to the results
ret += ParseResults([a])
return ParseResults([ret])
function_call.setParseAction(distribute_function)
Now your calls to test will look like:
[['10']]
[['10'], '+', ['20']]
[[['avg', [['10']]]]]
[[['avg', [['+', '10']]]]]
[['column_1'], '+', ['8']]
[[[['avg', [['column_1']]], '+', ['avg', [['10']]]]], '+', ['10']]
This should even work recursively with a call like fna(fnb(3+2)+fnc(4+9)).

Python truncate a long string

How does one truncate a string to 75 characters in Python?
This is how it is done in JavaScript:
var data="saddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsaddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsadddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
var info = (data.length > 75) ? data.substring[0,75] + '..' : data;
info = (data[:75] + '..') if len(data) > 75 else data
Even more concise:
data = data[:75]
If it is less than 75 characters there will be no change.
Even shorter :
info = data[:75] + (data[75:] and '..')
If you are using Python 3.4+, you can use textwrap.shorten from the standard library:
Collapse and truncate the given text to fit in the given width.
First the whitespace in text is collapsed (all whitespace is replaced
by single spaces). If the result fits in the width, it is returned.
Otherwise, enough words are dropped from the end so that the remaining
words plus the placeholder fit within width:
>>> textwrap.shorten("Hello world!", width=12)
'Hello world!'
>>> textwrap.shorten("Hello world!", width=11)
'Hello [...]'
>>> textwrap.shorten("Hello world", width=10, placeholder="...")
'Hello...'
For a Django solution (which has not been mentioned in the question):
from django.utils.text import Truncator
value = Truncator(value).chars(75)
Have a look at Truncator's source code to appreciate the problem:
https://github.com/django/django/blob/master/django/utils/text.py#L66
Concerning truncation with Django:
Django HTML truncation
With regex:
re.sub(r'^(.{75}).*$', '\g<1>...', data)
Long strings are truncated:
>>> data="11111111112222222222333333333344444444445555555555666666666677777777778888888888"
>>> re.sub(r'^(.{75}).*$', '\g<1>...', data)
'111111111122222222223333333333444444444455555555556666666666777777777788888...'
Shorter strings never get truncated:
>>> data="11111111112222222222333333"
>>> re.sub(r'^(.{75}).*$', '\g<1>...', data)
'11111111112222222222333333'
This way, you can also "cut" the middle part of the string, which is nicer in some cases:
re.sub(r'^(.{5}).*(.{5})$', '\g<1>...\g<2>', data)
>>> data="11111111112222222222333333333344444444445555555555666666666677777777778888888888"
>>> re.sub(r'^(.{5}).*(.{5})$', '\g<1>...\g<2>', data)
'11111...88888'
limit = 75
info = data[:limit] + '..' * (len(data) > limit)
This method doesn't use any if:
data[:75] + bool(data[75:]) * '..'
This just in:
n = 8
s = '123'
print s[:n-3] + (s[n-3:], '...')[len(s) > n]
s = '12345678'
print s[:n-3] + (s[n-3:], '...')[len(s) > n]
s = '123456789'
print s[:n-3] + (s[n-3:], '...')[len(s) > n]
s = '123456789012345'
print s[:n-3] + (s[n-3:], '...')[len(s) > n]
123
12345678
12345...
12345...
info = data[:75] + ('..' if len(data) > 75 else '')
info = data[:min(len(data), 75)
You can't actually "truncate" a Python string like you can do a dynamically allocated C string. Strings in Python are immutable. What you can do is slice a string as described in other answers, yielding a new string containing only the characters defined by the slice offsets and step.
In some (non-practical) cases this can be a little annoying, such as when you choose Python as your interview language and the interviewer asks you to remove duplicate characters from a string in-place. Doh.
Yet another solution. With True and False you get a little feedback about the test at the end.
data = {True: data[:75] + '..', False: data}[len(data) > 75]
Coming very late to the party I want to add my solution to trim text at character level that also handles whitespaces properly.
def trim_string(s: str, limit: int, ellipsis='…') -> str:
s = s.strip()
if len(s) > limit:
return s[:limit-1].strip() + ellipsis
return s
Simple, but it will make sure you that hello world with limit=6 will not result in an ugly hello … but hello… instead.
It also removes leading and trailing whitespaces, but not spaces inside. If you also want to remove spaces inside, checkout this stackoverflow post
>>> info = lambda data: len(data)>10 and data[:10]+'...' or data
>>> info('sdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdf')
'sdfsdfsdfs...'
>>> info('sdfsdf')
'sdfsdf'
>>>
Simple and short helper function:
def truncate_string(value, max_length=255, suffix='...'):
string_value = str(value)
string_truncated = string_value[:min(len(string_value), (max_length - len(suffix)))]
suffix = (suffix if len(string_value) > max_length else '')
return string_truncated+suffix
Usage examples:
# Example 1 (default):
long_string = ""
for number in range(1, 1000):
long_string += str(number) + ','
result = truncate_string(long_string)
print(result)
# Example 2 (custom length):
short_string = 'Hello world'
result = truncate_string(short_string, 8)
print(result) # > Hello...
# Example 3 (not truncated):
short_string = 'Hello world'
result = truncate_string(short_string)
print(result) # > Hello world
If you wish to do some more sophisticated string truncate you can adopt sklearn approach as implement by:
sklearn.base.BaseEstimator.__repr__
(See Original full code at: https://github.com/scikit-learn/scikit-learn/blob/f3f51f9b6/sklearn/base.py#L262)
It adds benefits such as avoiding truncate in the middle of the word.
def truncate_string(data, N_CHAR_MAX=70):
# N_CHAR_MAX is the (approximate) maximum number of non-blank
# characters to render. We pass it as an optional parameter to ease
# the tests.
lim = N_CHAR_MAX // 2 # apprx number of chars to keep on both ends
regex = r"^(\s*\S){%d}" % lim
# The regex '^(\s*\S){%d}' % n
# matches from the start of the string until the nth non-blank
# character:
# - ^ matches the start of string
# - (pattern){n} matches n repetitions of pattern
# - \s*\S matches a non-blank char following zero or more blanks
left_lim = re.match(regex, data).end()
right_lim = re.match(regex, data[::-1]).end()
if "\n" in data[left_lim:-right_lim]:
# The left side and right side aren't on the same line.
# To avoid weird cuts, e.g.:
# categoric...ore',
# we need to start the right side with an appropriate newline
# character so that it renders properly as:
# categoric...
# handle_unknown='ignore',
# so we add [^\n]*\n which matches until the next \n
regex += r"[^\n]*\n"
right_lim = re.match(regex, data[::-1]).end()
ellipsis = "..."
if left_lim + len(ellipsis) < len(data) - right_lim:
# Only add ellipsis if it results in a shorter repr
data = data[:left_lim] + "..." + data[-right_lim:]
return data
There's no need for a regular expression but you do want to use string formatting rather than the string concatenation in the accepted answer.
This is probably the most canonical, Pythonic way to truncate the string data at 75 characters.
>>> data = "saddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsaddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsadddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
>>> info = "{}..".format(data[:75]) if len(data) > 75 else data
>>> info
'111111111122222222223333333333444444444455555555556666666666777777777788888...'
Here's a function I made as part of a new String class... It allows adding a suffix ( if the string is size after trimming and adding it is long enough - although you don't need to force the absolute size )
I was in the process of changing a few things around so there are some useless logic costs ( if _truncate ... for instance ) where it is no longer necessary and there is a return at the top...
But, it is still a good function for truncating data...
##
## Truncate characters of a string after _len'nth char, if necessary... If _len is less than 0, don't truncate anything... Note: If you attach a suffix, and you enable absolute max length then the suffix length is subtracted from max length... Note: If the suffix length is longer than the output then no suffix is used...
##
## Usage: Where _text = 'Testing', _width = 4
## _data = String.Truncate( _text, _width ) == Test
## _data = String.Truncate( _text, _width, '..', True ) == Te..
##
## Equivalent Alternates: Where _text = 'Testing', _width = 4
## _data = String.SubStr( _text, 0, _width ) == Test
## _data = _text[ : _width ] == Test
## _data = ( _text )[ : _width ] == Test
##
def Truncate( _text, _max_len = -1, _suffix = False, _absolute_max_len = True ):
## Length of the string we are considering for truncation
_len = len( _text )
## Whether or not we have to truncate
_truncate = ( False, True )[ _len > _max_len ]
## Note: If we don't need to truncate, there's no point in proceeding...
if ( not _truncate ):
return _text
## The suffix in string form
_suffix_str = ( '', str( _suffix ) )[ _truncate and _suffix != False ]
## The suffix length
_len_suffix = len( _suffix_str )
## Whether or not we add the suffix
_add_suffix = ( False, True )[ _truncate and _suffix != False and _max_len > _len_suffix ]
## Suffix Offset
_suffix_offset = _max_len - _len_suffix
_suffix_offset = ( _max_len, _suffix_offset )[ _add_suffix and _absolute_max_len != False and _suffix_offset > 0 ]
## The truncate point.... If not necessary, then length of string.. If necessary then the max length with or without subtracting the suffix length... Note: It may be easier ( less logic cost ) to simply add the suffix to the calculated point, then truncate - if point is negative then the suffix will be destroyed anyway.
## If we don't need to truncate, then the length is the length of the string.. If we do need to truncate, then the length depends on whether we add the suffix and offset the length of the suffix or not...
_len_truncate = ( _len, _max_len )[ _truncate ]
_len_truncate = ( _len_truncate, _max_len )[ _len_truncate <= _max_len ]
## If we add the suffix, add it... Suffix won't be added if the suffix is the same length as the text being output...
if ( _add_suffix ):
_text = _text[ 0 : _suffix_offset ] + _suffix_str + _text[ _suffix_offset: ]
## Return the text after truncating...
return _text[ : _len_truncate ]
Suppose that stryng is a string which we wish to truncate and that nchars is the number of characters desired in the output string.
stryng = "sadddddddddddddddddddddddddddddddddddddddddddddddddd"
nchars = 10
We can truncate the string as follows:
def truncate(stryng:str, nchars:int):
return (stryng[:nchars - 6] + " [...]")[:min(len(stryng), nchars)]
The results for certain test cases are shown below:
s = "sadddddddddddddddddddddddddddddd!"
s = "sa" + 30*"d" + "!"
truncate(s, 2) == sa
truncate(s, 4) == sadd
truncate(s, 10) == sadd [...]
truncate(s, len(s)//2) == sadddddddd [...]
My solution produces reasonable results for the test cases above.
However, some pathological cases are shown below:
Some Pathological Cases!
truncate(s, len(s) - 3)() == sadddddddddddddddddddddd [...]
truncate(s, len(s) - 2)() == saddddddddddddddddddddddd [...]
truncate(s, len(s) - 1)() == sadddddddddddddddddddddddd [...]
truncate(s, len(s) + 0)() == saddddddddddddddddddddddddd [...]
truncate(s, len(s) + 1)() == sadddddddddddddddddddddddddd [...
truncate(s, len(s) + 2)() == saddddddddddddddddddddddddddd [..
truncate(s, len(s) + 3)() == sadddddddddddddddddddddddddddd [.
truncate(s, len(s) + 4)() == saddddddddddddddddddddddddddddd [
truncate(s, len(s) + 5)() == sadddddddddddddddddddddddddddddd
truncate(s, len(s) + 6)() == sadddddddddddddddddddddddddddddd!
truncate(s, len(s) + 7)() == sadddddddddddddddddddddddddddddd!
truncate(s, 9999)() == sadddddddddddddddddddddddddddddd!
Notably,
When the string contains new-line characters (\n) there could be an issue.
When nchars > len(s) we should print string s without trying to print the "[...]"
Below is some more code:
import io
class truncate:
"""
Example of Code Which Uses truncate:
```
s = "\r<class\n 'builtin_function_or_method'>"
s = truncate(s, 10)()
print(s)
```
Examples of Inputs and Outputs:
truncate(s, 2)() == \r
truncate(s, 4)() == \r<c
truncate(s, 10)() == \r<c [...]
truncate(s, 20)() == \r<class\n 'bu [...]
truncate(s, 999)() == \r<class\n 'builtin_function_or_method'>
```
Other Notes:
Returns a modified copy of string input
Does not modify the original string
"""
def __init__(self, x_stryng: str, x_nchars: int) -> str:
"""
This initializer mostly exists to sanitize function inputs
"""
try:
stryng = repr("".join(str(ch) for ch in x_stryng))[1:-1]
nchars = int(str(x_nchars))
except BaseException as exc:
invalid_stryng = str(x_stryng)
invalid_stryng_truncated = repr(type(self)(invalid_stryng, 20)())
invalid_x_nchars = str(x_nchars)
invalid_x_nchars_truncated = repr(type(self)(invalid_x_nchars, 20)())
strm = io.StringIO()
print("Invalid Function Inputs", file=strm)
print(type(self).__name__, "(",
invalid_stryng_truncated,
", ",
invalid_x_nchars_truncated, ")", sep="", file=strm)
msg = strm.getvalue()
raise ValueError(msg) from None
self._stryng = stryng
self._nchars = nchars
def __call__(self) -> str:
stryng = self._stryng
nchars = self._nchars
return (stryng[:nchars - 6] + " [...]")[:min(len(stryng), nchars)]
Here's a simple function that will truncate a given string from either side:
def truncate(string, length=75, beginning=True, insert='..'):
'''Shorten the given string to the given length.
An ellipsis will be added to the section trimmed.
:Parameters:
length (int) = The maximum allowed length before trunicating.
beginning (bool) = Trim starting chars, else; ending.
insert (str) = Chars to add at the trimmed area. (default: ellipsis)
:Return:
(str)
ex. call: truncate('12345678', 4)
returns: '..5678'
'''
if len(string)>length:
if beginning: #trim starting chars.
string = insert+string[-length:]
else: #trim ending chars.
string = string[:length]+insert
return string
Here I use textwrap.shorten and handle more edge cases. also include part of the last word in case this word is more than 50% of the max width.
import textwrap
def shorten(text: str, width=30, placeholder="..."):
"""Collapse and truncate the given text to fit in the given width.
The text first has its whitespace collapsed. If it then fits in the *width*, it is returned as is.
Otherwise, as many words as possible are joined and then the placeholder is appended.
"""
if not text or not isinstance(text, str):
return str(text)
t = text.strip()
if len(t) <= width:
return t
# textwrap.shorten also throws ValueError if placeholder too large for max width
shorten_words = textwrap.shorten(t, width=width, placeholder=placeholder)
# textwrap.shorten doesn't split words, so if the text contains a long word without spaces, the result may be too short without this word.
# Here we use a different way to include the start of this word in case shorten_words is less than 50% of `width`
if len(shorten_words) - len(placeholder) < (width - len(placeholder)) * 0.5:
return t[:width - len(placeholder)].strip() + placeholder
return shorten_words
Tests:
>>> shorten("123 456", width=7, placeholder="...")
'123 456'
>>> shorten("1 23 45 678 9", width=12, placeholder="...")
'1 23 45...'
>>> shorten("1 23 45 678 9", width=10, placeholder="...")
'1 23 45...'
>>> shorten("01 23456789", width=10, placeholder="...")
'01 2345...'
>>> shorten("012 3 45678901234567", width=17, placeholder="...")
'012 3 45678901...'
>>> shorten("1 23 45 678 9", width=9, placeholder="...")
'1 23...'
>>> shorten("1 23456", width=5, placeholder="...")
'1...'
>>> shorten("123 456", width=5, placeholder="...")
'12...'
>>> shorten("123 456", width=6, placeholder="...")
'123...'
>>> shorten("12 3456789", width=9, placeholder="...")
'12 345...'
>>> shorten(" 12 3456789 ", width=9, placeholder="...")
'12 345...'
>>> shorten('123 45', width=4, placeholder="...")
'1...'
>>> shorten('123 45', width=3, placeholder="...")
'...'
>>> shorten("123456", width=3, placeholder="...")
'...'
>>> shorten([1], width=9, placeholder="...")
'[1]'
>>> shorten(None, width=5, placeholder="...")
'None'
>>> shorten("", width=9, placeholder="...")
''

Categories