Thought exercise: What is the "best" way to write a Python function that takes a regex pattern or a string to match exactly:
import re
strings = [...]
def do_search(matcher):
"""
Returns strings matching matcher, which can be either a string
(for exact match) or a compiled regular expression object
(for more complex matches).
"""
if not is_a_regex_pattern(matcher):
matcher = re.compile('%s$' % re.escape(matcher))
for s in strings:
if matcher.match(s):
yield s
So, ideas for the implementation of is_a_regex_pattern()?
You can access the _sre.SRE_Pattern type via re._pattern_type:
if not isinstance(matcher, re._pattern_type):
matcher = re.compile('%s$' % re.escape(matcher))
Below is a demonstration:
>>> import re
>>> re._pattern_type
<class '_sre.SRE_Pattern'>
>>> isinstance(re.compile('abc'), re._pattern_type)
True
>>>
Or, make it quack:
try:
does_match = matcher.match(s)
except AttributeError:
does_match = re.match(matcher.s)
if does_match:
yield s
In other words, treat matcher as if it already were a compiled regular expression. And if that breaks, then treat it like a string that needs to be compiled.
This is called Duck Typing. Not everyone agrees that exceptions should be used like this for routine contingencies. This is the ask-permission versus ask-forgiveness debate. Python is more amenable to forgiveness than most languages.
Not a string:
def is_a_regex_pattern(s):
return not isinstance(s, basestring)
Is a _sre.SRE_Pattern (though that's not importable, so use a gross string match):
def is_a_regex_pattern(s):
return s.__class__.__name__ == 'SRE_Pattern'
You can re-compile a SRE_Pattern and it seems to evaluate the same.
def is_a_regex_pattern(s):
return s == re.compile(s)
You could test, if matcher has an method match:
import re
def do_search(matcher, strings):
"""
Returns strings matching matcher, which can be either a string
(for exact match) or a compiled regular expression object
(for more complex matches).
"""
if hasattr(matcher, 'match'):
test = matcher.match
else:
test = lambda s: matcher==s
for s in strings:
if test(s):
yield s
You should not use global variables, but use a second parameter.
On Python 3.7, re._pattern_type was renamed to re.Pattern
https://stackoverflow.com/a/27366172/895245 therefore broke at that point, as re._pattern_type is not defined.
While re.Pattern looks nicer and will therefore hopefully be more stable, it is not mentioned at all in the docs: https://docs.python.org/3/library/re.html#regular-expression-objects so maybe it is not a good idea to rely on it.
https://stackoverflow.com/a/46779329/895245 does make some sense. But what is someday the str class adds a .match method and it does something completely different? :-) Ah, the joys of typeless languages.
So I think I'm going with:
import re
_takes_s_or_re_type = type(re.compile(''))
def takes_s_or_re(s_or_re):
if isinstance(s_or_re, _takes_s_or_re_type):
return 0
else:
return 1
assert takes_s_or_re(re.compile('a.c')) == 0
assert takes_s_or_re('a.c') == 1
as this can only break when a public API breaks.
Tested on Python 3.8.0.
Related
I'm using a lambda function to extract the number in a string:
text = "some text with a number: 31"
get_number = lambda info,pattern: re.search('{}\s*(\d)'.format(pattern),info.lower()).group(1) if re.search('{}\s*(\d)'.format(pattern),info.lower()) else None
get_number(text,'number:')
How can I avoid to make this operation twice?:
re.search('{}\s*(\d)'.format(pattern),info.lower()
You can use findall() instead, it handles a no match gracefully. or is the only statement needed to satisfy the return conditions. The None is evaluated last, thus returned if an empty list is found (implicit truthiness of literals like lists).
>>> get_number = lambda info,pattern: re.findall('{}\s*(\d)'.format(pattern),info.lower()) or None
>>> print get_number(text, 'number:')
['3']
>>> print get_number(text, 'Hello World!')
>>>
That being said, I'd recommend defining a regular named function using def instead. You can extract more complex parts of this code to variables, leading to an easier to follow algorithm. Writing long anonymous function can lead to code smells. Something similar to below:
def get_number(source_text, pattern):
regex = '{}\s*(\d)'.format(pattern)
matches = re.findall(regex, source_text.lower())
return matches or None
This is super ugly, not going to lie, but it does work and avoids returning a match object if it's found, but does return None when it's not:
lambda info,pattern: max(re.findall('{}\s*(\d)'.format(pattern),info.lower()),[None],key=lambda x: x != [])[0]
Is there a graceful way to get names of named %s-like variables of string object?
Like this:
string = '%(a)s and %(b)s are friends.'
names = get_names(string) # ['a', 'b']
Known alternative ways:
Parse names using regular expression, e.g.:
import re
names = re.findall(r'%\((\w)\)[sdf]', string) # ['a', 'b']
Use .format()-compatible formating and Formatter().parse(string).
How to get the variable names from the string for the format() method
But what about a string with %s-like variables?
PS: python 2.7
In order to answer this question, you need to define "graceful". Several factors might be worth considering:
Is the code short, easy to remember, easy to write, and self explanatory?
Does it reuse the underlying logic (i.e. follow the DRY principle)?
Does it implement exactly the same parsing logic?
Unfortunately, the "%" formatting for strings is implemented in the C routine "PyString_Format" in stringobject.c. This routine does not provide an API or hooks that allow access to a parsed form of the format string. It simply builds up the result as it is parsing the format string. Thus any solution will need to duplicate the parsing logic from the C routine. This means DRY is not followed and exposes any solution to breaking if a change is made to the formatting specification.
The parsing algorithm in PyString_Format includes a fair bit of complexity, including handling nested parentheses in key names, so cannot be fully implemented using regular expression nor using string "split()". Short of copying the C code from PyString_Format and converting it to Python code, I do not see any remotely easy way of correctly extracting the names of the mapping keys under all circumstances.
So my conclusion is that there is no "graceful" way to obtain the names of the mapping keys for a Python 2.7 "%" format string.
The following code uses a regular expression to provide a partial solution that covers most common usage:
import re
class StringFormattingParser(object):
__matcher = re.compile(r'(?<!%)%\(([^)]+)\)[-# +0-9.hlL]*[diouxXeEfFgGcrs]')
#classmethod
def getKeyNames(klass, formatString):
return klass.__matcher.findall(formatString)
# Demonstration of use with some sample format strings
for value in [
'%(a)s and %(b)s are friends.',
'%%(nomatch)i',
'%%',
'Another %(matched)+4.5f%d%% example',
'(%(should_match(but does not))s',
]:
print StringFormattingParser.getKeyNames(value)
# Note the following prints out "really does match"!
print '%(should_match(but does not))s' % {'should_match(but does not)': 'really does match'}
P.S. DRY = Don't Repeat Yourself (https://en.wikipedia.org/wiki/Don%27t_repeat_yourself)
You could also do this:
[y[0] for y in [x.split(')') for x in s.split('%(')] if len(y)>1]
Don't know if this qualifies as graceful in your book, but here's a short function that parses out the names. No error checking, so it will fail for malformed format strings.
def get_names(s):
i = s.find('%')
while 0 <= i < len(s) - 3:
if s[i+1] == '(':
yield(s[i+2:s.find(')', i)])
i = s.find('%', i+2)
string = 'abd %(one) %%(two) 99 %%%(three)'
list(get_names(string) #=> ['one', 'three']
Also, you can reduce this %-task to Formater-solution.
>>> import re
>>> from string import Formatter
>>>
>>> string = '%(a)s and %(b)s are friends.'
>>>
>>> string = re.sub('((?<!%)%(\((\w)\)s))', '{\g<3>}', string)
>>>
>>> tuple(fn[1] for fn in Formatter().parse(string) if fn[1] is not None)
('a', 'b')
>>>
In this case you can use both variants of formating, I suppose.
The regular expression in it depends on what you want.
>>> re.sub('((?<!%)%(\((\w)\)s))', '{\g<3>}', '%(a)s and %(b)s are %(c)s friends.')
'{a} and {b} are {c} friends.'
>>> re.sub('((?<!%)%(\((\w)\)s))', '{\g<3>}', '%(a)s and %(b)s are %%(c)s friends.')
'{a} and {b} are %%(c)s friends.'
>>> re.sub('((?<!%)%(\((\w)\)s))', '{\g<3>}', '%(a)s and %(b)s are %%%(c)s friends.')
'{a} and {b} are %%%(c)s friends.'
I love using the expression
if 'MICHAEL89' in USERNAMES:
...
where USERNAMES is a list.
Is there any way to match items with case insensitivity or do I need to use a custom method? Just wondering if there is a need to write extra code for this.
username = 'MICHAEL89'
if username.upper() in (name.upper() for name in USERNAMES):
...
Alternatively:
if username.upper() in map(str.upper, USERNAMES):
...
Or, yes, you can make a custom method.
str.casefold is recommended for case-insensitive string matching. #nmichaels's solution can trivially be adapted.
Use either:
if 'MICHAEL89'.casefold() in (name.casefold() for name in USERNAMES):
Or:
if 'MICHAEL89'.casefold() in map(str.casefold, USERNAMES):
As per the docs:
Casefolding is similar to lowercasing but more aggressive because it
is intended to remove all case distinctions in a string. For example,
the German lowercase letter 'ß' is equivalent to "ss". Since it is
already lowercase, lower() would do nothing to 'ß'; casefold()
converts it to "ss".
I would make a wrapper so you can be non-invasive. Minimally, for example...:
class CaseInsensitively(object):
def __init__(self, s):
self.__s = s.lower()
def __hash__(self):
return hash(self.__s)
def __eq__(self, other):
# ensure proper comparison between instances of this class
try:
other = other.__s
except (TypeError, AttributeError):
try:
other = other.lower()
except:
pass
return self.__s == other
Now, if CaseInsensitively('MICHAEL89') in whatever: should behave as required (whether the right-hand side is a list, dict, or set). (It may require more effort to achieve similar results for string inclusion, avoid warnings in some cases involving unicode, etc).
Usually (in oop at least) you shape your object to behave the way you want. name in USERNAMES is not case insensitive, so USERNAMES needs to change:
class NameList(object):
def __init__(self, names):
self.names = names
def __contains__(self, name): # implements `in`
return name.lower() in (n.lower() for n in self.names)
def add(self, name):
self.names.append(name)
# now this works
usernames = NameList(USERNAMES)
print someone in usernames
The great thing about this is that it opens the path for many improvements, without having to change any code outside the class. For example, you could change the self.names to a set for faster lookups, or compute the (n.lower() for n in self.names) only once and store it on the class and so on ...
Here's one way:
if string1.lower() in string2.lower():
...
For this to work, both string1 and string2 objects must be of type string.
I think you have to write some extra code. For example:
if 'MICHAEL89' in map(lambda name: name.upper(), USERNAMES):
...
In this case we are forming a new list with all entries in USERNAMES converted to upper case and then comparing against this new list.
Update
As #viraptor says, it is even better to use a generator instead of map. See #Nathon's answer.
You could do
matcher = re.compile('MICHAEL89', re.IGNORECASE)
filter(matcher.match, USERNAMES)
Update: played around a bit and am thinking you could get a better short-circuit type approach using
matcher = re.compile('MICHAEL89', re.IGNORECASE)
if any( ifilter( matcher.match, USERNAMES ) ):
#your code here
The ifilter function is from itertools, one of my favorite modules within Python. It's faster than a generator but only creates the next item of the list when called upon.
To have it in one line, this is what I did:
if any(([True if 'MICHAEL89' in username.upper() else False for username in USERNAMES])):
print('username exists in list')
I didn't test it time-wise though. I am not sure how fast/efficient it is.
Example from this tutorial:
list1 = ["Apple", "Lenovo", "HP", "Samsung", "ASUS"]
s = "lenovo"
s_lower = s.lower()
res = s_lower in (string.lower() for string in list1)
print(res)
My 5 (wrong) cents
'a' in "".join(['A']).lower()
UPDATE
Ouch, totally agree #jpp, I'll keep as an example of bad practice :(
I needed this for a dictionary instead of list, Jochen solution was the most elegant for that case so I modded it a bit:
class CaseInsensitiveDict(dict):
''' requests special dicts are case insensitive when using the in operator,
this implements a similar behaviour'''
def __contains__(self, name): # implements `in`
return name.casefold() in (n.casefold() for n in self.keys())
now you can convert a dictionary like so USERNAMESDICT = CaseInsensitiveDict(USERNAMESDICT) and use if 'MICHAEL89' in USERNAMESDICT:
I find that in lots of different projects I'm writing a lot of code where I need to evaluate a (moderately complex, possibly costly-to-evaluate) expression and then do something with it (e.g. use it for string formatting), but only if the expression is True/non-None.
For example in lots of places I end up doing something like the following:
result += '%s '%( <complexExpressionForGettingX> ) if <complexExpressionForGettingX> else ''
... which I guess is basically a special-case of the more general problem of wanting to return some function of an expression, but only if that expression is True, i.e.:
f( e() ) if e() else somedefault
but without re-typing the expression (or re-evaluating it, in case it's a costly function call).
Obviously the required logic can be achieved easily enough in various long-winded ways (e.g. by splitting the expression into multiple statements and assigning the expression to a temporary variable), but that's a bit grungy and since this seems like quite a generic problem, and since python is pretty cool (especially for functional stuff) I wondered if there's a nice, elegant, concise way to do it?
My current best options are either defining a short-lived lambda to take care of it (better than multiple statements, but a bit hard to read):
(lambda e: '%s ' % e if e else '')( <complexExpressionForGettingX> )
or writing my own utility function like:
def conditional(expr, formatStringIfTrue, default='')
... but since I'm doing this in lots of different code-bases I'd much rather use a built-in library function or some clever python syntax if such a thing exists
I like one-liners, definitely. But sometimes they are the wrong solution.
In professional software development, if the team size is > 2, you spent more time on understanding code someone else wrote than on writing new code. The one-liners presented here are definitely confusing, so just do two lines (even though you mentioned multiple statements in your post):
X = <complexExpressionForGettingX>
result += '%s '% X if X else ''
This is clear, concise, and everybody immediately understands what's going on here.
Python doesn't have expression scope (Is there a Python equivalent of the Haskell 'let'), presumably because the abuses and confusion of the syntax outweigh the advantages.
If you absolutely have to use an expression scope, the least worst option is to abuse a generator comprehension:
result += next('%s '%(e) if e else '' for e in (<complexExpressionForGettingX>,))
You could define a conditional formatting function once, and use it repeatedly:
def cond_format(expr, form, alt):
if expr:
return form % expr
else:
return alt
Usage:
result += cond_format(<costly_expression>, '%s ', '')
After hearing the responses (thanks guys!) I'm now convinced there's no way to achieve what I want in Python without defining a new function (or lambda function) since that's the only way to introduce a new scope.
For best clarity I decided this needed to be implemented as a reusable function (not lambda) so for the benefit of others, I thought I'd share the function I finally came up with - which is flexible enough to cope with multiple additional format string arguments (in addition to the main argument used to decide whether it's to do the formatting at all); it also comes with pythondoc to show correctness and illustrate usage (if you're not sure how the **kwargs thing works just ignore it, it's just an implementation detail and was the only way I could see to implement an optional defaultValue= kwarg following the variable list of format string arguments).
def condFormat(formatIfTrue, expr, *otherFormatArgs, **kwargs):
""" Helper for creating returning the result of string.format() on a
specified expression if the expressions's bool(expr) is True
(i.e. it's not None, an empty list or an empty string or the number zero),
or return a default string (typically '') if not.
For more complicated cases where the operation on expr is more complicated
than a format string, or where a different condition is required, use:
(lambda e=myexpr: '' if not e else '%s ' % e)
formatIfTrue -- a format string suitable for use with string.format(), e.g.
"{}, {}" or "{1}, {0:d}".
expr -- the expression to evaluate. May be of any type.
defaultValue -- set this keyword arg to override
>>> 'x' + condFormat(', {}.', 'foobar')
'x, foobar.'
>>> 'x' + condFormat(', {}.', [])
'x'
>>> condFormat('{}; {}', 123, 456, defaultValue=None)
'123; 456'
>>> condFormat('{0:,d}; {2:d}; {1:d}', 12345, 678, 9, defaultValue=None)
'12,345; 9; 678'
>>> condFormat('{}; {}; {}', 0, 678, 9, defaultValue=None) == None
True
"""
defaultValue = kwargs.pop('defaultValue','')
assert not kwargs, 'unexpected kwargs: %s'%kwargs
if not bool(expr): return defaultValue
if otherFormatArgs:
return formatIfTrue.format( *((expr,)+otherFormatArgs) )
else:
return formatIfTrue.format(expr)
Presumably, you want to do this repeatedly to build up a string. With a more global view, you might find that filter (or itertools.ifilter) does what you want to the collection of values.
You'll wind up with something like this:
' '.join(map(str, filter(None, <iterable of <complexExpressionForGettingX>>)))
Using None as the first argument for filter indicates to accept any true value. As a concrete example with a simple expression:
>>> ' '.join(map(str, filter(None, range(-3, 3))))
'-3 -2 -1 1 2'
Depending on how you're calculating the values, it may be that an equivalent list or generator comprehension would be more readable.
Is it possible to convert a string to an operator in python?
I would like to pass a condition to a function
Ideally it would look like this:
def foo(self, attribute, operator_string, right_value):
left_value = getattr(self, attribute)
if left_value get_operator(operator_string) right_value:
return True
else:
return False
bar.x = 10
bar.foo('x', '>', 10)
[out] False
bar.foo('x', '>=', 10)
[out] True
I could make a dictionary where keys are strings and values are functions of the operator module.
I would have to change foo definition slightly:
operator_dict = {'>', operator.lt,
'>=', operator.le}
def foo(self, attribute, operator_string, right_value):
left_value = getattr(self, attribute)
operator_func = operator_dict[operator_string]
if operator_func(left_value, right_value):
return True
else:
return False
This means I have to make this dictionary, but is it really necessary?
You can use eval to dynamically build a piece of Python code and execute it, but apart from that there are no real alternatives. The dictionary-based solution is much more elegant and safe, however.
Apart from that, is it really that bad? Why not shorten it a bit …
return operator_dict[operator_string](left_value, right_value)
The way the problem is specified I don't see why you can't pass operator.le to the function instead of ">=".
If this operator_string coming from a database or file or something or are you passing it around in your code?
bar.foo('x', operator.le , 10)
Are you just looking to have a convenient shorthand? Then you might do something like:
from operator import le
bar.foo('x', le, 10)
If the real problem here is that you have code or business rules coming in from a database or datafile then maybe you actually need to look at writing a little parser that will map your input into these objects and then you could take a look at using a library like pyparsing, ply, codetalker, etc.
#This is very simple to do with eval()
score=1
trigger_conditon=">="
trigger_value=4
eval(f"{score}{trigger_conditon}{trigger_value}")
#luckily fstring also takes care of int/float or relavaent datatype
operator_str="ge"
import operator
eval(f"operator.{operator_str}({score},{trigger_value})")