I'm using a lambda function to extract the number in a string:
text = "some text with a number: 31"
get_number = lambda info,pattern: re.search('{}\s*(\d)'.format(pattern),info.lower()).group(1) if re.search('{}\s*(\d)'.format(pattern),info.lower()) else None
get_number(text,'number:')
How can I avoid to make this operation twice?:
re.search('{}\s*(\d)'.format(pattern),info.lower()
You can use findall() instead, it handles a no match gracefully. or is the only statement needed to satisfy the return conditions. The None is evaluated last, thus returned if an empty list is found (implicit truthiness of literals like lists).
>>> get_number = lambda info,pattern: re.findall('{}\s*(\d)'.format(pattern),info.lower()) or None
>>> print get_number(text, 'number:')
['3']
>>> print get_number(text, 'Hello World!')
>>>
That being said, I'd recommend defining a regular named function using def instead. You can extract more complex parts of this code to variables, leading to an easier to follow algorithm. Writing long anonymous function can lead to code smells. Something similar to below:
def get_number(source_text, pattern):
regex = '{}\s*(\d)'.format(pattern)
matches = re.findall(regex, source_text.lower())
return matches or None
This is super ugly, not going to lie, but it does work and avoids returning a match object if it's found, but does return None when it's not:
lambda info,pattern: max(re.findall('{}\s*(\d)'.format(pattern),info.lower()),[None],key=lambda x: x != [])[0]
Related
In Python, I'd like to test for the existence of a keyword in the output of a Linux command. The keywords to test for would be passed as a list as shown below. I've not spent a lot of time with Python so brute-force approach is below. Is there a cleaner way to write this?
def test_result (result, mykeys):
hit = 0
for keyword in mykeys:
if keyword in result:
hit = 1
print "found a match for " + keyword
if hit == 1:
return True
result = "grep says awk"
mykeys = ['sed', 'foo', 'awk']
result = test_result (result, mykeys)
The any built-in will do it.
def test_result(result, mykeys):
return any(key in result for key in mykeys)
You can use a regular expression to accomplish this. A regular expression of the form a|b|c matches any of a, b or c. So, you'd want something of the form:
import re
p = re.compile('|'.join(mykeys))
return bool(p.search(result))
p.search(result) searches the entire string for a match of the regular expression; it returns a match (which is truth-y) if present and returns None (which is false-y) otherwise. Converting the result to bool gives True if it matches and False otherwise.
Putting this together, you'd have:
import re
def test_result(result, mykeys):
p = re.compile('|'.join(mykeys))
return bool(p.search(result))
You can also make this more concise by not pre-compiling the regular expression; this should be fine if it's a one-time use:
def test_result(result, mykeys):
return bool(re.search('|'.join(mykeys), result))
For reference, read about Python's re library.
Your function does two things, printing and returning the result. You could break them up like so:
def test_result(result, mykeys):
return [k in result for k in mykeys]
def print_results(results):
for result in results:
print("found a match for " + result)
test_result will return a list with all the found keys, or an empty list. The empty list is falsey, so you can use it for whatever tests you want. The print_results is only needed if you actually want to print, otherwise you can use the result in some other function.
If you only want to check for the presence and don't care about which key you found, you can do something like:
def test_result(result, my_keys):
return any(map(lambda k: k in result, mykeys))
If you're using python3 (as you should be), I believe this will be lazy and only evaluate as much of the list as necessary.
See A more concise way to write this Python code for a more concise version of this last function.
To search for an element in a list, you can use a for-else statement. In particular, this allows to return the found element.
def test_result (result, mykeys):
for keyword in mykeys:
if keyword in result: break
else:
return None
return keyword
print(test_result("grep says awk", ['sed', 'foo', 'awk'])) # 'awk'
print(test_result("grep says awk", ['bar', 'foo'])) # None
I want to write a simple function that recognizes palindromes:
>>> def palindrome(s):
return s == s[::-1]
It works fine but it is case sensitive and to fix that I could do:
>>> def palindrome(s):
return s.lower() == s[::-1].lower()
>>> palindrome('Aba')
True
but I figure it's not very elegant. I tried to lowercase the input by using lambda expressions but I am doing something wrong and don't know how to fix it:
>>> def palindrome(lambda s : s.lower()):
return s == s[::-1]
SyntaxError: invalid syntax
You cannot use a lambda expression to describe actions that should be performed on input parameters (you can however use lambda to define a default value). You can do two things:
Define a function in the head of the function:
def palindrome(s):
s = s.lower()
return s == s[::-1]
Use a decorator:
def caseinsensitive(f):
def helper(s):
s = s.lower()
return f(s)
return helper
and then define your palindrome as:
#caseinsensitive
def palindrome(s):
return s == s[::-1]
Here you can reuse the #caseinsensitive to define all functions that do this as a first step.
Just call lower once, reassign s to the value and forget the lambda:
def palindrome(s):
s = s.lower()
return s == s[::-1]
This isn't really idiomatic python, but what you're looking for is something like this:
def palindrome(s):
return (lambda x: x == x[::-1])(s.lower())
That is, you define a lambda function and immediately invoke it, binding s.lower() to x.
def palindrome(s):
s = s.lower()
return s == s[::-1]
This is pretty straightforward and easy to use and understand answer, which is 100% correct and good.
BUT if you want to use lambda expression you must think how and what and why and stuff so let's go into the magical world of FUNCTIONAL PROGRAMMING.
If you don't know what a lambda expression is, basically when you type in the word lambda it specifies that you will later on give it some value for instance typing lambda a means you will supply it with 1 value (argument), typing lambda a, b explicitly means you will suppliy it with 2 values (arguments). So now that this whole thing of "what does even this lambda word mean" is done let's go deeper into the magical world of FUNCTIONAL PROGRAMMING.
So now when you tell python that it will have to wait some time (or maybe no time at all) for that value so it can do some magic on it, you can tell it what to do with it for instance
some_var = lambda some_string: some_string.lower()
So now this means that it's going to get some value, we expect it to be some sort of string and we can and will hold it in some_var for reasons only PHP programmers and us (me) know.
Next up is really straight forward we just return the check whether it is or not a palindrome
return some_var == some_var[::-1]
Let's get some glue and build this lambda beast from the things we have earlier
def palindrome():
some_var = lambda some_string : some_string.lower()
return some_var == some_var[::-1]
As you can see we no longer need to declare that we use some puny s in the method, hence we just press DEL and we can go along into the beatiful world of FUNCTIONAL PROGRAMMING.
So let's try to call this function, but the question raises how to do it?
palindrome("superpalindrome") == False
It does not compile though, because it thinks we are trying to give the palindrome method some kind of an argument while the definition has none at all. So the correct call of the function should be
palindrome()("superpalindrome") == False
In short, this is just magic, lambda expressions are actually in most cases worse in case of time usage, so you should stick to doing stuff in a OOP way or even else pythonic way. If you want to use lambda expressions you should try switching to Haskell(which I strongly advise) or Scala. If you have any further questions, feel free to ask me, I love talking about Haskell. Or FUNCTIONAL PROGRAMMING.
Full answer that is even more simplified
def palindrome():
return lambda some_str : some_str.lower() == some_str.lower()[::-1]
method = palindrome()
print(method("cococ"))
Maybe you wanted this:
(lambda lstr : lstr == lstr[::-1])((lambda x : x.lower())('abA'))
I love using the expression
if 'MICHAEL89' in USERNAMES:
...
where USERNAMES is a list.
Is there any way to match items with case insensitivity or do I need to use a custom method? Just wondering if there is a need to write extra code for this.
username = 'MICHAEL89'
if username.upper() in (name.upper() for name in USERNAMES):
...
Alternatively:
if username.upper() in map(str.upper, USERNAMES):
...
Or, yes, you can make a custom method.
str.casefold is recommended for case-insensitive string matching. #nmichaels's solution can trivially be adapted.
Use either:
if 'MICHAEL89'.casefold() in (name.casefold() for name in USERNAMES):
Or:
if 'MICHAEL89'.casefold() in map(str.casefold, USERNAMES):
As per the docs:
Casefolding is similar to lowercasing but more aggressive because it
is intended to remove all case distinctions in a string. For example,
the German lowercase letter 'ß' is equivalent to "ss". Since it is
already lowercase, lower() would do nothing to 'ß'; casefold()
converts it to "ss".
I would make a wrapper so you can be non-invasive. Minimally, for example...:
class CaseInsensitively(object):
def __init__(self, s):
self.__s = s.lower()
def __hash__(self):
return hash(self.__s)
def __eq__(self, other):
# ensure proper comparison between instances of this class
try:
other = other.__s
except (TypeError, AttributeError):
try:
other = other.lower()
except:
pass
return self.__s == other
Now, if CaseInsensitively('MICHAEL89') in whatever: should behave as required (whether the right-hand side is a list, dict, or set). (It may require more effort to achieve similar results for string inclusion, avoid warnings in some cases involving unicode, etc).
Usually (in oop at least) you shape your object to behave the way you want. name in USERNAMES is not case insensitive, so USERNAMES needs to change:
class NameList(object):
def __init__(self, names):
self.names = names
def __contains__(self, name): # implements `in`
return name.lower() in (n.lower() for n in self.names)
def add(self, name):
self.names.append(name)
# now this works
usernames = NameList(USERNAMES)
print someone in usernames
The great thing about this is that it opens the path for many improvements, without having to change any code outside the class. For example, you could change the self.names to a set for faster lookups, or compute the (n.lower() for n in self.names) only once and store it on the class and so on ...
Here's one way:
if string1.lower() in string2.lower():
...
For this to work, both string1 and string2 objects must be of type string.
I think you have to write some extra code. For example:
if 'MICHAEL89' in map(lambda name: name.upper(), USERNAMES):
...
In this case we are forming a new list with all entries in USERNAMES converted to upper case and then comparing against this new list.
Update
As #viraptor says, it is even better to use a generator instead of map. See #Nathon's answer.
You could do
matcher = re.compile('MICHAEL89', re.IGNORECASE)
filter(matcher.match, USERNAMES)
Update: played around a bit and am thinking you could get a better short-circuit type approach using
matcher = re.compile('MICHAEL89', re.IGNORECASE)
if any( ifilter( matcher.match, USERNAMES ) ):
#your code here
The ifilter function is from itertools, one of my favorite modules within Python. It's faster than a generator but only creates the next item of the list when called upon.
To have it in one line, this is what I did:
if any(([True if 'MICHAEL89' in username.upper() else False for username in USERNAMES])):
print('username exists in list')
I didn't test it time-wise though. I am not sure how fast/efficient it is.
Example from this tutorial:
list1 = ["Apple", "Lenovo", "HP", "Samsung", "ASUS"]
s = "lenovo"
s_lower = s.lower()
res = s_lower in (string.lower() for string in list1)
print(res)
My 5 (wrong) cents
'a' in "".join(['A']).lower()
UPDATE
Ouch, totally agree #jpp, I'll keep as an example of bad practice :(
I needed this for a dictionary instead of list, Jochen solution was the most elegant for that case so I modded it a bit:
class CaseInsensitiveDict(dict):
''' requests special dicts are case insensitive when using the in operator,
this implements a similar behaviour'''
def __contains__(self, name): # implements `in`
return name.casefold() in (n.casefold() for n in self.keys())
now you can convert a dictionary like so USERNAMESDICT = CaseInsensitiveDict(USERNAMESDICT) and use if 'MICHAEL89' in USERNAMESDICT:
Thought exercise: What is the "best" way to write a Python function that takes a regex pattern or a string to match exactly:
import re
strings = [...]
def do_search(matcher):
"""
Returns strings matching matcher, which can be either a string
(for exact match) or a compiled regular expression object
(for more complex matches).
"""
if not is_a_regex_pattern(matcher):
matcher = re.compile('%s$' % re.escape(matcher))
for s in strings:
if matcher.match(s):
yield s
So, ideas for the implementation of is_a_regex_pattern()?
You can access the _sre.SRE_Pattern type via re._pattern_type:
if not isinstance(matcher, re._pattern_type):
matcher = re.compile('%s$' % re.escape(matcher))
Below is a demonstration:
>>> import re
>>> re._pattern_type
<class '_sre.SRE_Pattern'>
>>> isinstance(re.compile('abc'), re._pattern_type)
True
>>>
Or, make it quack:
try:
does_match = matcher.match(s)
except AttributeError:
does_match = re.match(matcher.s)
if does_match:
yield s
In other words, treat matcher as if it already were a compiled regular expression. And if that breaks, then treat it like a string that needs to be compiled.
This is called Duck Typing. Not everyone agrees that exceptions should be used like this for routine contingencies. This is the ask-permission versus ask-forgiveness debate. Python is more amenable to forgiveness than most languages.
Not a string:
def is_a_regex_pattern(s):
return not isinstance(s, basestring)
Is a _sre.SRE_Pattern (though that's not importable, so use a gross string match):
def is_a_regex_pattern(s):
return s.__class__.__name__ == 'SRE_Pattern'
You can re-compile a SRE_Pattern and it seems to evaluate the same.
def is_a_regex_pattern(s):
return s == re.compile(s)
You could test, if matcher has an method match:
import re
def do_search(matcher, strings):
"""
Returns strings matching matcher, which can be either a string
(for exact match) or a compiled regular expression object
(for more complex matches).
"""
if hasattr(matcher, 'match'):
test = matcher.match
else:
test = lambda s: matcher==s
for s in strings:
if test(s):
yield s
You should not use global variables, but use a second parameter.
On Python 3.7, re._pattern_type was renamed to re.Pattern
https://stackoverflow.com/a/27366172/895245 therefore broke at that point, as re._pattern_type is not defined.
While re.Pattern looks nicer and will therefore hopefully be more stable, it is not mentioned at all in the docs: https://docs.python.org/3/library/re.html#regular-expression-objects so maybe it is not a good idea to rely on it.
https://stackoverflow.com/a/46779329/895245 does make some sense. But what is someday the str class adds a .match method and it does something completely different? :-) Ah, the joys of typeless languages.
So I think I'm going with:
import re
_takes_s_or_re_type = type(re.compile(''))
def takes_s_or_re(s_or_re):
if isinstance(s_or_re, _takes_s_or_re_type):
return 0
else:
return 1
assert takes_s_or_re(re.compile('a.c')) == 0
assert takes_s_or_re('a.c') == 1
as this can only break when a public API breaks.
Tested on Python 3.8.0.
I find that in lots of different projects I'm writing a lot of code where I need to evaluate a (moderately complex, possibly costly-to-evaluate) expression and then do something with it (e.g. use it for string formatting), but only if the expression is True/non-None.
For example in lots of places I end up doing something like the following:
result += '%s '%( <complexExpressionForGettingX> ) if <complexExpressionForGettingX> else ''
... which I guess is basically a special-case of the more general problem of wanting to return some function of an expression, but only if that expression is True, i.e.:
f( e() ) if e() else somedefault
but without re-typing the expression (or re-evaluating it, in case it's a costly function call).
Obviously the required logic can be achieved easily enough in various long-winded ways (e.g. by splitting the expression into multiple statements and assigning the expression to a temporary variable), but that's a bit grungy and since this seems like quite a generic problem, and since python is pretty cool (especially for functional stuff) I wondered if there's a nice, elegant, concise way to do it?
My current best options are either defining a short-lived lambda to take care of it (better than multiple statements, but a bit hard to read):
(lambda e: '%s ' % e if e else '')( <complexExpressionForGettingX> )
or writing my own utility function like:
def conditional(expr, formatStringIfTrue, default='')
... but since I'm doing this in lots of different code-bases I'd much rather use a built-in library function or some clever python syntax if such a thing exists
I like one-liners, definitely. But sometimes they are the wrong solution.
In professional software development, if the team size is > 2, you spent more time on understanding code someone else wrote than on writing new code. The one-liners presented here are definitely confusing, so just do two lines (even though you mentioned multiple statements in your post):
X = <complexExpressionForGettingX>
result += '%s '% X if X else ''
This is clear, concise, and everybody immediately understands what's going on here.
Python doesn't have expression scope (Is there a Python equivalent of the Haskell 'let'), presumably because the abuses and confusion of the syntax outweigh the advantages.
If you absolutely have to use an expression scope, the least worst option is to abuse a generator comprehension:
result += next('%s '%(e) if e else '' for e in (<complexExpressionForGettingX>,))
You could define a conditional formatting function once, and use it repeatedly:
def cond_format(expr, form, alt):
if expr:
return form % expr
else:
return alt
Usage:
result += cond_format(<costly_expression>, '%s ', '')
After hearing the responses (thanks guys!) I'm now convinced there's no way to achieve what I want in Python without defining a new function (or lambda function) since that's the only way to introduce a new scope.
For best clarity I decided this needed to be implemented as a reusable function (not lambda) so for the benefit of others, I thought I'd share the function I finally came up with - which is flexible enough to cope with multiple additional format string arguments (in addition to the main argument used to decide whether it's to do the formatting at all); it also comes with pythondoc to show correctness and illustrate usage (if you're not sure how the **kwargs thing works just ignore it, it's just an implementation detail and was the only way I could see to implement an optional defaultValue= kwarg following the variable list of format string arguments).
def condFormat(formatIfTrue, expr, *otherFormatArgs, **kwargs):
""" Helper for creating returning the result of string.format() on a
specified expression if the expressions's bool(expr) is True
(i.e. it's not None, an empty list or an empty string or the number zero),
or return a default string (typically '') if not.
For more complicated cases where the operation on expr is more complicated
than a format string, or where a different condition is required, use:
(lambda e=myexpr: '' if not e else '%s ' % e)
formatIfTrue -- a format string suitable for use with string.format(), e.g.
"{}, {}" or "{1}, {0:d}".
expr -- the expression to evaluate. May be of any type.
defaultValue -- set this keyword arg to override
>>> 'x' + condFormat(', {}.', 'foobar')
'x, foobar.'
>>> 'x' + condFormat(', {}.', [])
'x'
>>> condFormat('{}; {}', 123, 456, defaultValue=None)
'123; 456'
>>> condFormat('{0:,d}; {2:d}; {1:d}', 12345, 678, 9, defaultValue=None)
'12,345; 9; 678'
>>> condFormat('{}; {}; {}', 0, 678, 9, defaultValue=None) == None
True
"""
defaultValue = kwargs.pop('defaultValue','')
assert not kwargs, 'unexpected kwargs: %s'%kwargs
if not bool(expr): return defaultValue
if otherFormatArgs:
return formatIfTrue.format( *((expr,)+otherFormatArgs) )
else:
return formatIfTrue.format(expr)
Presumably, you want to do this repeatedly to build up a string. With a more global view, you might find that filter (or itertools.ifilter) does what you want to the collection of values.
You'll wind up with something like this:
' '.join(map(str, filter(None, <iterable of <complexExpressionForGettingX>>)))
Using None as the first argument for filter indicates to accept any true value. As a concrete example with a simple expression:
>>> ' '.join(map(str, filter(None, range(-3, 3))))
'-3 -2 -1 1 2'
Depending on how you're calculating the values, it may be that an equivalent list or generator comprehension would be more readable.