How to use re match objects in a list comprehension - python

I have a function to pick out lumps from a list of strings and return them as another list:
def filterPick(lines,regex):
result = []
for l in lines:
match = re.search(regex,l)
if match:
result += [match.group(1)]
return result
Is there a way to reformulate this as a list comprehension? Obviously it's fairly clear as is; just curious.
Thanks to those who contributed, special mention for #Alex. Here's a condensed version of what I ended up with; the regex match method is passed to filterPick as a "pre-hoisted" parameter:
import re
def filterPick(list,filter):
return [ ( l, m.group(1) ) for l in list for m in (filter(l),) if m]
theList = ["foo", "bar", "baz", "qurx", "bother"]
searchRegex = re.compile('(a|r$)').search
x = filterPick(theList,searchRegex)
>> [('bar', 'a'), ('baz', 'a'), ('bother', 'r')]

[m.group(1) for l in lines for m in [regex.search(l)] if m]
The "trick" is the for m in [regex.search(l)] part -- that's how you "assign" a value that you need to use more than once, within a list comprehension -- add just such a clause, where the object "iterates" over a single-item list containing the one value you want to "assign" to it. Some consider this stylistically dubious, but I find it practical sometimes.

return [m.group(1) for m in (re.search(regex, l) for l in lines) if m]

It could be shortened a little
def filterPick(lines, regex):
matches = map(re.compile(regex).match, lines)
return [m.group(1) for m in matches if m]
You could put it all in one line, but that would mean you would have to match every line twice which would be a bit less efficient.

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), it's possible to use a local variable within a list comprehension in order to avoid calling multiple times the same expression:
# items = ["foo", "bar", "baz", "qurx", "bother"]
[(x, match.group(1)) for x in items if (match := re.compile('(a|r$)').search(x))]
# [('bar', 'a'), ('baz', 'a'), ('bother', 'r')]
This:
Names the evaluation of re.compile('(a|r$)').search(x) as a variable match (which is either None or a Match object)
Uses this match named expression in place (either None or a Match) to filter out non matching elements
And re-uses match in the mapped value by extracting the first group (match.group(1)).

>>> "a" in "a visit to the dentist"
True
>>> "a" not in "a visit to the dentist"
False
That also works with a search query you're hunting down in a list
`P='a', 'b', 'c'
'b' in P` returns true

Related

list comprehension without if but with else

My question aims to use the else condition of a for-loop in a list comprehension.
example:
empty_list = []
def example_func(text):
for a in text.split():
for b in a.split(","):
empty_list.append(b)
else:
empty_list.append(" ")
I would like to make it cleaner by using a list comprehension with both for-loops.
But how can I do this by including an escape-clause for one of the loops (in this case the 2nd).
I know I can use if with and without else in a list comprehension. But how about using else without an if statement.
Is there a way, so the interpreter will understand it as escape-clause of a for loop?
Any help is much appreciated!
EDIT:
Thanks for the answers! In fact im trying to translate morse code.
The input is a string, containing morse codes.
Each word is separated by 3 spaces. Each letter of each word is separated by 1 space.
def decoder(code):
str_list = []
for i in code.split(" "):
for e in i.split():
str_list.append(morse_code_dic[e])
else:
str_list.append(" ")
return "".join(str_list[:-1]).capitalize()
print(decoder(".. - .-- .- ... .- --. --- --- -.. -.. .- -.--"))
I want to break down the whole sentence into words, then translate each word.
After the inner loop (translation of one word) is finished, it will launch its escape-clause else, adding a space, so that the structure of the whole sentence will be preserved. That way, the 3 Spaces will be translated to one space.
As noted in comments, that else does not really make all that much sense, since the purpose of an else after a for loop is actually to hold code for conditional execution if the loop terminates normally (i.e. not via break), which your loop always does, thus it is always executed.
So this is not really an answer to the question how to do that in a list comprehension, but more of an alternative. Instead of adding spaces after all words, then removing the last space and joining everything together, you could just use two nested join generator expressions, one for the sentence and one for the words:
def decoder(code):
return " ".join("".join(morse_code_dic[e] for e in i.split())
for i in code.split(" ")).capitalize()
As mentioned in the comments, the else clause in your particular example is pointless because it always runs. Let's contrive an example that would let us investigate the possibility of simulating a break and else.
Take the following string:
s = 'a,b,c b,c,d c,d,e, d,e,f'
Let's say you wanted to split the string by spaces and commas as before, but you only wanted to preserve the elements of the inner split up to the first occurrence of c:
out = []
for i in s.split():
for e in i.split(','):
if e == 'c':
break
out.append(e)
else:
out.append('-')
The break can be simulated using the arcane two-arg form of iter, which accepts a callable and a termination value:
>>> x = list('abcd')
>>> list(iter(iter(x).__next__, 'c'))
['a', 'b']
You can implement the else by chaining the inner iterable with ['-'].
>>> from itertools import chain
>>> x = list('abcd')
>>> list(iter(chain(x, ['-'])
.__next__, 'c'))
['a', 'b']
>>> y = list('def')
>>> list(iter(chain(y, ['-'])
.__next__, 'c'))
['d', 'e', 'f', '-']
Notice that the placement of chain is crucial here. If you were to chain the dash to the outer iterator, it would always be appended, not only when c is not encountered:
>>> list(chain(iter(iter(x).__next__, 'c'), ['-']))
['a', 'b', '-']
You can now simulate the entire nested loop with a single expression:
from itertools import chain
out = [e for i in s.split() for e in iter(chain(i.split(','), ['-']).__next__, 'c')]

List comprehension with regex

I am learning from Jacob Perkins's book.I do not understand this example
import re
replacement_patterns = [
(r'won\'t', 'will not'),
(r'can\'t', 'cannot'),
(r'i\'m', 'i am'),
(r'ain\'t', 'is not'),
(r'(\w+)\'ll', '\g<1> will'),
(r'(\w+)n\'t', '\g<1> not'),
(r'(\w+)\'ve', '\g<1> have'),
(r'(\w+)\'s', '\g<1> is'),
(r'(\w+)\'re', '\g<1> are'),
(r'(\w+)\'d', '\g<1> would')
]
Now we have
class RegexpReplacer(object):
def __init__(self, patterns=replacement_patterns):
self.patterns = [(re.compile(regex), repl) for (regex, repl) in patterns]
What does this list comprehension serve for?What does repl stands for?
repl stands for replacement. It is just a variable name; repl has no special meaning.
The (incomplete) code you have provided is presumably going to make a bunch of replacements on a given string. It will replace won't with will not; can't with cannot; i'm with i am; etc.
The more complex replacements, such as (\w+)'d --> \g<1> would are using back-references to capture part of the matched pattern, for use in the replacement.
The code: (re.compile(regex), repl) for (regex, repl) in patterns is using list-comprehension to compile the regular expressions.
repl is just a variable referring to the 2nd part of the tuple so lets say you have a list with [(1, 2), (3, 4)] and you want to create a list-comprehension to make a new list by adding 1 to the 2nd number in each tuple, you would do something like:
[(x, y+1) for (x, y) in lst]
I qoute:
Python supports a concept called "list comprehensions". It can be used to construct lists in a very natural, easy way, like a mathematician is used to do.
A list comprehension can be with a condition. List comprehensions can have multiple conditions.
The general format for a list comprehension with a if condition is this,
[<expression> for <value> in <iterable> if <condition>]
You can also have an if..else in the comprehension
[<expression> if <condition> else <expression> for <value> in <iterable> ]
NOTE: Your iterable can be list,tuple,set,string,...etc
To make things clear consider this simple example,
>>> v = [1,2,3,4]
>>> v
[1, 2, 3, 4]
v and x are two lists.
>>> x = [1,2]
>>> x
[1, 2]
Now suddenly you decide I want a list new_list which has items from v but not in x. Hmmm... How to do that? Take a look below.
>>> new_list = [item for item in v if item not in x]
>>> x
[3, 4]
Notice how I've used item. I just created that inside the list comprehension. Similarly repl just a variable name. Meaning **Replacement_string**
Why I told all that? You'll get that in a moment.
And now we come to re
pattern= r'won\'t' #can also be r"won't" \ just to escape the ' (single quotes)
# then, much later in your code you can do
m = re.match(pattern, input)
#Look how I'm using the pattern
But re.compile()
pattern = re.compile(r'won\'t')
# then, later in your code
m = pattern.match(input)
You see here we compile the regex pattern and then find a match. In the former we are just giving it as a parameter to re.match().
Note:
def __init__(self, patterns=replacement_patterns):
replacement_patterns --> patterns
(Now patterns and replacement_patters both are aliases to your list of tuples)
Both does the same however the, So coming to your confusion,
[(re.compile(regex), repl) for (regex, repl) in patterns]
This list comprehension gets all tuples from your list of tuples known as ? patterns
Initially:
(regex, repl)-->(r'won\'t', 'will not')
and so on for every tuple items. And this is converted to:
(r'won\'t', 'will not') --> (re.compile(r'won\'t'),'will not')
So basically your list comprehension converts the
tuple(pattern,replacement_string) to tuple(compiled_re,replacement_string)
Reference:
https://www.python.org/dev/peps/pep-0202/
https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions
To help understand:
test = [('a', 1), ('b', 2), ('c', 3)]
for item in test:
print item
for key, index in test:
print key, index
print [key + str(index) for key, index in test]

Python: matching values from one list to the sequences of values in another list

My original question was asked and answered here: Python: matching values from one list to the sequence of values in another list
I have two lists.
e_list = [('edward', '1.2.3.4.'), ('jane','1.2.3.4.'), ('jackie', '2.3.4.10.')...]
and a_list (the main list to be checked against)
a_list = [('a', '1.2.3.'), ('b', '2.3.'), ('c', '2.3.4.')...]
I'd like a modified output to my original question. I'm unsure of how I'd change the code snippet to solve the same question but to output all possibilities?
e.g.
new_list = [ ('edward', '1.2.3.4', '1.2.3'), ('jane', '1.2.3.4.', '1.2.3'), ('jackie', '2.3.4.10.', '2.3.'), ('jackie', '2.3.4.10.', '2.3.4')]
You need to loop over everything in a_list then deal with the extra case of adding the value in e_list. It ends up looking something like this:
results = []
for name, x in e_list:
this_name = [name, x]
for a, b in a_list:
if x.startswith(b):
this_name.append(b)
results.append(tuple(this_name))
print(results)
see this in action here: http://ideone.com/y8uAvC
You can use list comprehension if you want:
res = [(name, digits, match) for name, digits in e_list
for _, match in a_list
if digits.startswith(match)]
print res
But since it gets complicated, nested loops may be cleaner. You can use yield to get final list:
def get_res():
for name, digits in e_list:
for _, match in a_list:
if digits.startswith(match):
yield name, digits, match
print list(get_res())

Filter strings where there are n equal characters in a row

Is there an option how to filter those strings from list of strings which contains for example 3 equal characters in a row? I created a method which can do that but I'm curious whether is there a more pythonic way or more efficient or more simple way to do that.
list_of_strings = []
def check_3_in_row(string):
for ch in set(string):
if ch*3 in string:
return True
return False
new_list = [x for x in list_of_strings if check_3_in_row(x)]
EDIT:
I've just found out one solution:
new_list = [x for x in set(keywords) if any(ch*3 in x for ch in x)]
But I'm not sure which way is faster - regexp or this.
You can use Regular Expression, like this
>>> list_of_strings = ["aaa", "dasdas", "aaafff", "afff", "abbbc"]
>>> [x for x in list_of_strings if re.search(r'(.)\1{2}', x)]
['aaa', 'aaafff', 'afff', 'abbbc']
Here, . matches any character and it is captured in a group ((.)). And we check if the same captured character (we use the backreference \1 refer the first captured group in the string) appears two more times ({2} means two times).

sorting a list in python

My aim is to sort a list of strings where words have to be sorted alphabetically.Except words starting with "s" should be at the start of the list (they should be sorted as well), followed by the other words.
The below function does that for me.
def mysort(words):
mylist1 = sorted([i for i in words if i[:1] == "s"])
mylist2 = sorted([i for i in words if i[:1] != "s"])
list = mylist1 + mylist2
return list
I am just looking for alternative approaches to achieve this or if anyone can find any issues with the code above.
You could do it in one line, with:
sorted(words, key=lambda x: 'a' + x if x.startswith('s') else 'b' + x)
The sorted() function takes a keyword argument key, which is used to translate the values in the list before comparisons are done.
For example:
sorted(words, key=str.lower)
# Will do a sort that ignores the case, since instead
# of checking 'A' vs. 'b' it will check str.lower('A')
# vs. str.lower('b').
sorted(intlist, key=abs)
# Will sort a list of integers by magnitude, regardless
# of whether they're negative or positive:
# >>> sorted([-5,2,1,-8], key=abs)
# [1, 2, -5, -8]
The trick I used translated strings like this when doing the sorting:
"hello" => "bhello"
"steve" => "asteve"
And so "steve" would come before "hello" in the comparisons, since the comparisons are done with the a/b prefix.
Note that this only affects the keys used for comparisons, not the data items that come out of the sort.
1 . You can use generator expression inside sorted.
2 . You can use str.startswith.
3 . Don't use list as a variable name.
4 . Use key=str.lower in sorted.
mylist1 = sorted((i for i in words if i.startswith(("s","S"))),key=str.lower)
mylist2 = sorted((i for i in words if not i.startswith(("s","S"))),key=str.lower)
return mylist1 + mylist2
why str.lower?
>>> "abc" > "BCD"
True
>>> "abc" > "BCD".lower() #fair comparison
False
>>> l = ['z', 'a', 'b', 's', 'sa', 'sb', '', 'sz']
>>> sorted(l, key=lambda x:(x[0].replace('s','\x01').replace('S','\x01') if x else '') + x[1:])
['', 's', 'sa', 'sb', 'sz', 'a', 'b', 'z']
This key function replaces, for the purpose of sorting, every value starting with S or s with a \x01 which sorts before everything else.
One the lines of Integer answer I like using a tuple slightly better because is cleaner and also more general (works for arbitrary elements, not just strings):
sorted(key=lambda x : ((1 if x[:1] in ("S", "s") else 2), x))
Explanation:
The key parameter allows sorting an array based on the values of f(item) instead of on the values of item where f is an arbitray function.
In this case the function is anonymous (lambda) and returns a tuple where the first element is the "group" you want your element to end up in (e.g. 1 if the string starts with an "s" and 2 otherwise).
Using a tuple works because tuple comparison is lexicographical on the elements and therefore in the sorting the group code will weight more than the element.

Categories