List comprehension with regex

List comprehension with regex - python

I am learning from Jacob Perkins's book.I do not understand this example
import re
replacement_patterns = [
(r'won\'t', 'will not'),
(r'can\'t', 'cannot'),
(r'i\'m', 'i am'),
(r'ain\'t', 'is not'),
(r'(\w+)\'ll', '\g<1> will'),
(r'(\w+)n\'t', '\g<1> not'),
(r'(\w+)\'ve', '\g<1> have'),
(r'(\w+)\'s', '\g<1> is'),
(r'(\w+)\'re', '\g<1> are'),
(r'(\w+)\'d', '\g<1> would')
]
Now we have
class RegexpReplacer(object):
def __init__(self, patterns=replacement_patterns):
self.patterns = [(re.compile(regex), repl) for (regex, repl) in patterns]
What does this list comprehension serve for?What does repl stands for?

repl stands for replacement. It is just a variable name; repl has no special meaning.
The (incomplete) code you have provided is presumably going to make a bunch of replacements on a given string. It will replace won't with will not; can't with cannot; i'm with i am; etc.
The more complex replacements, such as (\w+)'d --> \g<1> would are using back-references to capture part of the matched pattern, for use in the replacement.
The code: (re.compile(regex), repl) for (regex, repl) in patterns is using list-comprehension to compile the regular expressions.

repl is just a variable referring to the 2nd part of the tuple so lets say you have a list with [(1, 2), (3, 4)] and you want to create a list-comprehension to make a new list by adding 1 to the 2nd number in each tuple, you would do something like:
[(x, y+1) for (x, y) in lst]

I qoute:
Python supports a concept called "list comprehensions". It can be used to construct lists in a very natural, easy way, like a mathematician is used to do.
A list comprehension can be with a condition. List comprehensions can have multiple conditions.
The general format for a list comprehension with a if condition is this,
[<expression> for <value> in <iterable> if <condition>]
You can also have an if..else in the comprehension
[<expression> if <condition> else <expression> for <value> in <iterable> ]
NOTE: Your iterable can be list,tuple,set,string,...etc
To make things clear consider this simple example,
>>> v = [1,2,3,4]
>>> v
[1, 2, 3, 4]
v and x are two lists.
>>> x = [1,2]
>>> x
[1, 2]
Now suddenly you decide I want a list new_list which has items from v but not in x. Hmmm... How to do that? Take a look below.
>>> new_list = [item for item in v if item not in x]
>>> x
[3, 4]
Notice how I've used item. I just created that inside the list comprehension. Similarly repl just a variable name. Meaning **Replacement_string**
Why I told all that? You'll get that in a moment.
And now we come to re
pattern= r'won\'t' #can also be r"won't" \ just to escape the ' (single quotes)
# then, much later in your code you can do
m = re.match(pattern, input)
#Look how I'm using the pattern
But re.compile()
pattern = re.compile(r'won\'t')
# then, later in your code
m = pattern.match(input)
You see here we compile the regex pattern and then find a match. In the former we are just giving it as a parameter to re.match().
Note:
def __init__(self, patterns=replacement_patterns):
replacement_patterns --> patterns
(Now patterns and replacement_patters both are aliases to your list of tuples)
Both does the same however the, So coming to your confusion,
[(re.compile(regex), repl) for (regex, repl) in patterns]
This list comprehension gets all tuples from your list of tuples known as ? patterns
Initially:
(regex, repl)-->(r'won\'t', 'will not')
and so on for every tuple items. And this is converted to:
(r'won\'t', 'will not') --> (re.compile(r'won\'t'),'will not')
So basically your list comprehension converts the
tuple(pattern,replacement_string) to tuple(compiled_re,replacement_string)

Reference:
https://www.python.org/dev/peps/pep-0202/
https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions
To help understand:
test = [('a', 1), ('b', 2), ('c', 3)]
for item in test:
print item
for key, index in test:
print key, index
print [key + str(index) for key, index in test]

Related

Replace dashes with whitespaces for all elements across a tuple?

I'm building off of these two questions because they don't quite answer my question:
How to change values in a tuple?
Python: Replace "-" with whitespace
If I have a tuple like this:
worldstuff = [('Hi', 'Hello-World', 'Earth'), ('Hello-World', 'Hi'), ...]
How do I replace dashes with whitespaces for all of the elements across all lists in a tuple? The previous Stack Overflow question covers changing the specific index of one list in a tuple, but not if there are multiple occurances of an element needing to be replaced.
I've tried doing the following, which doesn't quite work:
worldstuff_new = [x.replace('-', ' ') for x in worldstuff]
But if I do it for a specific list in the tuple, it works for that tuple list. I'm trying to avoid having to do separate lists and instead trying to do it all at once.
worldstuff_new = [x.replace('-', ' ') for x in worldstuff[0]]
I understand that tuples are immutable, which is why I am having trouble figuring this out. Is this possible? Would appreciate any help - thanks.

Correct expression:
a = [('Hi', 'Hello-World', 'Earth'), ('Hello-World', 'Hi')]
b = [tuple([x.replace('-', ' ') for x in tup]) for tup in a]
>>> b
[('Hi', 'Hello World', 'Earth'), ('Hello World', 'Hi')]
A few notes:
Please don't clobber builtins (tuple).
What you have is actually not a tuple, but a list of tuples.
As you note, tuples are immutable; but you can always build new tuples from the original ones.
(Speed) Why tuple([x.replace ...]) (tuple of a list comprehension) instead of tuple(x.replace ...) (tuple of the output of a generator)? Because the former is slightly faster.

first of everything, don't name any variable tuple it's a builtin function and when you name a variable tuple you miss that method
def changer(data):
if type(data) == str:
return data.replace("-", " ")
elif type(data) == list:
return [changer(x) for x in data]
elif type(data) == tuple:
return tuple(changer(x) for x in data)
tpl = [('Hi', 'Hello-World', 'Earth'), ('Hello-World', 'Hi')]
changer(tpl)
output:
[('Hi', 'Hello World', 'Earth'), ('Hello World', 'Hi')]

tuple_old = [('Hi', 'Hello-World', 'Earth'), ('Hello-World', 'Hi')]
tuple_new = [
tuple([x.replace('-', ' ') for x in tup]) for tup in tuple_old
]
print(tuple_new)
FWIW, tuples are the things in parentheses. Lists are in square brackets. So you have a list of tuples, not a tuple of lists.

There are a few things that might help you to understand:
You cannot change a tuple or a string. you can only create a new one with different contents.
All the functions that "modify" a string are actually just creating a new string that has been modified from the original. Your original question that you referenced also slightly mis-understood one of the quirks of python where you can iterate over the characters in a string, but due to python not having a character datatype, they just end up as new strings. tldr; replacing "-" with " " looks just like this:
print("old-str".replace("-", " "))
This will generate a new string with all the dashes replaced.
Now you need to extend this to creating a new tuple of strings. You can create a new tuple with the built-in-function (which you had previously accidentally overwrote with a variable) tuple and passing in some sort of iterable. In this case I will use a generator expression (similar to list comprehension but without the square brackets) to create this iterable:
tuple(entry.replace("-", " ") for entry in old_tup)
finally you can apply this to each tuple in your list either by creating a new list, or by over-writing the values in the existing list (example shows creating a new list with a list comprehension):
[tuple(entry.replace("-", " ") for entry in old_tup) for old_tup in worldstuff ]

This might help:
worldstuff_new = [tuple(x.replace('-', ' ') for x in t) for t in worldstuff]

If you want a different way to do this you could use the map function like so.
tuples = [('Hi','Hello-World', 'Earth'), ('Hello-World', 'Hi'), ('Te-st', 'Te-st2')]
new_tuples = list(map(lambda tup: tuple(item.replace('-', ' ') for item in tup), tuples))
output:
[('Hi', 'Hello World', 'Earth'), ('Hello World', 'Hi'), ('Te st', 'Te st2')]

Remove all words from a string that exist in a list

community.
I need to write a function that goes through a string and checks if each word exists in a list, if the word exists in the (Remove list) it should remove that word if not leave it alone.
i wrote this:
def remove_make(x):
a = x.split()
for word in a:
if word in remove: # True
a = a.remove(word)
else:
pass
return a
But it returns back the string with the (Remove) word still in there. Any idea how I can achieve this?

A more terse way of doing this would be to form a regex alternation based on the list of words to remove, and then do a single regex substitution:
inp = "one two three four"
remove = ['two', 'four']
regex = r'\s*(?:' + r'|'.join(remove) + ')\s*'
out = re.sub(regex, ' ', inp).strip()
print(out) # prints 'one three'

You can try something more simple:
import re
remove_list = ['abc', 'cde', 'edf']
string = 'abc is walking with cde, wishing good luck to edf.'
''.join([x for x in re.split(r'(\W+)', string) if x not in remove_list])
And the result would be:
' is walking with , wishing good luck to .'
The important part is the last line:
''.join([x for x in re.split(r'(\W+)', string) if x not in remove_list])
What it does:
You are converthing the string to list of words with re.split(r'(\W+)', string), preserving all the whitespaces and punctuation as list items.
You are creating another list with list comprehension, filtering all the items, which are not in remove_list
You are converting the result list back to string with str.join()
The BNF notation for list comprehensions and a little bit more information on them may be found here
PS: Of course, you may make this a little bit more readable if you break the one-liner into peaces and assign the result of re.split(r'(\W+)', string) to a variable and decouple the join and the comprehension.

You can create a new list without the words you want to remove and then use join() function to concatenate all the words in that list. Try
def remove_words(string, rmlist):
final_list = []
word_list = string.split()
for word in word_list:
if word not in rmlist:
final_list.append(word)
return ' '.join(final_list)

list.remove(x) returns None and modifies the list in-place by removing x it exists inside the list. When you do
a = a.remove(word)
you will be effectively storing None in a and this would give an exception in the next iteration when you again do a.remove(word) (None.remove(word) is invalid), but you don’t get that either since you immediately return after the conditional (which is wrong, you need to return after the loop has finished, outside its scope). This is how your function should look like (without modifying a list while iterating over it):
remove_words = ["abc", ...] # your list of words to be removed
def remove_make(x):
a = x.split()
temp = a[:]
for word in temp:
if word in remove_words: # True
a.remove(word)
# no need of 'else' also, 'return' outside the loop's scope
return " ".join(a)

Converting list of 2-element lists: [[word, length], [word, length], ...]

I need help editing this function lenumerate() that takes a string
(like 's') and returns a list of 2-item lists containing each word and
it's length:[['But', 3], ['then', 4], ['of', 2], ... ['are', 3], ['nonmigratory', 12]]
lenumerate(s) - convert 's' into list of 2-element lists: [[word,
length],[word, length], ...]
# Define the function first ...
def lenumerate(s):
l = [] # list for holding your result
# Convert string s into a list of 2-element-lists
Enter your code here
return l
... then call the lenumerate() to test it
text = "But then of course African swallows are nonmigratory"
l = lenumerate(text)
print("version 1", l)
I think I need to spit the list and use the len() function, but I am not exactly sure how to go about using both of those in the most efficient way.

Here is the answer you want:
def lenumerate(s):
l = []
words = s.split(' ')
for word in words:
l.append([word,len(word)])
return l

def lenumerate(s):
l = [] # list for holding your result
for x in s.split(): # split sentence into words using split()
l.append([x, len(x)]) #append a list to l x and the length of x
return l

Here is one succinct method:
text = "But then of course African swallows are nonmigratory"
def lenumerate(txt):
s = text.split(' ')
return list(zip(s, map(len, s)))
# [('But', 3), ('then', 4), ('of', 2), ('course', 6), ('African', 7),
# ('swallows', 8), ('are', 3), ('nonmigratory', 12)]

I would use list comprehension here. So:
def lenumerate (s): return [[word, len (word)] for word in s.split()]
Let me explain this nifty one-liner:
You can use def (or anything that needs a colon) on one line. Just keep typing after the colon.
List comprehension means you can create a list in a special way. So instead of defining temporary list l and adding to it later, I create and customize it on the spot by enclosing it in brackets.
I make [word, len (word)] as you suggested, and Python understands that I will define word in my for loop, which:
Comes after the statement. That's why I first made the list, then my for statement
And, like you guessed, the list we are cycling through it s.split() (splits at spaces)
Any other questions, just ask!

append/insert a value/string to a list element dynamically in Python?

my_list = ['cat','cow','dog','rabbit']
but what I want is to append/insert a character(or string) to one or more element(not all).
something like
my_list = ['cat','cow_%s','dog_%s','rabbit']%('gives milk','bark')
now updated list should look like
my_list = ['cat','cow_gives milk','dog_bark','rabbit']
one way is to do this is manually change/update the element one by one
e.g my_list[2]=my_list[2]+"bark"
but I don't want that because my_list is long(around 100s element ) and 40+ need to be changed dynamically.
In my case It is like
my_list = ['cat','cow_%s','dog_%s','rabbit']
for a in xyz: #a is a string and xyz is a list of string
my_list=my_list%(a,a+'b')
fun(my_list)

You could do something like this:
changes = {"cow":"gives milk", "dog":"bark"}
my_list = [item if item not in changes else "_".join([item, changes[item]])
for item in my_list]
For greater efficiency if you will do this repeatedly, as JAB suggests in the comments, build a dictionary to map items to their locations (indices) in the list:
locations = dict((s, i) for i, s in enumerate(my_list))
You can then use this to find the corresponding items.
If you are stuck with the list of strings, some ending "%s", and list of things to put in them, I guess you will have to do something like:
for i, s in enumerate(my_list): # work through items in list with index
if s.endswith("%s"): # find an item to update
my_list[i] = s % xyz.pop(0) # update with first item popped from xyz
Note that xyz will have to be a list for this, not tuple, as you can't pop items from a tuple.

If you have a replacement string in source strings, and an iterable of replacements, then you can do something such as:
import re
def do_replacements(src, rep):
reps = iter(rep)
for item in src:
yield re.sub('%s', lambda m: next(reps), item)
replaces = ('gives milk','bark', 'goes walkies', 'eat stuff')
my_list = ['cat','cow_%s','dog_%s_and_%s','rabbit_%s']
print list(do_replacements(my_list, replaces))
# ['cat', 'cow_gives milk', 'dog_bark_and_goes walkies', 'rabbit_eat stuff']
If you don't have enough replacements you'll get a StopIteration - you can either consider than an error, or alternatively provide a default (possibly empty) replacement to the replacement: lambda m: next(reps, '') for instance...

Here are two more possibilities: Either, you could create a list of verbs alongside your list of animals, with None for animals that shall not have a verb, and merge them together...
animals = ['cat','cow','dog','rabbit']
verbs = [None, 'gives milk', 'barks', None]
def combine(animal, verb):
return (animal + "_" + verb) if verb else animal
print map(combine, animals, verbs)
... or, if the animals already have those %s placeholders for verbs, iterate the animals, check if the current animal has a placeholder, and if so, replace it with the next verb. (Similar to Jon's answer, but using % instead of re.sub)
animals = ['cat','cow_%s','dog_%s','rabbit']
verbs = iter(['gives milk', 'barks'])
print [animal % next(verbs) if '%' in animal else animal
for animal in animals]

How to use re match objects in a list comprehension

I have a function to pick out lumps from a list of strings and return them as another list:
def filterPick(lines,regex):
result = []
for l in lines:
match = re.search(regex,l)
if match:
result += [match.group(1)]
return result
Is there a way to reformulate this as a list comprehension? Obviously it's fairly clear as is; just curious.
Thanks to those who contributed, special mention for #Alex. Here's a condensed version of what I ended up with; the regex match method is passed to filterPick as a "pre-hoisted" parameter:
import re
def filterPick(list,filter):
return [ ( l, m.group(1) ) for l in list for m in (filter(l),) if m]
theList = ["foo", "bar", "baz", "qurx", "bother"]
searchRegex = re.compile('(a|r$)').search
x = filterPick(theList,searchRegex)
>> [('bar', 'a'), ('baz', 'a'), ('bother', 'r')]

[m.group(1) for l in lines for m in [regex.search(l)] if m]
The "trick" is the for m in [regex.search(l)] part -- that's how you "assign" a value that you need to use more than once, within a list comprehension -- add just such a clause, where the object "iterates" over a single-item list containing the one value you want to "assign" to it. Some consider this stylistically dubious, but I find it practical sometimes.

return [m.group(1) for m in (re.search(regex, l) for l in lines) if m]

It could be shortened a little
def filterPick(lines, regex):
matches = map(re.compile(regex).match, lines)
return [m.group(1) for m in matches if m]
You could put it all in one line, but that would mean you would have to match every line twice which would be a bit less efficient.

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), it's possible to use a local variable within a list comprehension in order to avoid calling multiple times the same expression:
# items = ["foo", "bar", "baz", "qurx", "bother"]
[(x, match.group(1)) for x in items if (match := re.compile('(a|r$)').search(x))]
# [('bar', 'a'), ('baz', 'a'), ('bother', 'r')]
This:
Names the evaluation of re.compile('(a|r$)').search(x) as a variable match (which is either None or a Match object)
Uses this match named expression in place (either None or a Match) to filter out non matching elements
And re-uses match in the mapped value by extracting the first group (match.group(1)).

>>> "a" in "a visit to the dentist"
True
>>> "a" not in "a visit to the dentist"
False
That also works with a search query you're hunting down in a list
`P='a', 'b', 'c'
'b' in P` returns true

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

List comprehension with regex - python

repl is just a variable referring to the 2nd part of the tuple so lets say you have a list with [(1, 2), (3, 4)] and you want to create a list-comprehension to make a new list by adding 1 to the 2nd number in each tuple, you would do something like: [(x, y+1) for (x, y) in lst]

Related

Replace dashes with whitespaces for all elements across a tuple?

Remove all words from a string that exist in a list

Converting list of 2-element lists: [[word, length], [word, length], ...]

append/insert a value/string to a list element dynamically in Python?

How to use re match objects in a list comprehension

Categories

Resources