Python possible list comprehension - python

I have a text file and two lists of strings.
The first list is the keyword list
k = [hi, bob]
The second list is the words I want to replace the keywords with
r = [ok, bye]
I want to take the text file as input, where when k appears, it's replaced with r, thus, "hi, how are you bob" would be changed to "ok, how are you bye"

Let's say you have already parsed your sentence:
sentence = ['hi', 'how', 'are', 'you', 'bob']
What you want to do is to check whether each word in this sentence is present in k. If yes, replace it by the corresponding element in r; else, use the actual word. In other words:
if word in k:
word_index = k.index(word)
new_word = r[word_index]
This can be written in a more concise way:
new_word = r[k.index(word)] if word in k else word
Using list comprehensions, here's how you go about processing the whole sentence:
new_sentence = [r[k.index(word)] if word in k else word for word in sentence]
new_sentence is now equal to ['ok', 'how', 'are', 'you', 'bye'] (which is what you want).
Note that in the code above we perform two equivalent search operations: word in k and k.index(word). This is inefficient. These two operations can be reduced to one by catching exceptions from the index method:
def get_new_word(word, k, r):
try:
word_index = k.find(word)
return r[word_index]
except ValueError:
return word
new_sentence = [get_new_word(word, k, r) for word in sentence]
Now, you should also note that searching for word in sentence is a search with O(n) complexity (where n is the number of keywords). Thus the complexity of this algorithm is O(n.m) (where is the sentence length). You can reduce this complexity to O(m) by using a more appropriate data structure, as suggested by the other comments. This is left as an exercise :-p

I'll assume you've got the "reading string from file" part covered, so about that "replacing multiple strings" part: First, as suggested by Martijn, you can create a dictionary, mapping keys to replacements, using dict and zip.
>>> k = ["hi", "bob"]
>>> r = ["ok", "bye"]
>>> d = dict(zip(k, r))
Now, one way to replace all those keys at once would be to use a regular expression, being a disjunction of all those keys, i.e. "hi|bob" in your example, and using re.sub with a replacement function, looking up the respective key in that dictionary.
>>> import re
>>> re.sub('|'.join(k), lambda m: d[m.group()], "hi, how are you bob")
'ok, how are you bye'
Alternatively, you can just use a loop to replace each key-replacement pair one after the other:
s = "hi, how are you bob"
for (x, y) in zip(k, r):
s = s.replace(x, y)

Related

Remove all words from a string that exist in a list

community.
I need to write a function that goes through a string and checks if each word exists in a list, if the word exists in the (Remove list) it should remove that word if not leave it alone.
i wrote this:
def remove_make(x):
a = x.split()
for word in a:
if word in remove: # True
a = a.remove(word)
else:
pass
return a
But it returns back the string with the (Remove) word still in there. Any idea how I can achieve this?
A more terse way of doing this would be to form a regex alternation based on the list of words to remove, and then do a single regex substitution:
inp = "one two three four"
remove = ['two', 'four']
regex = r'\s*(?:' + r'|'.join(remove) + ')\s*'
out = re.sub(regex, ' ', inp).strip()
print(out) # prints 'one three'
You can try something more simple:
import re
remove_list = ['abc', 'cde', 'edf']
string = 'abc is walking with cde, wishing good luck to edf.'
''.join([x for x in re.split(r'(\W+)', string) if x not in remove_list])
And the result would be:
' is walking with , wishing good luck to .'
The important part is the last line:
''.join([x for x in re.split(r'(\W+)', string) if x not in remove_list])
What it does:
You are converthing the string to list of words with re.split(r'(\W+)', string), preserving all the whitespaces and punctuation as list items.
You are creating another list with list comprehension, filtering all the items, which are not in remove_list
You are converting the result list back to string with str.join()
The BNF notation for list comprehensions and a little bit more information on them may be found here
PS: Of course, you may make this a little bit more readable if you break the one-liner into peaces and assign the result of re.split(r'(\W+)', string) to a variable and decouple the join and the comprehension.
You can create a new list without the words you want to remove and then use join() function to concatenate all the words in that list. Try
def remove_words(string, rmlist):
final_list = []
word_list = string.split()
for word in word_list:
if word not in rmlist:
final_list.append(word)
return ' '.join(final_list)
list.remove(x) returns None and modifies the list in-place by removing x it exists inside the list. When you do
a = a.remove(word)
you will be effectively storing None in a and this would give an exception in the next iteration when you again do a.remove(word) (None.remove(word) is invalid), but you don’t get that either since you immediately return after the conditional (which is wrong, you need to return after the loop has finished, outside its scope). This is how your function should look like (without modifying a list while iterating over it):
remove_words = ["abc", ...] # your list of words to be removed
def remove_make(x):
a = x.split()
temp = a[:]
for word in temp:
if word in remove_words: # True
a.remove(word)
# no need of 'else' also, 'return' outside the loop's scope
return " ".join(a)

Creating a list of all words that are both in a given string and in a given dictionary

I'm working with a string and a dictionary in Python, trying to loop through the string in order to create a list of the words which appear both in the string and amongst the keys of the dictionary. What I have currently is:
## dictionary will be called "dict" below
sentence = "is this is even really a sentence"
wordsinboth = []
for w in sentence.split():
if w in dict:
wordsinboth += w
Instead of returning a list of words split by whitespace, however, this code returns a list of every character in the sentence.
The same thing occurs even when I attempt to create a list of split words before looping, as seen below:
sentence = "is this is even really a sentence"
wordsinboth = []
sent = sentence.split()
for w in sent:
if w in dict:
wordsinboth += w
I guess I'm not able to specify "if w in dict" and still split by whitespace? Any suggestions on how to fix this?
Use append instead of +=:
sentence = "is this is even really a sentence"
wordsinboth = []
for w in sentence.split():
if w in dict:
wordsinboth.append(w)
The += operator doesn't work as you'd expect:
a = []
myString = "hello"
a.append(myString)
print(a) # ['hello']
b = []
b += myString
print(b) # ['h', 'e', 'l', 'l', 'o']
If you're interested on why this happens, the following questions are a good read:
Why does += behave unexpectedly on lists?
What is the difference between Python's list methods append and extend?
Also, note that using list comprehensions might result in a more elegant solution to your problem:
wordsinboth = [word for word in sentence.split() if word in dict]
You can use += on a list, but you must add a list to it, not a value, otherwise the value gets converted to a list before being added. In your case, the w strings are being converted to a list of all the characters in them (e.g. 'if' => ['i', 'f']). To work around that, make the value into a list by adding [] around it:
for w in sentence.split():
if w in dict:
wordsinboth += [w]
Use list comprehensions it's more shortest and elegant way for your case:
wordsinboth = [word for word in sentence.split() if w in dict]
Problem in your cycle that you have to use append for adding new item to wordsinboth instead of + operator, also please keep in mind that it can create duplicates, if you need uniq items you can wrap your result to set which gives you uniq words.
Like this:
wordsinboth = {word for word in sentence.split() if w in dict}

Lambda expression in Python doesn't work correctly

I tried to make some lambda expressions. There is no any error. But it doesn't work correctly. Following is my code.
from nltk.tokenize import word_tokenize
list1 = 'gain, archive, win, success'.split(',')
list2 = 'miss, loss, gone, give up'.split(',')
def classify(s, rls):
for (f, emotion) in rls:
if f(s):
return emotion
return "another"
rules = [(lambda x: (i for i, j in zip(word_tokenize(x),list2) if i == j) != [], "sad"),
(lambda x: (a for a, b in zip(word_tokenize(x),list1) if a == b) != [], "happy"),]
print classify("I win the game", rules)
print classify("I miss you", rules)
The output is
sad
sad
I have no idea what is the wrong with my code. Can someone help me !
Zip iterates through the lists "in parallel", so it returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The returned list is truncated in length to the length of the shortest argument sequence. (source)
So you were trying to check if the i-th word of the sentence matches the i-th sentiment word in the least, which I guess is not what you want. Plus, as noted by #juanpa.arrivillaga, you were checking if a generator was equal to the empty list, which is always True, for the simple reason that a generator is not a list, independently from its content.
What you want is checking if any word in the sentence is in the sentiment list.
Try changing:
lambda x: (i for i, j in zip(word_tokenize(x),list2) if i == j) != []
to:
lambda x: any(word in list2 for word in word_tokenize(x))
So overall you define rules like this:
rules = [(lambda x: any(word in list2 for word in word_tokenize(x)), "sad"),
(lambda x: any(word in list1 for word in word_tokenize(x)), "happy")]
Also, there are white spaces in the words in sentiment lists that can make the comparison fail.
Redefine them as follows:
list1 = 'gain,archive,win,success'.split(',')
list2 = 'miss,loss,gone,give up'.split(',')
Or even better use strip to remove empty spaces from beginning and end of words as general good practice when working with strings.

Search a word in a dictionary

I want to input a letter and i want to return all the words that contain that letter. For example:
String: "I saw a frog in my garden"
input: g
output: frog, garden
How could make this in Python?
I don't know what you are talking about regarding dictionaries (you may misunderstand them)- but I would just split up the word and then check if the letter was in each one, within a list comprehension.
>>> String = "I saw a frog in my garden"
>>> letter = 'g'
>>> [w for w in String.split() if letter in w]
['frog', 'garden']
That seems to be what you want.
It is quite useful to know which letter the list represents:
contains = {}
contains[letter] = [w for w in String.split() if letter in w]
I am assuming that you have split the string as a list of words and created a dictionary using those words as keys. given that, the following function takes a dictionary and a character and returns a list of keys on that dictionary which have that character:
def keys_have_char(dict, char):
return [key for key in dict.keys() if char in key]
Notice that I haven't added any checks, so this assumes that dict is indeed a dictionary and will work not only with single chars, but with any substrings as well.

Changing lists with a given string

On Python 3 I am trying to write a function find(string_list, search) that takes a list of strings string_list and a single string search as parameters and returns a list of all those strings in string_list that contain the given search string.
So print(find(['she', 'sells', 'sea', 'shells', 'on', 'the', 'sea-shore'], 'he'))
would print:
['she', 'shells', 'the']
Here's what I tried so far:
def find(string_list, search):
letters = set(search)
for word in string_list:
if letters & set(word):
return word
return (object in string_list) in search
running print(find(['she', 'sells', 'sea', 'shells', 'on', 'the', 'sea-shore'], 'he'))
What I expected = [she, shells, the]
what I got = [she]
You can do this with:
def find(string_list, search):
return [s for s in string_list if search in s]
The main problem with your code example is that you can only return from a function once, at which point the function stops executing. This is why your function returns only one value.
If you wish to return multiple values you must return a container object like a list or a set. Here's how your code might look if you use a list:
def find(string_list, search):
letters = set(search)
result = [] # create an empty list
for word in string_list:
if letters & set(word):
# append the word to the end of the list
result.append(word)
return result
The if test here is actually not doing quite what your problem statement called for. Since a set is an unordered collection, the & operation can test only if the two sets have any elements in common, not that they appear in the same order as the input. For example:
>>> letters = set("hello")
>>> word = set("olleh")
>>> word & letters
set(['h', 'e', 'l', 'o'])
As you can see, the operator is returning a set whose elements are those that are common between the two sets. Since a set is True if it contains any elements at all, this is actually testing whether all of the letters in the search string appear in a given item, not that they appear together in the given order.
A better approach is to test the strings directly using the in operator, which (when applied to strings) tests if one string is a substring of another, in sequence:
def find(string_list, search):
result = []
for word in string_list:
if search in word:
result.append(word)
return result
Since this pattern of iterating over every item in a list and doing a test on it is so common, Python provides a shorter way to write this called a list comprehension, which allows you to do this whole thing in one expression:
def find(string_list, search):
return [word for word in string_list if search in word]
This executes just as the prior example but is more concise.

Categories