Lambda expression in Python doesn't work correctly - python

I tried to make some lambda expressions. There is no any error. But it doesn't work correctly. Following is my code.
from nltk.tokenize import word_tokenize
list1 = 'gain, archive, win, success'.split(',')
list2 = 'miss, loss, gone, give up'.split(',')
def classify(s, rls):
for (f, emotion) in rls:
if f(s):
return emotion
return "another"
rules = [(lambda x: (i for i, j in zip(word_tokenize(x),list2) if i == j) != [], "sad"),
(lambda x: (a for a, b in zip(word_tokenize(x),list1) if a == b) != [], "happy"),]
print classify("I win the game", rules)
print classify("I miss you", rules)
The output is
sad
sad
I have no idea what is the wrong with my code. Can someone help me !

Zip iterates through the lists "in parallel", so it returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The returned list is truncated in length to the length of the shortest argument sequence. (source)
So you were trying to check if the i-th word of the sentence matches the i-th sentiment word in the least, which I guess is not what you want. Plus, as noted by #juanpa.arrivillaga, you were checking if a generator was equal to the empty list, which is always True, for the simple reason that a generator is not a list, independently from its content.
What you want is checking if any word in the sentence is in the sentiment list.
Try changing:
lambda x: (i for i, j in zip(word_tokenize(x),list2) if i == j) != []
to:
lambda x: any(word in list2 for word in word_tokenize(x))
So overall you define rules like this:
rules = [(lambda x: any(word in list2 for word in word_tokenize(x)), "sad"),
(lambda x: any(word in list1 for word in word_tokenize(x)), "happy")]
Also, there are white spaces in the words in sentiment lists that can make the comparison fail.
Redefine them as follows:
list1 = 'gain,archive,win,success'.split(',')
list2 = 'miss,loss,gone,give up'.split(',')
Or even better use strip to remove empty spaces from beginning and end of words as general good practice when working with strings.

Related

Remove all words from a string that exist in a list

community.
I need to write a function that goes through a string and checks if each word exists in a list, if the word exists in the (Remove list) it should remove that word if not leave it alone.
i wrote this:
def remove_make(x):
a = x.split()
for word in a:
if word in remove: # True
a = a.remove(word)
else:
pass
return a
But it returns back the string with the (Remove) word still in there. Any idea how I can achieve this?
A more terse way of doing this would be to form a regex alternation based on the list of words to remove, and then do a single regex substitution:
inp = "one two three four"
remove = ['two', 'four']
regex = r'\s*(?:' + r'|'.join(remove) + ')\s*'
out = re.sub(regex, ' ', inp).strip()
print(out) # prints 'one three'
You can try something more simple:
import re
remove_list = ['abc', 'cde', 'edf']
string = 'abc is walking with cde, wishing good luck to edf.'
''.join([x for x in re.split(r'(\W+)', string) if x not in remove_list])
And the result would be:
' is walking with , wishing good luck to .'
The important part is the last line:
''.join([x for x in re.split(r'(\W+)', string) if x not in remove_list])
What it does:
You are converthing the string to list of words with re.split(r'(\W+)', string), preserving all the whitespaces and punctuation as list items.
You are creating another list with list comprehension, filtering all the items, which are not in remove_list
You are converting the result list back to string with str.join()
The BNF notation for list comprehensions and a little bit more information on them may be found here
PS: Of course, you may make this a little bit more readable if you break the one-liner into peaces and assign the result of re.split(r'(\W+)', string) to a variable and decouple the join and the comprehension.
You can create a new list without the words you want to remove and then use join() function to concatenate all the words in that list. Try
def remove_words(string, rmlist):
final_list = []
word_list = string.split()
for word in word_list:
if word not in rmlist:
final_list.append(word)
return ' '.join(final_list)
list.remove(x) returns None and modifies the list in-place by removing x it exists inside the list. When you do
a = a.remove(word)
you will be effectively storing None in a and this would give an exception in the next iteration when you again do a.remove(word) (None.remove(word) is invalid), but you don’t get that either since you immediately return after the conditional (which is wrong, you need to return after the loop has finished, outside its scope). This is how your function should look like (without modifying a list while iterating over it):
remove_words = ["abc", ...] # your list of words to be removed
def remove_make(x):
a = x.split()
temp = a[:]
for word in temp:
if word in remove_words: # True
a.remove(word)
# no need of 'else' also, 'return' outside the loop's scope
return " ".join(a)

Creating a list of all words that are both in a given string and in a given dictionary

I'm working with a string and a dictionary in Python, trying to loop through the string in order to create a list of the words which appear both in the string and amongst the keys of the dictionary. What I have currently is:
## dictionary will be called "dict" below
sentence = "is this is even really a sentence"
wordsinboth = []
for w in sentence.split():
if w in dict:
wordsinboth += w
Instead of returning a list of words split by whitespace, however, this code returns a list of every character in the sentence.
The same thing occurs even when I attempt to create a list of split words before looping, as seen below:
sentence = "is this is even really a sentence"
wordsinboth = []
sent = sentence.split()
for w in sent:
if w in dict:
wordsinboth += w
I guess I'm not able to specify "if w in dict" and still split by whitespace? Any suggestions on how to fix this?
Use append instead of +=:
sentence = "is this is even really a sentence"
wordsinboth = []
for w in sentence.split():
if w in dict:
wordsinboth.append(w)
The += operator doesn't work as you'd expect:
a = []
myString = "hello"
a.append(myString)
print(a) # ['hello']
b = []
b += myString
print(b) # ['h', 'e', 'l', 'l', 'o']
If you're interested on why this happens, the following questions are a good read:
Why does += behave unexpectedly on lists?
What is the difference between Python's list methods append and extend?
Also, note that using list comprehensions might result in a more elegant solution to your problem:
wordsinboth = [word for word in sentence.split() if word in dict]
You can use += on a list, but you must add a list to it, not a value, otherwise the value gets converted to a list before being added. In your case, the w strings are being converted to a list of all the characters in them (e.g. 'if' => ['i', 'f']). To work around that, make the value into a list by adding [] around it:
for w in sentence.split():
if w in dict:
wordsinboth += [w]
Use list comprehensions it's more shortest and elegant way for your case:
wordsinboth = [word for word in sentence.split() if w in dict]
Problem in your cycle that you have to use append for adding new item to wordsinboth instead of + operator, also please keep in mind that it can create duplicates, if you need uniq items you can wrap your result to set which gives you uniq words.
Like this:
wordsinboth = {word for word in sentence.split() if w in dict}

Find substring in string without white spaces

Let's say I have the following list ['house', 'John', 'garden']and a string 'MynameisJohn'. Is there a way in Python to check if any of the words in the list are part of the string even when there are no white spaces? The goal would finally be to have a function which returns the words which are part of the string and maybe something that describes where in the string the words start. So something like this:
def function(list, string):
returns [(word, position in the string)]
I tried some things but essentially nothing works because I don't know how to deal with the missing white spaces... The only method I could think of is checking if any sequence in the string corresponds to one of the words, the problem is I don't know how to implement something like that and it doesn't seem to be very efficient.
I found a question here on StackOverflow which deals with kind of the same problem, but since I have a concrete list to compare the string to, I shouldn't run into the same problem, right?
An IDLE example:
>>> find = ['house', 'John', 'garden']
>>> s = 'MynameisJohn'
>>> results = [item for item in find if item in s]
>>> print( results )
[John]
Explanation:
[item for item in find if item in s] is a list comprehension.
For every item in the list named find, it checks if item in s. This will return True if item is any substring in s. If True, then that item will be in the results list.
For finding position of some string in other string, you can use str.index() method.
This function() accepts list and string and yield words that match and position of the word in the string:
def function(lst, s):
for i in lst:
if i not in s:
continue
yield i, s.index(i)
lst = ['house', 'John', 'garden']
s = 'MynameisJohn'
for word, position in function(lst, s):
print(word, position)
Output:
John 8

Python possible list comprehension

I have a text file and two lists of strings.
The first list is the keyword list
k = [hi, bob]
The second list is the words I want to replace the keywords with
r = [ok, bye]
I want to take the text file as input, where when k appears, it's replaced with r, thus, "hi, how are you bob" would be changed to "ok, how are you bye"
Let's say you have already parsed your sentence:
sentence = ['hi', 'how', 'are', 'you', 'bob']
What you want to do is to check whether each word in this sentence is present in k. If yes, replace it by the corresponding element in r; else, use the actual word. In other words:
if word in k:
word_index = k.index(word)
new_word = r[word_index]
This can be written in a more concise way:
new_word = r[k.index(word)] if word in k else word
Using list comprehensions, here's how you go about processing the whole sentence:
new_sentence = [r[k.index(word)] if word in k else word for word in sentence]
new_sentence is now equal to ['ok', 'how', 'are', 'you', 'bye'] (which is what you want).
Note that in the code above we perform two equivalent search operations: word in k and k.index(word). This is inefficient. These two operations can be reduced to one by catching exceptions from the index method:
def get_new_word(word, k, r):
try:
word_index = k.find(word)
return r[word_index]
except ValueError:
return word
new_sentence = [get_new_word(word, k, r) for word in sentence]
Now, you should also note that searching for word in sentence is a search with O(n) complexity (where n is the number of keywords). Thus the complexity of this algorithm is O(n.m) (where is the sentence length). You can reduce this complexity to O(m) by using a more appropriate data structure, as suggested by the other comments. This is left as an exercise :-p
I'll assume you've got the "reading string from file" part covered, so about that "replacing multiple strings" part: First, as suggested by Martijn, you can create a dictionary, mapping keys to replacements, using dict and zip.
>>> k = ["hi", "bob"]
>>> r = ["ok", "bye"]
>>> d = dict(zip(k, r))
Now, one way to replace all those keys at once would be to use a regular expression, being a disjunction of all those keys, i.e. "hi|bob" in your example, and using re.sub with a replacement function, looking up the respective key in that dictionary.
>>> import re
>>> re.sub('|'.join(k), lambda m: d[m.group()], "hi, how are you bob")
'ok, how are you bye'
Alternatively, you can just use a loop to replace each key-replacement pair one after the other:
s = "hi, how are you bob"
for (x, y) in zip(k, r):
s = s.replace(x, y)

getting rid of proper nouns in a nested list python

I'm trying to right a program that takes in a nested list, and returns a new list that takes out proper nouns.
Here is an example:
L = [['The', 'name', 'is', 'James'], ['Where', 'is', 'the', 'treasure'], ['Bond', 'cackled', 'insanely']]
I want to return:
['the', 'name', 'is', 'is', 'the', 'tresure', 'cackled', 'insanely']
Take note that 'where' is deleted. It is ok since it does not appear anywhere else in the nested list. Each nested list is a sentence. My approach to it is append every first element in the nested list to a newList. Then I compare to see if elements in the newList are in the nested list. I would lowercase the element's in the newList to check. I'm half way done with this program, but I'm running into an error when I try to remove the element from the newList at the end. Once i get the new updated list, I want to delete items from the nestedList that are in the newList. I'd lastly append all the items in the nested list to a newerList and lowercase them. That should do it.
If someone has a more efficient approach I'd gladly listen.
def lowerCaseFirst(L):
newList = []
for nestedList in L:
newList.append(nestedList[0])
print newList
for firstWord in newList:
sum = 0
firstWord = firstWord.lower()
for nestedList in L:
for word in nestedList[1:]:
if firstWord == word:
print "yes"
sum = sum + 1
print newList
if sum >= 1:
firstWord = firstWord.upper()
newList.remove(firstWord)
return newList
Note this code is not finished due to the error in the second to last line
Here is with the newerList (updatedNewList):
def lowerCaseFirst(L):
newList = []
for nestedList in L:
newList.append(nestedList[0])
print newList
updatedNewList = newList
for firstWord in newList:
sum = 0
firstWord = firstWord.lower()
for nestedList in L:
for word in nestedList[1:]:
if firstWord == word:
print "yes"
sum = sum + 1
print newList
if sum >= 1:
firstWord = firstWord.upper()
updatedNewList.remove(firstWord)
return updatedNewList
error message:
Traceback (most recent call last):
File "/Applications/WingIDE.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 1, in <module>
# Used internally for debug sandbox under external interpreter
File "/Applications/WingIDE.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 80, in lowerCaseFirst
ValueError: list.remove(x): x not in list
The error in your first function is because you try to remove an uppercased version of firstWord from newlist where there are no uppercase words (you see that from the printout). Remember that you store a upper/lowercased version of your words in a new variable, but you don't change the contents of the original list.
I still don't understand your approach. You want to do to things as you describe your task; 1) flatten the a lists of lists to a list of elements (always an interesting programming exercise) and 2) remove proper nouns from this list. This means that you have to decide what is a proper noun. You could do that rudimentarily (all non-starting capitalized words, or an exhaustive list), or you could use a POS tagger (see: Finding Proper Nouns using NLTK WordNet). Unless I misunderstand your task completely, you needn't worry about the casing here.
The first task can be solved in many ways. Here is a nice way that illustrates well what actually happenes in the simple case where your list L is a list of lists (and not lists that can be infinitely nested):
def flatten(L):
newList = []
for sublist in L:
for elm in sublist:
newList.append(elm)
return newList
this function you could make into flattenAndFilter(L) by checking each element like this:
PN = ['James', 'Bond']
def flattenAndFilter(L):
newList = []
for sublist in L:
for elm in sublist:
if not elm in PN:
newList.append(elm)
return newList
You might not have such a nice list of PNs, though, then you would have to expand on the checking, as for instance by parsing the sentence and checking the POS tags.

Categories