I have a nested list (l1) containing several lists of sentences (i) and I want to remove a particular string from the second element in (i) when the split() method is applied to it. However it seems that only half of them are removed and the rest remain.
This is what I have tried:
for i in l1:
for j in i[1].split():
if j == 'a':
i[1].split().remove(j)
I have also tried to replace (j) with an empty string, but it wasn't helpful either.
example input: [[string1, This is a book], [string2, He is a tall man], ,,,]
example output:
This is book, He is tall man
You can't mutate the string, so your example won't work.
But you can mutate lists, so you could split the string on whitespace, ignore 'a' tokens and join it back together again:
for i in l1:
i[1] = ' '.join(p for p in i[1].split() if p != 'a')
This would eat any extra whitespace in the original string, but I'm assuming that's not a concern for you here.
This following code:
l1 = [
[None, "This is a dog"],
[None, "He is a tall man"],
]
for i in l1:
sentence = i[1]
new_sentence = []
for word in sentence.split():
if word == 'a':
continue
new_sentence.append(word)
i[1] = " ".join(new_sentence)
print(l1)
will result in
[
[None, 'This is dog'],
[None, 'He is tall man']
]
If I understand you properly this should accomplish what you're trying to do.
for i in l1:
if 'a' in i[1]:
i[1] = i[1].replace(' a ', ' ')
A solution a little bit more efficient I think:
word_search = 'a'
for i in l1:
split_sentence = i[1].split()
if word_search in split_sentence:
while word_search in split_sentence:
split_sentence.remove(word_search)
i[1] = ' '.join(split_sentence)
This solution doesn't loop through the sentence and doesn't do the join in case the searched word is not found in the sentence.
This depends on the frequency of the searched word in all the sentences I guess.
Related
Code:
list = ['hello','world']
list2 = ['a','b']
string = 'hello'# should output a
string_fin = ''
for s in string:
for i, j in zip (list, list2):
if s == i:
string_fin += j
print(string_fin)
I want to write hello or world in string = '' and to get the output a or b
I get which is nothing
The reason this is happening is because hello and world have more characters than a and b when I try something that has the same amount of characters as a or b it works
Please help
Thanks
Your program's main loop never runs because string is empty! So your program is basically:
list = ['hello','world']
list2 = ['a','b']
string = ''
string_fin = ''
print(string_fin)
Although based on how you worded your question, it is really hard to understand what you are trying to accomplish, but here is my go.
You have two lists: list1 and list2 (Please do not name your list list as it is a reserved keyword, use list1 instead!)
You want to check whether each word in your string matches with any word in your first list.
If it matches you want to take the corresponding word or letter from your second list, and append it into the string string_fin.
Finally, when you looped through all the words in the list, you print the content of string_fin.
The correct way to do this would be to split your string variable, and get each word stored in it.
string = 'hello or world'
stringWords = string.split()
Now, stringWords contains ['hello', 'or', 'world']. But I think you are not interested in the item or. So you can remove this item from the list, by using remove().
if 'or' in stringWords:
stringWords.remove('or')
Now you have the words that you are interested in. And we want to check whether any word in the first list matches with these words. (Remember, I renamed the first list from list to list1 to prevent any unexpected behavior.)
for word in stringWords:
tempIndex = list1.index(word)
temp = list2[tempIndex]
string_fin += temp
However, using index raises ValueError if a match is not found, so depending on your program logic, you may need to catch an exception and handle it.
The string string_fin will now contain ab or a or b depending on the value inside string.
Now, since you wanted to print something like a or b, you can instead create a list and store the matching words in it, and then, join this list using or separator.
string_fin = (' or ').join(tempList)
A complete program now will look like this:
list1 = ['hello', 'world']
list2 = ['a', 'b']
string = 'hello or world'
tempList = []
stringWords = string.split()
if 'or' in stringWords:
stringWords.remove('or')
for word in stringWords:
tempIndex = list1.index(word)
temp = list2[tempIndex]
tempList.append(temp)
string_fin = ' or '.join(tempList)
print(string_fin)
Better to store your lists as a dictionary, so you can do an easy lookup:
mapping = {'hello':'a', 'world':'b'}
string = 'hello or world'
out = []
for s in string.split():
out.append( mapping.get( s, s ) )
print(' '.join(out))
Purists will note that the for loop can be made into a one-liner:
mapping = {'hello':'a', 'world':'b'}
string = 'hello or world'
out = ' '.join(mapping.get(s,s) for s in string.split())
print(out)
I'm working with a string and a dictionary in Python, trying to loop through the string in order to create a list of the words which appear both in the string and amongst the keys of the dictionary. What I have currently is:
## dictionary will be called "dict" below
sentence = "is this is even really a sentence"
wordsinboth = []
for w in sentence.split():
if w in dict:
wordsinboth += w
Instead of returning a list of words split by whitespace, however, this code returns a list of every character in the sentence.
The same thing occurs even when I attempt to create a list of split words before looping, as seen below:
sentence = "is this is even really a sentence"
wordsinboth = []
sent = sentence.split()
for w in sent:
if w in dict:
wordsinboth += w
I guess I'm not able to specify "if w in dict" and still split by whitespace? Any suggestions on how to fix this?
Use append instead of +=:
sentence = "is this is even really a sentence"
wordsinboth = []
for w in sentence.split():
if w in dict:
wordsinboth.append(w)
The += operator doesn't work as you'd expect:
a = []
myString = "hello"
a.append(myString)
print(a) # ['hello']
b = []
b += myString
print(b) # ['h', 'e', 'l', 'l', 'o']
If you're interested on why this happens, the following questions are a good read:
Why does += behave unexpectedly on lists?
What is the difference between Python's list methods append and extend?
Also, note that using list comprehensions might result in a more elegant solution to your problem:
wordsinboth = [word for word in sentence.split() if word in dict]
You can use += on a list, but you must add a list to it, not a value, otherwise the value gets converted to a list before being added. In your case, the w strings are being converted to a list of all the characters in them (e.g. 'if' => ['i', 'f']). To work around that, make the value into a list by adding [] around it:
for w in sentence.split():
if w in dict:
wordsinboth += [w]
Use list comprehensions it's more shortest and elegant way for your case:
wordsinboth = [word for word in sentence.split() if w in dict]
Problem in your cycle that you have to use append for adding new item to wordsinboth instead of + operator, also please keep in mind that it can create duplicates, if you need uniq items you can wrap your result to set which gives you uniq words.
Like this:
wordsinboth = {word for word in sentence.split() if w in dict}
I'm writing some code that trims down a words in a list of string. if the last character of a word in the string is 't' or 's' it is removed and if the first character is 'x' it is removed.
words = ['bees', 'xerez']
should return:
['bee', 'erez']
So far my solution is:
trim_last = [x[:-1] for x in words if x[-1] == 's' or 't']
I think this trims the last characters fine. I then to trim the first characters if they are 'x' with this line:
trim_first = [x[1:] for x in trim_last if x[0] == 'x']
but this just returns an empty list, can i some how incorporate this into one working line?
[v.lstrip('x').rstrip('ts') for v in words]
You're doing a filter, not a mapping.
The right way would be
trim_first = [x[1:] if x.startswith('x') else x for x in trim_last]
Also, your solution should not return an empty list since the filter would match on the second element
In one step with re.sub() function:
import re
words = ['bees', 'xerez']
result = [re.sub(r'^x|[ts]$', '', w) for w in words]
print(result)
The output:
['bee', 'erez']
Just to chime in - since this is in fact, a mapping:
map(lambda x: x[1:] if x[0] == 'x' else x, words)
If you are looking for a one-liner you can use some arithmetic to play with the list slicing:
words = ['bees', 'xerez', 'xeret']
[w[w[0] == 'x' : len(w) - int(w[-1] in 'st')] for w in words]
# output: ['bee', 'erez', 'ere']
You can try this code:
trim_last = [x.lstrip('x').rstrip('t').rstrip('s') for x in words]
Why you are using two list comprehension for that you can do with one list comprehension :
one line solution:
words = ['bees', 'xerez','hellot','xnewt']
print([item[:-1] if item.endswith('t') or item.endswith('s') else item for item in [item[1:] if item.startswith('x') else item for item in words]])
output:
['bee', 'erez', 'hello', 'new']
Explanation of above list comprehension :
final=[]
for item in words:
sub_list=[]
if item.endswith('t') or item.endswith('s'):
sub_list.append(item[:-1])
else:
sub_list.append(item)
for item in sub_list:
if item.startswith('x'):
final.append(item[1:])
else:
final.append(item)
print(final)
currently I have two list, ListofComments and ListofWords. ListofComments has many words in its element. For example.
ListofComments[0] = 'I love python'
ListofComments[1] = 'I hate python'
but currently i was able to only split it into individual words for the last element of ListofComments. Below is what I have currently.
for x in range(0, 58196):
ListofWords = (re.sub("[^\w]", " ", ListofComments[x]).split())
I understand that perhaps another loop is needed but i can't exactly pinpoint how to go about solving this issue. The desire output would be having these ListofWords[0] = 'I' ListofWords[1] = 'love' ListofWords[2] = 'python' ListofWords[3] = 'I' LstofWords[4] = 'hate' ListofWords[5] = 'python'
I believe your only problem is you're overwriting your ListofWords at every loop iteration, hence why at the end of the loop you only see words from the last element of ListofComments.
Try this:
ListofWords = []
for x in range(0, 58196):
ListofWords.extend(re.sub("[^\w]", " ", ListofComments[x]).split())
EDIT:
As others suggested, you want to make sure you avoid a list out of range error. I didn't want to change the rest of your code, just to make evident what had to be changed, for it to work as you expected.
A simpler (and more robust) way to write the above, would be:
ListofWords = []
for comment in ListofComments:
ListofWords.extend(re.sub("[^\w]", " ", comment).split())
If i understand well, this would solve your problem:
list_of_words = []
my_list = ["i love python3", "i hate python2"]
for sentence in my_list:
words = sentence.split(" ")
for word in words:
list_of_words.append(word)
Your solution has two problems:
ListofWords is overrwritten on every iteration
you may run out of range
Here's my solution
from functools import reduce
# split comments
split_comments = [re.sub("[^\w]", " ", c).split() for c in ListofComments]
# >>> [['I', 'love', 'python'], ['I', 'hate', 'python']]
# flatten list of lists
reduce(lambda x, y: x + y, split_comments)
# >>> ['I', 'love', 'python', 'I', 'hate', 'python']
I have a text file and two lists of strings.
The first list is the keyword list
k = [hi, bob]
The second list is the words I want to replace the keywords with
r = [ok, bye]
I want to take the text file as input, where when k appears, it's replaced with r, thus, "hi, how are you bob" would be changed to "ok, how are you bye"
Let's say you have already parsed your sentence:
sentence = ['hi', 'how', 'are', 'you', 'bob']
What you want to do is to check whether each word in this sentence is present in k. If yes, replace it by the corresponding element in r; else, use the actual word. In other words:
if word in k:
word_index = k.index(word)
new_word = r[word_index]
This can be written in a more concise way:
new_word = r[k.index(word)] if word in k else word
Using list comprehensions, here's how you go about processing the whole sentence:
new_sentence = [r[k.index(word)] if word in k else word for word in sentence]
new_sentence is now equal to ['ok', 'how', 'are', 'you', 'bye'] (which is what you want).
Note that in the code above we perform two equivalent search operations: word in k and k.index(word). This is inefficient. These two operations can be reduced to one by catching exceptions from the index method:
def get_new_word(word, k, r):
try:
word_index = k.find(word)
return r[word_index]
except ValueError:
return word
new_sentence = [get_new_word(word, k, r) for word in sentence]
Now, you should also note that searching for word in sentence is a search with O(n) complexity (where n is the number of keywords). Thus the complexity of this algorithm is O(n.m) (where is the sentence length). You can reduce this complexity to O(m) by using a more appropriate data structure, as suggested by the other comments. This is left as an exercise :-p
I'll assume you've got the "reading string from file" part covered, so about that "replacing multiple strings" part: First, as suggested by Martijn, you can create a dictionary, mapping keys to replacements, using dict and zip.
>>> k = ["hi", "bob"]
>>> r = ["ok", "bye"]
>>> d = dict(zip(k, r))
Now, one way to replace all those keys at once would be to use a regular expression, being a disjunction of all those keys, i.e. "hi|bob" in your example, and using re.sub with a replacement function, looking up the respective key in that dictionary.
>>> import re
>>> re.sub('|'.join(k), lambda m: d[m.group()], "hi, how are you bob")
'ok, how are you bye'
Alternatively, you can just use a loop to replace each key-replacement pair one after the other:
s = "hi, how are you bob"
for (x, y) in zip(k, r):
s = s.replace(x, y)