Splitting list element into individual single elements in another list - python

currently I have two list, ListofComments and ListofWords. ListofComments has many words in its element. For example.
ListofComments[0] = 'I love python'
ListofComments[1] = 'I hate python'
but currently i was able to only split it into individual words for the last element of ListofComments. Below is what I have currently.
for x in range(0, 58196):
ListofWords = (re.sub("[^\w]", " ", ListofComments[x]).split())
I understand that perhaps another loop is needed but i can't exactly pinpoint how to go about solving this issue. The desire output would be having these ListofWords[0] = 'I' ListofWords[1] = 'love' ListofWords[2] = 'python' ListofWords[3] = 'I' LstofWords[4] = 'hate' ListofWords[5] = 'python'

I believe your only problem is you're overwriting your ListofWords at every loop iteration, hence why at the end of the loop you only see words from the last element of ListofComments.
Try this:
ListofWords = []
for x in range(0, 58196):
ListofWords.extend(re.sub("[^\w]", " ", ListofComments[x]).split())
EDIT:
As others suggested, you want to make sure you avoid a list out of range error. I didn't want to change the rest of your code, just to make evident what had to be changed, for it to work as you expected.
A simpler (and more robust) way to write the above, would be:
ListofWords = []
for comment in ListofComments:
ListofWords.extend(re.sub("[^\w]", " ", comment).split())

If i understand well, this would solve your problem:
list_of_words = []
my_list = ["i love python3", "i hate python2"]
for sentence in my_list:
words = sentence.split(" ")
for word in words:
list_of_words.append(word)

Your solution has two problems:
ListofWords is overrwritten on every iteration
you may run out of range
Here's my solution
from functools import reduce
# split comments
split_comments = [re.sub("[^\w]", " ", c).split() for c in ListofComments]
# >>> [['I', 'love', 'python'], ['I', 'hate', 'python']]
# flatten list of lists
reduce(lambda x, y: x + y, split_comments)
# >>> ['I', 'love', 'python', 'I', 'hate', 'python']

Related

I Would Like To Replace A Word With A Letter In Python

Code:
list = ['hello','world']
list2 = ['a','b']
string = 'hello'# should output a
string_fin = ''
for s in string:
for i, j in zip (list, list2):
if s == i:
string_fin += j
print(string_fin)
I want to write hello or world in string = '' and to get the output a or b
I get which is nothing
The reason this is happening is because hello and world have more characters than a and b when I try something that has the same amount of characters as a or b it works
Please help
Thanks
Your program's main loop never runs because string is empty! So your program is basically:
list = ['hello','world']
list2 = ['a','b']
string = ''
string_fin = ''
print(string_fin)
Although based on how you worded your question, it is really hard to understand what you are trying to accomplish, but here is my go.
You have two lists: list1 and list2 (Please do not name your list list as it is a reserved keyword, use list1 instead!)
You want to check whether each word in your string matches with any word in your first list.
If it matches you want to take the corresponding word or letter from your second list, and append it into the string string_fin.
Finally, when you looped through all the words in the list, you print the content of string_fin.
The correct way to do this would be to split your string variable, and get each word stored in it.
string = 'hello or world'
stringWords = string.split()
Now, stringWords contains ['hello', 'or', 'world']. But I think you are not interested in the item or. So you can remove this item from the list, by using remove().
if 'or' in stringWords:
stringWords.remove('or')
Now you have the words that you are interested in. And we want to check whether any word in the first list matches with these words. (Remember, I renamed the first list from list to list1 to prevent any unexpected behavior.)
for word in stringWords:
tempIndex = list1.index(word)
temp = list2[tempIndex]
string_fin += temp
However, using index raises ValueError if a match is not found, so depending on your program logic, you may need to catch an exception and handle it.
The string string_fin will now contain ab or a or b depending on the value inside string.
Now, since you wanted to print something like a or b, you can instead create a list and store the matching words in it, and then, join this list using or separator.
string_fin = (' or ').join(tempList)
A complete program now will look like this:
list1 = ['hello', 'world']
list2 = ['a', 'b']
string = 'hello or world'
tempList = []
stringWords = string.split()
if 'or' in stringWords:
stringWords.remove('or')
for word in stringWords:
tempIndex = list1.index(word)
temp = list2[tempIndex]
tempList.append(temp)
string_fin = ' or '.join(tempList)
print(string_fin)
Better to store your lists as a dictionary, so you can do an easy lookup:
mapping = {'hello':'a', 'world':'b'}
string = 'hello or world'
out = []
for s in string.split():
out.append( mapping.get( s, s ) )
print(' '.join(out))
Purists will note that the for loop can be made into a one-liner:
mapping = {'hello':'a', 'world':'b'}
string = 'hello or world'
out = ' '.join(mapping.get(s,s) for s in string.split())
print(out)

Creating a list of all words that are both in a given string and in a given dictionary

I'm working with a string and a dictionary in Python, trying to loop through the string in order to create a list of the words which appear both in the string and amongst the keys of the dictionary. What I have currently is:
## dictionary will be called "dict" below
sentence = "is this is even really a sentence"
wordsinboth = []
for w in sentence.split():
if w in dict:
wordsinboth += w
Instead of returning a list of words split by whitespace, however, this code returns a list of every character in the sentence.
The same thing occurs even when I attempt to create a list of split words before looping, as seen below:
sentence = "is this is even really a sentence"
wordsinboth = []
sent = sentence.split()
for w in sent:
if w in dict:
wordsinboth += w
I guess I'm not able to specify "if w in dict" and still split by whitespace? Any suggestions on how to fix this?
Use append instead of +=:
sentence = "is this is even really a sentence"
wordsinboth = []
for w in sentence.split():
if w in dict:
wordsinboth.append(w)
The += operator doesn't work as you'd expect:
a = []
myString = "hello"
a.append(myString)
print(a) # ['hello']
b = []
b += myString
print(b) # ['h', 'e', 'l', 'l', 'o']
If you're interested on why this happens, the following questions are a good read:
Why does += behave unexpectedly on lists?
What is the difference between Python's list methods append and extend?
Also, note that using list comprehensions might result in a more elegant solution to your problem:
wordsinboth = [word for word in sentence.split() if word in dict]
You can use += on a list, but you must add a list to it, not a value, otherwise the value gets converted to a list before being added. In your case, the w strings are being converted to a list of all the characters in them (e.g. 'if' => ['i', 'f']). To work around that, make the value into a list by adding [] around it:
for w in sentence.split():
if w in dict:
wordsinboth += [w]
Use list comprehensions it's more shortest and elegant way for your case:
wordsinboth = [word for word in sentence.split() if w in dict]
Problem in your cycle that you have to use append for adding new item to wordsinboth instead of + operator, also please keep in mind that it can create duplicates, if you need uniq items you can wrap your result to set which gives you uniq words.
Like this:
wordsinboth = {word for word in sentence.split() if w in dict}

remove a string in list

I have a nested list (l1) containing several lists of sentences (i) and I want to remove a particular string from the second element in (i) when the split() method is applied to it. However it seems that only half of them are removed and the rest remain.
This is what I have tried:
for i in l1:
for j in i[1].split():
if j == 'a':
i[1].split().remove(j)
I have also tried to replace (j) with an empty string, but it wasn't helpful either.
example input: [[string1, This is a book], [string2, He is a tall man], ,,,]
example output:
This is book, He is tall man
You can't mutate the string, so your example won't work.
But you can mutate lists, so you could split the string on whitespace, ignore 'a' tokens and join it back together again:
for i in l1:
i[1] = ' '.join(p for p in i[1].split() if p != 'a')
This would eat any extra whitespace in the original string, but I'm assuming that's not a concern for you here.
This following code:
l1 = [
[None, "This is a dog"],
[None, "He is a tall man"],
]
for i in l1:
sentence = i[1]
new_sentence = []
for word in sentence.split():
if word == 'a':
continue
new_sentence.append(word)
i[1] = " ".join(new_sentence)
print(l1)
will result in
[
[None, 'This is dog'],
[None, 'He is tall man']
]
If I understand you properly this should accomplish what you're trying to do.
for i in l1:
if 'a' in i[1]:
i[1] = i[1].replace(' a ', ' ')
A solution a little bit more efficient I think:
word_search = 'a'
for i in l1:
split_sentence = i[1].split()
if word_search in split_sentence:
while word_search in split_sentence:
split_sentence.remove(word_search)
i[1] = ' '.join(split_sentence)
This solution doesn't loop through the sentence and doesn't do the join in case the searched word is not found in the sentence.
This depends on the frequency of the searched word in all the sentences I guess.

Python possible list comprehension

I have a text file and two lists of strings.
The first list is the keyword list
k = [hi, bob]
The second list is the words I want to replace the keywords with
r = [ok, bye]
I want to take the text file as input, where when k appears, it's replaced with r, thus, "hi, how are you bob" would be changed to "ok, how are you bye"
Let's say you have already parsed your sentence:
sentence = ['hi', 'how', 'are', 'you', 'bob']
What you want to do is to check whether each word in this sentence is present in k. If yes, replace it by the corresponding element in r; else, use the actual word. In other words:
if word in k:
word_index = k.index(word)
new_word = r[word_index]
This can be written in a more concise way:
new_word = r[k.index(word)] if word in k else word
Using list comprehensions, here's how you go about processing the whole sentence:
new_sentence = [r[k.index(word)] if word in k else word for word in sentence]
new_sentence is now equal to ['ok', 'how', 'are', 'you', 'bye'] (which is what you want).
Note that in the code above we perform two equivalent search operations: word in k and k.index(word). This is inefficient. These two operations can be reduced to one by catching exceptions from the index method:
def get_new_word(word, k, r):
try:
word_index = k.find(word)
return r[word_index]
except ValueError:
return word
new_sentence = [get_new_word(word, k, r) for word in sentence]
Now, you should also note that searching for word in sentence is a search with O(n) complexity (where n is the number of keywords). Thus the complexity of this algorithm is O(n.m) (where is the sentence length). You can reduce this complexity to O(m) by using a more appropriate data structure, as suggested by the other comments. This is left as an exercise :-p
I'll assume you've got the "reading string from file" part covered, so about that "replacing multiple strings" part: First, as suggested by Martijn, you can create a dictionary, mapping keys to replacements, using dict and zip.
>>> k = ["hi", "bob"]
>>> r = ["ok", "bye"]
>>> d = dict(zip(k, r))
Now, one way to replace all those keys at once would be to use a regular expression, being a disjunction of all those keys, i.e. "hi|bob" in your example, and using re.sub with a replacement function, looking up the respective key in that dictionary.
>>> import re
>>> re.sub('|'.join(k), lambda m: d[m.group()], "hi, how are you bob")
'ok, how are you bye'
Alternatively, you can just use a loop to replace each key-replacement pair one after the other:
s = "hi, how are you bob"
for (x, y) in zip(k, r):
s = s.replace(x, y)

Delete strings from a list who do not contain certain words?

For example something like this (although it doesn't work):
list1 = ['hello, how are you?', 'well, who are you', 'what do you want']
desiredwords = ['hello', 'well']
list2 = [x for x in list1 if any(word in list1 for word in desiredwords) in x]
print list2
['hello, how are you?', 'well, who are you'] #Desired output
Anyone know how to do this?
You're calling any on the wrong generator expression. You want:
list2 = [x for x in list1 if any(word in x for word in desiredwords)]
The difference here is that in your question you're evaluating whether or not any word in your desired words list is a member of list1 (they're not), and then testing whether False (the output of your any call) is in the element of list that you're testing. This of course does not work.
My any version instead checks words in the desired words list against the element under consideration, using the output of any to filter the list.
Note that in on strings does substring matching - this approach will count "oilwell" as a matching "well". If you want that behavior, fine. If not, it gets harder.

Categories