Matching String in a List of Strings - python

I basically want to create a new list 'T' which will match if each element in the list 'Word' exists as a separate element in the list 'Z'.
ie I want the output of 'T' in the following case to be T = ['Hi x']
Word = ['x']
Z = ['Hi xo xo','Hi x','yoyo','yox']
I tried the following code but it gives me all sentences with words having 'x' in it however I only want the sentences having 'x' as a separate word.
for i in Z:
for v in i:
if v in Word:
print (i)

Just another pythonic way
[phrase for phrase in Z for w in Word if w in phrase.split()]
['Hi x']

You can do it with list comprehension.
>>> [i for i in Z if any (w.lower() ==j.lower() for j in i.split() for w in Word)]
['Hi x']
Edit:
Or you can do:
>>> [i for i in Z for w in Word if w.lower() in map(lambda x:x.lower(),i.split())]
['Hi x']

if you want to print all strings from Z that contain a word from Word:
Word = ['xo']
Z = ['Hi xo xo','Hi x','yoyo','yox']
res = []
for i in Z:
for v in i.split():
if v in Word:
res.append(i)
break
print(res)
Notice the break. Without the break you could get some strings from Z twice, if two words from it would match. Like the xo in the example.
The i.split() expression splits i to words on spaces.

words = ['x']
phrases = ['Hi xo xo','Hi x','yoyo','yox']
for phrase in phrases:
for word in words:
if word in phrase.split():
print(phrase)

If you would store Word as a set instead of list you could use set operations for check. Basically following splits every string on whitespace, constructs set out of words and checks if Word is subset or not.
>>> Z = ['Hi xo xo','Hi x','yoyo','yox']
>>> Word = {'x'}
>>> [s for s in Z if Word <= set(s.split())]
['Hi x']
>>> Word = {'Hi', 'x'}
>>> [s for s in Z if Word <= set(s.split())]
['Hi x']
In above <= is same as set.issubset.

Related

How to remove a word from a list with a specific character in a specific index position

this is what I have so far:
wlist = [word for word in wlist if not any(map(lambda x: x in word, 'c'))]
this code works, however in its current state it will remove all strings from wlist which contain 'c'. I would like to be able to specify an index position. For example if
wlist = ['snake', 'cat', 'shock']
wlist = [word for word in wlist if not any(map(lambda x: x in word, 'c'))]
and I select index position 3 than only 'shock' will be removed since 'shock' is the only string with c in index 3. the current code will remove both 'cat' and 'shock'. I have no idea how to integrate this, I would appreciate any help, thanks.
Simply use slicing:
out = [w for w in wlist if w[3:4] != 'c']
Output: ['snake', 'cat']
Probably you should use regular expressions. How ever I don`t know them )), so just iterates through list of words.
for i in wlist:
try:
if i[3] == 'c':
wlist.remove(i)
except IndexError:
continue
You should check only the 3rd character in your selected string. If you use the [<selected_char>:<selected_char>+1] list slicing then only the selected character will be checked. More about slicing: https://stackoverflow.com/a/509295/11502612
Example code:
checking_char_index = 3
wlist = ["snake", "cat", "shock", "very_long_string_with_c_char", "x", "c"]
out = [word for word in wlist if word[checking_char_index : checking_char_index + 1] != "c"]
print(out)
As you can see, I have extended your list with some corner-case strings and it works as expected as you can see below.
Output:
>>> python3 test.py
['snake', 'cat', 'very_long_string_with_c_char', 'x', 'c']

I am struggling with understanding the following list comprehension

Can please someone write the following list comprehension in simple for loops and statements.
new_words = ' '.join([word for word in line.split() if not
any([phrase in word for phrase in char_list])])
I wrote the above list comprehension in the following code but it doesn't work.
new_list = []
for line in in_list:
for word in line.split():
for phrase in char_list:
if not phrase in word:
new_list.append(word)
return new_list
Thanks
new_words = ' '.join(
[
word for word in line.split()
if not any(
[phrase in word for phrase in char_list]
)
]
)
is more or less equivalent to this:
new_list = []
for word in line.split():
phrases_in_word = []
for phrase in char_list:
# (phrase in word) returns a boolean True or False
phrases_in_word.append(phrase in word)
if not any(phrases_in_word):
new_list.append(word)
new_words = ' '.join(new_list)
new_words = ' '.join([word for word in line.split()
if not any([phrase in word for phrase in char_list])])
is the equivalent of:
lst = []
for word in line.split():
for phrase in char_list:
if phrase in word:
break
else: # word not in ANY phrase
lst.append(word)
new_words = ' '.join(lst)

Split string into pair

What would be the best/easiest way to split a string into pair of word ?
Ex:
string = "This is a string"
Output:
["This is", "is a", "a string"]
>>> import itertools
>>> a, b = itertools.tee('this is a string'.split());
>>> next(b, None)
>>> [' '.join(words) for words in zip(a, b)]
['this is', 'is a', 'a string']
string_list = string.split()
result = [f'{string_list[i] string_list[i+1]}' for i in range(len(string_list) - 1)]
words = str.split()
output = []
i = 1
while i < len(words):
cur_word = words[i]
prev_word = words[i - 1]
output.append(f"{prev_word} {cur_word}")
i += 1
You can use the zip function, with its first argument the whole list of words and its second the list of words without the first word. Since zip aggregates elements from each of the iterables, this will connect each word with its next in the list:
string = "This is a string"
zipped_lst = zip(string.split(), string.split()[1:])
print(list(zipped_lst))
This outputs
[('This', 'is'), ('is', 'a'), ('a', 'string')]

Iterating through list and using remove() doesn't produce desired result

I’m a programming neophyte and would like some assistance in understanding why the following algorithm is behaving in a particular manner.
My objective is for the function to read in a text file containing words (can be capitalized), strip the whitespace, split the items into separate lines, convert all capital first characters to lowercase, remove all single characters (e.g., “a”, “b”, “c”, etc.), and add the resulting words to a list. All words are to be a separate item in the list for further processing.
Input file:
A text file (‘sample.txt’) contains the following data - “a apple b Banana c cherry”
Desired output:
[‘apple’, ‘banana’, ‘cherry’]
In my initial attempt I tried to iterate through the list of words to test if their length was equal to 1. If so, the word was to be removed from the list, with the other words remaining in the list. This resulted in the following, non-desired output: [None, None, None]
filename = ‘sample.txt’
with open(filename) as input_file:
word_list = input_file.read().strip().split(' ')
word_list = [word.lower() for word in word_list]
word_list = [word_list.remove(word) for word in word_list if len(word) == 1]
print(word_list)
Produced non-desired output = [None, None, None]
My next attempt was to instead iterate through the list for words to test if their length was greater than 1. If so, the word was to be added to the list (leaving the single characters behind). The desired output was achieved using this method.
filename = ‘sample.txt’
with open(filename) as input_file:
word_list = input_file.read().strip().split(' ')
word_list = [word.lower() for word in word_list]
word_list = [word for word in word_list if len(word) > 1]
print(word_list)
Produced desired Output = [‘apple’, ‘banana’, ‘cherry’]
My questions are:
Why didn’t the initial code produce the desired result when it seemed to be the most logical and most efficient?
What is the best ‘Pythonic’ way to achieve the desired result?
The reason you got the output you got is
You're removing items from the list as you're looping through it
You are trying to use the output of list.remove (which just modifies the list and returns None)
Your last list comprehension (word_list = [word_list.remove(word) for word in word_list if len(word) == 1]) is essentially equivalent to this:
new_word_list = []
for word in word_list:
if len(word) == 1:
new_word_list.append(word_list.remove(word))
word_list = new_word_list
And as you loop through it this happens:
# word_list == ['a', 'apple', 'b', 'banana', 'c', 'cherry']
# new_word_list == []
word = word_list[0] # word == 'a'
new_word_list.append(word_list.remove(word))
# word_list == ['apple', 'b', 'banana', 'c', 'cherry']
# new_word_list == [None]
word = word_list[1] # word == 'b'
new_word_list.append(word_list.remove(word))
# word_list == ['apple', 'banana', 'c', 'cherry']
# new_word_list == [None, None]
word = word_list[2] # word == 'c'
new_word_list.append(word_list.remove(word))
# word_list == ['apple', 'banana', 'cherry']
# new_word_list == [None, None, None]
word_list = new_word_list
# word_list == [None, None, None]
The best 'Pythonic' way to do this (in my opinion) would be:
with open('sample.txt') as input_file:
file_content = input_file.read()
word_list = []
for word in file_content.strip().split(' '):
if len(word) == 1:
continue
word_list.append(word.lower())
print(word_list)
In your first approach, you are storing the result of word_list.remove(word) in the list which is None. Bcz list.remove() method return nothing but performing action on a given list.
Your second approach is the pythonic way to achieve your goal.
The second attempt is the most pythonic. The first one can still be achieved with the following:
filename = 'sample.txt'
with open(filename) as input_file:
word_list = input_file.read().strip().split(' ')
word_list = [word.lower() for word in word_list]
for word in word_list:
if len(word) == 1:
word_list.remove(word)
print(word_list)
Why didn’t the initial code produce the desired result when it seemed
to be the most logical and most efficient?
It's advised to never alter a list while iterating over it. This is because it is iterating over a view of the initial list and that view will differ from the original.
What is the best ‘Pythonic’ way to achieve the desired result?
Your second attempt. But I'd use a better naming convention and your comprehensions can be combined as you're only making them lowercase in the first one:
word_list = input_file.read().strip().split(' ')
filtered_word_list = [word.lower() for word in word_list if len(word) > 1]

Write a for loop to remove punctuation

I've been tasked with writing a for loop to remove some punctuation in a list of strings, storing the answers in a new list. I know how to do this with one string, but not in a loop.
For example: phrases = ['hi there!', 'thanks!'] etc.
import string
new_phrases = []
for i in phrases:
if i not in string.punctuation
Then I get a bit stuck at this point. Do I append? I've tried yield and return, but realised that's for functions.
You can either update your current list or append the new value in another list. the update will be better because it takes constant space while append takes O(n) space.
phrases = ['hi there!', 'thanks!']
i = 0
for el in phrases:
new_el = el.replace("!", "")
phrases[i] = new_el
i += 1
print (phrases)
will give output: ['hi there', 'thanks']
Give this a go:
import re
new_phrases = []
for word in phrases:
new_phrases.append(re.sub(r'[^\w\s]','', word))
This uses the regex library to turn all punctuation into a 'blank' string. Essentially, removing it
You can use re module and list comprehension to do it in single line:
phrases = ['hi there!', 'thanks!']
import string
import re
new_phrases = [re.sub('[{}]'.format(string.punctuation), '', i) for i in phrases]
new_phrases
#['hi there', 'thanks']
If phrases contains any punctuation then replace it with "" and append to the new_phrases
import string
new_phrases = []
phrases = ['hi there!', 'thanks!']
for i in phrases:
for pun in string.punctuation:
if pun in i:
i = i.replace(pun,"")
new_phrases.append(i)
print(new_phrases)
OUTPUT
['hi there', 'thanks']
Following your forma mentis, I'll do like this:
for word in phrases: #for each word
for punct in string.punctuation: #for each punctuation
w=w.replace(punct,'') # replace the punctuation character with nothing (remove punctuation)
new_phrases.append(w) #add new "punctuationless text" to your output
I suggest you using the powerful translate() method on each string of your input list, which seems really appropriate. It gives the following code, iterating over the input list throug a list comprehension, which is short and easily readable:
import string
phrases = ['hi there!', 'thanks!']
translationRule = str.maketrans({k:"" for k in string.punctuation})
new_phrases = [phrase.translate(translationRule) for phrase in phrases]
print(new_phrases)
# ['hi there', 'thanks']
Or to only allow spaces and letters:
phrases=[''.join(x for x in i if x.isalpha() or x==' ') for i in phrases]
Now:
print(phrases)
Is:
['hi there', 'thanks']
you should use list comprehension
new_list = [process(string) for string in phrases]

Categories