I want to input a letter and i want to return all the words that contain that letter. For example:
String: "I saw a frog in my garden"
input: g
output: frog, garden
How could make this in Python?
I don't know what you are talking about regarding dictionaries (you may misunderstand them)- but I would just split up the word and then check if the letter was in each one, within a list comprehension.
>>> String = "I saw a frog in my garden"
>>> letter = 'g'
>>> [w for w in String.split() if letter in w]
['frog', 'garden']
That seems to be what you want.
It is quite useful to know which letter the list represents:
contains = {}
contains[letter] = [w for w in String.split() if letter in w]
I am assuming that you have split the string as a list of words and created a dictionary using those words as keys. given that, the following function takes a dictionary and a character and returns a list of keys on that dictionary which have that character:
def keys_have_char(dict, char):
return [key for key in dict.keys() if char in key]
Notice that I haven't added any checks, so this assumes that dict is indeed a dictionary and will work not only with single chars, but with any substrings as well.
Related
I have long list of strings called words. All the strings have the same length which is 5. For any i in {0,...,4} we can check to see if the i'th letter of a string is also the i'th letter of any other string in the list. We call a letter at the i'th position unique if it is not.
I would like to remove all strings from my list for which there exists any i for which the i'th letter is unique.
As a very simple example consider: words = ["apple", "amber", "bpple", "bmber", "appld"]. The string "appld" should be removed because d is unique at the 4th position.
Is there a neat way to do this?
Using collections.Counter and the zip(*...) transposition idiom:
from collections import Counter
# counts for every index
bypos = [*map(Counter, zip(*words))]
# disallow count 1 for any letter x in its respective position
words = [w for w in words if all(c[x]!=1 for x, c in zip(w, bypos))]
# ['apple', 'amber', 'bpple', 'bmber']
Note that is better to rebuild the list in a single iteration than to remove elements repeatedly.
Some docs on the utils used here:
collections.Counter
map
zip
all
unpacking operator *
this can be done in O(NlogN) but space inefficient
d={}
for j,w in enumerate(words):
for i,c in enumerate(w):
if (i,c) in d:
d[(i,c)].append(j)
else:
d[(i,c)]=[j]
for i in reversed(list(set([v[0] for v in d.values() if len(v)==1]))):
words.pop(i)
print(words)
community.
I need to write a function that goes through a string and checks if each word exists in a list, if the word exists in the (Remove list) it should remove that word if not leave it alone.
i wrote this:
def remove_make(x):
a = x.split()
for word in a:
if word in remove: # True
a = a.remove(word)
else:
pass
return a
But it returns back the string with the (Remove) word still in there. Any idea how I can achieve this?
A more terse way of doing this would be to form a regex alternation based on the list of words to remove, and then do a single regex substitution:
inp = "one two three four"
remove = ['two', 'four']
regex = r'\s*(?:' + r'|'.join(remove) + ')\s*'
out = re.sub(regex, ' ', inp).strip()
print(out) # prints 'one three'
You can try something more simple:
import re
remove_list = ['abc', 'cde', 'edf']
string = 'abc is walking with cde, wishing good luck to edf.'
''.join([x for x in re.split(r'(\W+)', string) if x not in remove_list])
And the result would be:
' is walking with , wishing good luck to .'
The important part is the last line:
''.join([x for x in re.split(r'(\W+)', string) if x not in remove_list])
What it does:
You are converthing the string to list of words with re.split(r'(\W+)', string), preserving all the whitespaces and punctuation as list items.
You are creating another list with list comprehension, filtering all the items, which are not in remove_list
You are converting the result list back to string with str.join()
The BNF notation for list comprehensions and a little bit more information on them may be found here
PS: Of course, you may make this a little bit more readable if you break the one-liner into peaces and assign the result of re.split(r'(\W+)', string) to a variable and decouple the join and the comprehension.
You can create a new list without the words you want to remove and then use join() function to concatenate all the words in that list. Try
def remove_words(string, rmlist):
final_list = []
word_list = string.split()
for word in word_list:
if word not in rmlist:
final_list.append(word)
return ' '.join(final_list)
list.remove(x) returns None and modifies the list in-place by removing x it exists inside the list. When you do
a = a.remove(word)
you will be effectively storing None in a and this would give an exception in the next iteration when you again do a.remove(word) (None.remove(word) is invalid), but you don’t get that either since you immediately return after the conditional (which is wrong, you need to return after the loop has finished, outside its scope). This is how your function should look like (without modifying a list while iterating over it):
remove_words = ["abc", ...] # your list of words to be removed
def remove_make(x):
a = x.split()
temp = a[:]
for word in temp:
if word in remove_words: # True
a.remove(word)
# no need of 'else' also, 'return' outside the loop's scope
return " ".join(a)
I'm working with a string and a dictionary in Python, trying to loop through the string in order to create a list of the words which appear both in the string and amongst the keys of the dictionary. What I have currently is:
## dictionary will be called "dict" below
sentence = "is this is even really a sentence"
wordsinboth = []
for w in sentence.split():
if w in dict:
wordsinboth += w
Instead of returning a list of words split by whitespace, however, this code returns a list of every character in the sentence.
The same thing occurs even when I attempt to create a list of split words before looping, as seen below:
sentence = "is this is even really a sentence"
wordsinboth = []
sent = sentence.split()
for w in sent:
if w in dict:
wordsinboth += w
I guess I'm not able to specify "if w in dict" and still split by whitespace? Any suggestions on how to fix this?
Use append instead of +=:
sentence = "is this is even really a sentence"
wordsinboth = []
for w in sentence.split():
if w in dict:
wordsinboth.append(w)
The += operator doesn't work as you'd expect:
a = []
myString = "hello"
a.append(myString)
print(a) # ['hello']
b = []
b += myString
print(b) # ['h', 'e', 'l', 'l', 'o']
If you're interested on why this happens, the following questions are a good read:
Why does += behave unexpectedly on lists?
What is the difference between Python's list methods append and extend?
Also, note that using list comprehensions might result in a more elegant solution to your problem:
wordsinboth = [word for word in sentence.split() if word in dict]
You can use += on a list, but you must add a list to it, not a value, otherwise the value gets converted to a list before being added. In your case, the w strings are being converted to a list of all the characters in them (e.g. 'if' => ['i', 'f']). To work around that, make the value into a list by adding [] around it:
for w in sentence.split():
if w in dict:
wordsinboth += [w]
Use list comprehensions it's more shortest and elegant way for your case:
wordsinboth = [word for word in sentence.split() if w in dict]
Problem in your cycle that you have to use append for adding new item to wordsinboth instead of + operator, also please keep in mind that it can create duplicates, if you need uniq items you can wrap your result to set which gives you uniq words.
Like this:
wordsinboth = {word for word in sentence.split() if w in dict}
I need to find a way to check if given characters are contained in any of the words of a very long list.
I suppose you could do it by checking every indexes of the words in the list, a bit like so:
for i in list:
if i[0] == 'a' or 'b':
found_words.append(i)
if i[1] == 'a' or 'b':
found_words.append(i)
But this is not a very stylish and not a very efficient way of doing it.
Thanks for your help
A more understandable way of doing this is the following:
character='e'
for i in list:
if character in i:
found_words.append(i)
If you want to match characters in lists, you can use regular expressions.
import re
for i in lst:
re.match(str,i) #returns "true", use in conditionals
Replace "str" with the characters you want to check for, e.g "[abcde]", which matches "a","b","c","d", or "e" in any word, or "[abcde][pqrst]" which matches any combination of "ap", "at", "eq", etc. Do so with a variable so you can change it far more easily.
You could do the following:
check = set('ab').intersection # the letters to check against
lst = [...] # the words, do not shadow the built-in 'list'
found_words = [w for w in lst if check(w)]
or shorter:
found_words = list(filter(check, lst))
I have a text file and two lists of strings.
The first list is the keyword list
k = [hi, bob]
The second list is the words I want to replace the keywords with
r = [ok, bye]
I want to take the text file as input, where when k appears, it's replaced with r, thus, "hi, how are you bob" would be changed to "ok, how are you bye"
Let's say you have already parsed your sentence:
sentence = ['hi', 'how', 'are', 'you', 'bob']
What you want to do is to check whether each word in this sentence is present in k. If yes, replace it by the corresponding element in r; else, use the actual word. In other words:
if word in k:
word_index = k.index(word)
new_word = r[word_index]
This can be written in a more concise way:
new_word = r[k.index(word)] if word in k else word
Using list comprehensions, here's how you go about processing the whole sentence:
new_sentence = [r[k.index(word)] if word in k else word for word in sentence]
new_sentence is now equal to ['ok', 'how', 'are', 'you', 'bye'] (which is what you want).
Note that in the code above we perform two equivalent search operations: word in k and k.index(word). This is inefficient. These two operations can be reduced to one by catching exceptions from the index method:
def get_new_word(word, k, r):
try:
word_index = k.find(word)
return r[word_index]
except ValueError:
return word
new_sentence = [get_new_word(word, k, r) for word in sentence]
Now, you should also note that searching for word in sentence is a search with O(n) complexity (where n is the number of keywords). Thus the complexity of this algorithm is O(n.m) (where is the sentence length). You can reduce this complexity to O(m) by using a more appropriate data structure, as suggested by the other comments. This is left as an exercise :-p
I'll assume you've got the "reading string from file" part covered, so about that "replacing multiple strings" part: First, as suggested by Martijn, you can create a dictionary, mapping keys to replacements, using dict and zip.
>>> k = ["hi", "bob"]
>>> r = ["ok", "bye"]
>>> d = dict(zip(k, r))
Now, one way to replace all those keys at once would be to use a regular expression, being a disjunction of all those keys, i.e. "hi|bob" in your example, and using re.sub with a replacement function, looking up the respective key in that dictionary.
>>> import re
>>> re.sub('|'.join(k), lambda m: d[m.group()], "hi, how are you bob")
'ok, how are you bye'
Alternatively, you can just use a loop to replace each key-replacement pair one after the other:
s = "hi, how are you bob"
for (x, y) in zip(k, r):
s = s.replace(x, y)