Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
The desired result is either a function or a way to find where is a sentence within a list of strings.
sentence = 'The cat went to the pool yesterday'
structure = ['The cat went,', 'to the pool yesterday.','I wonder if you realize the effect you are having on me. It hurts. A lot.']
for example
def findsentence(sentence, list of strings):
# do something to get the output, vec of positions to find the sentence in hte string list
return output
findsentence(sentence, structure)
> (0,1) # beacuse the phrase is splitted in the list...
Caution!!
The challenge it is not to find exactly the sentence. Look at the example, this sentence is part of sentence position 0 and part in structure postition 1.
So this is not a simple, string manipulation problem.
Use the following :
sentence = "foo sam bar go"
structure = ["rq", "foo sam", "bar go", "ca", "da"]
def findsentencelist(sentence, list_of_strings):
l = []
for item in list_of_strings:
if item in sentence:
l.append(list_of_strings.index(item))
return l
print str(findsentencelist(sentence, structure))
Hopefully this will help you, Yahli.
EDIT :
There is a problem with your variables.
Your sentence MUST be a string - not a list.
Edit your variables and try this function again :)
SECOND EDIT:
I think I've finally understood what you're trying to do. Let me know if this one works better.
THIRD EDIT:
Jesus, Hopefully this one would solve your problem. Let me know if it did the trick :)
I just remove punctuations on structure to make it work:
sentence = 'The cat went to the pool yesterday'
structure = ['The cat went,', 'to the pool yesterday.','I wonder if you realize the effect you are having on me. It hurts. A lot.','Life is too short as it is. In short, she had a cushion job.']
import string
def findsentence(sentence, list_of_strings):
return tuple(i for i, s in enumerate(list_of_strings) if s.translate(None, string.punctuation) in sentence)
print findsentence(sentence, structure)
# (0, 1)
After removing the punctuation. You can use this code to get the index ,
for i,j in enumerate(structure):
if j in sentence:
print(i)
Hope this solves your problems. There are quite other solutions as python is flexible.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have a string that looks like:
str_in = "Lemons: J2020, M2021. Eat by 9/03/28
Strawberries: N2023, O2024. Buy by 10/10/20"
How do I get just "J2020, M2021, N2023, O2024"?
What I have so far is very hardcoded. It is:
str_in.replace("Lemon:","")
str_in.replace("Strawberries:", "")
str_in.replace("Buy by")
I don't know how to get rid of the date if the date changes from the number specified. Is there a RegEx form I could use?
Based on your original post and your follow-up comments, you can explicitly fetch the strings you want to keep by using this regex: \b[A-Z]+\d+\b. It allows for 1 or more letters followed by 1 or more digits, bounded as a single word. To test it and other regexes in the future, use this great online tool.
The findall() method on the regex class is best used here because it will return all instances of this pattern. For more on findall() and other kinds of matching methods, check out this tutorial.
Putting all that together, the code would be:
values = re.findall(r'\b[A-Z]+\d+\b', str_in)
Be sure to import re first.
I just saw your edited question, so, here's my edited answer
import re
re_pattern = re.compile(r'(\w+),\s(\w+)\.')
data = [ 'Lemons: J2020, M2021. Eat by 9/03/28',
'Strawberries: N2023, O2024. Buy by 10/10/20',
'Peaches: N12345, O123456. Buy by 10/10/20'
]
for line in data:
match = re_pattern.search(line)
if match:
print(match.group(1), match.group(2))
import re
string = "Lemons: J2020, M2021. Eat by 9/03/28 Strawberries: N2023, O2024. Buy by 10/10/20"
array = re.findall(r"\b[A-Z]\d{4}\b", string)
result = ','.join(array)
The result string is "J2020, M2021, N2023, O2024"
The array is ['J2020', 'M2021', 'N2023', 'O2024']
The regex matches the possibility of having 1 OR 2 chars in the begining of the required text an then matches the later portions of the digits. I think the OP has the requisite information to make a test on the basis of this information.
import re
str_in = "Lemons: J2020, M2021. Eat by 9/03/28 \
Strawberries: N2023, O2024. Buy by 10/10/20"
result = re.findall(r'([A-Z]{1,2}\d+)', str_in)
print(result)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I've been looking into developing a discord bot that can reply to messages by reading their contents and checking if they appear in a list.
My problem is, I need to find a reliable way of getting python to look for certain words from a text, see if they appear in the given list and output the words that are detected.
I've managed to get it working somewhat myself with the following code:
if any(word in text in list):
print("Word Spotted")
I would really apreciate some help.
Here's some code that does something like what you're describing. But really it sounds like you need to spend a significant amount of time working through some basic Python tutorials before you will be able to implement this.
import re
key_words = set(['foo', 'bar', 'baz'])
typed_str = 'You are such a Foo BAR!'
print key_words & set(re.findall('[a-z]+', typed_str.lower()))
I'm not sure exactly what is being asked but somethings to consider (in no particular order) if you are building a bot that is taking in raw user input.
capitalization sensitivity
spell check
understanding intent simplistically
If your environment allows access to libraries you might consider checking out TextBlob. The following commands will give you the functionality needed for the example below.
pip install textblob
python -m textblob.download_corpora
core function
from textblob import TextBlob, Word
import copy
def score_intent(rawstring,keywords,weights=None,threshold=0.01,debug=False):
"""
rawstring: string of text with words that you want to detect
keywords: list of words that you are looking for
weights: (optional) dictionary with relative weights of words you want
threshold: spellcheck confidence threshold
debug: boolean for extra print statements to help debug
"""
allwords = TextBlob(rawstring).words
allwords = [w.upper() for w in allwords]
keywords = [k.upper() for k in keywords]
processed_input_as_list = spellcheck_subject_matter_specific(rawstring,keywords,threshold=threshold,debug=debug)
common_words = intersection(processed_input_as_list,keywords)
intent_score = len(common_words)
if weights:
for special_word in weights.keys():
if special_word.upper() in common_words:
# the minus one is so we dont double count a word.
intent_score = intent_score + weights[special_word] -1
if debug:
print "intent score: %s" %intent_score
print "words of interest found in text: {}".format(common_words)
# you could return common_words and score intent based on the list.
# return common_words, intent_score
return common_words
utilities for intersection & spellchecking
def intersection(a,b):
"""
a and b are lists
function returns a list that is the intersection of the two
"""
return list(set(a)&set(b))
def spellcheck_subject_matter_specific(rawinput,subject_matter_vector,threshold=0.01,capitalize=True,debug=False):
"""
rawinput: all the text that you want to check for spelling
subject_matter_vector: only the words that are worth spellchecking for (since the function can be sort of sensitive it might correct words that you don't want to correct)
threshold: the spell check confidence needed to update the word to the correct spelling
capitalize: boolean determining if you want the return string to be capitalized.
"""
new_input = copy.copy(rawinput)
for w in TextBlob(rawinput).words:
spellchecked_vec = w.spellcheck()
if debug:
print "Word: %s" %w
print "Spellchecked Guesses & Confidences: %s" %spellchecked_vec
print "Only spellchecked confidences greater than {} and in this list {} will be included".format(threshold,subject_matter_vector)
corrected_words = [z[0].upper() for z in spellchecked_vec if z[1] > threshold]
important_words = intersection(corrected_words,subject_matter_vector)
for new_word in important_words:
new_input = new_input + ' ' + new_word
inputBlob = TextBlob(new_input)
processed_input = inputBlob.words
if capitalize:
processed_input = [word.upper() for word in processed_input]
return processed_input
Usage Example
discord_str = "Hi, i want to talk about codee and pYtHon"
words2detect = ["python","code"]
score_intent(rawstring=discord_str,keywords=words2detect,threshold=0.01,debug=True)
output
intent score: 2
words of interest found in text: ['PYTHON', 'CODE']
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
I want to improve my code's performance and running time, looking for write my loops better.
For example, I have a dictionary that contains words as keys, and their translation in Spanish as values.
{
'Hello' : 'Hola',
'Goodbye' : 'Adios',
'Cheese' : 'Queso'
}
I also have a given English sentence, and I want to iterate over any word in my dict and replace it with the Spanish translation.
For this scenario I consider that up to one word could be exist in the given sentence.
I wrote a basic code that do that, but I am not sure that it is best practice:
words_list = {
'Hello' : 'Hola',
'Goodbye' : 'Adios',
'Cheese' : 'Queso'
}
sentence = "Hello, I want to talk Spanish"
for english_word in words_list.keys():
if english_word in sentence:
sentence = sentence.replace(english_word, words_list[english_word])
break
print sentence
How can I write it better?
Thanks!
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I have values that look like this
<type 'str'>
['zeven', 'nul', 'zeven', 'een', 'pieter', 'marie']
What I would like to do now is to loop over all the elements. However if I do like this:
x = ['zeven', 'nul', 'zeven', 'een', 'pieter', 'marie']
for word in x:
print(x)
I get:
[
'
z
e
Any thoughts how I can just get the values (like zeven, nul... etc...)
You are almost there, you should not print x but you should print word for each word in the list.
x = ['zeven', 'nul', 'zeven', 'een', 'pieter', 'marie']
for word in x:
print(word)
this will give you the following output:
zeven
nul
zeven
een
pieter
marie
It looks like you have something like this:
x = "['zeven', 'nul', 'zeven', 'een', 'pieter', 'marie']"
So you will need to use eval():
for word in eval(x):
print word
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I have a string of text
text = u"Hey, there, hope you are doing good?????? or maybe not?"
and a token version using spacy, I'm using spacy because I want to be able to use its other features like part of speech tagging, lemmatization and so on. The problem I'd love to solve is removing stop words like ['?',',',you'] from the token. The tokenized version of token is saved in toks
token = nlp(text)
toks = []
for t in token:
toks.append(t.lower_)
I was thinking of using multiple while loops like this
while "?" in token.text:
toks.remove("?")
while "," in token.text:
toks.remove(",")
while "you" in token.text:
toks.remove("you")
but I keep getting ValueError: list.remove(x): x not in list which is perfectly understandable, as it keeps removing until there is nothing to remove which thereby leads to an error.
However I found a way to handle the error using
while True:
try:
if '?' in tokens.text:
toks.remove('?')
except:
try:
if ',' in tokens.text:
toks.remove(',')
except:
try:
if 'you' in tokens.text:
toks.remove('you')
except:
break
I'm not getting the error any more, but I feel like there should be a better way to solve the problem without nested loops. Can you suggest a cleaner way?
Since you seem to want to exclude all tokens from a given set of tokens, it's easier to just ignore them while creating the toks list:
from spacy.en import English
unwanted_tokens = {'?', ',', 'you'}
text = u"Hey, there, hope you are doing good?????? or maybe not?"
nlp = English()
tokens = nlp(text)
toks = []
for t in tokens:
if t.lower_ not in unwanted_tokens:
toks.append(t.lower_)
>>> toks
[u'hey', u'there', u'hope', u'are', u'doing', u'good', u' ', u'or', u'maybe', u'not']
The for loop could be replaced by a list comprehension:
toks = [t.lower_ for t in tokens if t.lower_ not in unwanted_tokens]
If, for reasons that you don't show in your question, you must remove the tokens after toks has been created, then you can just use a list comprehension:
toks = [t for t in toks if t not in unwanted_tokens]
Use the str.replace method, with the empty string as the new string.
for target in ['?', ',', 'you']:
text = text.replace(target, '')
What this does is loop through items that need to be replaced and inserts empty strings every time it sees that string