The output of sentences extracted by key words

The output of sentences extracted by key words - python

I am new to Python. I have trouble figuring out the format of the extracted sentences by several key words. There are several sentences extracted. How to convert the output of several sentences to one string?
For example:
search_keywords=['my family','love my']
text = "my family is good. I love my family. I am happy."
sentences = text.split(".")
for sentence in sentences:
if (any(map(lambda word: word in sentence, search_keywords))):
print (sentence)
count = len(sentence.split())
print(count)
The output is:
my family is good
4
I love my family
4
How to combine the two extracted sentences as one string, so the total count equals 8 like the following:
my family is good. I love my family.
8
Any help is appreciated.

Let me correct your python code
#your data
search_keywords=['my family','love my']
text = "my family is good. I love my family. I am happy."
sentences = text.split(".")
#initialise
total_count = 0
final_sentence = ""
#every sentences
for sentence in sentences:
if (any(map(lambda word: word in sentence, search_keywords))):
#add the count to total_count
total_count += len(sentence.split())
#add the sentence to final sentence
final_sentence += sentence+'.'
#print the final_sentence and total_count
print(final_sentence)
print(total_count)

How about this:
result = []
result_count = 0
for sentence in sentences:
if (any(map(lambda word: word in sentence, search_keywords))):
result.append(sentence)
result_count += len(sentence.split())
print('. '.join(result) + '.')
print(result_count)
#my family is good. I love my family.
#8

Use the join method for strings:
outp = []
count = 0
for sentence in sentences:
if (any(map(lambda word: word in sentence, search_keywords))):
outp.append(sentence)
count += len(sentence.split())
print('. '.join(outp) + '.')
print(count)
You choose the separator string and apply the join method providing the list to be separated by the string.

Related

Find the occurrence of a particular word from a file in python [duplicate]

I'm trying to find the number of occurrences of a word in a string.
word = "dog"
str1 = "the dogs barked"
I used the following to count the occurrences:
count = str1.count(word)
The issue is I want an exact match. So the count for this sentence would be 0.
Is that possible?

If you're going for efficiency:
import re
count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(word), input_string))
This doesn't need to create any intermediate lists (unlike split()) and thus will work efficiently for large input_string values.
It also has the benefit of working correctly with punctuation - it will properly return 1 as the count for the phrase "Mike saw a dog." (whereas an argumentless split() would not). It uses the \b regex flag, which matches on word boundaries (transitions between \w a.k.a [a-zA-Z0-9_] and anything else).
If you need to worry about languages beyond the ASCII character set, you may need to adjust the regex to properly match non-word characters in those languages, but for many applications this would be an overcomplication, and in many other cases setting the unicode and/or locale flags for the regex would suffice.

You can use str.split() to convert the sentence to a list of words:
a = 'the dogs barked'.split()
This will create the list:
['the', 'dogs', 'barked']
You can then count the number of exact occurrences using list.count():
a.count('dog') # 0
a.count('dogs') # 1
If it needs to work with punctuation, you can use regular expressions. For example:
import re
a = re.split(r'\W', 'the dogs barked.')
a.count('dogs') # 1

Use a list comprehension:
>>> word = "dog"
>>> str1 = "the dogs barked"
>>> sum(i == word for word in str1.split())
0
>>> word = 'dog'
>>> str1 = 'the dog barked'
>>> sum(i == word for word in str1.split())
1
split() returns a list of all the words in a sentence. Then we use a list comprehension to count how many times the word appears in a sentence.

import re
word = "dog"
str = "the dogs barked"
print len(re.findall(word, str))

You need to split the sentence into words. For you example you can do that with just
words = str1.split()
But for real word usage you need something more advanced that also handles punctuation. For most western languages you can get away with replacing all punctuation with spaces before doing str1.split().
This will work for English as well in simple cases, but note that "I'm" will be split into two words: "I" and "m", and it should in fact be split into "I" and "am". But this may be overkill for this application.
For other cases such as Asian language, or actual real world usage of English, you might want to use a library that does the word splitting for you.
Then you have a list of words, and you can do
count = words.count(word)

#counting the number of words in the text
def count_word(text,word):
"""
Function that takes the text and split it into word
and counts the number of occurence of that word
input: text and word
output: number of times the word appears
"""
answer = text.split(" ")
count = 0
for occurence in answer:
if word == occurence:
count = count + 1
return count
sentence = "To be a programmer you need to have a sharp thinking brain"
word_count = "a"
print(sentence.split(" "))
print(count_word(sentence,word_count))
#output
>>> %Run test.py
['To', 'be', 'a', 'programmer', 'you', 'need', 'to', 'have', 'a', 'sharp', 'thinking', 'brain']
2
>>>
Create the function that takes two inputs which are sentence of text and word.
Split the text of a sentence into the segment of words in a list,
Then check whether the word to be counted exist in the segmented words and count the occurrence as a return of the function.

If you don't need RegularExpression then you can do this neat trick.
word = " is " #Add space at trailing and leading sides.
input_string = "This is some random text and this is str which is mutable"
print("Word count : ",input_string.count(word))
Output -- Word count : 3

Below is a simple example where we can replace the desired word with the new word and also for desired number of occurrences:
import string
def censor(text, word):<br>
newString = text.replace(word,"+" * len(word),text.count(word))
print newString
print censor("hey hey hey","hey")
output will be : +++ +++ +++
The first Parameter in function is search_string.
Second one is new_string which is going to replace your search_string.
Third and last is number of occurrences .

Let us consider the example s = "suvotisuvojitsuvo".
If you want to count no of distinct count "suvo" and "suvojit" then you use the count() method... count distinct i.e) you don't count the suvojit to suvo.. only count the lonely "suvo".
suvocount = s.count("suvo") // #output: 3
suvojitcount = s.count("suvojit") //# output : 1
Then find the lonely suvo count you have to negate from the suvojit count.
lonelysuvo = suvocount - suvojicount //# output: 3-1 -> 2

This would be my solution with help of the comments:
word = str(input("type the french word chiens in english:"))
str1 = "dogs"
times = int(str1.count(word))
if times >= 1:
print ("dogs is correct")
else:
print ("your wrong")

If you want to find the exact number of occurrence of the specific word in the sting and you don't want to use any count function, then you can use the following method.
text = input("Please enter the statement you want to check: ")
word = input("Please enter the word you want to check in the statement: ")
# n is the starting point to find the word, and it's 0 cause you want to start from the very beginning of the string.
n = 0
# position_word is the starting Index of the word in the string
position_word = 0
num_occurrence = 0
if word.upper() in text.upper():
while position_word != -1:
position_word = text.upper().find(word.upper(), n, len(text))
# increasing the value of the stating point for search to find the next word
n = (position_word + 1)
# statement.find("word", start, end) returns -1 if the word is not present in the given statement.
if position_word != -1:
num_occurrence += 1
print (f"{word.title()} is present {num_occurrence} times in the provided statement.")
else:
print (f"{word.title()} is not present in the provided statement.")

This is simple python program using split function
str = 'apple mango apple orange orange apple guava orange'
print("\n My string ==> "+ str +"\n")
str = str.split()
str2=[]
for i in str:
if i not in str2:
str2.append(i)
print( i,str.count(i))

I have just started out to learn coding in general and I do not know any libraries as such.
s = "the dogs barked"
value = 0
x = 0
y=3
for alphabet in s:
if (s[x:y]) == "dog":
value = value+1
x+=1
y+=1
print ("number of dog in the sentence is : ", value)

Another way to do this is by tokenizing string (breaking into words)
Use Counter from collection module of Python Standard Library
from collections import Counter
str1 = "the dogs barked"
stringTokenDict = { key : value for key, value in Counter(str1.split()).items() }
print(stringTokenDict['dogs'])
#This dictionary contains all words & their respective count

Find the average length of all words in a sentence

Given a string consisting of words separated by spaces (one or more).
Find the average length of all words.
Average word length = total number of characters in words (excluding spaces) divided by the number of words.
My attempt:
But input is incorrect, can you help me?
sentence = input("sentence: ")
words = sentence.split()
total_number_of_characters = 0
number_of_words = 0
for word in words:
total_number_of_characters += len(sentence)
number_of_words += len(words)
average_word_length = total_number_of_characters / number_of_words
print(average_word_length)

When you're stuck, one nice trick is to use very verbose variable names that match the task description as closely as possible, for example:
words = sentence.split()
total_number_of_characters = 0
number_of_words = 0
for word in words:
total_number_of_characters += WHAT?
number_of_words += WHAT?
average_word_length = total_number_of_characters / number_of_words
Can you do the rest?

I think maybe it should be
for char in word:
Rather than
for char in words:

You may use mean() function to calculate the average.
>>> from statistics import mean()
>>> sentence = 'The quick brown fox jumps over the lazy dog'
>>> mean(len(word) for word in sentence.split())
3.888888888888889
The statistics library was introduced with Python 3.4.
https://docs.python.org/3/library/statistics.html#statistics.mean

There is a simpler way to solve this problem. You can get the amount of words by getting len(words) and the number of letters by taking the original sentence and removing all spaces in it (check the replace() method).
Now your turn to piece these infos together!
Edit: Here's an example:
sentence = input("Sentence: ")
words = len(sentence.split())
chars = len(sentence.replace(" ", ""))
print(chars / words)

Python encoding/decoding of str, methods

Given a sentence string. Write the shortest word in a sentence. If there are several such words, then output the last one. A word is a set of characters that does not contain spaces, punctuation marks and is delimited by spaces, punctuation marks, or the beginning/end of a line.
Input: sentence = “I LOVE python version three and point 10”
Output: "I"
My attempt:
sentence = input("sentence: ")
words = sentence.split()
min_word = None
for word in words:
if len(word) < len(words):
min_word = word
print(min_word)
But output is : 10
Can you help me?

this bug because of if len(word) < len(words):. It can be if len(word) < len(min_word): and to fix len(None) you can use this code:
sentence = input("sentence: ")
words = sentence.split()
min_word = words[0]
for word in words:
if len(word) < len(min_word):
min_word = word
print(min_word)

Getting the maximum number of words in a sentence of a paragraph Python

I'm trying to get the maximum numbers of words inside a sentence of a paragraph but just can't see it.
Here is what I tried:
S = input("Enter a paragraph")
def getMaxNum(S):
if "." in S:
new_list = S.split(".")[0]
return len(new_list)
else "?" in S:
new_list = S.split("?")[0]
return len(new_list)
else "!" in S:
new_list = S.split("?")[0]
return len(new_list)
getMaxNum(S)
In the else statement I could be getting the previous sentence values but that's not what I need to get. Any ideas how can I accomplish that?

I'm not 100% certain of what your requirements are, but if I borrow Buoy Rina's input, here's a solution using regular expressions (pattern search strings):
#!/usr/bin/env python3
import re
text = "I will go school tomorrow. I eat apples. Here is a six word sentence."
max_words = 0
sentences = re.split("[.!?]", text)
for sentence in sentences:
max_words = max( len( sentence.split() ), max_words )
print(f"max_words: {max_words}")
The re.split() breaks the text (or paragraph) into sentences based on "some" end of sentence punctuation. There are likely conditions under which searching for period '.' won't yield a complete sentence, but we'll ignore that for simplicity.
The string function split() then breaks up the sentence into words based on white space (the default of split()). We then get the length of the resultant list to find the word count.

text = "I will go school tomorrow. I eat apples. I will have a very long sentence. "
def getmaxwordcount(text):
count_word = 0
is_start_word = False
counts = []
for c in text:
if c == ' ':
if is_start_word:
count_word += 1
is_start_word = False
elif c == '!' or c == '.' or c == '?':
if is_start_word:
count_word += 1
is_start_word = False
counts.append(count_word)
count_word = 0
else:
if c.isalpha():
if is_start_word == False:
is_start_word = True
return max(counts)
getmaxwordcount(text) # 7

import re
text = "I will go school tomorrow. I eat apples."
def foo(txt):
max_count=0
for i in re.split('[!.?]',txt):
if len(i.split()) > max_count:
max_count = len(i.split())
return max_count
print(foo(text)) # returns 5

code
import re
paragraph = "Two words. Three other words? Finally four another words!"
all_lengths_in_paragraph = [f"Length of {n+1}th sentence is {len(list(filter(None, x.split(' '))))}" for n, x in enumerate(list(filter(None, re.split('\.|!|\?', paragraph))))]
max_length = max([len(list(filter(None, x.split(' ')))) for x in list(filter(None, re.split('\.|!|\?', paragraph)))])
for one_length in all_lengths_in_paragraph:
print(r)
print('maximum length is', max_length)
output
Length of 1th sentence is 2
Length of 2th sentence is 3
Length of 3th sentence is 4
maximum length is 4

Python Hacker Rank Two Way Emunese Translator

This problem asks me to make a two way translator, English to Emunese(made up language) and Emunese to English - I have to take the last letter from each word and move it to the front of the word, add mu after each word, and add emu after every three words (ie. 'imu odmu tnomu emu wknomu whomu otmu emu odmu sthimu' is the English sentence 'i do not know how to do this' translated to Emunese and of course the opposite for when I'm converting the sentence to English). At first I thought this would be relatively simple. Through my own test it seems to run fine, change the sentence to Emunese and you'll see that it converts to English. The problem occurs when I plug the code into Hacker Rank. Again, the code seems to run fine on the visible test cases (Ones where I can see the output and input). The problem occurs on the hidden test cases, where two of the test cases return 'Wrong Answer'. I've been scratching my head trying to figure out what my code is missing, but I cant quite think of it.
Here is the Hack Rank link:
https://www.hackerrank.com/contests/hcpc19-div-i/challenges/hcpc-19-div-i-two-way-emunese-translator
This code can be plugged in anywhere (Under this is code that can be plugged directly into hacker rank):
sentence = ['i', 'dont', 'know','how', 'to', 'do','this']
#Which version you want to convert the sentence to
lang = 'EMU'
n = len(sentence)
if lang == 'ENG':
#Move the last letter to the front
sentence = " ".join([words[-1:] + words[:-1] + 'mu' for words in sentence])
result = []
#Add emu after every three words
for idx, word in enumerate(sentence.split()):
if idx > 1 and idx % 3 == 0:
result.append("emu")
result.append(word)
#If sentence is 3, 6, 9, etc letters long add emu to the end
if n % 3 == 0:
result.append("emu")
sentence = " ".join(result)
print(sentence)
elif lang == 'EMU':
sentence = ' '.join(sentence)
#Get rid of mu and emu
sentence = sentence.replace('mu', '')
sentence = sentence.replace('e ', '')
#Move the first letter to the end of the word
print( " ".join([words[1:] + words[0] for words in sentence.split()]))
This code has to be plugged into Hacker Rank to work (Look up for a version that can be plugged in anywhere):
if lang == 'ENG':
#Move the last letter to the front
sentence = " ".join([words[-1:] + words[:-1] + 'mu' for words in sentence])
result = []
#Add emu after every three words
for idx, word in enumerate(sentence.split()):
if idx > 1 and idx % 3 == 0:
result.append("emu")
result.append(word)
#If sentence is 3, 6, 9, etc letters long add emu to the end
if n % 3 == 0:
result.append("emu")
return " ".join(result)
elif lang == 'EMU':
sentence = ' '.join(sentence)
#Get rid of mu and emu
sentence = sentence.replace('mu', '')
sentence = sentence.replace('e ', '')
#Move the first letter to the end of the word
return " ".join([words[1:] + words[0] for words in sentence.split()])

Here is the fixed code. In the elif lang == "EMU" statement
for idx, word in enumerate(sentence):
if idx > 1 and idx % 3 == 0:
sentence.remove(word)
sentence = " ".join([words[:-2] for words in sentence])
sentence = " ".join([words[1:] + words[0] for words in sentence.split()])
print(sentence)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

The output of sentences extracted by key words - python

How about this: result = [] result_count = 0 for sentence in sentences: if (any(map(lambda word: word in sentence, search_keywords))): result.append(sentence) result_count += len(sentence.split()) print('. '.join(result) + '.') print(result_count) #my family is good. I love my family. #8

Related

Find the occurrence of a particular word from a file in python [duplicate]

Find the average length of all words in a sentence

Python encoding/decoding of str, methods

Getting the maximum number of words in a sentence of a paragraph Python

Python Hacker Rank Two Way Emunese Translator

Categories

Resources