Find the average length of all words in a sentence - python

Given a string consisting of words separated by spaces (one or more).
Find the average length of all words.
Average word length = total number of characters in words (excluding spaces) divided by the number of words.
My attempt:
But input is incorrect, can you help me?
sentence = input("sentence: ")
words = sentence.split()
total_number_of_characters = 0
number_of_words = 0
for word in words:
total_number_of_characters += len(sentence)
number_of_words += len(words)
average_word_length = total_number_of_characters / number_of_words
print(average_word_length)

When you're stuck, one nice trick is to use very verbose variable names that match the task description as closely as possible, for example:
words = sentence.split()
total_number_of_characters = 0
number_of_words = 0
for word in words:
total_number_of_characters += WHAT?
number_of_words += WHAT?
average_word_length = total_number_of_characters / number_of_words
Can you do the rest?

I think maybe it should be
for char in word:
Rather than
for char in words:

You may use mean() function to calculate the average.
>>> from statistics import mean()
>>> sentence = 'The quick brown fox jumps over the lazy dog'
>>> mean(len(word) for word in sentence.split())
3.888888888888889
The statistics library was introduced with Python 3.4.
https://docs.python.org/3/library/statistics.html#statistics.mean

There is a simpler way to solve this problem. You can get the amount of words by getting len(words) and the number of letters by taking the original sentence and removing all spaces in it (check the replace() method).
Now your turn to piece these infos together!
Edit: Here's an example:
sentence = input("Sentence: ")
words = len(sentence.split())
chars = len(sentence.replace(" ", ""))
print(chars / words)

Related

Find the occurrence of a particular word from a file in python [duplicate]

I'm trying to find the number of occurrences of a word in a string.
word = "dog"
str1 = "the dogs barked"
I used the following to count the occurrences:
count = str1.count(word)
The issue is I want an exact match. So the count for this sentence would be 0.
Is that possible?
If you're going for efficiency:
import re
count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(word), input_string))
This doesn't need to create any intermediate lists (unlike split()) and thus will work efficiently for large input_string values.
It also has the benefit of working correctly with punctuation - it will properly return 1 as the count for the phrase "Mike saw a dog." (whereas an argumentless split() would not). It uses the \b regex flag, which matches on word boundaries (transitions between \w a.k.a [a-zA-Z0-9_] and anything else).
If you need to worry about languages beyond the ASCII character set, you may need to adjust the regex to properly match non-word characters in those languages, but for many applications this would be an overcomplication, and in many other cases setting the unicode and/or locale flags for the regex would suffice.
You can use str.split() to convert the sentence to a list of words:
a = 'the dogs barked'.split()
This will create the list:
['the', 'dogs', 'barked']
You can then count the number of exact occurrences using list.count():
a.count('dog') # 0
a.count('dogs') # 1
If it needs to work with punctuation, you can use regular expressions. For example:
import re
a = re.split(r'\W', 'the dogs barked.')
a.count('dogs') # 1
Use a list comprehension:
>>> word = "dog"
>>> str1 = "the dogs barked"
>>> sum(i == word for word in str1.split())
0
>>> word = 'dog'
>>> str1 = 'the dog barked'
>>> sum(i == word for word in str1.split())
1
split() returns a list of all the words in a sentence. Then we use a list comprehension to count how many times the word appears in a sentence.
import re
word = "dog"
str = "the dogs barked"
print len(re.findall(word, str))
You need to split the sentence into words. For you example you can do that with just
words = str1.split()
But for real word usage you need something more advanced that also handles punctuation. For most western languages you can get away with replacing all punctuation with spaces before doing str1.split().
This will work for English as well in simple cases, but note that "I'm" will be split into two words: "I" and "m", and it should in fact be split into "I" and "am". But this may be overkill for this application.
For other cases such as Asian language, or actual real world usage of English, you might want to use a library that does the word splitting for you.
Then you have a list of words, and you can do
count = words.count(word)
#counting the number of words in the text
def count_word(text,word):
"""
Function that takes the text and split it into word
and counts the number of occurence of that word
input: text and word
output: number of times the word appears
"""
answer = text.split(" ")
count = 0
for occurence in answer:
if word == occurence:
count = count + 1
return count
sentence = "To be a programmer you need to have a sharp thinking brain"
word_count = "a"
print(sentence.split(" "))
print(count_word(sentence,word_count))
#output
>>> %Run test.py
['To', 'be', 'a', 'programmer', 'you', 'need', 'to', 'have', 'a', 'sharp', 'thinking', 'brain']
2
>>>
Create the function that takes two inputs which are sentence of text and word.
Split the text of a sentence into the segment of words in a list,
Then check whether the word to be counted exist in the segmented words and count the occurrence as a return of the function.
If you don't need RegularExpression then you can do this neat trick.
word = " is " #Add space at trailing and leading sides.
input_string = "This is some random text and this is str which is mutable"
print("Word count : ",input_string.count(word))
Output -- Word count : 3
Below is a simple example where we can replace the desired word with the new word and also for desired number of occurrences:
import string
def censor(text, word):<br>
newString = text.replace(word,"+" * len(word),text.count(word))
print newString
print censor("hey hey hey","hey")
output will be : +++ +++ +++
The first Parameter in function is search_string.
Second one is new_string which is going to replace your search_string.
Third and last is number of occurrences .
Let us consider the example s = "suvotisuvojitsuvo".
If you want to count no of distinct count "suvo" and "suvojit" then you use the count() method... count distinct i.e) you don't count the suvojit to suvo.. only count the lonely "suvo".
suvocount = s.count("suvo") // #output: 3
suvojitcount = s.count("suvojit") //# output : 1
Then find the lonely suvo count you have to negate from the suvojit count.
lonelysuvo = suvocount - suvojicount //# output: 3-1 -> 2
This would be my solution with help of the comments:
word = str(input("type the french word chiens in english:"))
str1 = "dogs"
times = int(str1.count(word))
if times >= 1:
print ("dogs is correct")
else:
print ("your wrong")
If you want to find the exact number of occurrence of the specific word in the sting and you don't want to use any count function, then you can use the following method.
text = input("Please enter the statement you want to check: ")
word = input("Please enter the word you want to check in the statement: ")
# n is the starting point to find the word, and it's 0 cause you want to start from the very beginning of the string.
n = 0
# position_word is the starting Index of the word in the string
position_word = 0
num_occurrence = 0
if word.upper() in text.upper():
while position_word != -1:
position_word = text.upper().find(word.upper(), n, len(text))
# increasing the value of the stating point for search to find the next word
n = (position_word + 1)
# statement.find("word", start, end) returns -1 if the word is not present in the given statement.
if position_word != -1:
num_occurrence += 1
print (f"{word.title()} is present {num_occurrence} times in the provided statement.")
else:
print (f"{word.title()} is not present in the provided statement.")
This is simple python program using split function
str = 'apple mango apple orange orange apple guava orange'
print("\n My string ==> "+ str +"\n")
str = str.split()
str2=[]
for i in str:
if i not in str2:
str2.append(i)
print( i,str.count(i))
I have just started out to learn coding in general and I do not know any libraries as such.
s = "the dogs barked"
value = 0
x = 0
y=3
for alphabet in s:
if (s[x:y]) == "dog":
value = value+1
x+=1
y+=1
print ("number of dog in the sentence is : ", value)
Another way to do this is by tokenizing string (breaking into words)
Use Counter from collection module of Python Standard Library
from collections import Counter
str1 = "the dogs barked"
stringTokenDict = { key : value for key, value in Counter(str1.split()).items() }
print(stringTokenDict['dogs'])
#This dictionary contains all words & their respective count

Python encoding/decoding of str, methods

Given a sentence string. Write the shortest word in a sentence. If there are several such words, then output the last one. A word is a set of characters that does not contain spaces, punctuation marks and is delimited by spaces, punctuation marks, or the beginning/end of a line.
Input: sentence = “I LOVE python version three and point 10”
Output: "I"
My attempt:
sentence = input("sentence: ")
words = sentence.split()
min_word = None
for word in words:
if len(word) < len(words):
min_word = word
print(min_word)
But output is : 10
Can you help me?
this bug because of if len(word) < len(words):. It can be if len(word) < len(min_word): and to fix len(None) you can use this code:
sentence = input("sentence: ")
words = sentence.split()
min_word = words[0]
for word in words:
if len(word) < len(min_word):
min_word = word
print(min_word)

The output of sentences extracted by key words

I am new to Python. I have trouble figuring out the format of the extracted sentences by several key words. There are several sentences extracted. How to convert the output of several sentences to one string?
For example:
search_keywords=['my family','love my']
text = "my family is good. I love my family. I am happy."
sentences = text.split(".")
for sentence in sentences:
if (any(map(lambda word: word in sentence, search_keywords))):
print (sentence)
count = len(sentence.split())
print(count)
The output is:
my family is good
4
I love my family
4
How to combine the two extracted sentences as one string, so the total count equals 8 like the following:
my family is good. I love my family.
8
Any help is appreciated.
Let me correct your python code
#your data
search_keywords=['my family','love my']
text = "my family is good. I love my family. I am happy."
sentences = text.split(".")
#initialise
total_count = 0
final_sentence = ""
#every sentences
for sentence in sentences:
if (any(map(lambda word: word in sentence, search_keywords))):
#add the count to total_count
total_count += len(sentence.split())
#add the sentence to final sentence
final_sentence += sentence+'.'
#print the final_sentence and total_count
print(final_sentence)
print(total_count)
How about this:
result = []
result_count = 0
for sentence in sentences:
if (any(map(lambda word: word in sentence, search_keywords))):
result.append(sentence)
result_count += len(sentence.split())
print('. '.join(result) + '.')
print(result_count)
#my family is good. I love my family.
#8
Use the join method for strings:
outp = []
count = 0
for sentence in sentences:
if (any(map(lambda word: word in sentence, search_keywords))):
outp.append(sentence)
count += len(sentence.split())
print('. '.join(outp) + '.')
print(count)
You choose the separator string and apply the join method providing the list to be separated by the string.

How to not count punctuation between words

What is the best way to count variable of say an apostrophe counting with words such as "shouldn't" only.
For example "I shouldn't do that" counts once
But " 'I will not do that' " counts zero
Basically how can i use counts to count apostrophes in words and not quotes.
I haven't been able to try much successfully. I can only use the basic for loop to count every apostrophe but can't narrow down specifically.
for sentence in split_sentences:
for w in sentence:
for p in punctuation:
if p == w:
if word in counts:
counts[p] += 1
else:
counts[p] = 1
else:
pass
With a given list of words, It should count only in words not around word.
So "Shouldn't" will count but "'should'" will not.
You can check if it is inside the word:
for sentence in split_sentences:
for w in sentence:
for p in punctuation:
if p in w and w[0] != p and w[-1] != p:
if word in counts:
counts[p] += 1
else:
counts[p] = 1
else:
pass
The important line is this if p in w and w[0] != p and w[-1] != p:
We have 3 rules for it to count:
The puntuation p is in the word 2
The word w does not start (w[0]) by the punctuation p
The word w does not ends (w[-1]) by the punctuation p
A more pythonic way of doing such would be to use the str available methods, endswith and startswith:
...
if p in w and not w.startswith(p) and not w.endswith(p):
...
You can use the regular expression [a-zA-Z]'[a-zA-Z] to find all single quotes that are surrounded by letters.
The requirement for the hyphen isn't completely clear to me. If it has the same requirement (i.e. it only counts when surrounded by letters) than using the regular expression [a-zA-Z]['-][a-zA-Z] will do the trick: it will count quotes as well as hyphens.
If you should count all hyphens, then you could just use the str.count method (e.g.
"test-string".count("-") returns 1).
Here is some example code, assuming the hyphens must also be counted only if they are surrounded by letters:
import re
TEST_SENTENCES = (
"I shouldn't do that",
"'I will not do that'",
"Test-hyphen"
)
PATTERN = re.compile("[a-zA-Z]['-][a-zA-Z]")
for sentence in TEST_SENTENCES:
print(len(PATTERN.findall(sentence)))
Output:
1
0
1

Trying to make the program give me the longest word in a sentence

The question is, when i have a sentence that contains two words with the same amount of letters, how do i make the program give me the first longest word when reading instead of both?
import sys
inputsentence = input("Enter a sentence and I will find the longest word: ").split()
longestwords = []
for word in inputsentence:
if len(word) == len(max(inputsentence, key=len)):
longestwords.append(word)
print ("the longest word in the sentence is:",longestwords)
example: the quick brown fox...right now the program gives me "quick" and "brown", how do tweak my code to just give me "quick" since its the first longest word?
I would get rid of the for-loop altogether and just do this:
>>> mystr = input("Enter a sentence and I will find the longest word: ")
Enter a sentence and I will find the longest word: The quick brown fox
>>> longest = max(mystr.split(), key=len)
>>> print("the longest word in the sentence is:", longest)
the longest word in the sentence is: quick
>>>
Just print the first one in the list:
print ("the longest word in the sentence is:",longestwords[0])
There are likely better ways to do this, but this requires the least modification to your code.
Why not just:
longest_word = None
for word in inputsentence:
if len(word) == len(max(inputsentence, key=len)):
longest_word = word
print ("the longest word in the sentence is:",longest_word)
More pythonic way
import sys
inputsentence = input("Enter a sentence and I will find the longest word: ").split()
# use function len() as criteria to sort
inputsentence.sort(key=len)
# -1 is last item on list
print ("the longest word in the sentence is:", sentence[-1])

Categories