How to not count punctuation between words

How to not count punctuation between words - python

What is the best way to count variable of say an apostrophe counting with words such as "shouldn't" only.
For example "I shouldn't do that" counts once
But " 'I will not do that' " counts zero
Basically how can i use counts to count apostrophes in words and not quotes.
I haven't been able to try much successfully. I can only use the basic for loop to count every apostrophe but can't narrow down specifically.
for sentence in split_sentences:
for w in sentence:
for p in punctuation:
if p == w:
if word in counts:
counts[p] += 1
else:
counts[p] = 1
else:
pass
With a given list of words, It should count only in words not around word.
So "Shouldn't" will count but "'should'" will not.

You can check if it is inside the word:
for sentence in split_sentences:
for w in sentence:
for p in punctuation:
if p in w and w[0] != p and w[-1] != p:
if word in counts:
counts[p] += 1
else:
counts[p] = 1
else:
pass
The important line is this if p in w and w[0] != p and w[-1] != p:
We have 3 rules for it to count:
The puntuation p is in the word 2
The word w does not start (w[0]) by the punctuation p
The word w does not ends (w[-1]) by the punctuation p
A more pythonic way of doing such would be to use the str available methods, endswith and startswith:
...
if p in w and not w.startswith(p) and not w.endswith(p):
...

You can use the regular expression [a-zA-Z]'[a-zA-Z] to find all single quotes that are surrounded by letters.
The requirement for the hyphen isn't completely clear to me. If it has the same requirement (i.e. it only counts when surrounded by letters) than using the regular expression [a-zA-Z]['-][a-zA-Z] will do the trick: it will count quotes as well as hyphens.
If you should count all hyphens, then you could just use the str.count method (e.g.
"test-string".count("-") returns 1).
Here is some example code, assuming the hyphens must also be counted only if they are surrounded by letters:
import re
TEST_SENTENCES = (
"I shouldn't do that",
"'I will not do that'",
"Test-hyphen"
)
PATTERN = re.compile("[a-zA-Z]['-][a-zA-Z]")
for sentence in TEST_SENTENCES:
print(len(PATTERN.findall(sentence)))
Output:
1
0
1

Related

Find the occurrence of a particular word from a file in python [duplicate]

I'm trying to find the number of occurrences of a word in a string.
word = "dog"
str1 = "the dogs barked"
I used the following to count the occurrences:
count = str1.count(word)
The issue is I want an exact match. So the count for this sentence would be 0.
Is that possible?

If you're going for efficiency:
import re
count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(word), input_string))
This doesn't need to create any intermediate lists (unlike split()) and thus will work efficiently for large input_string values.
It also has the benefit of working correctly with punctuation - it will properly return 1 as the count for the phrase "Mike saw a dog." (whereas an argumentless split() would not). It uses the \b regex flag, which matches on word boundaries (transitions between \w a.k.a [a-zA-Z0-9_] and anything else).
If you need to worry about languages beyond the ASCII character set, you may need to adjust the regex to properly match non-word characters in those languages, but for many applications this would be an overcomplication, and in many other cases setting the unicode and/or locale flags for the regex would suffice.

You can use str.split() to convert the sentence to a list of words:
a = 'the dogs barked'.split()
This will create the list:
['the', 'dogs', 'barked']
You can then count the number of exact occurrences using list.count():
a.count('dog') # 0
a.count('dogs') # 1
If it needs to work with punctuation, you can use regular expressions. For example:
import re
a = re.split(r'\W', 'the dogs barked.')
a.count('dogs') # 1

Use a list comprehension:
>>> word = "dog"
>>> str1 = "the dogs barked"
>>> sum(i == word for word in str1.split())
0
>>> word = 'dog'
>>> str1 = 'the dog barked'
>>> sum(i == word for word in str1.split())
1
split() returns a list of all the words in a sentence. Then we use a list comprehension to count how many times the word appears in a sentence.

import re
word = "dog"
str = "the dogs barked"
print len(re.findall(word, str))

You need to split the sentence into words. For you example you can do that with just
words = str1.split()
But for real word usage you need something more advanced that also handles punctuation. For most western languages you can get away with replacing all punctuation with spaces before doing str1.split().
This will work for English as well in simple cases, but note that "I'm" will be split into two words: "I" and "m", and it should in fact be split into "I" and "am". But this may be overkill for this application.
For other cases such as Asian language, or actual real world usage of English, you might want to use a library that does the word splitting for you.
Then you have a list of words, and you can do
count = words.count(word)

#counting the number of words in the text
def count_word(text,word):
"""
Function that takes the text and split it into word
and counts the number of occurence of that word
input: text and word
output: number of times the word appears
"""
answer = text.split(" ")
count = 0
for occurence in answer:
if word == occurence:
count = count + 1
return count
sentence = "To be a programmer you need to have a sharp thinking brain"
word_count = "a"
print(sentence.split(" "))
print(count_word(sentence,word_count))
#output
>>> %Run test.py
['To', 'be', 'a', 'programmer', 'you', 'need', 'to', 'have', 'a', 'sharp', 'thinking', 'brain']
2
>>>
Create the function that takes two inputs which are sentence of text and word.
Split the text of a sentence into the segment of words in a list,
Then check whether the word to be counted exist in the segmented words and count the occurrence as a return of the function.

If you don't need RegularExpression then you can do this neat trick.
word = " is " #Add space at trailing and leading sides.
input_string = "This is some random text and this is str which is mutable"
print("Word count : ",input_string.count(word))
Output -- Word count : 3

Below is a simple example where we can replace the desired word with the new word and also for desired number of occurrences:
import string
def censor(text, word):<br>
newString = text.replace(word,"+" * len(word),text.count(word))
print newString
print censor("hey hey hey","hey")
output will be : +++ +++ +++
The first Parameter in function is search_string.
Second one is new_string which is going to replace your search_string.
Third and last is number of occurrences .

Let us consider the example s = "suvotisuvojitsuvo".
If you want to count no of distinct count "suvo" and "suvojit" then you use the count() method... count distinct i.e) you don't count the suvojit to suvo.. only count the lonely "suvo".
suvocount = s.count("suvo") // #output: 3
suvojitcount = s.count("suvojit") //# output : 1
Then find the lonely suvo count you have to negate from the suvojit count.
lonelysuvo = suvocount - suvojicount //# output: 3-1 -> 2

This would be my solution with help of the comments:
word = str(input("type the french word chiens in english:"))
str1 = "dogs"
times = int(str1.count(word))
if times >= 1:
print ("dogs is correct")
else:
print ("your wrong")

If you want to find the exact number of occurrence of the specific word in the sting and you don't want to use any count function, then you can use the following method.
text = input("Please enter the statement you want to check: ")
word = input("Please enter the word you want to check in the statement: ")
# n is the starting point to find the word, and it's 0 cause you want to start from the very beginning of the string.
n = 0
# position_word is the starting Index of the word in the string
position_word = 0
num_occurrence = 0
if word.upper() in text.upper():
while position_word != -1:
position_word = text.upper().find(word.upper(), n, len(text))
# increasing the value of the stating point for search to find the next word
n = (position_word + 1)
# statement.find("word", start, end) returns -1 if the word is not present in the given statement.
if position_word != -1:
num_occurrence += 1
print (f"{word.title()} is present {num_occurrence} times in the provided statement.")
else:
print (f"{word.title()} is not present in the provided statement.")

This is simple python program using split function
str = 'apple mango apple orange orange apple guava orange'
print("\n My string ==> "+ str +"\n")
str = str.split()
str2=[]
for i in str:
if i not in str2:
str2.append(i)
print( i,str.count(i))

I have just started out to learn coding in general and I do not know any libraries as such.
s = "the dogs barked"
value = 0
x = 0
y=3
for alphabet in s:
if (s[x:y]) == "dog":
value = value+1
x+=1
y+=1
print ("number of dog in the sentence is : ", value)

Another way to do this is by tokenizing string (breaking into words)
Use Counter from collection module of Python Standard Library
from collections import Counter
str1 = "the dogs barked"
stringTokenDict = { key : value for key, value in Counter(str1.split()).items() }
print(stringTokenDict['dogs'])
#This dictionary contains all words & their respective count

Find the average length of all words in a sentence

Given a string consisting of words separated by spaces (one or more).
Find the average length of all words.
Average word length = total number of characters in words (excluding spaces) divided by the number of words.
My attempt:
But input is incorrect, can you help me?
sentence = input("sentence: ")
words = sentence.split()
total_number_of_characters = 0
number_of_words = 0
for word in words:
total_number_of_characters += len(sentence)
number_of_words += len(words)
average_word_length = total_number_of_characters / number_of_words
print(average_word_length)

When you're stuck, one nice trick is to use very verbose variable names that match the task description as closely as possible, for example:
words = sentence.split()
total_number_of_characters = 0
number_of_words = 0
for word in words:
total_number_of_characters += WHAT?
number_of_words += WHAT?
average_word_length = total_number_of_characters / number_of_words
Can you do the rest?

I think maybe it should be
for char in word:
Rather than
for char in words:

You may use mean() function to calculate the average.
>>> from statistics import mean()
>>> sentence = 'The quick brown fox jumps over the lazy dog'
>>> mean(len(word) for word in sentence.split())
3.888888888888889
The statistics library was introduced with Python 3.4.
https://docs.python.org/3/library/statistics.html#statistics.mean

There is a simpler way to solve this problem. You can get the amount of words by getting len(words) and the number of letters by taking the original sentence and removing all spaces in it (check the replace() method).
Now your turn to piece these infos together!
Edit: Here's an example:
sentence = input("Sentence: ")
words = len(sentence.split())
chars = len(sentence.replace(" ", ""))
print(chars / words)

Coding a function to find the validity of certain words in a sentence

A university assignment has us tasked with writing a program in Python that analyzes tweets. Part of the assignment is coding a function that identifies whether words within a string sentence are valid, and can be counted. Here's the question:
Task 8 Valid Words
We also might want to look at only valid words in our data set. A word will be a valid word if all three of the following conditions are true:
• The word contains only letters, hyphens, and/or punctuation* (no digits).
• There is at most one hyphen '-'. If present, it must be surrounded by characters ("a-b" is valid, but "-ab" and "ab-" are not valid).
• There is at most one punctuation mark. If present, it must be at the end of the word ("ab,", "cd!", and "." are valid, but "a!b" and "c.," are not valid).
NB: for this question, the 3rd condition will also apply to apostrophes despite real "valid" words
containing them.
Write a function valid_words_mask(sentence) that takes an input parameter sentence (type string)
and returns the tuple: (int, list[]), where:
• int is the number of valid words found.
• list[] contains the booleans True or False for each word in sequence depending on whether that
word is valid.
*Assume that a punctuation mark is any character that is not an alphanumeric (except for apostrophes,
and for hyphens, which are handled separately as per the instructions).
Here's the code I have written so far, after many days of struggling. It seems to only return one iteration of the loop. Keep in mind that I am a beginner programmer, and have only applied the few concepts we have learned. :)
Thanks for the feedback.
def valid_words_mask(sentence):
"""Takes a string sentence input and determines whether words are valid"""
import string
punctuation = list(string.punctuation)
punctuation.remove("-")
word_list = " ".split(sentence)
valid_count = 0
valid_list = []
for word in word_list:
hyphen_count = 0
digit_count = 0
punctuation_count = 0
for i in range (0, len(word)):
#Checks whether given character is a punctuation mark
if word[i] == "-":
hyphen_count += 1
for i in range (0, len(word)):
#Checks whether given character is a digit
if word[i].isdigit() == True:
digit_count += 1
for i in range (0, (len(word) - 1)):
if word[i] in punctuation:
punctuation_count += 1
if digit_count < 1 and hyphen_count < 2 and punctuation_count < 1:
if word[0] != "-" and word[-1] != "-":
validity = True
else: validity = False
if validity == True:
valid_count += 1
valid_list.append(validity)
final_tuple = (valid_count, valid_list)
return final_tuple
sentence = "these are valid words"
print(valid_words_mask(sentence))

The problem is wit the line:
word_list = " ".split(sentence).
word_list is an empty list.
Put
word_list = sentence.split() instead.

The instructions for this task are confusing when it comes to defining what constitutes punctuation which means that the following code may not work for you.
However, you should think about breaking down the functionality into its component parts. In particular, you have 3 "rules" so write 3 complementary functions: each one succinct. Then it becomes easier to combine those rules into another "driver" function. Here's an example:
from string import ascii_lowercase, punctuation
HYPHEN = '-'
PUNCTUATION = punctuation.replace(HYPHEN, '')
VCHARS = ascii_lowercase + punctuation
def valid_chars(word):
return all(c in VCHARS for c in word)
def valid_hyphens(word):
return word.count(HYPHEN) == 0 or (word[0] != HYPHEN and word[-1] != HYPHEN)
def valid_punctuation(word):
pcount = sum(1 for c in word if c in PUNCTUATION)
return pcount == 0 or (pcount == 1 and word[-1] in PUNCTUATION)
def valid_words_mask(sentence):
valid_count = 0
valid_list = list()
for word in sentence.lower().split():
if v := valid_chars(word) and valid_punctuation(word) and valid_hyphens(word):
valid_count += 1
valid_list.append(v)
return valid_count, valid_list
print(valid_words_mask('Hello world??'))
Output:
(1, [True, False])

How to reverse the words of a string considering the punctuation?

Here is what I have so far:
def reversestring(thestring):
words = thestring.split(' ')
rev = ' '.join(reversed(words))
return rev
stringing = input('enter string: ')
print(reversestring(stringing))
I know I'm missing something because I need the punctuation to also follow the logic.
So let's say the user puts in Do or do not, there is no try.. The result should be coming out as .try no is there , not do or Do, but I only get try. no is there not, do or Do. I use a straightforward implementation which reverse all the characters in the string, then do something where it checks all the words and reverses the characters again but only to the ones with ASCII values of letters.

Try this (explanation in comments of code):
s = "Do or do not, there is no try."
o = []
for w in s.split(" "):
puncts = [".", ",", "!"] # change according to needs
for c in puncts:
# if a punctuation mark is in the word, take the punctuation and add it to the rest of the word, in the beginning
if c in w:
w = c + w[:-1] # w[:-1] gets everthing before the last char
o.append(w)
o = reversed(o) # reversing list to reverse sentence
print(" ".join(o)) # printing it as sentence
#output: .try no is there ,not do or Do

Your code does exactly what it should, splitting on space doesn't separator a dot ro comma from a word.
I'd suggest you use re.findall to get all words, and all punctation that interest you
import re
def reversestring(thestring):
words = re.findall(r"\w+|[.,]", thestring)
rev = ' '.join(reversed(words))
return rev
reversestring("Do or do not, there is no try.") # ". try no is there , not do or Do"

You can use regular expressions to parse the sentence into a list of words and a list of separators, then reverse the word list and combine them together to form the desired string. A solution to your problem would look something like this:
import re
def reverse_it(s):
t = "" # result, empty string
words = re.findall(r'(\w+)', s) # just the words
not_s = re.findall(r'(\W+)', s) # everything else
j = len(words)
k = len(not_s)
words.reverse() # reverse the order of word list
if re.match(r'(\w+)', s): # begins with a word
for i in range(k):
t += words[i] + not_s[i]
if j > k: # and ends with a word
t += words[k]
else: # begins with punctuation
for i in range(j):
t += not_s[i] + words[i]
if k > j: # ends with punctuation
t += not_s[j]
return t #result
def check_reverse(p):
q = reverse_it(p)
print("\"%s\", \"%s\"" % (p, q) )
check_reverse('Do or do not, there is no try.')
Output
"Do or do not, there is no try.", "try no is there, not do or Do."
It is not a very elegant solution but sure does work!

Comparing variables to all items in list in Python

I need to write a function for an edx Python course. The idea is figure out the number of letters and words in series of strings and return the average letter count while only using for/while loops and conditionals. My code comes close to being able to do this, but I cannot. For the life of me. Figure out why it doesn't work. I've been bashing my head against it for two days, now, and I know it's probably something really simple that I'm too idiotic to see (sense my frustration?), but I do not know what it is.
If I'm looking at line 14, the logic makes sense: if i in the string is punctuation (not a letter) and the previous character (char, in this case) is not punctuation (therefore a letter), it should be a word. But it's still counting double punctuation as words. But not all of them.
def averageWordLength(myString):
char = ""
punctuation = [" ", "!", "?", ".", ","]
letters = 0
words = 0
if not myString == str(myString):
return "Not a string"
try:
for i in myString:
if i not in punctuation:
letters += 1
elif i in punctuation:
if char not in punctuation:
words += 1
elif char in punctuation:
pass
char = i
if letters == 0:
return "No words"
else:
average = letters / (words + 1)
return letters, words + 1, average
except TypeError:
return "No words"
print(averageWordLength("Hi"))
print(averageWordLength("Hi, Lucy"))
print(averageWordLength(" What big spaces you have!"))
print(averageWordLength(True))
print(averageWordLength("?!?!?! ... !"))
print(averageWordLength("One space. Two spaces. Three spaces. Nine spaces. "))
Desired output:
2, 1, 2.0
6, 2, 3.0
20, 6, 4.0
Not a string
No words
38, 8, 4.75
What in blazes am I doing wrong?!
٩๏̯͡๏۶
Final correction:
for i in myString:
if i not in punctuation:
letters += 1
if char in punctuation:
words += 1
char = i

else:
average = letters / (words + 1)
return letters, words + 1, average
You're adding 1 to words by default... this is not valid in all cases: "Hi!" being a good example. This is actually what is putting off all of your strings: Anytime a string does not end in a word your function will be off.
Hint: You only want to add one if there is no punctuation after the last word.

A problem happens when the string begins with a punctuation character: the previous character is still "" and not considered as a punctuation character, so an non-existent word in counted.
you could add "" in the list of symbols, or do :
punctuation = " !?.,"
because testing c in s return true if c is a substring of s, aka if c is a character of s. And the empty string is contained in every string.
A second problem occurs at the end, if the string terminate with a word, it is not counted (were your word+1 a way to fix that ?), but if the string terminate with a punctuation, the last word is counted.
Add this just after the for loop :
if char not in punctuation:
words += 1
And now there will be no need to add 1, just use
average = letters / words

This is made more difficult since I assume you're not allowed to use inbuilt string functions like split().
The way I would approach this is to:
Split the sentence into a list of words.
Count the letters in each word.
Take the average amount of letters.
def averageWordLength(myString):
punctuation = [" ", "!", "?", ".", ","]
if not isinstance(myString, str):
return "Not a string"
split_values = []
word = ''
for char in myString:
if char in punctuation:
if word:
split_values.append(word)
word = ''
else:
word += char
if word:
split_values.append(word)
letter_count = []
for word in split_values:
letter_count.append(len(word))
if len(letter_count):
return sum(letter_count)/len(letter_count)
else:
return "No words."

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to not count punctuation between words - python

Related

Find the occurrence of a particular word from a file in python [duplicate]

Find the average length of all words in a sentence

Coding a function to find the validity of certain words in a sentence

How to reverse the words of a string considering the punctuation?

Comparing variables to all items in list in Python

Categories

Resources