pythonic way to count total number of letters in string in python - python

I just started learning python, and I can think of two ways to count letters in a string (ignoring numbers, punctuation, and white spaces)
Using for loop:
for c in s:
if c.isalpha():
counter += 1
print(counter)
Create a list of alphabets and count the length of the list: (It will create an unwanted list)
import re
s = "Nice. To. Meet. You."
letters = re.findall("([a-z]|[A-Z])", s)
counter = len(letters)
print(counter)
Can anyone tell me is there a "pythonic" way to achieve the same result?
like single line code or a function called that will return an int which is the answer?
Thank you very much.

Your first approach is perfectly pythonic, and probably the way to go. You could slightly simplify, using filter or a list comprehension as:
s = "Nice. To. Meet. You."
len(list(filter(str.isalpha, s)))
# 13
Or:
len([i for i in s if i.isalpha()])
# 13
Your second approach isn't really advisable, since you don't really need to use a regex for this. Note that you could simplify that pattern to ([a-zA-Z]), by the way.

You can use regex to remove anything that is not a letter and then count the length of the string:
import re
s = "Nice. To. Meet. You."
counter = len(re.sub(r'[^a-zA-Z]','',s))

Using regex (re), you can find all letters and count their occurrence:
len(re.findall(r"[a-zA-Z]", string))

Related

Can't convert 'list'object to str implicitly Python

I am trying to import the alphabet but split it so that each character is in one array but not one string. splitting it works but when I try to use it to find how many characters are in an inputted word I get the error 'TypeError: Can't convert 'list' object to str implicitly'. Does anyone know how I would go around solving this? Any help appreciated. The code is below.
import string
alphabet = string.ascii_letters
print (alphabet)
splitalphabet = list(alphabet)
print (splitalphabet)
x = 1
j = year3wordlist[x].find(splitalphabet)
k = year3studentwordlist[x].find(splitalphabet)
print (j)
EDIT: Sorry, my explanation is kinda bad, I was in a rush. What I am wanting to do is count each individual letter of a word because I am coding a spelling bee program. For example, if the correct word is 'because', and the user who is taking part in the spelling bee has entered 'becuase', I want the program to count the characters and location of the characters of the correct word AND the user's inputted word and compare them to give the student a mark - possibly by using some kind of point system. The problem I have is that I can't simply say if it is right or wrong, I have to award 1 mark if the word is close to being right, which is what I am trying to do. What I have tried to do in the code above is split the alphabet and then use this to try and find which characters have been used in the inputted word (the one in year3studentwordlist) versus the correct word (year3wordlist).
There is a much simpler solution if you use the in keyword. You don't even need to split the alphabet in order to check if a given character is in it:
year3wordlist = ['asdf123', 'dsfgsdfg435']
total_sum = 0
for word in year3wordlist:
word_sum = 0
for char in word:
if char in string.ascii_letters:
word_sum += 1
total_sum += word_sum
# Length of characters in the ascii letters alphabet:
# total_sum == 12
# Length of all characters in all words:
# sum([len(w) for w in year3wordlist]) == 18
EDIT:
Since the OP comments he is trying to create a spelling bee contest, let me try to answer more specifically. The distance between a correctly spelled word and a similar string can be measured in many different ways. One of the most common ways is called 'edit distance' or 'Levenshtein distance'. This represents the number of insertions, deletions or substitutions that would be needed to rewrite the input string into the 'correct' one.
You can find that distance implemented in the Python-Levenshtein package. You can install it via pip:
$ sudo pip install python-Levenshtein
And then use it like this:
from __future__ import division
import Levenshtein
correct = 'because'
student = 'becuase'
distance = Levenshtein.distance(correct, student) # distance == 2
mark = ( 1 - distance / len(correct)) * 10 # mark == 7.14
The last line is just a suggestion on how you could derive a grade from the distance between the student's input and the correct answer.
I think what you need is join:
>>> "".join(splitalphabet)
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
join is a class method of str, you can do
''.join(splitalphabet)
or
str.join('', splitalphabet)
To convert the list splitalphabet to a string, so you can use it with the find() function you can use separator.join(iterable):
"".join(splitalphabet)
Using it in your code:
j = year3wordlist[x].find("".join(splitalphabet))
I don't know why half the answers are telling you how to put the split alphabet back together...
To count the number of characters in a word that appear in the splitalphabet, do it the functional way:
count = len([c for c in word if c in splitalphabet])
import string
# making letters a set makes "ch in letters" very fast
letters = set(string.ascii_letters)
def letters_in_word(word):
return sum(ch in letters for ch in word)
Edit: it sounds like you should look at Levenshtein edit distance:
from Levenshtein import distance
distance("because", "becuase") # => 2
While join creates the string from the split, you would not have to do that as you can issue the find on the original string (alphabet). However, I do not think is what you are trying to do. Note that the find that you are trying attempts to find the splitalphabet (actually alphabet) within year3wordlist[x] which will always fail (-1 result)
If what you are trying to do is to get the indices of all the letters of the word list within the alphabet, then you would need to handle it as
for each letter in the word of the word list, determine the index within alphabet.
j = []
for c in word:
j.append(alphabet.find(c))
print j
On the other hand if you are attempting to find the index of each character within the alphabet within the word, then you need to loop over splitalphabet to get an individual character to find within the word. That is
l = []
for c within splitalphabet:
j = word.find(c)
if j != -1:
l.append((c, j))
print l
This gives the list of tuples showing those characters found and the index.
I just saw that you talk about counting the number of letters. I am not sure what you mean by this as len(word) gives the number of characters in each word while len(set(word)) gives the number of unique characters. On the other hand, are you saying that your word might have non-ascii characters in it and you want to count the number of ascii characters in that word? I think that you need to be more specific in what you want to determine.
If what you are doing is attempting to determine if the characters are all alphabetic, then all you need to do is use the isalpha() method on the word. You can either say word.isalpha() and get True or False or check each character of word to be isalpha()

Find multiple list items within a string

Working on a problem in which I am trying to get a count of the number of vowels in a string. I wrote the following code:
def vowel_count(s):
count = 0
for i in s:
if i == 'a' or i == 'e' or i == 'i' or i == 'o' or i == 'u':
count += 1
print count
vowel_count(s)
While the above works, I would like to know how to do this more simply by creating a list of all vowels, then looping my If statement through that, instead of multiple boolean checks. I'm sure there's an even more elegant way to do this with import modules, but interested in this type of solution.
Relative noob...appreciate the help.
No need to create a list, you can use a string like 'aeiou' to do this:
>>> vowels = 'aeiou'
>>> s = 'fooBArSpaM'
>>> sum(c.lower() in vowels for c in s)
4
You can actually treat a string similarly to how you would a list in python (as they are both iterables), for example
vowels = 'aeiou'
sum(1 for i in s if i.lower() in vowels)
For completeness sake, others suggest vowels = set('aeiou') to allow not matching checks such as 'eio' in vowels. However note if you are iterating over your string in a for loop one character at a time, you won't run into this problem.
A weird way around this is the following:
vowels = len(s) - len(s.translate(None, 'aeiou'))
What you are doing with s.translate(None, 'aeiou') is creating a copy of the string removing all vowels. And then checking how the length differed.
Special note: the way I'm using it is even part of the official documentation
What is a vowel?
Note, though, that method presented here only replaces exactly the characters present in the second parameter of the translate string method. In particular, this means that it will not replace uppercase versions characters, let alone accented ones (like áèïôǔ).
Uppercase vowels
Solving the uppercase ones is kind of easy, just do the replacemente on a copy of the string that has been converted to lowercase:
vowels = len(s) - len(s.lower().translate(None, 'aeiou'))
Accented vowels
This one is a little bit more convoluted, but thanks to this other SO question we know the best way to do it. The resulting code would be:
from unicodedate import normalize
# translate special characters to unaccented versions
normalized_str = normalize('NFD', s).encode('ascii', 'ignore')
vowels = len(s) - len(normalized_str.lower().translate(None, 'aeiou'))
You can filter using a list comprehension, like so:
len([letter for letter in s if letter in 'aeiou'])

Counting punctuation in text using Python and regex

I am trying to count the number of times punctuation characters appear in a novel. For example, I want to find the occurrences of question marks and periods along with all the other non alphanumeric characters. Then I want to insert them into a csv file. I am not sure how to do the regex because I don't have that much experience with python. Can someone help me out?
texts=string.punctuation
counts=dict(Counter(w.lower() for w in re.findall(r"\w+", open(cwd+"/"+book).read())))
writer = csv.writer(open("author.csv", 'a'))
writer.writerow([counts.get(fieldname,0) for fieldname in texts])
In [1]: from string import punctuation
In [2]: from collections import Counter
In [3]: counts = Counter(open('novel.txt').read())
In [4]: punctuation_counts = {k:v for k, v in counts.iteritems() if k in punctuation}
from string import punctuation
from collections import Counter
with open('novel.txt') as f: # closes the file for you which is important!
c = Counter(c for line in f for c in line if c in punctuation)
This also avoids loading the whole novel into memory at once.
Btw this is what string.punctuation looks like:
>>> punctuation
'!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~'
You may want to add or detract symbols from here depending on your needs.
Also Counter defines a __missing__ with simply does return 0. So instead of down-initialising it into a dictionary and then calling .get(x, 0). Just leave it as a counter and access it like c[x], if it doesn't exist, its count is 0. I'm not sure why everybody has the sudden urge to downgrade all their Counters into dicts just because of the scary looking Counter([...]) you see when you print one, when in fact Counters are dictionaries too and deserve respect.
writer.writerow([counts.get(c, 0) for c in punctuation])
If you leave your counter you can just do this:
writer.writerow([counts[c] for c in punctuation])
and that was much easier.
import re
def count_puncts(x):
# sub. punct. with '' and returns the new string with the no. of replacements.
new_str, count = re.subn(r'\W', '', x)
return count
The code you have is very close to what you'd need if you were counting words. If you were trying to count words, the only modification you'd have to make would probably be to change the last line to this:
writer.writerows(counts.items())
Unfortunately, you're not trying to count words here. If you're looking for counts of single characters, I'd avoid using regular expressions and go straight to count. Your code might look like this:
book_text = open(cwd+"/"+book).read()
counts = {}
for character in texts:
counts[character] = book_text.count(character)
writer.writerows(counts.items())
As you might be able to tell, this makes a dictionary with the characters as keys and the number of times that character appears in the text as the value. Then we write it as we would have done for counting words.
Using curses:
import curses.ascii
str1 = "real, and? or, and? what."
t = (c for c in str1 if curses.ascii.ispunct(c))
d = dict()
for p in t:
d[p] = 1 if not p in d else d[p] + 1 for p in t

How do I calculate the number of times a word occurs in a sentence?

So I've been learning Python for some months now and was wondering how I would go about writing a function that will count the number of times a word occurs in a sentence. I would appreciate if someone could please give me a step-by-step method for doing this.
Quick answer:
def count_occurrences(word, sentence):
return sentence.lower().split().count(word)
'some string.split() will split the string on whitespace (spaces, tabs and linefeeds) into a list of word-ish things. Then ['some', 'string'].count(item) returns the number of times item occurs in the list.
That doesn't handle removing punctuation. You could do that using string.maketrans and str.translate.
# Make collection of chars to keep (don't translate them)
import string
keep = string.lowercase + string.digits + string.whitespace
table = string.maketrans(keep, keep)
delete = ''.join(set(string.printable) - set(keep))
def count_occurrences(word, sentence):
return sentence.lower().translate(table, delete).split().count(word)
The key here is that we've constructed the string delete so that it contains all the ascii characters except letters, numbers and spaces. Then str.translate in this case takes a translation table that doesn't change the string, but also a string of chars to strip out.
wilberforce has the quick, correct answer, and I'll give the long winded 'how to get to that conclusion' answer.
First, here are some tools to get you started, and some questions you need to ask yourself.
You need to read the section on Sequence Types, in the python docs, because it is your best friend for solving this problem. Seriously, read it. Once you have read that, you should have some ideas. For example you can take a long string and break it up using the split() function. To be explicit:
mystring = "This sentence is a simple sentence."
result = mystring.split()
print result
print "The total number of words is: " + str(len(result))
print "The word 'sentence' occurs: " + str(result.count("sentence"))
Takes the input string and splits it on any whitespace, and will give you:
["This", "sentence", "is", "a", "simple", "sentence."]
The total number of words is 6
The word 'sentence' occurs: 1
Now note here that you do have the period still at the end of the second 'sentence'. This is a problem because 'sentence' is not the same as 'sentence.'. If you are going to go over your list and count words, you need to make sure that the strings are identical. You may need to find and remove some punctuation.
A naieve approach to this might be:
no_period_string = mystring.replace(".", " ")
print no_period_string
To get me a period-less sentence:
"This sentence is a simple sentence"
You also need to decide if your input going to be just a single sentence, or maybe a paragraph of text. If you have many sentences in your input, you might want to find a way to break them up into individual sentences, and find the periods (or question marks, or exclamation marks, or other punctuation that ends a sentence). Once you find out where in the string the 'sentence terminator' is you could maybe split up the string at that point, or something like that.
You should give this a try yourself - hopefully I've peppered in enough hints to get you to look at some specific functions in the documentation.
Simplest way:
def count_occurrences(word, sentence):
return sentence.count(word)
text=input("Enter your sentence:")
print("'the' appears", text.count("the"),"times")
simplest way to do it
Problem with using count() method is that it not always gives the correct number of occurrence when there is overlapping, for example
print('banana'.count('ana'))
output
1
but 'ana' occurs twice in 'banana'
To solve this issue, i used
def total_occurrence(string,word):
count = 0
tempsting = string
while(word in tempsting):
count +=1
tempsting = tempsting[tempsting.index(word)+1:]
return count
You can do it like this:
def countWord(word):
numWord = 0
for i in range(1, len(word)-1):
if word[i-1:i+3] == 'word':
numWord += 1
print 'Number of times "word" occurs is:', numWord
then calling the string:
countWord('wordetcetcetcetcetcetcetcword')
will return: Number of times "word" occurs is: 2
def check_Search_WordCount(mySearchStr, mySentence):
len_mySentence = len(mySentence)
len_Sentence_without_Find_Word = len(mySentence.replace(mySearchStr,""))
len_Remaining_Sentence = len_mySentence - len_Sentence_without_Find_Word
count = len_Remaining_Sentence/len(mySearchStr)
return (int(count))
I assume that you just know about python string and for loop.
def count_occurences(s,word):
count = 0
for i in range(len(s)):
if s[i:i+len(word)] == word:
count += 1
return count
mystring = "This sentence is a simple sentence."
myword = "sentence"
print(count_occurences(mystring,myword))
explanation:
s[i:i+len(word)]: slicing the string s to extract a word having the same length with the word (argument)
count += 1 : increase the counter whenever matched.

Sequence of vowels count

This is not a homework question, it is an exam preparation question.
I should define a function syllables(word) that counts the number of syllables in
A word in the following way:
• a maximal sequence of vowels is a syllable;
• a final e in a word is not a syllable (or the vowel sequence it is a part
Of).
I do not have to deal with any special cases, such as a final e in a
One-syllable word (e.g., ’be’ or ’bee’).
>>> syllables(’honour’)
2
>>> syllables(’decode’)
2
>>> syllables(’oiseau’)
2
Should I use regular expression here or just list comprehension ?
I find regular expressions natural for this question. (I think a non-regex answer would take more coding. I use two string methods, 'lower' and 'endswith' to make the answer more clear.)
import re
def syllables(word):
word = word.lower()
if word.endswith('e'):
word = word[:-1]
count = len(re.findall('[aeiou]+', word))
return count
for word in ('honour', 'decode', 'decodes', 'oiseau', 'pie'):
print word, syllables(word)
Which prints:
honour 2
decode 2
decodes 3
oiseau 2
pie 1
Note that 'decodes' has one more syllable than 'decode' (which is strange, but fits your definition).
Question. How does this help you? Isn't the point of the study question that you work through it yourself? You may get more benefit in the future by posting a failed attempt in your question, so you can learn exactly where you are lacking.
Use regexps - most languages will let you count the number of matches of a regexp in a string.
Then special-case the terminal-e by checking the right-most match group.
I don't think regex is the right solution here.
It seems pretty straightforward to write this treating each string as a list.
Some pointers:
[abc] matches a, b or c.
A + after a regex token allows the token to match once or more
$ matches the end of the string.
(?<=x) matches the current position only if the previous character is an x.
(?!x) matches the current position only if the next character is not an x.
EDIT:
I just saw your comment that since this is not homework, actual code is requested.
Well, then:
[aeiou]+(?!(?<=e)$)
If you don't want to count final vowel sequences that end in e at all (like the u in tongue or the o in toe), then use
[aeiou]+(?=[^aeiou])|[aeiou]*[aiou]$
I'm sure you'll be able to figure out how it works if you read the explanation above.
Here's an answer without regular expressions. My real answer (also posted) uses regular expressions. Untested code:
def syllables(word):
word = word.lower()
if word.endswith('e'):
word = word[:-1]
vowels = 'aeiou'
in_vowel_group = False
vowel_groups = 0
for letter in word:
if letter in vowels:
if not in_vowel_group:
in_vowel_group = True
vowel_groups += 1
else:
in_vowel_group = False
return vowel_groups
Both ways work. You said yourself that it was for exam preparation. Use whichever is going to be on the exam. If they're both on the exam, use which you need more practice for. Just remember:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. ~Jamie Zawinski
So in my opinion, don't use regex unless you need the practice.
Regular expressions would be way too complex, and a list comprehension probably wouldn't be robust enough. You will probably be able to solve this easily using a grammar lexer like PyParsing. Give it a shot!
Use a regex that matches a,e,i,o, or u, convert the string to a list, then iterate through the list... 1 for first true, 1 for next false, 2 for next true, 2 for next false, etc.
To handle the case where the last letter is 'e' following a consonant (as in ate), just check the last two letters of the word before you start. If they match that pattern truncate the final e and process as normal.
This pattern works for your definition:
(?!e$)([aeiouy]+)
Just count how many times it occurs.

Categories