How do I count the number of occurrences of a character in a string?
e.g. 'a' appears in 'Mary had a little lamb' 4 times.
str.count(sub[, start[, end]])
Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.
>>> sentence = 'Mary had a little lamb'
>>> sentence.count('a')
4
You can use .count() :
>>> 'Mary had a little lamb'.count('a')
4
To get the counts of all letters, use collections.Counter:
>>> from collections import Counter
>>> counter = Counter("Mary had a little lamb")
>>> counter['a']
4
Regular expressions maybe?
import re
my_string = "Mary had a little lamb"
len(re.findall("a", my_string))
Python-3.x:
"aabc".count("a")
str.count(sub[, start[, end]])
Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.
myString.count('a');
more info here
str.count(a) is the best solution to count a single character in a string. But if you need to count more characters you would have to read the whole string as many times as characters you want to count.
A better approach for this job would be:
from collections import defaultdict
text = 'Mary had a little lamb'
chars = defaultdict(int)
for char in text:
chars[char] += 1
So you'll have a dict that returns the number of occurrences of every letter in the string and 0 if it isn't present.
>>>chars['a']
4
>>>chars['x']
0
For a case insensitive counter you could override the mutator and accessor methods by subclassing defaultdict (base class' ones are read-only):
class CICounter(defaultdict):
def __getitem__(self, k):
return super().__getitem__(k.lower())
def __setitem__(self, k, v):
super().__setitem__(k.lower(), v)
chars = CICounter(int)
for char in text:
chars[char] += 1
>>>chars['a']
4
>>>chars['M']
2
>>>chars['x']
0
This easy and straight forward function might help:
def check_freq(x):
freq = {}
for c in set(x):
freq[c] = x.count(c)
return freq
check_freq("abbabcbdbabdbdbabababcbcbab")
{'a': 7, 'b': 14, 'c': 3, 'd': 3}
If a comprehension is desired:
def check_freq(x):
return {c: x.count(c) for c in set(x)}
Regular expressions are very useful if you want case-insensitivity (and of course all the power of regex).
my_string = "Mary had a little lamb"
# simplest solution, using count, is case-sensitive
my_string.count("m") # yields 1
import re
# case-sensitive with regex
len(re.findall("m", my_string))
# three ways to get case insensitivity - all yield 2
len(re.findall("(?i)m", my_string))
len(re.findall("m|M", my_string))
len(re.findall(re.compile("m",re.IGNORECASE), my_string))
Be aware that the regex version takes on the order of ten times as long to run, which will likely be an issue only if my_string is tremendously long, or the code is inside a deep loop.
I don't know about 'simplest' but simple comprehension could do:
>>> my_string = "Mary had a little lamb"
>>> sum(char == 'a' for char in my_string)
4
Taking advantage of built-in sum, generator comprehension and fact that bool is subclass of integer: how may times character is equal to 'a'.
a = 'have a nice day'
symbol = 'abcdefghijklmnopqrstuvwxyz'
for key in symbol:
print(key, a.count(key))
An alternative way to get all the character counts without using Counter(), count and regex
counts_dict = {}
for c in list(sentence):
if c not in counts_dict:
counts_dict[c] = 0
counts_dict[c] += 1
for key, value in counts_dict.items():
print(key, value)
I am a fan of the pandas library, in particular the value_counts() method. You could use it to count the occurrence of each character in your string:
>>> import pandas as pd
>>> phrase = "I love the pandas library and its `value_counts()` method"
>>> pd.Series(list(phrase)).value_counts()
8
a 5
e 4
t 4
o 3
n 3
s 3
d 3
l 3
u 2
i 2
r 2
v 2
` 2
h 2
p 1
b 1
I 1
m 1
( 1
y 1
_ 1
) 1
c 1
dtype: int64
count is definitely the most concise and efficient way of counting the occurrence of a character in a string but I tried to come up with a solution using lambda, something like this :
sentence = 'Mary had a little lamb'
sum(map(lambda x : 1 if 'a' in x else 0, sentence))
This will result in :
4
Also, there is one more advantage to this is if the sentence is a list of sub-strings containing same characters as above, then also this gives the correct result because of the use of in. Have a look :
sentence = ['M', 'ar', 'y', 'had', 'a', 'little', 'l', 'am', 'b']
sum(map(lambda x : 1 if 'a' in x else 0, sentence))
This also results in :
4
But Of-course this will work only when checking occurrence of single character such as 'a' in this particular case.
a = "I walked today,"
c=['d','e','f']
count=0
for i in a:
if str(i) in c:
count+=1
print(count)
I know the ask is to count a particular letter. I am writing here generic code without using any method.
sentence1 =" Mary had a little lamb"
count = {}
for i in sentence1:
if i in count:
count[i.lower()] = count[i.lower()] + 1
else:
count[i.lower()] = 1
print(count)
output
{' ': 5, 'm': 2, 'a': 4, 'r': 1, 'y': 1, 'h': 1, 'd': 1, 'l': 3, 'i': 1, 't': 2, 'e': 1, 'b': 1}
Now if you want any particular letter frequency, you can print like below.
print(count['m'])
2
the easiest way is to code in one line:
'Mary had a little lamb'.count("a")
but if you want can use this too:
sentence ='Mary had a little lamb'
count=0;
for letter in sentence :
if letter=="a":
count+=1
print (count)
To find the occurrence of characters in a sentence you may use the below code
Firstly, I have taken out the unique characters from the sentence and then I counted the occurrence of each character in the sentence these includes the occurrence of blank space too.
ab = set("Mary had a little lamb")
test_str = "Mary had a little lamb"
for i in ab:
counter = test_str.count(i)
if i == ' ':
i = 'Space'
print(counter, i)
Output of the above code is below.
1 : r ,
1 : h ,
1 : e ,
1 : M ,
4 : a ,
1 : b ,
1 : d ,
2 : t ,
3 : l ,
1 : i ,
4 : Space ,
1 : y ,
1 : m ,
"Without using count to find you want character in string" method.
import re
def count(s, ch):
pass
def main():
s = raw_input ("Enter strings what you like, for example, 'welcome': ")
ch = raw_input ("Enter you want count characters, but best result to find one character: " )
print ( len (re.findall ( ch, s ) ) )
main()
Python 3
Ther are two ways to achieve this:
1) With built-in function count()
sentence = 'Mary had a little lamb'
print(sentence.count('a'))`
2) Without using a function
sentence = 'Mary had a little lamb'
count = 0
for i in sentence:
if i == "a":
count = count + 1
print(count)
Use count:
sentence = 'A man walked up to a door'
print(sentence.count('a'))
# 4
Taking up a comment of this user:
import numpy as np
sample = 'samplestring'
np.unique(list(sample), return_counts=True)
Out:
(array(['a', 'e', 'g', 'i', 'l', 'm', 'n', 'p', 'r', 's', 't'], dtype='<U1'),
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1]))
Check 's'. You can filter this tuple of two arrays as follows:
a[1][a[0]=='s']
Side-note: It works like Counter() of the collections package, just in numpy, which you often import anyway. You could as well count the unique words in a list of words instead.
This is an extension of the accepted answer, should you look for the count of all the characters in the text.
# Objective: we will only count for non-empty characters
text = "count a character occurrence"
unique_letters = set(text)
result = dict((x, text.count(x)) for x in unique_letters if x.strip())
print(result)
# {'a': 3, 'c': 6, 'e': 3, 'u': 2, 'n': 2, 't': 2, 'r': 3, 'h': 1, 'o': 2}
No more than this IMHO - you can add the upper or lower methods
def count_letter_in_str(string,letter):
return string.count(letter)
You can use loop and dictionary.
def count_letter(text):
result = {}
for letter in text:
if letter not in result:
result[letter] = 0
result[letter] += 1
return result
spam = 'have a nice day'
var = 'd'
def count(spam, var):
found = 0
for key in spam:
if key == var:
found += 1
return found
count(spam, var)
print 'count %s is: %s ' %(var, count(spam, var))
Related
Implement the function most_popular_character(my_string), which gets the string argument my_string and returns its most frequent letter. In case of a tie, break it by returning the letter of smaller ASCII value.
Note that lowercase and uppercase letters are considered different (e.g., ‘A’ < ‘a’). You may assume my_string consists of English letters only, and is not empty.
Example 1: >>> most_popular_character("HelloWorld") >>> 'l'
Example 2: >>> most_popular_character("gggcccbb") >>> 'c'
Explanation: cee and gee appear three times each (and bee twice), but cee precedes gee lexicographically.
Hints (you may ignore these):
Build a dictionary mapping letters to their frequency;
Find the largest frequency;
Find the smallest letter having that frequency.
def most_popular_character(my_string):
char_count = {} # define dictionary
for c in my_string:
if c in char_count: #if c is in the dictionary:
char_count[c] = 1
else: # if c isn't in the dictionary - create it and put 1
char_count[c] = 1
sorted_chars = sorted(char_count) # sort the dictionary
char_count = char_count.keys() # place the dictionary in a list
max_per = 0
for i in range(len(sorted_chars) - 1):
if sorted_chars[i] >= sorted_chars[i+1]:
max_per = sorted_chars[i]
break
return max_per
my function returns 0 right now, and I think the problem is in the last for loop and if statement - but I can't figure out what the problem is..
If you have any suggestions on how to adjust the code it would be very appreciated!
Your dictionary didn't get off to a good start by you forgetting to add 1 to the character count, instead you are resetting to 1 each time.
Have a look here to get the gist of getting the maximum value from a dict: https://datagy.io/python-get-dictionary-key-with-max-value/
def most_popular_character(my_string):
# NOTE: you might want to convert the entire sting to upper or lower case, first, depending on the use
# e.g. my_string = my_string.lower()
char_count = {} # define dictionary
for c in my_string:
if c in char_count: #if c is in the dictionary:
char_count[c] += 1 # add 1 to it
else: # if c isn't in the dictionary - create it and put 1
char_count[c] = 1
# Never under estimate the power of print in debugging
print(char_count)
# max(char_count.values()) will give the highest value
# But there may be more than 1 item with the highest count, so get them all
max_keys = [key for key, value in char_count.items() if value == max(char_count.values())]
# Choose the lowest by sorting them and pick the first item
low_item = sorted(max_keys)[0]
return low_item, max(char_count.values())
print(most_popular_character("HelloWorld"))
print(most_popular_character("gggcccbb"))
print(most_popular_character("gggHHHAAAAaaaccccbb 12 3"))
Result:
{'H': 1, 'e': 1, 'l': 3, 'o': 2, 'W': 1, 'r': 1, 'd': 1}
('l', 3)
{'g': 3, 'c': 3, 'b': 2}
('c', 3)
{'g': 3, 'H': 3, 'A': 4, 'a': 3, 'c': 4, 'b': 2, ' ': 2, '1': 1, '2': 1, '3': 1}
('A', 4)
So: l and 3, c and 3, A and 4
def most_popular_character(my_string):
history_l = [l for l in my_string] #each letter in string
char_dict = {} #creating dict
for item in history_l: #for each letter in string
char_dict[item] = history_l.count(item)
return [max(char_dict.values()),min(char_dict.values())]
I didn't understand the last part of minimum frequency, so I make this function return a maximum frequency and a minimum frequency as a list!
Use a Counter to count the characters, and use the max function to select the "biggest" character according to your two criteria.
>>> from collections import Counter
>>> def most_popular_character(my_string):
... chars = Counter(my_string)
... return max(chars, key=lambda c: (chars[c], -ord(c)))
...
>>> most_popular_character("HelloWorld")
'l'
>>> most_popular_character("gggcccbb")
'c'
Note that using max is more efficient than sorting the entire dictionary, because it only needs to iterate over the dictionary once and find the single largest item, as opposed to sorting every item relative to every other item.
I have a string "Hello I am going to I with hello am". I want to find how many times a word occur in the string. Example hello occurs 2 time. I tried this approach that only prints characters -
def countWord(input_string):
d = {}
for word in input_string:
try:
d[word] += 1
except:
d[word] = 1
for k in d.keys():
print "%s: %d" % (k, d[k])
print countWord("Hello I am going to I with Hello am")
I want to learn how to find the word count.
If you want to find the count of an individual word, just use count:
input_string.count("Hello")
Use collections.Counter and split() to tally up all the words:
from collections import Counter
words = input_string.split()
wordCount = Counter(words)
from collections import *
import re
Counter(re.findall(r"[\w']+", text.lower()))
Using re.findall is more versatile than split, because otherwise you cannot take into account contractions such as "don't" and "I'll", etc.
Demo (using your example):
>>> countWords("Hello I am going to I with hello am")
Counter({'i': 2, 'am': 2, 'hello': 2, 'to': 1, 'going': 1, 'with': 1})
If you expect to be making many of these queries, this will only do O(N) work once, rather than O(N*#queries) work.
Counter from collections is your friend:
>>> from collections import Counter
>>> counts = Counter(sentence.lower().split())
The vector of occurrence counts of words is called bag-of-words.
Scikit-learn provides a nice module to compute it, sklearn.feature_extraction.text.CountVectorizer. Example:
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(analyzer = "word", \
tokenizer = None, \
preprocessor = None, \
stop_words = None, \
min_df = 0, \
max_features = 50)
text = ["Hello I am going to I with hello am"]
# Count
train_data_features = vectorizer.fit_transform(text)
vocab = vectorizer.get_feature_names()
# Sum up the counts of each vocabulary word
dist = np.sum(train_data_features.toarray(), axis=0)
# For each, print the vocabulary word and the number of times it
# appears in the training set
for tag, count in zip(vocab, dist):
print count, tag
Output:
2 am
1 going
2 hello
1 to
1 with
Part of the code was taken from this Kaggle tutorial on bag-of-words.
FYI: How to use sklearn's CountVectorizerand() to get ngrams that include any punctuation as separate tokens?
Considering Hello and hello as same words, irrespective of their cases:
>>> from collections import Counter
>>> strs="Hello I am going to I with hello am"
>>> Counter(map(str.lower,strs.split()))
Counter({'i': 2, 'am': 2, 'hello': 2, 'to': 1, 'going': 1, 'with': 1})
Here is an alternative, case-insensitive, approach
sum(1 for w in s.lower().split() if w == 'Hello'.lower())
2
It matches by converting the string and target into lower-case.
ps: Takes care of the "am ham".count("am") == 2 problem with str.count() pointed out by #DSM below too :)
You can divide the string into elements and calculate their number
count = len(my_string.split())
You can use the Python regex library re to find all matches in the substring and return the array.
import re
input_string = "Hello I am going to I with Hello am"
print(len(re.findall('hello', input_string.lower())))
Prints:
2
def countSub(pat,string):
result = 0
for i in range(len(string)-len(pat)+1):
for j in range(len(pat)):
if string[i+j] != pat[j]:
break
else:
result+=1
return result
Hello I asked this question previously and I wanted to adjust the code that I have now. I want to adjust this code so that if a letter is not present in a text string it still returns the value 0 to it assigned.
count = {}
for l in text.lower():
if l in let:
if l in count.keys():
count[l] += 1
else:
count[l] = 1
return count
It currently returns this:
example = "Sample String"
print(func(example, "sao")
{'s': 2, 'a' : 1}
This would be my desired output
example = "Sample String"
print(func(example, "sao"))
{'s': 2, 'a' : 1, 'o' :0}
If you don't mind using tools designed especially for your purpose, then the following will do:
from collections import Counter
def myfunc(inp, vals):
c = Counter(inp)
return {e: c[e] for e in vals}
s = 'Sample String'
print(myfunc(s, 'sao')
Otherwise you can explicitly set all missing values in your functions.
def func(inp, vals):
count = {e:0 for e in vals}
for s in inp:
if s in count:
count[s] += 1
return count
# create a function
def stringFunc(string, letters):
# convert string of letters to a list of letters
letter_list = list(letters)
# dictionary comprehension to count the number of times a letter is in the string
d = {letter: string.lower().count(letter) for letter in letter_list}
return d
stringFunc('Hello World', 'lohdx')
# {'l': 3, 'o': 2, 'h': 1, 'd': 1, 'x': 0}
You can use a Dict Comprehensions and str.count:
def count_letters(text, letters):
lower_text = text.lower()
return {c: lower_text.count(c) for c in letters}
print(count_letters("Sample String", "sao"))
result: {'s': 2, 'a': 1, 'o': 0}
You can use collections.Counter and obtain character counts via the get method:
from collections import Counter
def func(string, chars):
counts = Counter(string.lower())
return {c: counts.get(c, 0) for c in chars}
So I am trying to implement code that will count the next letter in a sentence, using python.
so for instance,
"""So I am trying to implement code that will count the next letter in a sentence, using
python"""
most common letters one after the other
for 's'
'o' :1
'e' :1
for 'o'
' ' :1
'd' :1
'u' :1
'n' :1
I think you get the idea
I already have written code for counting letters prior
def count_letters(word, char):
count = 0
for c in word:
if char == c:
count += 1
return count
As you can see this just counts for letters, but not the next letter. can someone give me a hand on this one?
from collections import Counter, defaultdict
counts = defaultdict(Counter)
s = """So I am trying to implement code that will count the next letter in a sentence, using
python""".lower()
for c1, c2 in zip(s, s[1:]):
counts[c1][c2] += 1
(apart from being simpler, this should be significantly faster than pault's answer by not iterating over the string for every letter)
Concepts to google that aren't named in the code:
for c1, c2 in ... (namely the fact that there are two variables): tuple unpacking
s[1:]: slicing. Basically this is a copy of the string after the first character.
Here is a relatively terse way to do it:
from itertools import groupby
from collections import Counter
def countTransitionFrequencies(text):
prevNext = list(zip(text[:-1], text[1:]))
prevNext.sort(key = lambda pn: pn[0])
transitions = groupby(prevNext, lambda pn: pn[0])
freqs = map(
lambda kts: (kts[0], Counter(map(lambda kv: kv[1], kts[1]))),
transitions
)
return freqs
Explanation:
zip creates list of pairs with (previous, next) characters
The pairs are sorted and grouped by the previous character
The frequencies of the next characters (extracted from pairs by kv[1]) are then counted using Counter.
Sorting is not really necessary, but unfortunately, this is how the provided groupby works.
An example:
for k, v in countTransitionFrequencies("hello world"):
print("%r -> %r" % (k, v))
This prints:
' ' -> Counter({'w': 1})
'e' -> Counter({'l': 1})
'h' -> Counter({'e': 1})
'l' -> Counter({'l': 1, 'o': 1, 'd': 1})
'o' -> Counter({' ': 1, 'r': 1})
'r' -> Counter({'l': 1})
'w' -> Counter({'o': 1})
Here's a way using collections.Counter:
Suppose the string you provided was stored in a variable s.
First we iterate over the set of all lower case letters in s. We do this by making another string s_lower which will convert the string s to lowercase. We then wrap this with the set constructor to get unique values.
For each char, we iterate through the string and check to see if the previous letter is equal to char. If so, we store this in a list. Finally, we pass this list into the collections.Counter constructor which will count the occurrences.
Each counter is stored in a dictionary, counts, where the keys are the unique characters in the string.
from collections import Counter
counts = {}
s_lower = s.lower()
for char in set(s_lower):
counts[char] = Counter(
[c for i, c in enumerate(s_lower) if i > 0 and s_lower[i-1] == char]
)
For your string, this has the following outputs:
>>> print(counts['s'])
#Counter({'i': 1, 'e': 1, 'o': 1})
>>> print(counts['o'])
#Counter({' ': 2, 'd': 1, 'n': 1, 'u': 1})
One caveat is that this method will iterate through the whole string for each unique character, which could potentially make it slow for large lists.
Here is an alternative approach using collections.Counter and collections.defaultdict that only loops through the string once:
from collections import defaultdict, Counter
def count_letters(s):
s_lower = s.lower()
counts = defaultdict(Counter)
for i in range(len(s_lower) - 1):
curr_char = s_lower[i]
next_char = s_lower[i+1]
counts[curr_char].update(next_char)
return counts
counts = count_letters(s)
We loop over each character in the string (except the last) and on each iteration we update a counter using the next character.
This should work, the only thing is it doesn't sort the values, but that can be solved by creating a new dictionary with list of tuples (char, occurrences) and using sorted function on tuple[1].
def countNext(word):
d = {}
word = word.lower()
for i in range(len(word) - 1):
c = word[i]
cc = word[i+1]
if(not c.isalpha() or not cc.isalpha()):
continue
if c in d:
if cc in d[c]:
d[c][cc] += 1
else:
d[c][cc] = 1
else:
d[c] = {}
d[c][cc] = 1
return d
I am trying to make my program count everything but numbers in a string, and store it in a dictionary.
So far I have this:
string = str(input("Enter a string: "))
stringUpper = string.upper()
dict = {}
for n in stringUpper:
keys = dict.keys()
if n in keys:
dict[n] += 1
else:
dict[n] = 1
print(dict)
I just want the alphabetical numbers quantified, but I cannot figure out how to exclude the non-alphabetical characters.
Basically there are multiple steps involved:
Getting rid of the chars that you don't want to count
Count the remaining
You have several options available to do these. I'll just present one option, but keep in mind that there might be other (and better) alternatives.
from collections import Counter
the_input = input('Enter something')
Counter(char for char in the_input.upper() if char.isalpha())
For example:
Enter something: aashkfze3f8237rhbjasdkvjuhb
Counter({'A': 3,
'B': 2,
'D': 1,
'E': 1,
'F': 2,
'H': 3,
'J': 2,
'K': 2,
'R': 1,
'S': 2,
'U': 1,
'V': 1,
'Z': 1})
So it obviously worked. Here I used collections.Counter to count and a generator expression using str.isalpha as condition to get rid of the unwanted characters.
Note that there are several bad habits in your code that will make your life more complicated than it needs to be:
dict = {} will shadow the built-in dict. So it's better to choose a different name.
string is the name of a built-in module, so here a different name might be better (but not str which is a built-in name as well).
stringUpper = string.upper(). In Python you generally don't use camelCase but use _ to seperate word (i.e. string_upper) but since you only use it to loop over you might as well use for n in string.upper(): directly.
Variable names like n aren't very helpful. Usually you can name them char or character when iterating over a string or item when iterating over a "general" iterable.
You can use re to replace all non-alphabetical characters before doing any manipulation:
regex = re.compile('[^a-zA-Z]')
#First parameter is the replacement, second parameter is your input string
regex.sub('', stringUpper )
string = str(input("Enter a string: "))
stringUpper = string.upper()
dict = {}
for n in stringUpper:
if n not in '0123456789':
keys = dict.keys()
if n in keys:
dict[n] += 1
else:
dict[n] = 1
print(dict)
for n in stringUpper:
if n.isalpha()
dict[n] += 1
else:
dict[n] = 1
print(dict)
You can check string for alphanumeric
n.isalnum()
for aphabetic:
n.isalpha()
So your code will be like:
dict = {}
for n in stringUpper:
if n.isalpha():
keys = dict.keys()
if n in keys:
dict[n] += 1
else:
dict[n] = 1
print(dict)
else:
#do something....
While iterating, check if the lower() and upper() is the same for a character. If they are different from each other, then it is an alphabetical letter.
if n.upper() == n.lower():
continue
This should do it.