Smallest alphabet with maximum occurences in python string - python

There is a python string for example "programming"
I have to find the smallest letter with highest occurrences.
That is for given input string = "programming"
the output should be g 2
I tried this , but not able to achieve the solution
str1 = 'programming'
max_freq = {}
for i in str1:
if i in max_freq:
max_freq[i] += 1
else:
max_freq[i] = 1
res = max(max_freq, key = max_freq.get)
Any help would be really appreciated

You can use Counter and achieve this.
Count the frequency of each letter using Counter. This will give you a dict
Sort the dict first by the values in descending order and then by the keys value in ascending order
The First item is your answer
from collections import Counter
str1 = 'programming'
d = Counter(str1)
d = sorted(d.items(), key= lambda x: (-x[1], x[0]))
print(d[0])
('g', 2)
For your code to work, replace the last line with this
res = sorted(max_freq.items(), key=lambda x: (-x[1], x[0]))[0]
res will have the smallest letter with maximum occurrences. i.e, ('g', 2)

You are close, you just aren't getting the max correctly. If all you care about is the number, then you could modify your example slightly:
str1 = 'programming'
max_freq = {}
for i in str1:
if i in max_freq:
max_freq[i] += 1
else:
max_freq[i] = 1
res = max(max_freq.values())

Related

Python custom comparator to sort a specific list

I have an input list like [1,2,2,1,6] the task in hand is to sort by the frequency. I have solved this question and am getting the output as [1,2,6].
But the caveat is that if two of the numbers have the same count like count(1) == count(2). So the desired output is [2,1,6]
then in the output array, 2 must come before 1 as 2 > 1.
So for the input [1,1,2,2,3,3] the output should be [3,2,1]. The counts are the same so they got sorted by their actual values.
This is what I did
input format:
number of Test cases
The list input.
def fun(l):
d = {}
for i in l:
if i in d:
d[i] += 1
else:
d[i] = 1
d1 = sorted(d,key = lambda k: d[k], reverse=True)
return d1
try:
test = int(input())
ans = []
while test:
l = [int(x) for x in input().split()]
ans.append(fun(l))
test -= 1
for i in ans:
for j in i:
print(j, end = " ")
print()
except:
pass
I think that this can help you. I added reverse parameter that is setting by default to True, because that gives the solution, but I wrote in the code where you can edit this as you may.
Here is the code:
from collections import defaultdict # To use a dictionary, but initialized with a default value
def fun(l, reverse = True):
d = defaultdict(int)
# Add count
for i in l:
d[i] += 1
# Create a dictionary where keys are values
new_d = defaultdict(list)
for key,value in d.items():
new_d[value].append(key)
# Get frequencies
list_freq = list(new_d.keys())
list_freq.sort(reverse = reverse) #YOU CAN CHANGE THIS
list_freq
# Add numbers in decreasing order by frequency
# If two integers have the same frequency, the greater number goes first
ordered_list = []
for number in list_freq:
values_number = new_d[number]
values_number.sort(reverse = reverse) # YOU CAN CHANGE THIS
ordered_list.extend(values_number)
return ordered_list
Examples:
l = [1,2,2,1,6]
fun(l)
#Output [2,1,6]
I hope this can help you!

Remove N consecutive repeated characters in a string

I am trying to solve a problem where the user inputs a string say str = "aaabbcc" and an integer n = 2.
So the function is supposed to remove characters that appearing 'n' times from the str and output only "aaa".
I tried couple of approaches and I'm not able to obtain the right output.
Are there any Regular expression functions that I could use or any recursive functions or just plain old iterations.
Thanks in advance.
Using itertools.groupby
Ex:
from itertools import groupby
s = "aaabbcc"
n = 2
result = ""
for k, v in groupby(s):
value = list(v)
if not len(value) == n:
result += "".join(value)
print(result)
Output:
aaa
You can use itertools.groupby:
>>> s = "aaabbccddddddddddeeeee"
>>> from itertools import groupby
>>> n = 3
>>> groups = (list(values) for _, values in groupby(s))
>>> "".join("".join(v) for v in groups if len(v) < n)
'bbcc'
from collections import Counter
counts = Counter(string)
string = "".join(c for c in string if counts[c] != 2)
Edit: Wait, sorry, I missed "consecutive". This will remove characters that occur exactly two times in the whole string (fitting your example, but not the general case).
Consecutive filter is a bit more complex, but doable - just find the consecutive runs first, then filter out the ones which have length two.
runs = [[string[0], 0]]
for c in string:
if c == runs[-1][0]:
runs[-1][1] += 1
else:
runs.append([c, 1])
string = "".join(c*length for c,length in runs if length != 2)
Edit2: As the other answers correctly point out, the first part of this is done natively by groupby
from itertools import groupby
string = "".join(c*length for c,length in groupby(string) if length != 2)
In [15]: some_string = 'aaabbcc'
In [16]: n = 2
In [17]: final_string = ''
In [18]: for k, v in Counter(some_string).items():
...: if v != n:
...: final_string += k * v
...:
In [19]: final_string
Out[19]: 'aaa'
You'll need: from collections import Counter
from collections import defaultdict
def fun(string,n):
dic = defaultdict(int)
for i in string:
dic[i]+=1
check = []
for i in dic:
if dic[i]==n:
check.append(i)
for i in check:
del dic[i]
return dic
string = "aaabbcc"
n = 2
result = fun(string, n)
sol =''
for i in result:
sol+=i*result[i]
print(sol)
output
aaa

How to return most repeated letter in python?

I've done some digging and most use arrays, but our class is not that far and we're to use mostly for loops to return the most repeated letter in a function.
Here was my code so far, but all I could get was to return the count of the first letter.
def most_repeated_letters(word_1):
x = 0
z = 0
for letter in word_1:
y = word_1.count(letter[0:])
if y > z:
z = y
x += 1
return z
print most_repeated_letters('jackaby')
Make use collections.Counter
from collections import Counter
c = Counter('jackaby').most_common(1)
print(c)
# [('a', 2)]
There are a few problems with your code:
you calculate the count of the most common letter, but not the letter itself
you return inside the loop and thus after the very first letter
also, you never use x, and the slicing of letter is unneccesary
Some suggestions to better spot those errors yourself:
use more meaningful variable names
use more than two spaces for indentation
Fixing those, your code might look something like this:
def most_repeated_letters(word_1):
most_common_count = 0
most_common_letter = None
for letter in word_1:
count = word_1.count(letter)
if count > most_common_count:
most_common_count = count
most_common_letter = letter
return most_common_letter
Once you are comfortable with Python's basic language features, you should have a closer look at the builtin functions. In fact, your entire function can be reduced to a single line using max, using the word_1.count as the key function for comparison.
def most_repeated_letters(word_1):
return max(word_1, key=word_1.count)
But while this is very short, it is not very efficient, as the count function is called for each letter in the word, giving the function quadratic complexity O(n²). Instead, you can use a dict to store counts of individual letters and increase those counts in a single pass over the word in O(n).
def most_repeated_letters(word_1):
counts = {}
for letter in word_1:
if letter not in counts:
counts[letter] = 1
else:
counts[letter] += 1
return max(counts, key=counts.get)
And this is basically the same as what collections.Counter would do, as already described in another answer.
If you don't want to use Counter:
def most_repeated_letters(word_1):
lettersCount = {}
for ch in word_1:
if ch not in lettersCount:
lettersCount[ch] = 1
else:
lettersCount[ch] += 1
return max(lettersCount, key=lettersCount.get)
print(most_repeated_letters('jackabybb'))
Here is a code that works for multiple:
def most_repeated_letters(word_1):
d = {}
for letter in word_1:
if not d.get(letter):
d[letter] = 0
d[letter] = d.get(letter) + 1
ret = {}
for k,v in d.iteritems():
if d[k] == max(d.values()):
ret[k] = v
return ret
most_repeated_letters('jackaby')
If you don’t want to use the collections modue :
def mostRepeatedLetter(text):
counter = {}
for letter in text:
if letter in counter:
counter[letter]+=1
else:
counter[letter]=1
max = { letter: 0, quantity: 0 }
for key, value in counter.items():
if value > max.quantity:
max.letter, max.quantity = key, value
return max

Creating a Letter a Histogram

So I want to create a histogram.
Here is my code:
def histogram(s):
d = dict()
for c in s:
if c not in d:
d[c] = 1
else:
d[c] += 1
return d
def print_hist(h):
for c in h:
print c, h[c]
It give me this:
>>> h = histogram('parrot')
>>> print_hist(h)
a 1
p 1
r 2
t 1
o 1
But I want this:
a: 1
o: 1
p: 1
r: 2
t: 1
So how can I get my histogram in alphabetical order, be case sensitive (so "a" and "A" are the same), and list the whole alphabet (so letters that are not in the string just get a zero)?
Use an ordered dictionary which store keys in the order they were put in.
from collections import OrderedDict
import string
def count(s):
histogram = OrderedDict((c,0) for c in string.lowercase)
for c in s:
if c in string.letters:
histogram[c.lower()] += 1
return histogram
for letter, c in count('parrot').iteritems():
print '{}:{}'.format(letter, c)
Result:
a:1
b:0
c:0
d:0
e:0
f:0
g:0
h:0
i:0
j:0
k:0
l:0
m:0
n:0
o:1
p:1
q:0
r:2
s:0
t:1
u:0
v:0
w:0
x:0
y:0
z:0
Just use collections.Counter for this, unless you really want your own:
>>> import collections
>>> c = collections.Counter('parrot')
>>> sorted(c.items(), key=lambda c: c[0])
[('a', 1), ('o', 1), ('p', 1), ('r', 2), ('t', 1)]
EDIT: As commenters pointed out, your last sentence indicates you want data on all the letters of the alphabet that do not occur in your word. Counter is good for this also since, as the docs indicate:
Counter objects have a dictionary interface except that they return a zero count for missing items instead of raising a KeyError.
So you can just iterate through something like string.ascii_lowercase:
>>> import string
>>> for letter in string.ascii_lowercase:
... print('{}: {}'.format(letter, c[letter]))
...
a: 1
b: 0
c: 0
d: 0
e: 0
f: 0
g: 0
h: 0
i: 0
j: 0
k: 0
l: 0
m: 0
n: 0
o: 1
p: 1
q: 0
r: 2
s: 0
t: 1
u: 0
v: 0
w: 0
x: 0
y: 0
z: 0
Finally, rather than implementing something complicated to merge the results of upper- and lowercase letters, just normalize your input first:
c = collections.Counter('PaRrOt'.lower())
A trivial answer would be:
import string
for letter in string.ascii_lowercase:
print letter, ': ', h.lower().count(letter)
(highly inefficient as you go through the string 26 times)
Can also use a Counter
from collections import Counter
import string
cnt = Counter(h.lower())
for letter in string.ascii_lowercase:
print letter, ': ', cnt[letter]
Quite neater.
If you want it ordered then you are going to have to use an ordereddictionary
You also are going to need to order the letters before you add them to the dictionary
It is not clear to me I think you want a case insensitive result so we need to get all letters in one case
from collections import OrderedDict as od
import string
def histogram(s):
first we need to create the dictionary that has all of the lower case letters
we imported string which will provide us a list but I think it is all lowercase including unicode so we need to only use the first 26 in string.lowercase
d = od()
for each_letter in string.lowercase[0:26]:
d[each_letter] = 0
Once the dictionary is created then we just need to iterate through the word after it has been lowercased. Please note that this will blow up with any word that has a number or a space. You may or may not want to test or add numbers and spaces to your dictionary. One way to keep it from blowing up is to try to add a value. If the value is not in the dictionary just ignore it.
for c in s.lower():
try:
d[c] += 1
except ValueError:
pass
return d
If you want to list the whole (latin only) alphabet anyway, you could use a list of length 26:
hist = [0] * 26
for c in s.lower():
hist[orc(c) - ord('a')] += 1
To get the desired output:
for x in range(26):
print chr(x), ":", hist[x]
Check this function for your output
def print_hist(h):
for c in sorted(h):
print c, h[c]

Finding the most frequent character in a string

I found this programming problem while looking at a job posting on SO. I thought it was pretty interesting and as a beginner Python programmer I attempted to tackle it. However I feel my solution is quite...messy...can anyone make any suggestions to optimize it or make it cleaner? I know it's pretty trivial, but I had fun writing it. Note: Python 2.6
The problem:
Write pseudo-code (or actual code) for a function that takes in a string and returns the letter that appears the most in that string.
My attempt:
import string
def find_max_letter_count(word):
alphabet = string.ascii_lowercase
dictionary = {}
for letters in alphabet:
dictionary[letters] = 0
for letters in word:
dictionary[letters] += 1
dictionary = sorted(dictionary.items(),
reverse=True,
key=lambda x: x[1])
for position in range(0, 26):
print dictionary[position]
if position != len(dictionary) - 1:
if dictionary[position + 1][1] < dictionary[position][1]:
break
find_max_letter_count("helloworld")
Output:
>>>
('l', 3)
Updated example:
find_max_letter_count("balloon")
>>>
('l', 2)
('o', 2)
There are many ways to do this shorter. For example, you can use the Counter class (in Python 2.7 or later):
import collections
s = "helloworld"
print(collections.Counter(s).most_common(1)[0])
If you don't have that, you can do the tally manually (2.5 or later has defaultdict):
d = collections.defaultdict(int)
for c in s:
d[c] += 1
print(sorted(d.items(), key=lambda x: x[1], reverse=True)[0])
Having said that, there's nothing too terribly wrong with your implementation.
If you are using Python 2.7, you can quickly do this by using collections module.
collections is a hight performance data structures module. Read more at
http://docs.python.org/library/collections.html#counter-objects
>>> from collections import Counter
>>> x = Counter("balloon")
>>> x
Counter({'o': 2, 'a': 1, 'b': 1, 'l': 2, 'n': 1})
>>> x['o']
2
Here is way to find the most common character using a dictionary
message = "hello world"
d = {}
letters = set(message)
for l in letters:
d[message.count(l)] = l
print d[d.keys()[-1]], d.keys()[-1]
Here's a way using FOR LOOP AND COUNT()
w = input()
r = 1
for i in w:
p = w.count(i)
if p > r:
r = p
s = i
print(s)
The way I did uses no built-in functions from Python itself, only for-loops and if-statements.
def most_common_letter():
string = str(input())
letters = set(string)
if " " in letters: # If you want to count spaces too, ignore this if-statement
letters.remove(" ")
max_count = 0
freq_letter = []
for letter in letters:
count = 0
for char in string:
if char == letter:
count += 1
if count == max_count:
max_count = count
freq_letter.append(letter)
if count > max_count:
max_count = count
freq_letter.clear()
freq_letter.append(letter)
return freq_letter, max_count
This ensures you get every letter/character that gets used the most, and not just one. It also returns how often it occurs. Hope this helps :)
If you want to have all the characters with the maximum number of counts, then you can do a variation on one of the two ideas proposed so far:
import heapq # Helps finding the n largest counts
import collections
def find_max_counts(sequence):
"""
Returns an iterator that produces the (element, count)s with the
highest number of occurrences in the given sequence.
In addition, the elements are sorted.
"""
if len(sequence) == 0:
raise StopIteration
counter = collections.defaultdict(int)
for elmt in sequence:
counter[elmt] += 1
counts_heap = [
(-count, elmt) # The largest elmt counts are the smallest elmts
for (elmt, count) in counter.iteritems()]
heapq.heapify(counts_heap)
highest_count = counts_heap[0][0]
while True:
try:
(opp_count, elmt) = heapq.heappop(counts_heap)
except IndexError:
raise StopIteration
if opp_count != highest_count:
raise StopIteration
yield (elmt, -opp_count)
for (letter, count) in find_max_counts('balloon'):
print (letter, count)
for (word, count) in find_max_counts(['he', 'lkj', 'he', 'll', 'll']):
print (word, count)
This yields, for instance:
lebigot#weinberg /tmp % python count.py
('l', 2)
('o', 2)
('he', 2)
('ll', 2)
This works with any sequence: words, but also ['hello', 'hello', 'bonjour'], for instance.
The heapq structure is very efficient at finding the smallest elements of a sequence without sorting it completely. On the other hand, since there are not so many letter in the alphabet, you can probably also run through the sorted list of counts until the maximum count is not found anymore, without this incurring any serious speed loss.
def most_frequent(text):
frequencies = [(c, text.count(c)) for c in set(text)]
return max(frequencies, key=lambda x: x[1])[0]
s = 'ABBCCCDDDD'
print(most_frequent(s))
frequencies is a list of tuples that count the characters as (character, count). We apply max to the tuples using count's and return that tuple's character. In the event of a tie, this solution will pick only one.
I noticed that most of the answers only come back with one item even if there is an equal amount of characters most commonly used. For example "iii 444 yyy 999". There are an equal amount of spaces, i's, 4's, y's, and 9's. The solution should come back with everything, not just the letter i:
sentence = "iii 444 yyy 999"
# Returns the first items value in the list of tuples (i.e) the largest number
# from Counter().most_common()
largest_count: int = Counter(sentence).most_common()[0][1]
# If the tuples value is equal to the largest value, append it to the list
most_common_list: list = [(x, y)
for x, y in Counter(sentence).items() if y == largest_count]
print(most_common_count)
# RETURNS
[('i', 3), (' ', 3), ('4', 3), ('y', 3), ('9', 3)]
Question :
Most frequent character in a string
The maximum occurring character in an input string
Method 1 :
a = "GiniGinaProtijayi"
d ={}
chh = ''
max = 0
for ch in a : d[ch] = d.get(ch,0) +1
for val in sorted(d.items(),reverse=True , key = lambda ch : ch[1]):
chh = ch
max = d.get(ch)
print(chh)
print(max)
Method 2 :
a = "GiniGinaProtijayi"
max = 0
chh = ''
count = [0] * 256
for ch in a : count[ord(ch)] += 1
for ch in a :
if(count[ord(ch)] > max):
max = count[ord(ch)]
chh = ch
print(chh)
Method 3 :
import collections
line ='North Calcutta Shyambazaar Soudipta Tabu Roopa Roopi Gina Gini Protijayi Sovabazaar Paikpara Baghbazaar Roopa'
bb = collections.Counter(line).most_common(1)[0][0]
print(bb)
Method 4 :
line =' North Calcutta Shyambazaar Soudipta Tabu Roopa Roopi Gina Gini Protijayi Sovabazaar Paikpara Baghbazaar Roopa'
def mostcommonletter(sentence):
letters = list(sentence)
return (max(set(letters),key = letters.count))
print(mostcommonletter(line))
Here are a few things I'd do:
Use collections.defaultdict instead of the dict you initialise manually.
Use inbuilt sorting and max functions like max instead of working it out yourself - it's easier.
Here's my final result:
from collections import defaultdict
def find_max_letter_count(word):
matches = defaultdict(int) # makes the default value 0
for char in word:
matches[char] += 1
return max(matches.iteritems(), key=lambda x: x[1])
find_max_letter_count('helloworld') == ('l', 3)
If you could not use collections for any reason, I would suggest the following implementation:
s = input()
d = {}
# We iterate through a string and if we find the element, that
# is already in the dict, than we are just incrementing its counter.
for ch in s:
if ch in d:
d[ch] += 1
else:
d[ch] = 1
# If there is a case, that we are given empty string, then we just
# print a message, which says about it.
print(max(d, key=d.get, default='Empty string was given.'))
sentence = "This is a great question made me wanna watch matrix again!"
char_frequency = {}
for char in sentence:
if char == " ": #to skip spaces
continue
elif char in char_frequency:
char_frequency[char] += 1
else:
char_frequency[char] = 1
char_frequency_sorted = sorted(
char_frequency.items(), key=lambda ky: ky[1], reverse=True
)
print(char_frequency_sorted[0]) #output -->('a', 9)
# return the letter with the max frequency.
def maxletter(word:str) -> tuple:
''' return the letter with the max occurance '''
v = 1
dic = {}
for letter in word:
if letter in dic:
dic[letter] += 1
else:
dic[letter] = v
for k in dic:
if dic[k] == max(dic.values()):
return k, dic[k]
l, n = maxletter("Hello World")
print(l, n)
output: l 3
you may also try something below.
from pprint import pprint
sentence = "this is a common interview question"
char_frequency = {}
for char in sentence:
if char in char_frequency:
char_frequency[char] += 1
else:
char_frequency[char] = 1
pprint(char_frequency, width = 1)
out = sorted(char_frequency.items(),
key = lambda kv : kv[1], reverse = True)
print(out)
print(out[0])
statistics.mode(data)
Return the single most common data point from discrete or nominal data. The mode (when it exists) is the most typical value and serves as a measure of central location.
If there are multiple modes with the same frequency, returns the first one encountered in the data. If the smallest or largest of those is desired instead, use min(multimode(data)) or max(multimode(data)). If the input data is empty, StatisticsError is raised.
import statistics as stat
test = 'This is a test of the fantastic mode super special function ssssssssssssss'
test2 = ['block', 'cheese', 'block']
val = stat.mode(test)
val2 = stat.mode(test2)
print(val, val2)
mode assumes discrete data and returns a single value. This is the standard treatment of the mode as commonly taught in schools:
mode([1, 1, 2, 3, 3, 3, 3, 4])
3
The mode is unique in that it is the only statistic in this package that also applies to nominal (non-numeric) data:
mode(["red", "blue", "blue", "red", "green", "red", "red"])
'red'
Here is how I solved it, considering the possibility of multiple most frequent chars:
sentence = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, \
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
enim."
joint_sentence = sentence.replace(" ", "")
frequencies = {}
for letter in joint_sentence:
frequencies[letter] = frequencies.get(letter, 0) +1
biggest_frequency = frequencies[max(frequencies, key=frequencies.get)]
most_frequent_letters = {key: value for key, value in frequencies.items() if value == biggest_frequency}
print(most_frequent_letters)
Output:
{'e': 12, 'i': 12}
#file:filename
#quant:no of frequent words you want
def frequent_letters(file,quant):
file = open(file)
file = file.read()
cnt = Counter
op = cnt(file).most_common(quant)
return op
# This code is to print all characters in a string which have highest frequency
def find(str):
y = sorted([[a.count(i),i] for i in set(str)])
# here,the count of unique character and the character are taken as a list
# inside y(which is a list). And they are sorted according to the
# count of each character in the list y. (ascending)
# Eg : for "pradeep", y = [[1,'r'],[1,'a'],[1,'d'],[2,'p'],[2,'e']]
most_freq= y[len(y)-1][0]
# the count of the most freq character is assigned to the variable 'r'
# ie, most_freq= 2
x= []
for j in range(len(y)):
if y[j][0] == most_freq:
x.append(y[j])
# if the 1st element in the list of list == most frequent
# character's count, then all the characters which have the
# highest frequency will be appended to list x.
# eg :"pradeep"
# x = [['p',2],['e',2]] O/P as expected
return x
find("pradeep")

Categories