This question already has answers here:
Finding the most frequent character in a string
(18 answers)
Closed 9 years ago.
I am looking to write a program that lets a user enter a string and then shows the character that appears the most often in their inputted string. I originally wrote a program that counted the number of times a specific letter appeared (the letter T), but was looking to convert that. It may take a whole new code, though.
My former program is here:
# This program counts the number of times
# the letter T (uppercase or lowercase)
# appears in a string.
def main():
# Create a variable to use to hold the count.
# The variable must start with 0.
count = 0
# Get a string from the user.
my_string = input('Enter a sentence: ')
# Count the Ts.
for ch in my_string:
if ch == 'T' or ch == 't':
count += 1
# Print the result.
print('The letter T appears', count, 'times.')
# Call the main function.
main()
Similar to this code, but I am not familiar with a lot of what is used there. I am thinking the version of Python I use is older, but I may not be correct.
You could use collections.Counter from Py's std lib.
Take a look here.
Also, a small example:
>>> from collections import Counter
>>> Counter('abracadabra').most_common(3)
[('a', 5), ('r', 2), ('b', 2)]
EDIT
One liner!
count = [(i, string.count(i)) for i in set(string)]
Or if you want to write your own function (that's interesting!) you could start with something like this:
string = "abracadabra"
def Counter(string):
count = {}
for each in string:
if each in count:
count[each] += 1
else:
count[each] = 1
return count # Or 'return [(k, v) for k, v in count]'
Counter(string)
out: {'a': 5, 'r': 2, 'b': 2, 'c': 1, 'd': 1}
>>> s = 'abracadabra'
>>> max([(i, s.count(i)) for i in set(s.lower)], key=lambda x: x[1])
('a', 5)
This will not show ties though. This will:
>>> s = 'abracadabrarrr'
>>> lcounts = sorted([(i, s.count(i)) for i in set(s)], key=lambda x: x[1])
>>> count = max(lcounts, key=lambda x: x[1])[1]
>>> filter(lambda x: x[1] == count, lcounts)
[('a', 5), ('r', 5)]
Related
Implement the function most_popular_character(my_string), which gets the string argument my_string and returns its most frequent letter. In case of a tie, break it by returning the letter of smaller ASCII value.
Note that lowercase and uppercase letters are considered different (e.g., ‘A’ < ‘a’). You may assume my_string consists of English letters only, and is not empty.
Example 1: >>> most_popular_character("HelloWorld") >>> 'l'
Example 2: >>> most_popular_character("gggcccbb") >>> 'c'
Explanation: cee and gee appear three times each (and bee twice), but cee precedes gee lexicographically.
Hints (you may ignore these):
Build a dictionary mapping letters to their frequency;
Find the largest frequency;
Find the smallest letter having that frequency.
def most_popular_character(my_string):
char_count = {} # define dictionary
for c in my_string:
if c in char_count: #if c is in the dictionary:
char_count[c] = 1
else: # if c isn't in the dictionary - create it and put 1
char_count[c] = 1
sorted_chars = sorted(char_count) # sort the dictionary
char_count = char_count.keys() # place the dictionary in a list
max_per = 0
for i in range(len(sorted_chars) - 1):
if sorted_chars[i] >= sorted_chars[i+1]:
max_per = sorted_chars[i]
break
return max_per
my function returns 0 right now, and I think the problem is in the last for loop and if statement - but I can't figure out what the problem is..
If you have any suggestions on how to adjust the code it would be very appreciated!
Your dictionary didn't get off to a good start by you forgetting to add 1 to the character count, instead you are resetting to 1 each time.
Have a look here to get the gist of getting the maximum value from a dict: https://datagy.io/python-get-dictionary-key-with-max-value/
def most_popular_character(my_string):
# NOTE: you might want to convert the entire sting to upper or lower case, first, depending on the use
# e.g. my_string = my_string.lower()
char_count = {} # define dictionary
for c in my_string:
if c in char_count: #if c is in the dictionary:
char_count[c] += 1 # add 1 to it
else: # if c isn't in the dictionary - create it and put 1
char_count[c] = 1
# Never under estimate the power of print in debugging
print(char_count)
# max(char_count.values()) will give the highest value
# But there may be more than 1 item with the highest count, so get them all
max_keys = [key for key, value in char_count.items() if value == max(char_count.values())]
# Choose the lowest by sorting them and pick the first item
low_item = sorted(max_keys)[0]
return low_item, max(char_count.values())
print(most_popular_character("HelloWorld"))
print(most_popular_character("gggcccbb"))
print(most_popular_character("gggHHHAAAAaaaccccbb 12 3"))
Result:
{'H': 1, 'e': 1, 'l': 3, 'o': 2, 'W': 1, 'r': 1, 'd': 1}
('l', 3)
{'g': 3, 'c': 3, 'b': 2}
('c', 3)
{'g': 3, 'H': 3, 'A': 4, 'a': 3, 'c': 4, 'b': 2, ' ': 2, '1': 1, '2': 1, '3': 1}
('A', 4)
So: l and 3, c and 3, A and 4
def most_popular_character(my_string):
history_l = [l for l in my_string] #each letter in string
char_dict = {} #creating dict
for item in history_l: #for each letter in string
char_dict[item] = history_l.count(item)
return [max(char_dict.values()),min(char_dict.values())]
I didn't understand the last part of minimum frequency, so I make this function return a maximum frequency and a minimum frequency as a list!
Use a Counter to count the characters, and use the max function to select the "biggest" character according to your two criteria.
>>> from collections import Counter
>>> def most_popular_character(my_string):
... chars = Counter(my_string)
... return max(chars, key=lambda c: (chars[c], -ord(c)))
...
>>> most_popular_character("HelloWorld")
'l'
>>> most_popular_character("gggcccbb")
'c'
Note that using max is more efficient than sorting the entire dictionary, because it only needs to iterate over the dictionary once and find the single largest item, as opposed to sorting every item relative to every other item.
I am using Python 3.3
I need to create two lists, one for the unique words and the other for the frequencies of the word.
I have to sort the unique word list based on the frequencies list so that the word with the highest frequency is first in the list.
I have the design in text but am uncertain how to implement it in Python.
The methods I have found so far use either Counter or dictionaries which we have not learned. I have already created the list from the file containing all the words but do not know how to find the frequency of each word in the list. I know I will need a loop to do this but cannot figure it out.
Here's the basic design:
original list = ["the", "car",....]
newlst = []
frequency = []
for word in the original list
if word not in newlst:
newlst.append(word)
set frequency = 1
else
increase the frequency
sort newlst based on frequency list
use this
from collections import Counter
list1=['apple','egg','apple','banana','egg','apple']
counts = Counter(list1)
print(counts)
# Counter({'apple': 3, 'egg': 2, 'banana': 1})
You can use
from collections import Counter
It supports Python 2.7,read more information here
1.
>>>c = Counter('abracadabra')
>>>c.most_common(3)
[('a', 5), ('r', 2), ('b', 2)]
use dict
>>>d={1:'one', 2:'one', 3:'two'}
>>>c = Counter(d.values())
[('one', 2), ('two', 1)]
But, You have to read the file first, and converted to dict.
2.
it's the python docs example,use re and Counter
# Find the ten most common words in Hamlet
>>> import re
>>> words = re.findall(r'\w+', open('hamlet.txt').read().lower())
>>> Counter(words).most_common(10)
[('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631),
('you', 554), ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)]
words = file("test.txt", "r").read().split() #read the words into a list.
uniqWords = sorted(set(words)) #remove duplicate words and sort
for word in uniqWords:
print words.count(word), word
Pandas answer:
import pandas as pd
original_list = ["the", "car", "is", "red", "red", "red", "yes", "it", "is", "is", "is"]
pd.Series(original_list).value_counts()
If you wanted it in ascending order instead, it is as simple as:
pd.Series(original_list).value_counts().sort_values(ascending=True)
Yet another solution with another algorithm without using collections:
def countWords(A):
dic={}
for x in A:
if not x in dic: #Python 2.7: if not dic.has_key(x):
dic[x] = A.count(x)
return dic
dic = countWords(['apple','egg','apple','banana','egg','apple'])
sorted_items=sorted(dic.items()) # if you want it sorted
One way would be to make a list of lists, with each sub-list in the new list containing a word and a count:
list1 = [] #this is your original list of words
list2 = [] #this is a new list
for word in list1:
if word in list2:
list2.index(word)[1] += 1
else:
list2.append([word,0])
Or, more efficiently:
for word in list1:
try:
list2.index(word)[1] += 1
except:
list2.append([word,0])
This would be less efficient than using a dictionary, but it uses more basic concepts.
You can use reduce() - A functional way.
words = "apple banana apple strawberry banana lemon"
reduce( lambda d, c: d.update([(c, d.get(c,0)+1)]) or d, words.split(), {})
returns:
{'strawberry': 1, 'lemon': 1, 'apple': 2, 'banana': 2}
Using Counter would be the best way, but if you don't want to do that, you can implement it yourself this way.
# The list you already have
word_list = ['words', ..., 'other', 'words']
# Get a set of unique words from the list
word_set = set(word_list)
# create your frequency dictionary
freq = {}
# iterate through them, once per unique word.
for word in word_set:
freq[word] = word_list.count(word) / float(len(word_list))
freq will end up with the frequency of each word in the list you already have.
You need float in there to convert one of the integers to a float, so the resulting value will be a float.
Edit:
If you can't use a dict or set, here is another less efficient way:
# The list you already have
word_list = ['words', ..., 'other', 'words']
unique_words = []
for word in word_list:
if word not in unique_words:
unique_words += [word]
word_frequencies = []
for word in unique_words:
word_frequencies += [float(word_list.count(word)) / len(word_list)]
for i in range(len(unique_words)):
print(unique_words[i] + ": " + word_frequencies[i])
The indicies of unique_words and word_frequencies will match.
The ideal way is to use a dictionary that maps a word to it's count. But if you can't use that, you might want to use 2 lists - 1 storing the words, and the other one storing counts of words. Note that order of words and counts matters here. Implementing this would be hard and not very efficient.
Try this:
words = []
freqs = []
for line in sorted(original list): #takes all the lines in a text and sorts them
line = line.rstrip() #strips them of their spaces
if line not in words: #checks to see if line is in words
words.append(line) #if not it adds it to the end words
freqs.append(1) #and adds 1 to the end of freqs
else:
index = words.index(line) #if it is it will find where in words
freqs[index] += 1 #and use the to change add 1 to the matching index in freqs
Here is code support your question
is_char() check for validate string count those strings alone, Hashmap is dictionary in python
def is_word(word):
cnt =0
for c in word:
if 'a' <= c <='z' or 'A' <= c <= 'Z' or '0' <= c <= '9' or c == '$':
cnt +=1
if cnt==len(word):
return True
return False
def words_freq(s):
d={}
for i in s.split():
if is_word(i):
if i in d:
d[i] +=1
else:
d[i] = 1
return d
print(words_freq('the the sky$ is blue not green'))
for word in original_list:
words_dict[word] = words_dict.get(word,0) + 1
sorted_dt = {key: value for key, value in sorted(words_dict.items(), key=lambda item: item[1], reverse=True)}
keys = list(sorted_dt.keys())
values = list(sorted_dt.values())
print(keys)
print(values)
Simple way
d = {}
l = ['Hi','Hello','Hey','Hello']
for a in l:
d[a] = l.count(a)
print(d)
Output : {'Hi': 1, 'Hello': 2, 'Hey': 1}
word and frequency if you need
def counter_(input_list_):
lu = []
for v in input_list_:
ele = (v, lc.count(v)/len(lc)) #if you don't % remove <</len(lc)>>
if ele not in lu:
lu.append(ele)
return lu
counter_(['a', 'n', 'f', 'a'])
output:
[('a', 0.5), ('n', 0.25), ('f', 0.25)]
the best thing to do is :
def wordListToFreqDict(wordlist):
wordfreq = [wordlist.count(p) for p in wordlist]
return dict(zip(wordlist, wordfreq))
then try to :
wordListToFreqDict(originallist)
I have a lot of lists like:
SI821lzc1n4
MCap1kr01lv
All of them have the same length. I need to count how many times each symbol appears on each position. Example:
abcd
a5c1
b51d
Here it'll be a5cd
One way is to use zip to associate characters in the same position. We can then send all of the characters from each position to a Counter, then use Counter.most_common to get the most common character
from collections import Counter
l = ['abcd', 'a5c1', 'b51d']
print(''.join([Counter(z).most_common(1)[0][0] for z in zip(*l)]))
# a5cd
from statistics import mode
[mode([x[i] for x in y]) for i in xrange(len(y[0]))]
where y is your list.
Python 3.4 and up
You could use combination of zip and Counter
a = ("abcd")
b = ("a5c1")
c = ("b51d")
from collections import Counter
zippedList = list(zip(a,b,c))
print("zipped: {}".format(zippedList))
final = ""
for x in zippedList:
countLetters = Counter(x)
print(countLetters)
final += countLetters.most_common(3)[0][0]
print("output: {}".format(final))
output:
zipped: [('a', 'a', 'b'), ('b', '5', '5'), ('c', 'c', '1'), ('d', '1', 'd')]
Counter({'a': 2, 'b': 1})
Counter({'5': 2, 'b': 1})
Counter({'c': 2, '1': 1})
Counter({'d': 2, '1': 1})
output: a5cd
This all depends on where your list is. Is your list coming from another file or is it an actual array? At the end of the day, the best way to do this simply is going to be to use a dictionary and a for loop.
new_dict = {}
for i in range(len(line)):
if i in new_dict:
new_dict[i].append(line[i])
else:
new_dict[i] = [line[i]]
Then after that I'm assuming that you'd like to output the four most common element appearances. For that I'd recommend importing statistics and using the mode method...
from statistics import mode
new_line = ""
for key in new_dict:
x = mode(new_dict[key])
new_line = new_line + x
However, your question is quite vague, please elaborate more next time.
P.s. I'm a newbie so all you experienced programmers plz don't hate :)
I would use a combination of defaultdict, enumerate, and Counter:
>>> from collections import Counter, defaultdict
>>> data = '''abcd
a5c1
b51d
'''
>>> poscount = defaultdict(Counter)
>>> for line in data.split():
for i, character in enumerate(line):
poscount[i][character] += 1
>>> ''.join([poscount[i].most_common(1)[0][0] for i in sorted(poscount)])
'a5cd'
Here's how it works:
The defaultdict() creates new entries when it sees a new key.
The enumerate() function returns both the character and its position in the line.
The Counter counts the occurences of individual characters
Combining the three makes a defaultdict whose keys are the column positions and whose values are character counters. That gives you one character counter per column.
The most_common() method returns the highest frequency (character, count) pair for that counter.
The [0][0] extracts the character from the list of (character, count) tuples.
The str.join() method combines the results back together.
For example, if the given string is this:
"aaabbbbccdaeeee"
I want to say something like:
3 a, 4 b, 2 c, 1 d, 1 a, 4 e
It is easy enough to do in Python with a brute force loop, but I am wondering if there is a more Pythonic / cleaner one-liner type of approach.
My brute force:
while source!="":
leading = source[0]
c=0
while source!="" and source[0]==leading:
c+=1
source=source[1:]
print(c, leading)
Use a Counter for a count of each distinct letter in the string regardless of position:
>>> s="aaabbbbccdaeeee"
>>> from collections import Counter
>>> Counter(s)
Counter({'a': 4, 'b': 4, 'e': 4, 'c': 2, 'd': 1})
You can use groupby if the position in the string has meaning:
from itertools import groupby
li=[]
for k, l in groupby(s):
li.append((k, len(list(l))))
print li
Prints:
[('a', 3), ('b', 4), ('c', 2), ('d', 1), ('a', 1), ('e', 4)]
Which can be reduce to a list comprehension:
[(k,len(list(l))) for k, l in groupby(s)]
You can even use a regex:
>>> [(m.group(0)[0], len(m.group(0))) for m in re.finditer(r'((\w)\2*)', s)]
[('a', 3), ('b', 4), ('c', 2), ('d', 1), ('a', 1), ('e', 4)]
There are a number of different ways to solve the problem. #dawg has already posted the best solution, but if for some reason you aren't allowed to use Counter() (maybe a job interview or school assignment) then you can actually solve the problem in a few ways.
from collections import Counter, defaultdict
def counter_counts(s):
""" Preferred method using Counter()
Arguments:
s {string} -- [string to have each character counted]
Returns:
[dict] -- [dictionary of counts of each char]
"""
return Counter(s)
def default_counts(s):
""" Alternative solution using defaultdict
Arguments:
s {string} -- [string to have each character counted]
Returns:
[dict] -- [dictionary of counts of each char]
"""
counts = defaultdict(int) # each key is initalized to 0
for char in s:
counts[char] += 1 # increment the count of each character by 1
return counts
def vanilla_counts_1(s):
""" Alternative solution using a vanilla dicitonary
Arguments:
s {string} -- [string to have each character counted]
Returns:
[dict] -- [dictionary of counts of each char]
"""
counts = {}
for char in s:
# we have to manually check that each value is in the dictionary before attempting to increment it
if char in counts:
counts[char] += 1
else:
counts[char] = 1
return counts
def vanilla_counts_2(s):
""" Alternative solution using a vanilla dicitonary
This version uses the .get() method to increment instead of checking if a key already exists
Arguments:
s {string} -- [string to have each character counted]
Returns:
[dict] -- [dictionary of counts of each char]
"""
counts = {}
for char in s:
# the second argument in .get() is the default value if we dont find the key
counts[char] = counts.get(char, 0) + 1
return counts
And just for fun lets take a look at how each method performs.
For s = "aaabbbbccdaeeee" and 10,000 runs:
Counter: 0.0330204963684082s
defaultdict: 0.01565241813659668s
vanilla 1: 0.01562952995300293s
vanilla 2: 0.015581130981445312s
(actually rather surprising results)
Now let's test what happens if we set our string to the entire plaintext version of the book of Genesis and 1,000 runs:
Counter: 8.500739336013794s
defaultdict: 14.721554040908813s
vanilla 1: 18.089043855667114s
vanilla 2: 27.01840090751648s
Looks like the overhead of creating the Counter() object becomes much less important!
(These weren't very scientific tests, but it was a bit of fun).
This question already has an answer here:
Python Anagram Finder from given File
(1 answer)
Closed 9 years ago.
Given the string...
able\nacre\nbale\nbeyond\nbinary\nboat\nbrainy\ncare\ncat\ncater\ncrate\nlawn\nlist\nrace\nreact\nsheet\nsilt\nslit\ntrace\n
I am trying to figure out how to assign each word in the string to a variable, and then sort each word alphabetically which will allow me to compare them to see which ones are anagrams and which ones are not. I have around a month of Python experience so dumb everything WAY down if you could.
Instead of saving each word to a variable, you should save them all to a list. Here is how I would approach the complete problem:
from itertools import groupby
from operator import itemgetter
s = 'able\nacre\nbale\nbeyond\nbinary\nboat\nbrainy\ncare\ncat\ncater\ncrate\nlawn\nlist\nrace\nreact\nsheet\nsilt\nslit\ntrace\n'
words = s.strip().split()
sorted_words = (''.join(sorted(line)) for line in words)
grouped = sorted((v, i) for i, v in enumerate(sorted_words))
anagrams = [[words[i] for v, i in g] for k, g in groupby(grouped, itemgetter(0))]
Result:
>>> import pprint
>>> pprint.pprint(anagrams)
[['able', 'bale'],
['binary', 'brainy'],
['boat'],
['acre', 'care', 'race'],
['cater', 'crate', 'react', 'trace'],
['cat'],
['lawn'],
['beyond'],
['sheet'],
['list', 'silt', 'slit']]
In [27]: s = 'able\nacre\nbale\nbeyond\nbinary\nboat\nbrainy\ncare\ncat\ncater\ncrate\nlawn\nlist\nrace\nreact\nsheet\nsilt\nslit\ntrace\n'
In [28]: words = s.split()
In [29]: [''.join(sorted(w)) for w in words]
Out[29]:
['abel',
'acer',
'abel',
'bdenoy',
'abinry',
'abot',
'abinry',
...
You can do yourstring.split('whattosplitat'). In this case, that would be
l='able\nacre\nbale\nbeyond\nbinary\nboat\nbrainy\ncare\ncat\ncater\ncrate\nlawn\nlist\nrace\nreact\nsheet\nsilt\nslit\ntrace\n'.split('\n')
Then you can do l.sort() which will sort your list alphabetically.
s = 'able\nacre\nbale\nbeyond\nbinary\nboat\nbrainy\ncare\ncat\ncater\ncrate\nlawn\nlist\nrace\nreact\nsheet\nsilt\nslit\ntrace\n'
words = sorted(s.split('\n')[:-1]) # the last one will be '', so you want to get rid of that
To test whether or not a string is an anagram of another string:
def isAnagram(a, b):
aLtrs = sorted(list(a)) # if a='test', aLtrs=['e', 's', 't', 't']
bLtrs = sorted(list(a)) # same as above
return True if aLtrs==bLtrs else False