Character count in Python - python

The task is given: need to get a word from user, then total characters in the word must be counted and displayed in sorted order (count must be descending and characters must be ascending -
i.e.,
if the user gives as "management"
then the output should be
**a 2
e 2
m 2
n 2
g 1
t 1**
this is the code i written for the task:
string=input().strip()
set1=set(string)
lis=[]
for i in set1:
lis.append(i)
lis.sort()
while len(lis)>0:
maxi=0
for i in lis:
if string.count(i)>maxi:
maxi=string.count(i)
for j in lis:
if string.count(j)==maxi:
print(j,maxi)
lis.remove(j)
this code gives me following output for string "management"
a 2
m 2
e 2
n 2
g 1
t 1
m & e are not sorted.
What is wrong with my code?

The issue with your code lies in that you're trying to remove an element from the list while you're still iterating over it. This can cause problems. Presently, you remove "a", whereupon "e" takes its spot - and the list advances to the next letter, "m". Thus, "e" is skipped 'till the next iteration.
Try separating your printing and your removal, and don't remove elements from a list you're currently iterating over - instead, try adding all other elements to a new list.
string=input().strip()
set1=set(string)
lis=[]
for i in set1:
lis.append(i)
lis.sort()
while len(lis)>0:
maxi=0
for i in lis:
if string.count(i)>maxi:
maxi=string.count(i)
for j in lis:
if string.count(j)==maxi:
print(j,maxi)
dupelis = lis
lis = []
for k in dupelis:
if string.count(k)!=maxi:
lis.append(k)
managementa 2e 2m 2n 2g 1t 1
Demo

The problem with your code is the assignment of the variable maxi and the two for loops. "e" wont come second because you are assigning maxi as "2" and string.count(i) will be less than maxi.
for i in lis:
if string.count(i)>maxi:
maxi=string.count(i)
for j in lis:
if string.count(j)==maxi:
print(j,maxi)
There are several ways of achieving what you are looking for. You can try the solutions as others have explained.

you can use a simple Counter for that
from collections import Counter
Counter("management")
Counter({'a': 2, 'e': 2, 'm': 2, 'n': 2, 'g': 1, 't': 1})

I'm not really sure what you are trying to achieve by adding a while loop and then two nested for loops inside it. But the same thing can be achieved by a single for loop.
for i in lis:
print(i, string.count(i))
With this the output will be:
a 2
e 2
g 1
m 2
n 2
t 1

As answered before, you can use a Counter to get the counts of characters, no need to make a set or list.
For sorting, you'd be well off using the inbuilt sorted function which accepts a function in the key parameter. Read more about sorting and lambda functions.
>>> from collections import Counter
>>> c = Counter('management')
>>> sorted(c.items())
[('a', 2), ('e', 2), ('g', 1), ('m', 2), ('n', 2), ('t', 1)]
>>> alpha_sorted = sorted(c.items())
>>> sorted(alpha_sorted, key=lambda x: x[1])
[('g', 1), ('t', 1), ('a', 2), ('e', 2), ('m', 2), ('n', 2)]
>>> sorted(alpha_sorted, key=lambda x: x[1], reverse=True) # Reverse ensures you get descending sort
[('a', 2), ('e', 2), ('m', 2), ('n', 2), ('g', 1), ('t', 1)]

The easiest way to count the characters is to use Counter, as suggested by some previous answers. After that, the trick is to come up with a measure that takes both the count and the character into account to achieve the sorting. I have the following:
from collections import Counter
c = Counter('management')
sc = sorted(c.items(),
key=lambda x: -1000 * x[1] + ord(x[0]))
for char, count in sc:
print(char, count)
c.items() gives a list of tuples (character, count). We can use sorted() to sort them.
The parameter key is the key. sorted() puts items with lower keys (i.e. keys with smaller values) first, so I have to make a big count have a small value.
I basically give a lot of negative weight (-1000) to the count (x[1]), then augment that with the ascii value of character (ord(x[0])). The result is a sorting order that takes into account the count first, the character second.
An underlying assumption is that ord(x[0]) never exceeds 1000, which should be true of English characters.

Related

How can I make a dictionary / collections.counter that takesz into account the index in Python?

I am aware of dictionaries and collection.Counters in Python.
My question is how can I make one that takes index of the string into account?
For example for this string: aaabaaa
I would like to make a tuples that contain each string in progression, keeping track of the count going left to right and resetting the count once a new alphanumeric is found.
For example, I like to see this output:
[('a', 3), ('b', 1), ('a', 3)]
Any idea how to use the dictionary / Counter/ or is there some other data structure built into Python I can use?
Regards
You could use groupby:
from itertools import groupby
m = [(k, sum(1 for _ in v)) for k, v in groupby('aaabaaa')]
print(m)
Output
[('a', 3), ('b', 1), ('a', 3)]
Explanation
The groupby function makes an iterator that returns consecutive keys and groups from the iterable, in this case 'aaabaaa'. The key k is the value identifying of the group, ['a', 'b', 'a']. The sum(1 for _ in v) count the amount of elements in the group.

how to sort tuple element first on the basis of key and then on the basis of value [duplicate]

This question already has answers here:
How do I sort a dictionary by value?
(34 answers)
Closed 6 years ago.
How to sort a tuple of elements in python, first on the basis of value and then on the basis of key. Consider the program in which I am taking input from the user as a string. I want to find out the count of each character and print the 3 most common characters in a string.
#input string
strr=list(raw_input())
count=dict()
#store the count of each character in dictionary
for i in range(len(strr)):
count[strr[i]]=count.get(strr[i],0)+1
#hence we can't perform sorting on dict so convert it into tuple
temp=list()
t=count.items()
for (k,v) in t:
temp.append((v,k))
temp.sort(reverse=True)
#print 3 most common element
for (v,k) in temp[:3]:
print k,v
on giving the i/p -aabbbccde
Output of the above code is:
3 b
2 c
2 a
But I want the output as:
3 b
2 a
2 c
Sort a list of tuples, the first value in descending order (reverse=True) and the second value in ascending order (reverse=False, by default). Here is a MWE.
lists = [(2, 'c'), (2, 'a'), (3, 'b')]
result = sorted(lists, key=lambda x: (-x[0], x[1])) # -x[0] represents descending order
print(result)
# Output
[(3, 'b'), (2, 'a'), (2, 'c')]
It is straightforward to use collections.Counter
to count each letter's frequency in a string.
import collections
s = 'bcabcab'
# If you don't care the order, just use `most_common`
#most_common = collections.Counter(s).most_common(3)
char_and_frequency = collections.Counter(s)
result = sorted(char_and_frequency.items(), key=lambda x:(-x[1], x[0]))[:3] # sorted by x[1] in descending order, x[0] in ascending order
print(result)
# Output
[('b', 3), ('a', 2), ('c', 2)]
As a general solution for "sort by value descending then by key ascending" you can use itertools.groupby:
import itertools
from operator import itemgetter
def sort_by_value_then_key(count_dict):
first_level_sorted = sorted(count_dict.items(), key=itemgetter(1), reverse=True)
for _, group in itertools.groupby(first_level_sorted,itemgetter(1)):
for pair in sorted(group):
yield pair
strr=list("aabbccdef")
count=dict()
#store the count of each character in dictionary
for i in range(len(strr)):
count[strr[i]]=count.get(strr[i],0)+1
temp = list(sort_by_value_then_key(count))
for (letter, freq) in temp[:3]:
print letter, freq
Output:
a 2
b 2
c 2
since you are using integers #sparkandshine's solution of sorting by the negative number will probably be easier to work with but this is much more generalized.

What is the inbuilt .count in python?

I've been solving problems in checkio.com and one of the questions was: "Write a function to find the letter which occurs the maximum number of times in a given string"
The top solution was:
import string
def checkio(text):
"""
We iterate through latin alphabet and count each letter in the text.
Then 'max' selects the most frequent letter.
For the case when we have several equal letter,
'max' selects the first from they.
"""
text = text.lower()
return max(string.ascii_lowercase, key=text.count)
I didn't understand what text.count is when it is used as the key in the max function.
Edit: Sorry for not being more specific. I know what the program does as well as the function of str.count(). I want to know what text.count is. If .count is a method then shouldn't it be followed by braces?
The key=text.count is what is counting the number of times all the letters appear in the string, then you take the highest number of all those numbers to get the most frequent letter that has appeared.
When the following code is run, the result is e, which is, if you count, the most frequent letter.
import string
def checkio(text):
"""
We iterate through latin alphabet and count each letter in the text.
Then 'max' selects the most frequent letter.
For the case when we have several equal letter,
'max' selects the first from they.
"""
text = text.lower()
return max(string.ascii_lowercase, key=text.count)
print checkio('hello my name is heinst')
A key function in max() is called for each element to provide an alternative to determine the maximum by, which in this case isn't all that efficient.
Essentially, the line max(string.ascii_lowercase, key=text.count) can be translated to:
max_character, max_count = None, -1
for character in string.ascii_lowercase:
if text.count(character) > max_count:
max_character = character
return max_character
where str.count() loops through the whole of text counting how often character occurs.
You should really use a multiset / bag here instead; in Python that's provided by the collections.Counter() type:
max_character = Counter(text.lower()).most_common(1)[0][0]
The Counter() takes O(N) time to count the characters in a string of length N, then to find the maximum, another O(K) to determine the highest count, where K is the number of unique characters. Asymptotically speaking, that makes the whole process take O(N) time.
The max() approach takes O(MN) time, where M is the length of string.ascii_lowercase.
Use the Counter function from the collections module.
>>> import collections
>>> word = "supercalafragalistic"
>>> c = collections.Counter(word)
>>> c.most_common()
[('a', 4), ('c', 2), ('i', 2), ('l', 2), ('s', 2), ('r', 2), ('e', 1), ('g', 1), ('f', 1), ('p', 1), ('u', 1), ('t', 1)]
>>> c.most_common()[0]
('a', 4)

An algorithm to find transitions in Python

I want to implement an algorithm that gets the index of letter changes.
I have the below list, here I want to find the beginning of every letter changes and put a result list except the first one. Because, for the first one, we should get the last index of occurrence of it. Let me give you an example:
letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
Transitions:
'A','A','A','A','A','A','A','A','A','A','A','A'-->'B'-->'C','C'-->'X'-->'D'-->'X'-->'B','B'-->'A','A','A','A'
Here, after A letters finish, B starts, we should put the index of last A and the index of first B and so on, but we should not include X letter into the result list.
Desired result:
[(11, 'A'), (12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')]
So far, I have done this code, this finds other items except the (11, 'A'). How can I modify my code to get the desired result?
for i in range(len(letters)):
if letters[i]!='X' and letters[i]!=letters[i-1]:
result.append((i,(letters[i])))
My result:
[(12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')] ---> missing (11, 'A').
Now that you've explained you want the first index of every letter after the first, here's a one-liner:
letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
[(n+1, b) for (n, (a,b)) in enumerate(zip(letters,letters[1:])) if a!=b and b!='X']
#=> [(12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')]
Now, your first entry is different. For this, you need to use a recipe which finds the last index of each item:
import itertools
grouped = [(len(list(g))-1,k) for k,g in (itertools.groupby(letters))]
weird_transitions = [grouped[0]] + [(n+1, b) for (n, (a,b)) in enumerate(zip(letters,letters[1:])) if a!=b and b!='X']
#=> [(11, 'A'), (12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')]
Of course, you could avoid creating the whole list of grouped, because you only ever use the first item from groupby. I leave that as an exercise for the reader.
This will also give you an X as the first item, if X is the first (set of) items. Because you say nothing about what you're doing, or why the Xs are there, but omitted, I can't figure out if that's the right behaviour or not. If it's not, then probably use my entire other recipe (in my other answer), and then take the first item from that.
Your question is a bit confusing, but this code should do what you want.
firstChangeFound = False
for i in range(len(letters)):
if letters[i]!='X' and letters[i]!=letters[i-1]:
if not firstChangeFound:
result.append((i-1, letters[i-1])) #Grab the last occurrence of the first character
result.append((i, letters[i]))
firstChangeFound = True
else:
result.append((i, letters[i]))
You want (Or, you don't, as you finally explained - see my other answer):
import itertools
import functional # get it from pypi
letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
grouped = [(len(list(g)),k) for k,g in (itertools.groupby(letters))]
#=> [(12, 'A'), (1, 'B'), (2, 'C'), (1, 'D'), (2, 'B'), (4, 'A')]
#-1 to take this from counts to indices
filter(lambda (a,b): b!='X',functional.scanl(lambda (a,b),(c,d): (a+c,d), (-1,'X'), grouped))
#=> [(11, 'A'), (12, 'B'), (14, 'C'), (16, 'D'), (19, 'B'), (23, 'A')]
This gives you the last index of each letter run, other than Xs. If you want the first index after the relevant letter, then switch the -1 to 0.
scanl is a reduce which returns intermediate results.
As a general rule, it makes sense to either filter first or last, unless that is for some reason expensive, or the filtering can easily be accomplished without increasing complexity.
Also, your code is relatively hard to read and understand, because you iterate by index. That's unusual in python, unless manipulating the index numerically. If you're visiting every item, it's usual to iterate directly.
Also, why do you want this particular format? It's usual to have the format as (unique item,data) because that can easily be placed in a dict.
With minimal change to your code, and following Josh Caswell's suggestion:
for i, letter in enumerate(letters[1:], 1):
if letter != 'X' and letters[i] != letters[i-1]:
result.append((i, letter))
first_change = result[0][0]
first_stretch = ''.join(letters[:first_change]).rstrip('X')
if first_stretch:
result.insert(0, (len(first_stretch) - 1, first_stretch[-1]))
Here's a solution which uses groupby to generate a single sequence from which both first and last indices can be extracted.
import itertools
import functools
letters = ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'C', 'C', 'X', 'D', 'X', 'B', 'B', 'A', 'A', 'A', 'A']
groupbysecond = functools.partial(itertools.groupby,key=operator.itemgetter(1))
def transitions(letters):
#segregate transition and non-transition indices
grouped = groupbysecond(enumerate(zip(letters,letters[1:])))
# extract first such entry from each group
firsts = (next(l) for k,l in grouped)
# group those entries together - where multiple, there are first and last
# indices of the run of letters
regrouped = groupbysecond((n,a) for n,(a,b) in firsts)
# special case for first entry, which wants last index of first letter
kfirst,lfirst = next(regrouped)
firstitem = (tuple(lfirst)[-1],) if kfirst != 'X' else ()
#return first item, and first index for all other letters
return itertools.chain(firstitem,(next(l) for k,l in regrouped if k != 'X'))
letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
prev = letters[0]
result = []
for i in range(len(letters)):
if prev!=letters[i]:
result.append((i-1,prev))
if letters[i]!='X':
prev = letters[i]
else:
prev = letters[i+1]
result.append((len(letters)-1,letters[-1]))
print result
RESULTS IN: (Not OP's desired results, sorry I must have misunderstood. see JSutton's ans)
[(11,'A'), (12,'B'), (14,'C'), (16,'D'), (19,'B'), (23,'A')]
which is actually the index of the last instance of a letter before they change or the list ends.
With an aid of dictionary to keep running time linear in number of input, here is a solution:
letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
def f(letters):
result = []
added = {}
for i in range(len(letters)):
if (i+1 == len(letters)):
break
if letters[i+1]!='X' and letters[i+1]!=letters[i]:
if(i not in added and letters[i]!='X'):
result.append((i, letters[i]))
added[i] = letters[i]
if(i+1 not in added):
result.append((i+1, letters[i+1]))
added[i+1] = letters[i+1]
return result
Basically, my the solution always tries to add both indices where a change occurred. But the dictionary (which has constant time lookup tells us if we already added the element or not to exclude duplicates). This takes care of adding the first element. Otherwise you can use an if statement to indicate first round which will only run once. However, I argue that this solution has same running time. As long as you do not check if you added the element by looking up the list itself (since this is linear time lookup at worst), this will result in O(n^2) time which is bad!
Here's my suggestion. It has three steps.
Fist, find all the starting indexes for each run of letters.
Replace the index in the first non-X run with the index of the end of its run, which will be one less than the start of the following run.
Filter out all X runs.
The code:
def letter_runs(letters):
prev = None
results = []
for index, letter in enumerate(letters):
if letter != prev:
prev = letter
results.append((index, letter))
if results[0][1] != "X":
results[0] = (results[1][0]-1, results[0][1])
else: # if first run is "X" second must be something else!
results[1] = (results[2][0]-1, results[1][1])
return [(index, letter) for index, letter in results if letter != "X"]

PySchool- List (Topic 6-22)

I am a beginner in python and i am trying to solve some questions about lists. I got stuck on one problem and I am not able to solve it:
Write a function countLetters(word) that takes in a word as argument
and returns a list that counts the number of times each letter
appears. The letters must be sorted in alphabetical order.
Ex:
>>> countLetters('google')
[('e', 1), ('g', 2), ('l', 1), ('o', 2)]
I am not able to count the occurrences of every character. For sorting I am using sorted(list) and I am also using dictionary(items functions) for this format of output(tuples of list). But I am not able to link all these things.
Use sets !
m = "google"
u = set(m)
sorted([(l, m.count(l)) for l in u])
>>> [('e', 1), ('g', 2), ('l', 1), ('o', 2)]
A hint: Note that you can loop through a string in the same way as a list or other iterable object in python:
def countLetters(word):
for letter in word:
print letter
countLetters("ABC")
The output will be:
A
B
C
So instead of printing, use the loop to look at what letter you've got (in your letter variable) and count it somehow.
finally, made it!!!
import collections
def countch(strng):
d=collections.defaultdict(int)
for letter in strng:
d[letter]+=1
print sorted(d.items())
This is my solution.Now, i can ask for your solutions of this problem.I would love to see your code.

Categories