counting common words in Python - python

def words(word,number):
if number<len(word):
result={}
for key,value in word.items():
common_num=sorted(set(word.values()), reverse=True)[:number]
if value in common_num:
result.update({key:value})
word.clear()
word.update(result)
new_word_count={}
common_word=[]
common=[]
for key, value in word.items():
if value in common_word:
common.append(value)
common_word.append(value)
new_word_count=dict(word)
for key,value in new_word_count.items():
if value in common:
del word[key]
Example:
>>> word={'a': 2, 'b': 2, 'c' : 3, 'd: 3, 'e': 4, 'f' : 4, 'g' : 5}
>>> words(word,3)
My output: {'g': 5}
Expected output:{'g': 5, 'e': 4, 'f': 4}
Any idea why im getting this output

Well, without any special imports, there are easier ways to accomplish what you're trying to do. You've got a whole lot of rigmarole involved in tracking and storing the values being kept, then deleting, then re-adding, when you could simplify a lot; even with explanatory comments, this is substantially shorter:
def common_words(word_count, number):
# Early out when no filtering needed
if number >= len(word_count):
return
# Get the top number+1 keys based on their values
top = sorted(word_count, key=word_count.get, reverse=True)[:number+1]
# We kept one more than we needed to figure out what the tie would be
tievalue = word_count[top.pop()]
# If there is a tie, we keep popping until the tied values are gone
while top and tievalue == word_count[top[-1]]:
top.pop()
# top is now the keys we should retain, easy to compute keys to delete
todelete = word_count.keys() - top
for key in todelete:
del word_count[key]
There are slightly better ways to do this that avoid repeated lookups in word_count (sorting items, not keys, etc.), but this is easier to understand IMO, and the extra lookups in word_count are bounded and linear, so it's not a big deal.

Although in the comments the author mentions avoiding Counter(), for those interested in seeing how to apply it, here is a short solution as suggested by #ShadowRanger:
import collections as ct
word={'a': 2, 'b': 2, 'c' : 3, 'd': 3, 'e': 4, 'f' : 4, 'g' : 5}
words = ct.Counter(word)
words.most_common(3)
# [('g', 5), ('f', 4), ('e', 4)]

Related

What's a more Pythonic way of grabbing N items from a dictionary?

In Python, suppose I want to grab N arbitrary items from a dictionary—say, to print them, to inspect a few items. I don't care which items I get. I don't want to turn the dictionary into a list (as does some code I have seen); that seems like a waste. I can do it with the following code (where N = 5), but it seems like there has to be a more Pythonic way:
count = 0
for item in my_dict.items():
if count >= 5:
break
print(item)
count += 1
Thanks in advance!
You can use itertools.islice to slice any iterable (not only lists):
>>> import itertools
>>> my_dict = {i: i for i in range(10)}
>>> list(itertools.islice(my_dict.items(), 5))
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]
I might use zip and range:
>>> my_dict = {i: i for i in range(10)}
>>> for _, item in zip(range(5), my_dict.items()):
... print(item)
...
(0, 0)
(1, 1)
(2, 2)
(3, 3)
(4, 4)
The only purpose of the range here is to give an iterable that will cause zip to stop after 5 iterations.
You can modify what you have slightly:
for count, item in enumerate(dict.items()):
if count >= 5:
break
print(item)
Note: in this case when you're looping through .items(), you're getting a key/value pair, which can be unpacked as you iterate:
for count, (key, value) in enumerate(dict.items()):
if count >= 5:
break
print(f"{key=} {value=})
If you want just the keys, you can just iterate over the dict.
for count, key in enumerate(dict):
if count >= 5:
break
print(f"{key=})
If you want just the values:
for count, value in enumerate(dict.values()):
if count >= 5:
break
print(f"{value=})
And last note: using dict as a variable name overwrites the built in dict and makes it unavailable in your code.
Typically, I would like to use slice notation to do this, but dict.items() returns an iterator, which is not slicable.
You have two main options:
Make it something that slice notation works on:
x = {'a':1, 'b':2, 'c': 3, 'd': 4, 'e': 5, 'f': 6}
for item, index in list(x.items())[:5]:
print(item)
Use something that works on iterators. In this case, the built-in (and exceedingly popular) itertools package:
import itertools
x = {'a':1, 'b':2, 'c': 3, 'd': 4, 'e': 5, 'f': 6}
for item in itertools.islice(x.items(), 5):
print(item)

How to join all keys and values of dictionary and return in form of string?

Step 1. i/p= “wwwwaaadexxxxxx”
Step 2. converted= {'w': 4, 'a': 3, 'd': 1, 'e': 1, 'x': 6}
Step Final. o/p= 'w4a3d1e1x6'
I'm on S2 how to go to final step ?
Would appreciated direct conversions 1-> Final
Time Complexity should be less but would appreciate any Sol.
I want to return in form of String stored in any var
without importing anything
You can get ket and value pairs (using dict.items()) and parse them as a list, then use join to create a string out of it!
converted= {'w': 4, 'a': 3, 'd': 1, 'e': 1, 'x': 6}
print(''.join([f"{k}{v}" for k,v in converted.items()]))
w4a3d1e1x6
OR use Counter
Counter is from collections module that will give you a dict like structure with Count of each character
from collections import Counter
my_str = 'wwwwaaadexxxxxx'
print(''.join([f"{k}{v}" for k,v in Counter(my_str).items()]))

Strange behaviour at key substitution in a dictionary

I have a dictionary with keys as single characters. I want to substitute the upper-cased characters with doubled versions of them.
For example, I have this structure:
x = 'AbCDEfGH'
a = dict(zip(list(x), range(len(x))))
print(a)
which creates this dictionary:
{'A': 0, 'b': 1, 'C': 2, 'D': 3, 'E': 4, 'f': 5, 'G': 6, 'H': 7}
The values don't matter, so I just use some integers. What I want is to substitute the upper-cased keys with double characters, so that I get this:
{'AA': 0, 'b': 1, 'CC': 2, 'DD': 3, 'EE': 4, 'f': 5, 'GG': 6, 'HH': 7}
So, I tried the following in-place substitution:
for k, v in a.items():
if k.isupper():
a[k+k] = a.pop(k)
print(a)
But this, strangely, results in:
{'b': 1, 'E': 4, 'f': 5, 'G': 6, 'CCCCCCCCCCCCCCCC': 2, 'DDDDDDDDDDDDDDDD': 3, 'HHHHHHHHHHHHHHHH': 7, 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA': 0}
Even stranger, if I set all keys to upper-case:
y = 'ABCDEFGH'
a = dict(zip(list(y), range(len(y))))
for k, v in a.items():
if k.isupper():
a[k+k] = a.pop(k)
print(a)
it yields:
{'D': 3, 'E': 4, 'F': 5, 'CCCCCCCC': 2, 'GGGGGGGG': 6, 'HHHHHHHH': 7, 'AAAAAAAAAAAAAAAA': 0, 'BBBBBBBBBBBBBBBB': 1}
What is happening? I see the keys are repeating in magnitudes of 2. But, why?
I don't really care about the order of the items, but I see some aren't even being changed.
Is there any other way to substitute the keys the way I intend to?
.items() returns a live view of the underlying dict contents. Mutating the dict while iterating it causes unpredictable effects, usually leading to some keys being processed more than once (thus some keys doubling multiple times), while others aren't processed at all. Python tries to defend you from this by raising a RuntimeError if the dict changes size during iteration, but your code is keeping a constant size at the time of the check (when the next item is requested from the iterator), so Python's cheap length check doesn't save you.
The minimal fix is to make your loop run over a snapshot of the items:
for k, v in tuple(a.items()):
A simpler solution is a dict comprehension though:
a = {k*2 if k.isupper() else k: v for k, v in a.items()}
That builds a whole new dict with the doubled keys before reassigning a, so no mutation issues occur. You could build a in one fell swoop for that matter, just by doing:
a = {let*2 if let.isupper() else let: i for i, let in enumerate(x)}
No need to listify x (strings already iterate by character) and enumerate can take care of numbering the values for you without needing zip, range or len at all.

Using other dictionary values to define a dictionary value during initialization

Say I have three variables which I want to store in a dictionary such that the third is the sum of the first two. Is there a way to do this in one call when the dictionary is initialized? For example:
myDict = {'a': 1, 'b': 2, 'c': myDict['a'] + myDict['b']}
Python>=3.8's named assignment allows something like the following, which I guess you could interpret as one call:
>>> md = {**(md := {'a': 2, 'b': 3}), **{'c': md['a'] + md['b']}}
>>> md
{'a': 2, 'b': 3, 'c': 5}
But this is really just a fanciful way of forcing a two-liner into a single line and making it less readable and less memory-efficient (because of the intermediate dicts). Also note that the md used on the right hand side of the = really could be any name.
You could actually be a little more efficient and get rid of one spurious auxiliary dict:
(md := {'a': 2, 'b': 3}).update({'c': md['a'] + md['b']})
You can do:
>>> myDict = {'a': 1, 'b': 2}
>>> myDict["c"] = myDict["a"] + myDict["b"]
>>> myDict
{'a': 1, 'b': 2, 'c': 3}
You can not do this in 1 line, because myDict is not even exist while assigning to c

Python count/dictionary count

dct = {}
with open("grades_single.txt","r") as g:
content = g.readlines()[1].strip('\n')
for item in content:
dct[item] = content.count(item)
LetterA = max(dct.values())
print(dct)
I'm very new to python so please excuse me. This is my code so far and it works but not as it's intended to. I'm trying to count the frequency off certain letters on new lines so I can do a mathematical function with each letter. The program counts all the letters and prints them but I'd like to be able to count each letter one by one I.E 7As, new fuction 4Bs etc.
At the moment the program is printing them off in one function but yeah I'd like to split them up so I can work with each letter one by one. {'A': 9, 'C': 12, 'B': 19, 'E': 4, 'D': 5, 'F': 1}
Does anyone know how to count the frequency of each letter by letter?
ADCBCBBBADEBCCBADBBBCDCCBEDCBACCFEABBCBBBCCEAABCBB
Example of what I'd like to count.
>>> from collections import Counter
>>> s = "ADCBCBBBADEBCCBADBBBCDCCBEDCBACCFEABBCBBBCCEAABCBB"
>>> Counter(s)
Counter({'B': 19, 'C': 14, 'A': 7, 'D': 5, 'E': 4, 'F': 1})
collections.Counter is clean, but if you were in a hurry, you could iterate over all of the elements and place them into a dictionary yousrelf.
s = 'ADCBCBBBADEBCCBADBBBCDCCBEDCBACCFEABBCBBBCCEAABCBB'
grades = {}
for letter in s:
grades[letter] = grades.get(letter, 0) + 1

Categories