Strange behaviour at key substitution in a dictionary - python

I have a dictionary with keys as single characters. I want to substitute the upper-cased characters with doubled versions of them.
For example, I have this structure:
x = 'AbCDEfGH'
a = dict(zip(list(x), range(len(x))))
print(a)
which creates this dictionary:
{'A': 0, 'b': 1, 'C': 2, 'D': 3, 'E': 4, 'f': 5, 'G': 6, 'H': 7}
The values don't matter, so I just use some integers. What I want is to substitute the upper-cased keys with double characters, so that I get this:
{'AA': 0, 'b': 1, 'CC': 2, 'DD': 3, 'EE': 4, 'f': 5, 'GG': 6, 'HH': 7}
So, I tried the following in-place substitution:
for k, v in a.items():
if k.isupper():
a[k+k] = a.pop(k)
print(a)
But this, strangely, results in:
{'b': 1, 'E': 4, 'f': 5, 'G': 6, 'CCCCCCCCCCCCCCCC': 2, 'DDDDDDDDDDDDDDDD': 3, 'HHHHHHHHHHHHHHHH': 7, 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA': 0}
Even stranger, if I set all keys to upper-case:
y = 'ABCDEFGH'
a = dict(zip(list(y), range(len(y))))
for k, v in a.items():
if k.isupper():
a[k+k] = a.pop(k)
print(a)
it yields:
{'D': 3, 'E': 4, 'F': 5, 'CCCCCCCC': 2, 'GGGGGGGG': 6, 'HHHHHHHH': 7, 'AAAAAAAAAAAAAAAA': 0, 'BBBBBBBBBBBBBBBB': 1}
What is happening? I see the keys are repeating in magnitudes of 2. But, why?
I don't really care about the order of the items, but I see some aren't even being changed.
Is there any other way to substitute the keys the way I intend to?

.items() returns a live view of the underlying dict contents. Mutating the dict while iterating it causes unpredictable effects, usually leading to some keys being processed more than once (thus some keys doubling multiple times), while others aren't processed at all. Python tries to defend you from this by raising a RuntimeError if the dict changes size during iteration, but your code is keeping a constant size at the time of the check (when the next item is requested from the iterator), so Python's cheap length check doesn't save you.
The minimal fix is to make your loop run over a snapshot of the items:
for k, v in tuple(a.items()):
A simpler solution is a dict comprehension though:
a = {k*2 if k.isupper() else k: v for k, v in a.items()}
That builds a whole new dict with the doubled keys before reassigning a, so no mutation issues occur. You could build a in one fell swoop for that matter, just by doing:
a = {let*2 if let.isupper() else let: i for i, let in enumerate(x)}
No need to listify x (strings already iterate by character) and enumerate can take care of numbering the values for you without needing zip, range or len at all.

Related

Histogram of lists enteries

I have a number of lists as follows:
list1 = ['a_1','a_2','b_17','c_19']
list2 = ['aa_1','a_12','b_15','d_39']
list3 = ['a_1','a_200','ba_1','u_0']
I wish to generate a histogram based on the labels, ignoring the numbering, that is, a has 4 entries over all the lists, ba 1 entry, u has 1 entry, and so on. The labels, are file names from a specific folder, before adding the numbers, so it is a finite known list.
How can I perform such a count without a bunch of ugly loops? Can I use unique here, somehow?
You cannot acheive it without a loop. But you can instead use list comphrension to make it into a single line. Something like this.
list1 = ['a_1','a_2','b_17','c_19']
list2 = ['aa_1','a_12','b_15','d_39']
list3 = ['a_1','a_200','ba_1','u_0']
lst = [x.split('_')[0] for x in (list1 + list2 + list3)]
print({x: lst.count(x) for x in lst})
You can use a defaultdict initialized to 0 to count the occurrence and get a nice container with the required information.
So, define the container:
from collections import defaultdict
histo = defaultdict(int)
I'd like to split the operation into methods.
First get the prefix from the string, to be used as key in the dictionary:
def get_prefix(string):
return string.split('_')[0]
This works like
get_prefix('az_1')
#=> 'az'
Then a method to update de dictionary:
def count_elements(lst):
for e in lst:
histo[get_prefix(e)] += 1
Finally you can call this way:
count_elements(list1)
count_elements(list2)
count_elements(list3)
dict(histo)
#=> {'a': 5, 'b': 2, 'c': 1, 'aa': 1, 'd': 1, 'ba': 1, 'u': 1}
Or directly
count_elements(list1 + list2 + list3)
To get the unique count, call it using set:
count_elements(set(list1 + list2 + list3))
dict(histo)
{'ba': 1, 'a': 4, 'aa': 1, 'b': 2, 'u': 1, 'd': 1, 'c': 1}

How to join all keys and values of dictionary and return in form of string?

Step 1. i/p= “wwwwaaadexxxxxx”
Step 2. converted= {'w': 4, 'a': 3, 'd': 1, 'e': 1, 'x': 6}
Step Final. o/p= 'w4a3d1e1x6'
I'm on S2 how to go to final step ?
Would appreciated direct conversions 1-> Final
Time Complexity should be less but would appreciate any Sol.
I want to return in form of String stored in any var
without importing anything
You can get ket and value pairs (using dict.items()) and parse them as a list, then use join to create a string out of it!
converted= {'w': 4, 'a': 3, 'd': 1, 'e': 1, 'x': 6}
print(''.join([f"{k}{v}" for k,v in converted.items()]))
w4a3d1e1x6
OR use Counter
Counter is from collections module that will give you a dict like structure with Count of each character
from collections import Counter
my_str = 'wwwwaaadexxxxxx'
print(''.join([f"{k}{v}" for k,v in Counter(my_str).items()]))

Using other dictionary values to define a dictionary value during initialization

Say I have three variables which I want to store in a dictionary such that the third is the sum of the first two. Is there a way to do this in one call when the dictionary is initialized? For example:
myDict = {'a': 1, 'b': 2, 'c': myDict['a'] + myDict['b']}
Python>=3.8's named assignment allows something like the following, which I guess you could interpret as one call:
>>> md = {**(md := {'a': 2, 'b': 3}), **{'c': md['a'] + md['b']}}
>>> md
{'a': 2, 'b': 3, 'c': 5}
But this is really just a fanciful way of forcing a two-liner into a single line and making it less readable and less memory-efficient (because of the intermediate dicts). Also note that the md used on the right hand side of the = really could be any name.
You could actually be a little more efficient and get rid of one spurious auxiliary dict:
(md := {'a': 2, 'b': 3}).update({'c': md['a'] + md['b']})
You can do:
>>> myDict = {'a': 1, 'b': 2}
>>> myDict["c"] = myDict["a"] + myDict["b"]
>>> myDict
{'a': 1, 'b': 2, 'c': 3}
You can not do this in 1 line, because myDict is not even exist while assigning to c

counting common words in Python

def words(word,number):
if number<len(word):
result={}
for key,value in word.items():
common_num=sorted(set(word.values()), reverse=True)[:number]
if value in common_num:
result.update({key:value})
word.clear()
word.update(result)
new_word_count={}
common_word=[]
common=[]
for key, value in word.items():
if value in common_word:
common.append(value)
common_word.append(value)
new_word_count=dict(word)
for key,value in new_word_count.items():
if value in common:
del word[key]
Example:
>>> word={'a': 2, 'b': 2, 'c' : 3, 'd: 3, 'e': 4, 'f' : 4, 'g' : 5}
>>> words(word,3)
My output: {'g': 5}
Expected output:{'g': 5, 'e': 4, 'f': 4}
Any idea why im getting this output
Well, without any special imports, there are easier ways to accomplish what you're trying to do. You've got a whole lot of rigmarole involved in tracking and storing the values being kept, then deleting, then re-adding, when you could simplify a lot; even with explanatory comments, this is substantially shorter:
def common_words(word_count, number):
# Early out when no filtering needed
if number >= len(word_count):
return
# Get the top number+1 keys based on their values
top = sorted(word_count, key=word_count.get, reverse=True)[:number+1]
# We kept one more than we needed to figure out what the tie would be
tievalue = word_count[top.pop()]
# If there is a tie, we keep popping until the tied values are gone
while top and tievalue == word_count[top[-1]]:
top.pop()
# top is now the keys we should retain, easy to compute keys to delete
todelete = word_count.keys() - top
for key in todelete:
del word_count[key]
There are slightly better ways to do this that avoid repeated lookups in word_count (sorting items, not keys, etc.), but this is easier to understand IMO, and the extra lookups in word_count are bounded and linear, so it's not a big deal.
Although in the comments the author mentions avoiding Counter(), for those interested in seeing how to apply it, here is a short solution as suggested by #ShadowRanger:
import collections as ct
word={'a': 2, 'b': 2, 'c' : 3, 'd': 3, 'e': 4, 'f' : 4, 'g' : 5}
words = ct.Counter(word)
words.most_common(3)
# [('g', 5), ('f', 4), ('e', 4)]

Summing values in a dictionary based on multiple conditions

I'm trying to sum values between multiple dictionaries, for example:
oneDic = {'A': 3, 'B': 0, 'C':1, 'D': 1, 'E': 2}
otherDic = {'A': 9, 'D': 1, 'E': 15}
I want to sum up the values of in oneDic if they are found in otherDic and if the corresponding value in otherDic is less than a specific value
oneDic = {'A': 3, 'B': 0, 'C':1, 'D': 1, 'E': 2}
otherDic = {'A': 9, 'D': 1, 'E': 15}
value = 12
test = sum(oneDic[value] for key, value in oneDic.items() if count in otherTest[count] < value
return (test)
I would expect a value of 4, because C is not found in otherDic and the value of E in otherDic is not less than value
But when I run this code I get a lovely key error, can anybody point me in the right direction?
How about this
sum(v for k, v in oneDic.items() if otherDic.get(k, value) < value)
Here we iterative over the k, v pairs of oneDic and only include them if the return from otherDic.get(k, value) is < value. dict.get takes two arguments, the second being optional. If the key is not found, the default value is used. Here we set the default value to be value so that missing keys from otherDic are not included.
By the way, the reason you get the KeyError is because you are trying to access B and C at some point in the iteration by doing otherDic['B'] and otherDic['C'] and that is a KeyError. However, using .get as in otherDic.get('B') will return the default of None since you did not supply a default - but it will not have a KeyError
The following code snippet works. I do not know what the count variable in your code is:
oneDic = {'A': 3, 'B': 0, 'C':1, 'D': 1, 'E': 2}
otherDic = {'A': 9, 'D': 1, 'E': 15}
value = 12
test = sum(j for i,j in oneDic.items() if (i in otherDic) and (otherDic[i] < value))
print(test)
Link to working code

Categories