Merge two dictionaries and if duplicate keys found , add their values - python

I have 2 python dictionaries:
x = {'bookA': 1, 'bookB': 2, 'bookC': 3, 'bookD': 4, 'bookE': 5}
y = {'bookB': 1, 'bookD': 2, 'bookF': 3, 'bookL': 4, 'bookX': 5}
I want to merge the above two dictionaries and create an another dictionary.
I tried this code:
z = {**x, **y}
But the key values are overriding in this case. I want a dictionary in which if there are duplicates , add their values or some other action can also be there like subtraction, multiplication etc. So my motto is not to override the duplicate values but to perform some action if got any duplicate.
Any help would be highly appreciated.

Option 1
Convert x and y to collections.Counter objects and just sum them (Counter supports __add__ition.)
from collections import Counter
z = dict(Counter(x) + Counter(y))
z
{'bookA': 1,
'bookB': 3,
'bookC': 3,
'bookD': 6,
'bookE': 5,
'bookF': 3,
'bookL': 4,
'bookX': 5}
Option 2
You can write a neat little dict comprehension using dict.pop -
z = {k : x[k] + y.pop(k, 0) for k in x}
Now, update z with what's left in y -
z.update(y)
Or,
z = {**z, **y} # python3.6
z
{'bookA': 1,
'bookB': 3,
'bookC': 3,
'bookD': 6,
'bookE': 5,
'bookF': 3,
'bookL': 4,
'bookX': 5}

Related

Adding the values of same keys in one dictionary

I have a Dictionary :
Dict1= {“AAT”: 2, “CCG”: 1, “ATA”: 5, “GCG”: 7, “CGC”: 2, “TAG”: 1, “GAT”: 0, “AAT”: 3, “CCG”: 2, “ATG”: 5, “GCG”: 3, “CGC”: 7, “TAG”: 0, “GAT”: 0}
And I have to sum all the similar triplet codes in a new dictionary.
Output should be like this:
Dict2 = {“AAT”: 5, “CCG”: 3, “ATA”: 5, “GCG”: 10, “CGC”: 9, “TAG”: 1, “GAT”: 0}
How do I proceed with the code?
Dict1 is not a valid dictionary as dictionary keys have to be unique. In general if you have some (non-unique) strings and values assigned to them, you can write
if key in Dict2:
Dict2[key] += val
else
Dict2[key] = val
You are trying to sum up the values of same keys which not possible since python doesn't allow duplicate keys in dictionary. You can check this for reference:
https://www.w3schools.com/python/python_dictionaries.asp

Is there a way to self generate similar key names and values in a dictionary with a loop?

I want to create a dictionary by using a loop or similar technique. Something like the below variable assignment is possible.
my_dict = {v:int(v*random()) for v in range(10)}
Though the question I am stuck at- How can I generate similar names for the item keys? Giving an example below:
{'Item-1': 1, 'Item-2':3, 'Item-3':3 ....}
Thanks in advance!
from random import random
my_dict = {f'item-{v+1}': int(v*random()) for v in range(10)}
print(my_dict)
Output:
{'item-1': 0, 'item-2': 0, 'item-3': 1, 'item-4': 1, 'item-5': 0, 'item-6': 3, 'item-7': 2, 'item-8': 4, 'item-9': 6, 'item-10': 2}
This uses an f-string to create the key, the corresponding value is randomly generated like in your question.
You can use list comprehension in dictionaries too.
from random import randint
dic = {f"item-{i}": randint(0, 10) for i in range(1, 11)}
print(dic)
Create keys and values and add to my_dict in a loop
my_dict = {}
for v in range(10): my_dict[f'Item-{v}'] = v
print(my_dict)
{'Item-0': 0, 'Item-1': 1, 'Item-2': 2, 'Item-3': 3, 'Item-4': 4, 'Item-5': 5, 'Item-6': 6, 'Item-7': 7, 'Item-8': 8, 'Item-9': 9}

How to use .apply() to combine a column of dictionaries into one dictionary?

I have a column of dictionaries within a pandas data frame.
srs_tf = pd.Series([{'dried': 1, 'oak': 2},{'fruity': 2, 'earthy': 2},{'tones': 2, 'oak': 4}])
srs_b = pd.Series([2,4,6])
df = pd.DataFrame({'tf': srs_tf, 'b': srs_b})
df
tf b
0 {'dried': 1, 'oak': 2} 2
1 {'fruity': 2, 'earthy': 2} 4
2 {'tones': 2, 'oak': 4} 6
These dictionaries represent word frequency in descriptions of wines (Ex input dictionary:{'savory': 1, 'dried': 3, 'thyme': 1, 'notes':..}). I need to create an output dictionary from this column of dictionaries that contains all of the keys from the input dictionaries and maps them to the number of input dictionaries in which those keys are present. For example, the word 'dried' is a key in 850 of the input dictionaries, so in the output dictionary {.. 'dried': 850...}.
I want to try using the data frame .apply() method but I believe that I am using it incorrectly.
def worddict(row, description_counter):
for key in row['tf'].keys():
if key in description_counter.keys():
description_counter[key] += 1
else:
description_counter[key] = 1
return description_counter
description_counter = {}
output_dict = df_wine_list.apply(lambda x: worddict(x, description_counter), axis = 1)
So a couple things. I think that my axis should = 0 rather than 1, but I get this error when I try that: KeyError: ('tf', 'occurred at index Unnamed: 0')
When I do use axis = 1, my function returns a column of identical dictionaries rather than a single dictionary.
You can use chain and Counter:
from collections import Counter
from itertools import chain
Counter(chain.from_iterable(df['a']))
# Counter({'dried': 1, 'earthy': 1, 'fruity': 1, 'oak': 2, 'tones': 1})
Or,
Counter(y for x in df['a'] for y in x)
# Counter({'dried': 1, 'earthy': 1, 'fruity': 1, 'oak': 2, 'tones': 1})
You can also use Index.value_counts,
pd.concat(map(pd.Series, df['a'])).index.value_counts().to_dict()
# {'dried': 1, 'earthy': 1, 'fruity': 1, 'oak': 2, 'tones': 1}

Create a multiset from a Set X

in a Multiset it is allowed to have multiple elements
For Example. if X (normal set) = {0,2,4,7,10}, then ∆X (multiset) = {2,2,3,3,4,5,6,7,8,10}.
∆X denotes the multiset of all 􏰃(N 2) pairwise distances between points in X
How can i Write this in Python?
I have created a List X but i don't know how to put all differences in another list and order them.
I hope you can help me.
It is basically just one line.
import itertools
s = {0,2,4,7,10}
sorted([abs(a-b) for (a,b) in itertools.combinations(s,2)])
you can use itertools
import itertools
s = {0,2,4,7,10}
k = itertools.combinations(s,2)
distance = []
l = list(k)
for p in l:
distance.append(abs(p[1]-p[0]))
print(sorted(distance))
A simple way is to convert your set to a list, sort it, and then use a double for loop to compute the differences:
X = {0,2,4,7,10} # original set
sorted_X = sorted(list(X))
diffs = []
for i, a in enumerate(sorted_X):
for j, b in enumerate(sorted_X):
if j > i:
diffs.append(b-a)
print(diffs)
#[2, 4, 7, 10, 2, 5, 8, 3, 6, 3]
And if you want the diffs sorted as well:
print(sorted(diffs))
#[2, 2, 3, 3, 4, 5, 6, 7, 8, 10]
Another option that would work in this case is to use itertools.product:
from itertools import product
print(sorted([(y-x) for x,y in product(sorted_X, sorted_X) if y>x]))
#[2, 2, 3, 3, 4, 5, 6, 7, 8, 10]

How is sorted(key=lambda x:) implemented behind the scene?

An example:
names = ["George Washington", "John Adams", "Thomas Jefferson", "James Madison"]
sorted(names, key=lambda name: name.split()[-1].lower())
I know key is used to compare different names, but it can have two different implementations:
First compute all keys for each name, and bind the key and name together in some way, and sort them. The p
Compute the key each time when a comparison happens
The problem with the first approach is that it has to define another data structure to bind the key and data. The problem with the second approach is that the key might be computed for multiple times, that is, name.split()[-1].lower() will be executed many times, which is very time-consuming.
I am just wondering in which way Python implemented sorted().
The key function is executed just once per value, to produce a (keyvalue, value) pair; this is then used to sort and later on just the values are returned in the sorted order. This is sometimes called a Schwartzian transform.
You can test this yourself; you could count how often the function is called, for example:
>>> def keyfunc(value):
... keyfunc.count += 1
... return value
...
>>> keyfunc.count = 0
>>> sorted([0, 8, 1, 6, 4, 5, 3, 7, 9, 2], key=keyfunc)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> keyfunc.count
10
or you could collect all the values that are being passed in; you'll see that they follow the original input order:
>>> def keyfunc(value):
... keyfunc.arguments.append(value)
... return value
...
>>> keyfunc.arguments = []
>>> sorted([0, 8, 1, 6, 4, 5, 3, 7, 9, 2], key=keyfunc)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> keyfunc.arguments
[0, 8, 1, 6, 4, 5, 3, 7, 9, 2]
If you want to read the CPython source code, the relevant function is called listsort(), and the keyfunc is used in the following loop (saved_ob_item is the input array), which is executed before sorting takes place:
for (i = 0; i < saved_ob_size ; i++) {
keys[i] = PyObject_CallFunctionObjArgs(keyfunc, saved_ob_item[i],
NULL);
if (keys[i] == NULL) {
for (i=i-1 ; i>=0 ; i--)
Py_DECREF(keys[i]);
if (saved_ob_size >= MERGESTATE_TEMP_SIZE/2)
PyMem_FREE(keys);
goto keyfunc_fail;
}
}
lo.keys = keys;
lo.values = saved_ob_item;
so in the end, you have two arrays, one with keys and one with the original values. All sort operations act on the two arrays in parallel, sorting the values in lo.keys and moving the elements in lo.values in tandem.

Categories