I have source dictionary, please refer sample below. It has deep level nested sub-dictionaries and lists, and I need to remove all keys where the value is "nan".
data = {"col1":"val1","col2":"val2","col3":"val3","col4":"val3","list1":[{"l1":"v1","l2":"nan"},{"K1":"Kv1","K2":"nan"},{"M1":"Mv1","M2":"nan","sublist1":[{"SL1":"SV1","SL2":"nan"}]}],"list2":[{"l1":"v1","l2":"nan"},{"K1":"Kv1","K2":"nan"},{"M1":"Mv1","M2":"nan","sublist2":[{"SL1":"SV1","SL2":"nan"}]}]}
I tried following code by creating a function but it is not working as expected:
def cleanNullTerms(d):
clean = {}
for k, v in d.items():
if isinstance(v, list):
for values in v:
nested = cleanNullTerms(values)
if values =='nan':
clean[k] = nested
elif v is not 'nan':
clean[k] = v
return clean
You've didn't describe exactly what's wrong, but from inspection your function clearly looks wrong to me because it doesn't check for nested dicts. You also haven't indicated what your expected output would be, but the following looks correct to me.
from pprint import pprint, pp
def cleanNullTerms(d):
clean = {}
for k, v in d.items():
if isinstance(v, dict):
clean[k] = cleanNullTerms(v)
elif isinstance(v, list):
clean[k] = [cleanNullTerms(values) for values in v]
elif v != 'nan':
clean[k] = v
return clean
data = {'col1': 'val1',
'col2': 'val2',
'col3': 'val3',
'col4': 'val3',
'list1': [{'l1': 'v1', 'l2': 'nan'},
{'K1': 'Kv1', 'K2': 'nan'},
{'M1': 'Mv1',
'M2': 'nan',
'sublist1': [{'SL1': 'SV1', 'SL2': 'nan'}]}],
'list2': [{'l1': 'v1', 'l2': 'nan'},
{'K1': 'Kv1', 'K2': 'nan'},
{'M1': 'Mv1',
'M2': 'nan',
'sublist2': [{'SL1': 'SV1', 'SL2': 'nan'}]}]}
data = cleanNullTerms(data)
pprint(data, sort_dicts=False)
Results printed:
{'col1': 'val1',
'col2': 'val2',
'col3': 'val3',
'col4': 'val3',
'list1': [{'l1': 'v1'},
{'K1': 'Kv1'},
{'M1': 'Mv1', 'sublist1': [{'SL1': 'SV1'}]}],
'list2': [{'l1': 'v1'},
{'K1': 'Kv1'},
{'M1': 'Mv1', 'sublist2': [{'SL1': 'SV1'}]}]}
Your code put values of list to the place that the list had. If there were many values every value overwrote last one so only last one got saved. And it did not go recusively into dictionaries. It still only removes "nan"s if they are nested with dicts and lists. For example it does not go into sets. If you want it to go into sets, write a comment and I can add this.
def cleanNullTerms(d):
clean={}
for k, v in d.items():
if isinstance(v, list):
v2=[]
for value0 in v:
nested=cleanNullTerms(value0)
if value0!='nan':
v2.append(nested)
clean[k]=v2
elif isinstance(v, dict):
clean[k]=cleanNullTerms(v)
elif v!='nan':
clean[k]=v
return clean
Related
I want to get a new dictionary by removing all the keys with the same value (keys are differents, values are the same)
For example: Input:
dct = {'key1' : [1,2,3], 'key2': [1,2,6], 'key3': [1,2,3]}
expected output:
{'key2': [1, 2, 6]}
key1 and key3 was deleted because they shared same values.
I have no idea about it?
You can do this by creating a dictionary based on the values. In this case the values are lists which are not hashable so convert to tuple. The values in the new dictionary are lists to which we append any matching key from the original dictionary. Finally, work through the new dictionary looking for any values where the list length is greater than 1 - i.e., is a duplicate. Then we can remove those keys from the original dictionary.
d = {'key1' : [1,2,3], 'key2': [1,2,6], 'key3': [1,2,3]}
control = {}
for k, v in d.items():
control.setdefault(tuple(v), []).append(k)
for v in control.values():
if len(v) > 1:
for k in v:
del d[k]
print(d)
Output:
{'key2': [1, 2, 6]}
I created a list of counts which holds information about how many times an item is in the dictionary. Then I copy only items which are there once.
a = {"key1": [1,2,3], "key2": [1,2,6], "key3": [1,2,3]}
# find how many of each item are there
counts = list(map(lambda x: list(a.values()).count(x), a.values()))
result = {}
#copy only items which are in the list once
for i,item in enumerate(a):
if counts[i] == 1:
result[item] = a[item]
print(result)
As given by the OP, the simplest solution to the problem:
dct = {'key1' : [1,2,3], 'key2': [1,2,6], 'key3': [1,2,3]}
print({k:v for k, v in dct.items() if list(dct.values()).count(v) == 1})
Output:
{'key2': [1, 2, 6]}
One loop solution:
dict_ = {'key1' : [1,2,3], 'key2': [1,2,6], 'key3': [1,2,3]}
key_lookup = {}
result = {}
for key, value in dict_.items():
v = tuple(value)
if v not in key_lookup:
key_lookup[v] = key
result[key] = value
else:
if key_lookup[v] is not None:
del result[key_lookup[v]]
key_lookup[v] = None
print(result)
Output:
{'key2': [1, 2, 6]}
dct = {'key1' : [1,2,3], 'key2': [1,2,6], 'key3': [1,2,3]}
temp = list(dct.values())
result = {}
for key, value in dct.items():
for t in temp:
if temp.count(t) > 1:
while temp.count(t) > 0:
temp.remove(t)
else:
if t == value:
result[key] = value
print(result)
Output:
{'key2': [1, 2, 6]}
Consider a fixed list of keys and a dictionary of random key/values. I have to iterate through the dictionary and if a key is not in the list (keys that may or may not exist in the dictionary), then I add it to a new dictionary.
d = {'key1': '1', 'keyA': 'A', 'key2': '2', 'keyB': 'B', ...}
new_d = {}
for k, v in d.items():
if k not in ['key1', 'keyB', 'key_doesnotexist']:
new_d[k] = v
To optimize this, I thought about iterating through the list of keys first and popping anything that matches the list to get rid of the inner loop so something like this:
d = {'key1': '1', 'keyA': 'A', 'key2': '2', 'keyB': 'B', ...}
new_d = {}
for x in ['key1', 'keyB', 'key_doesnotexist']:
d.pop(x, default=None)
for k, v in d.items():
new_d[k] = v
Just wondering if there are any faster methods I should be aware of.
This might work faster:
d = {'key1': '1', 'keyA': 'A', 'key2': '2', 'keyB': 'B'}
new_d = {k: d[k] for k in d.keys() - ['key1', 'keyB', 'key_doesnotexist']}
Prints:
>>> new_d
{'keyA': 'A', 'key2': '2'}
Use a dict comprehension to select the items you want:
new_d = {k: v for k, v in d.items()
if k not in ['key1', 'keyB', 'key_doesnotexist']}
Or simply copy the dict and delete the unwanted entries:
import copy
new_d = copy.deepcopy(d)
for k in ['key1', 'keyB', 'key_doesnotexist']:
new_d.del(k)
I have a dictionary dict:
dict = {'drop_key1': '10001', 'drop_key2':'10002'}
The key(s) in dict startswith drop_, i would like to update dict by dropping drop_ value from key(s):
dict = {'key1': '10001', 'key2':'10002'}
What is the best approach to do it?
something like
d1 = {'drop_key1': '10001', 'drop_key2':'10002'}
d2 = {k[5:]:v for k,v in d1.items()}
print(d2)
output
{'key1': '10001', 'key2': '10002'}
One approach is, for each key value in the dictionary, you can replace the part of the string with the new string value. For instance:
d = {k.replace('drop_', ''): v for k, v in d.items() if k.strip().startswith('drop_')}
or you can define a function, and get the index of the searched string ("drop_"). If the search string index is 0, then remove it. For instance:
def change_key(key, search):
start_idx = key.find(search)
if start_idx == 0:
key = key.replace(search, "")
return key
d = {change_key(k, search="drop_"): v for k, v in d.items()}
Result:
{'key1': '10001', 'key2': '10002'}
Note that if you use a method, then you can guarantee to remove the search string if it is at the beginning of the string. For instance:
d = {' drop_key1': '10001', 'drop_key2': '10002'}
d = {change_key(k, search="drop_"): v for k, v in d.items()}
Result:
{' drop_key1': '10001', 'key2': '10002'}
I would like to know how if there exists any python function to merge two dictionary and combine all values that have a common key.
I have found function to append two dict, to merge two dict but not to combine its values.
Example:
D1 = [{k1: v01}, {k3: v03}, {k4: v04},}],
D2 = [{k1: v11}, {k2: v12}, {k4: v14},}],
this should be the expected result:
D3 = [
{k1: [v01, v11]},
{k2: [ v12]},
{K3: [v03 ]},
{k4: [v04, v14]},
]
There is no built-in function for this but you can use a defaultdict for this:
from collections import defaultdict
d = defaultdict(list)
for other in [d1, d1]:
for k, v in other.items():
d[k].append(v)
A solution, without importing anything:
# First initialize data, done correctly here.
D1 = [{'k1': 'v01'}, {'k3': 'v03'}, {'k4': 'v04'}]
D2 = [{'k1': 'v11'}, {'k2': 'v12'}, {'k4': 'v14'}]
# Get all unique keys
keys = {k for d in [*D1, *D2] for k in d}
# Initialize an empty dict
D3 = {x:[] for x in keys}
# sort to maintain order
D3 = dict(sorted(D3.items()))
#Iterate and extend
for x in [*D1, *D2]:
for k,v in x.items():
D3[k].append(v)
# NOTE: I do not recommend you convert a dictionary into a list of records.
# Nonetheless, here is how it would be done.
# To convert to a list
D3_list = [{k:v} for k,v in D3.items()]
print(D3_list)
# [{'k1': ['v01', 'v11']},
# {'k2': ['v12']},
# {'k3': ['v03']},
# {'k4': ['v04', 'v14']}]
If you meant to use actual dicts, instead of lists of dicts, this is easier.
D1 = dict(k1=1, k3=3, k4=4)
D2 = dict(k1=11, k2=12, k4=14)
There isn't a simple built-in function to do this, but the setdefault method is close.
It tries to get the given key, but creates it if it doesn't exist.
D3 = {}
for k, v in D1.items() | D2.items():
D3.setdefault(k, set()).add(v)
And the result.
{'k4': {4, 14}, 'k1': {1, 11}, 'k3': {3}, 'k2': {12}}
This all assumes the order doesn't matter, just combining sets.
A more generic way to merge dicts together may look like this.
(To answer a similar SO question)
def merge(combiner, dicts):
new_dict = {}
for d in dicts:
for k, v in d.items():
if k in new_dict:
new_dict[k] = combiner(new_dict[k], v)
else:
new_dict[k] = v
return new_dict
x = {'a': 'A', 'b': 'B'}
y = {'b': 'B', 'c': 'C'}
z = {'a': 'A', 'd': 'D'}
merge_dicts(combiner= lambda x, y: f'{x} AND {y}', dicts=(x,y,z))
# {'a': 'A AND A', 'b': 'B AND B', 'c': 'C', 'd': 'D'}
I have a dictionary of "259136 keys" and each of those keys, have 1 or more than one values.
My objective is "to find keys that have at least one value common with another key in the list of keys?"
I have tried different ways to deal with this problem but I was looking for a faster solution. I tried
for each key compare with the 259135 keys to check the above condition
reversing the dictionary from key value to value key, so now the value becomes key and this way I will have two dictionaries and I can go to first one and based on the values in the first one pull out all the values from the second one.
Use a dict of sets:
d={ 'k1': [1,2,3],
'k2': [2],
'k3': [10],
'k4': [3,2]
}
com_keys={}
for k, v in d.items():
for e in v:
com_keys.setdefault(e, set()).add(k)
print com_keys
# {1: set(['k1']), 10: set(['k3']), 3: set(['k1', 'k4']), 2: set(['k2', 'k1', 'k4'])}
Then if you only want the ones that have more than one key in common, just filter with a dict comprehension (or the like for older Pythons):
>>> {k:v for k,v in com_keys.items() if len(v)>1 }
{2: set(['k2', 'k1', 'k4']), 3: set(['k1', 'k4'])}
It get a little more challenging if your dict is a non-homogenous combination of containers that support iteration (lists, tuples, etc) with 'single items' that either do not support iteration (ints, floats) or things that you do not want to iterate with a for loop (strings, unicode, other dicts, etc)
For example, assume you have a combination of lists and 'single items' that are ints and strings:
import collections
d={ 'k1': [1,2,3],
'k2': 2,
'k3': [10],
'k4': [3,2],
'k5': 'string',
'k6': ['string',2]
}
com_keys={}
for k, v in d.items():
if not isinstance(v, basestring) and isinstance(v, collections.Iterable):
for e in v:
com_keys.setdefault(e, set()).add(k)
else:
com_keys.setdefault(v, set()).add(k)
print com_keys
# {1: set(['k1']), 10: set(['k3']), 3: set(['k1', 'k4']), 2: set(['k2', 'k1', 'k6', 'k4']), 'string': set(['k6', 'k5'])}
print {k:v for k,v in com_keys.items() if len(v)>1 }
# {2: set(['k2', 'k1', 'k6', 'k4']), 3: set(['k1', 'k4']), 'string': set(['k6', 'k5'])}