Create dictionary from dict and list - python

I have a dictionary :
dicocategory = {}
dicocategory["a"] = ["crapow", "Killian", "pauk", "victor"]
dicocategory["b"] = ["graton", "fred"]
dicocategory["c"] = ["babar", "poca", "german", "Georges", "nowak"]
dicocategory["d"] = ["crado", "cradi", "hibou", "distopia", "fiboul"]
dicocategory["e"] = ["makenkosapo"]
and a list :
my_list = ['makenkosapo', 'Killian', 'Georges', 'poca', 'nowak']
I want to create a new dictionary with my dicocategory's keys as new keys and items of my list as values.
To get the keys of my new dict (removing duplicate content and adapted to my list) I made :
def tablemain(my_list ):
tableheaders = list()
for value in my_list:
tableheaders.append([k for k, v in dicocategory.items() if value in v])
convertlist = [j for i in tableheaders for j in i]
headerstablefinal = list(set(convertlist))
return headerstablefinal
giving me:
['e', 'a', 'c']
My problem is: I don't know how to put the items of my list in the corresponding keys.
EDIT :
Bellow an output of what I want
{"a" : ['Killian'], 'c' : ['Georges', 'poca', 'nowak'], 'e' : ['makenkosapo']}
The list my_list can change, so I want something that can create a new dictionary doesn't matter the list.
If my new list is :
my_list = ['crapow', 'german', 'pauk']
My output will be :
{'a':['crapow', 'pauk'], 'c':['german']}
Do you have any idea?
Thank you

You can use a couple of dictionary comprehensions. Calculate the intersection in the first, and in the second remove instances where the intersection is empty:
my_set = set(my_list)
# calculate intersection
res = {k: set(v) & my_set for k, v in dicocategory.items()}
# remove zero intersection values
res = {k: v for k, v in res.items() if v}
print(res)
{'a': {'Killian'},
'c': {'Georges', 'nowak', 'poca'},
'e': {'makenkosapo'}}
More efficiently, you can use a generator expression to avoid an intermediary dictionary:
# generate intersection
gen = ((k, set(v) & my_set) for k, v in dicocategory.items())
# remove zero intersection values
res = {k: v for k, v in gen if v}

You can get a dictionary containing only keys with values that match your list like this:
{k:v for k,v in dicocategory.items() if set(v).intersection(set(my_list))}
You won't be able to put that directly into a DataFrame though as the lists differ in length.

Related

Extract differences of values of 2 given dictionaries - values are tuples of strings

I have two dictionaries as follows, I need to extract which strings in the tuple values are in one dictionary but not in other:
dict_a = {"s": ("mmmm", "iiiii", "p11"), "yyzz": ("oo", "i9")}
dict_b = {"s": ("mmmm",), "h": ("pp",), "g": ("rr",)}
The desired output:
{"s": ("iiiii", "p11"), "yyzz": ("oo", "i9")}
The order of the strings in the output doesn't matter.
One way that I tried to solve, but it doesn't produce the expected result:
>>> [item for item in dict_a.values() if item not in dict_b.values()]
[('mmmm', 'iiiii', 'p11'), ('oo', 'i9')]
If order doesn't matter, convert your dictionary values to sets, and subtract these:
{k: set(v) - set(dict_b.get(k, ())) for k, v in dict_a.items()}
The above takes all key-value pairs from dict_a, and for each such pair, outputs a new dictionary with those keys and a new value that's the set difference between the original value and the corresponding value from dict_b, if there is one:
>>> dict_a = {"s": ("mmmm", "iiiii", "p11"), "yyzz": ("oo", "i9")}
>>> dict_b = {"s": ("mmmm",), "h": ("pp",), "g": ("rr",)}
>>> {k: set(v) - set(dict_b.get(k, ())) for k, v in dict_a.items()}
{'s': {'p11', 'iiiii'}, 'yyzz': {'oo', 'i9'}}
The output will have sets, but these can be converted back to tuples if necessary:
{k: tuple(set(v) - set(dict_b.get(k, ()))) for k, v in dict_a.items()}
The dict_b.get(k, ()) call ensures there is always a tuple to give to set().
If you use the set.difference() method you don't even need to turn the dict_b value to a set:
{k: tuple(set(v).difference(dict_b.get(k, ()))) for k, v in dict_a.items()}
Demo of the latter two options:
>>> {k: tuple(set(v) - set(dict_b.get(k, ()))) for k, v in dict_a.items()}
{'s': ('p11', 'iiiii'), 'yyzz': ('oo', 'i9')}
>>> {k: tuple(set(v).difference(dict_b.get(k, ()))) for k, v in dict_a.items()}
{'s': ('p11', 'iiiii'), 'yyzz': ('oo', 'i9')}
{k: [v for v in vs if v not in dict_b.get(k, [])] for k,vs in dict_a.items()}
if you want to use tuples (or sets - just replace the cast)
{k: tuple(v for v in vs if v not in dict_b.get(k, [])) for k,vs in dict_a.items()}
Try this (see comments for explanations):
>>> out = {} # Initialise output dictionary
>>> for k, v in dict_a.items(): # Iterate through items of dict_a
... if k not in dict_b: # Check if the key is not in dict_b
... out[k] = v # If it isn't, add to out
... else: # Otherwise
... out[k] = tuple(set(v) - set(dict_b[k])) # Subtract sets to find the difference
...
>>> out
{'s': ('iiiii', 'p11'), 'yyzz': ('oo', 'i9')}
This can then be simplified using a dictionary comprehension:
>>> out = {k: tuple(set(v) - set(dict_b.get(k, ()))) for k, v in dict_a.items()}
>>> out
{'s': ('iiiii', 'p11'), 'yyzz': ('oo', 'i9')}
See this solution :
First iterate through all keys in dict_a
Check if the key is present or not in dict_b
Now, if present: take the tuples and iterate through dict_a's tuple(as per your question). now check weather that element is present or not in dict_b's tuple. If it is present just leave it. If it is not just add that element in tup_res.
Now after the for loop, add that key and tup value in the dict_res.
if the key is not present in dict_b simply add it in dict_res.
dict_a = {"s":("mmmm","iiiii","p11"), "yyzz":("oo","i9")}
dict_b = {"s":("mmmm"),"h":("pp",),"g":("rr",)}
dict_res = {}
for key in dict_a:
if key in dict_b:
tup_a = dict_a[key]
tup_b = dict_b[key]
tup_res = ()
for tup_ele in tup_a:
if tup_ele in tup_b:
pass
else:
tup_res = tup_res + (tup_ele,)
dict_res[key] = tup_res;
else:
dict_res[key] = dict_a[key];
print(dict_res)
It is giving correct output :)

Combine python dictionaries that share values and keys

I am doing some entity matching based on string edit distance and my results are a dictionary with keys (query string) and values [list of similar strings] based on some scoring criteria.
for example:
results = {
'ben' : ['benj', 'benjamin', 'benyamin'],
'benj': ['ben', 'beny', 'benjamin'],
'benjamin': ['benyamin'],
'benyamin': ['benjamin'],
'carl': ['karl'],
'karl': ['carl'],
}
Each value also has a corresponding dictionary item, for which it is the key (e.g. 'carl' and 'karl').
I need to combine the elements that have shared values. Choosing one value as the new key (lets say the longest string). In the above example I would hope to get:
results = {
'benjamin': ['ben', 'benj', 'benyamin', 'beny', 'benjamin', 'benyamin'],
'carl': ['carl','karl']
}
I have tried iterating through the dictionary using the keys, but I can't wrap my head around how to iterate and compare through each dictionary item and its list of values (or single value).
This is one solution using collections.defaultdict and sets.
The desired output is very similar to what you have, and can be easily manipulated to align.
from collections import defaultdict
results = {
'ben' : ['benj', 'benjamin', 'benyamin'],
'benj': ['ben', 'beny', 'benjamin'],
'benjamin': 'benyamin',
'benyamin': 'benjamin',
'carl': 'karl',
'karl': 'carl',
}
d = defaultdict(set)
for i, (k, v) in enumerate(results.items()):
w = {k} | (set(v) if isinstance(v, list) else {v})
for m, n in d.items():
if not n.isdisjoint(w):
d[m].update(w)
break
else:
d[i] = w
result = {max(v, key=len): v for k, v in d.items()}
# {'benjamin': {'ben', 'benj', 'benjamin', 'beny', 'benyamin'},
# 'carl': {'carl', 'karl'}}
Credit to #IMCoins for the idea of manipulating v to w in second loop.
Explanation
There are 3 main steps:
Convert values into a consistent set format, including keys and values from original dictionary.
Cycle through this dictionary and add values to a new dictionary. If there is an intersection with some key [i.e. sets are not disjoint], then use that key. Otherwise, add to new key determined via enumeration.
Create result dictionary in a final transformation by mapping max length key to values.
EDIT : Even though performance was not the question here, I took the liberty to perform some tests between jpp's answer, and mine... here is the full script. My script performs the tests in 17.79 seconds, and his in 23.5 seconds.
import timeit
results = {
'ben' : ['benj', 'benjamin', 'benyamin'],
'benj': ['ben', 'beny', 'benjamin'],
'benjamin': ['benyamin'],
'benyamin': ['benjamin'],
'carl': ['karl'],
'karl': ['carl'],
}
def imcoins(result):
new_dict = {}
# .items() for python3x
for k, v in results.iteritems():
flag = False
# Checking if key exists...
if k not in new_dict.keys():
# But then, we also need to check its values.
for item in v:
if item in new_dict.keys():
# If we update, set the flag to True, so we don't create a new value.
new_dict[item].update(v)
flag = True
if flag == False:
new_dict[k] = set(v)
# Now, to sort our newly created dict...
sorted_dict = {}
for k, v in new_dict.iteritems():
max_string = max(v)
if len(max_string) > len(k):
sorted_dict[max(v, key=len)] = set(v)
else:
sorted_dict[k] = v
return sorted_dict
def jpp(result):
from collections import defaultdict
res = {i: {k} | (set(v) if isinstance(v, list) else {v}) \
for i, (k, v) in enumerate(results.items())}
d = defaultdict(set)
for i, (k, v) in enumerate(res.items()):
for m, n in d.items():
if n & v:
d[m].update(v)
break
else:
d[i] = v
result = {max(v, key=len): v for k, v in d.items()}
return result
iterations = 1000000
time1 = timeit.timeit(stmt='imcoins(results)', setup='from __main__ import imcoins, results', number=iterations)
time2 = timeit.timeit(stmt='jpp(results)', setup='from __main__ import jpp, results', number=iterations)
print time1 # Outputs : 17.7903265883
print time2 # Outputs : 23.5605850732
If I move the import from his function to global scope, it gives...
imcoins : 13.4129249463 seconds
jpp : 21.8191823393 seconds

Remove duplicates and combine multiple lists into one?

How do I remove duplicates and combine multiple lists into one like so:
function([["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]) should return exactly:
[["good", ["me.txt", "money.txt"]], ["hello", ["me.txt"]], ["rep", ["money.txt"]]]
The easiest one would be using defaultdict .
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> for i,j in l:
d[i].append(j) #append value to the key
>>> d
=> defaultdict(<class 'list'>, {'hello': ['me.txt'], 'good': ['me.txt', 'money.txt'],
'rep': ['money.txt']})
#to get it in a list
>>> out = [ [key,d[key]] for key in d]
>>> out
=> [['hello', ['me.txt']], ['good', ['me.txt', 'money.txt']], ['rep', ['money.txt']]]
#driver values :
IN : l = [["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]
Try This ( no library needed ):
your_input_data = [ ["hello","me.txt"], ["good","me.txt"], ["good","me.txt"], ["good","money.txt"], ["rep", "money.txt"] ]
my_dict = {}
for box in your_input_data:
if box[0] in my_dict:
buffer_items = []
for items in box[1:]:
if items not in my_dict[box[0]]:
buffer_items.append(items)
remove_dup = list(set(buffer_items + my_dict[box[0]]))
my_dict[box[0]] = remove_dup
else:
buffer_items = []
for items in box[1:]:
buffer_items.append(items)
remove_dup = list(set(buffer_items))
my_dict[box[0]] = remove_dup
last_point = [[keys, values] for keys, values in my_dict.items()]
print(last_point)
Good Luck ...
You can do it with traditional dictionaries too.
In [30]: l1 = [["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]
In [31]: for i, j in l1:
...: if i not in d2:
...: d2[i] = j
...: else:
...: val = d2[i]
...: d2[i] = [val, j]
...:
In [32]: d2
Out[32]: {'good': ['me.txt', 'money.txt'], 'hello': 'me.txt', 'rep': 'money.txt'}
In [33]: out = [ [key,d1[key]] for key in d1]
In [34]: out
Out[34]:
[['rep', ['money.txt']],
['hello', ['me.txt']],
['good', ['me.txt', 'money.txt']]]
Let's first understand the actual problem :
Example Hint :
For these types of list problems there is a pattern :
So suppose you have a list :
a=[(2006,1),(2007,4),(2008,9),(2006,5)]
And you want to convert this to a dict as the first element of the tuple as key and second element of the tuple. something like :
{2008: [9], 2006: [5], 2007: [4]}
But there is a catch you also want that those keys which have different values but keys are same like (2006,1) and (2006,5) keys are same but values are different. you want that those values append with only one key so expected output :
{2008: [9], 2006: [1, 5], 2007: [4]}
for this type of problem we do something like this:
first create a new dict then we follow this pattern:
if item[0] not in new_dict:
new_dict[item[0]]=[item[1]]
else:
new_dict[item[0]].append(item[1])
So we first check if key is in new dict and if it already then add the value of duplicate key to its value:
full code:
a=[(2006,1),(2007,4),(2008,9),(2006,5)]
new_dict={}
for item in a:
if item[0] not in new_dict:
new_dict[item[0]]=[item[1]]
else:
new_dict[item[0]].append(item[1])
print(new_dict)
Your actual problem solution :
list_1=[["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]
no_dublicates={}
for item in list_1:
if item[0] not in no_dublicates:
no_dublicates[item[0]]=["".join(item[1:])]
else:
no_dublicates[item[0]].extend(item[1:])
list_result=[]
for key,value in no_dublicates.items():
list_result.append([key,value])
print(list_result)
output:
[['hello', ['me.txt']], ['rep', ['money.txt']], ['good', ['me.txt', 'money.txt']]]
yourList=[["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]
expectedList=[["good", ["me.txt", "money.txt"]], ["hello", ["me.txt"]], ["rep", ["money.txt"]]]
def getall(allsec, listKey, uniqlist):
if listKey not in uniqlist:
uniqlist.append(listKey)
return [listKey, [x[1] for x in allsec if x[0] == listKey]]
uniqlist=[]
result=sorted(list(filter(lambda x:x!=None, [getall(yourList,elem[0],uniqlist) for elem in yourList])))
print(result)
hope this helps
This can easily be solved using dict and sets.
def combine_duplicates(given_list):
data = {}
for element_1, element_2 in given_list:
data[element_1] = data.get(element_1, set()).add(element_2)
return [[k, list(v)] for k, v in data.items()]
Using Python to create a function that gives you the exact required output can be done as follows:
from collections import defaultdict
def function(data):
entries = defaultdict(list)
for k, v in data:
entries[k].append(v)
return sorted([k, v] for k, v in entries.items())
print(function([["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]))
The output is sorted before being returned as per your requirement. This would display the return from the function as:
[['good', ['me.txt', 'money.txt']], ['hello', ['me.txt']], ['rep', ['money.txt']]]
It also ensures that the keys are sorted. A dictionary is used to deal with the removal of duplicates (as keys need to be unique).
A defaultdict() is used to simplify the building of lists within the dictionary. The alternative would be to try and append a new value to an existing key, and if there is a KeyError exception, then add the new key instead as follows:
def function(data):
entries = {}
for k, v in data:
try:
entries[k].append(v)
except KeyError as e:
entries[k] = [v]
return sorted([k, v] for k, v in entries.items())
Create a empty array push the index 0 from childs arrays and join to convert all values to a string separate by space .
var your_input_data = [ ["hello","hi", "jel"], ["good"], ["good2","lo"], ["good3","lt","ahhahah"], ["rep", "nice","gr8", "job"] ];
var myprint = []
for(var i in your_input_data){
myprint.push(your_input_data[i][0]);
}
console.log(myprint.join(' '))

How to split list inside a dictionnary to create a new one?

I've been struggling on something for the day,
I have a dictionnary under the format
dict = {a:[element1, element2, element3], b:[element4, element5, element6]...}
I want a new dictionnary under the form
newdict = {a:element1, b:element4...}
Meaning only keeping the first element of the lists contained for each value.
You can use a dictionary comprehension:
{k: v[0] for k, v in d.items()}
# {'a': 'element1', 'b': 'element4'}
Hopefully this helps.
I like to check if the dictionary has a key before overwriting a keys value.
dict = {a:[element1, element2, element3], b:[element4, element5, element6]}
Python 2
newDict = {}
for k, v in dict.iteritems():
if k not in newDict:
# add the first list value to the newDict's key
newDick[k] = v[0]
Python 3
newDict = {}
for k, v in dict.items():
if k not in newDict:
# add the first list value to the newDict's key
newDick[k] = v[0]

Deleting from dict if found in new list in Python

Say I have a dictionary with whatever number of values.
And then I create a list.
If any of the values of the list are found in the dictionary, regardless of whether or not it is a key or an index how do I delete the full value?
E.g:
dictionary = {1:3,4:5}
list = [1]
...
dictionary = {4:5}
How do I do this without creating a new dictionary?
for key, value in list(dic.items()):
if key in lst or value in lst:
del dic[key]
No need to create a separate list or dictionary.
I interpreted "whether or not it is a key or an index" to mean "whether or not it is a key or a value [in the dictionary]"
it's a bit complicated because of your "values" requirement:
>>> dic = {1: 3, 4: 5}
>>> ls = set([1])
>>> dels = []
>>> for k, v in dic.items():
if k in ls or v in ls:
dels.append(k)
>>> for i in dels:
del dic[i]
>>> dic
{4: 5}
A one liner to do this would be :
[dictionary.pop(x) for x in list if x in dictionary.keys()]
dictionary = {1:3,4:5}
list = [1]
for key in list:
if key in dictionary:
del dictionary[key]
>>> dictionary = {1:3,4:5}
>>> list = [1]
>>> for x in list:
... if x in dictionary:
... del(dictionary[x])
...
>>> dictionary
{4: 5}
def remKeys(dictionary, list):
for i in list:
if i in dictionary.keys():
dictionary.pop(i)
return dictionary
I would do something like:
for i in list:
if dictionary.has_key(i):
del dictionary[i]
But I am sure there are better ways.
A few more testcases to define how I interpret your question:
#!/usr/bin/env python
def test(beforedic,afterdic,removelist):
d = beforedic
l = removelist
for i in l:
for (k,v) in list(d.items()):
if k == i or v == i:
del d[k]
assert d == afterdic,"d is "+str(d)
test({1:3,4:5},{4:5},[1])
test({1:3,4:5},{4:5},[3])
test({1:3,4:5},{1:3,4:5},[9])
test({1:3,4:5},{4:5},[1,3])
If the dictionary is small enough, it's easier to just make a new one. Removing all items whose key is in the set s from the dictionary d:
d = dict((k, v) for (k, v) in d.items() if not k in s)
Removing all items whose key or value is in the set s from the dictionary d:
d = dict((k, v) for (k, v) in d.items() if not k in s and not v in s)

Categories