Removing duplicates in values of dictionary in python - python

Sorry the topic's title is vague, I find it hard to explain.
I have a dictionary in which each value is a list of items. I wish to remove the duplicated items, so that each item will appear minimum times (preferable once) in the lists.
Consider the dictionary:
example_dictionary = {"weapon1":[1,2,3],"weapon2":[2,3],"weapon3":[2,3]}
'weapon2' and 'weapon3' have the same values, so it should result in:
result_dictionary = {"weapon1":[1],"weapon2":[3],"weapon3":[2]}
since I don't mind the order, it can also result in:
result_dictionary = {"weapon1":[1],"weapon2":[2],"weapon3":[3]}
But when "there's no choice" it should leave the value. Consider this new dictionary:
example_dictionary = {"weapon1":[1,2,3],"weapon2":[2,3],"weapon3":[2,3],"weapon4":[3]}
now, since it cannot assign either '2' or '3' only once without leaving a key empty, a possible output would be:
result_dictionary = {"weapon1":[1],"weapon2":[3],"weapon3":[2],"weapon4":[3]}
I can relax the problem to only the first part and manage, though I prefer a solution to the two parts together

#!/usr/bin/env python3
example_dictionary = {"weapon1":[1,2,3],"weapon2":[2,3],"weapon3":[2,3]}
result = {}
used_values = []
def extract_semi_unique_value(my_list):
for val in my_list:
if val not in used_values:
used_values.append(val)
return val
return my_list[0]
for key, value in example_dictionary.items():
semi_unique_value = extract_semi_unique_value(value)
result[key] = [semi_unique_value]
print(result)

This is probably not the most efficient solution possible. Because it involves iteration over all possible combinations, then it'll run quite slow for large targets.
It makes use of itertools.product() to get all possible combinations. Then in it, tries to find the combination with the most unique numbers (by testing the length of a set).
from itertools import product
def dedup(weapons):
# get the keys and values ordered so we can join them back
# up again at the end
keys, vals = zip(*weapons.items())
# because sets remove all duplicates, whichever combo has
# the longest set is the most unique
best = max(product(*vals), key=lambda combo: len(set(combo)))
# combine the keys and whatever we found was the best combo
return {k: [v] for k, v in zip(keys, best)}
From the examples:
dedup({"weapon1":[1,2,3],"weapon2":[2,3],"weapon3":[2,3]})
#: {'weapon1': 1, 'weapon2': 2, 'weapon3': 3}
dedup({"weapon1":[1,2,3],"weapon2":[2,3],"weapon3":[2,3],"weapon4":[3]})
#: {'weapon1': 1, 'weapon2': 2, 'weapon3': 2, 'weapon4': 3}

this could help
import itertools
res = {'weapon1': [1, 2, 3], 'weapon2': [2, 3], 'weapon3': [2, 3]}
r = [[x] for x in list(set(list(itertools.chain.from_iterable(res.values()))))]
r2 = [x for x in res.keys()]
r3 = list(itertools.product(r2,r))
r4 = dict([r3[x] for x in range(0,len(r3)) if not x%4])

Related

How do I pull a random key from a dictionary with a certain value?

I have a dictionary of words, each of which with a certain point value. I would dictionary to search though this dictionary for a random word with a specific point value, i.e. find a random word with a point value of 3. my dictionary is structured like this:
wordList = {"to":1,"as":1,"be":1,"see":2,"bed":2,"owl":2,"era":2,"alive":3,"debt":3,"price":4,"stain":4} #shortened list obviously
Looked around online and I couldn't find a great answer, that or I did and I just didn't quite get it.
I would use random.choice() with a list comprehension:
from random import choice
choice([word for word, count in wordList.items() if count == 3])
If you don't care about performance, that will work but it will recreate a dictionary every time you access it:
random.choice([k for k,v in wordList.items() if v == 3])
otherwise it's could be better to create a reversed dictionary, to save the time in multiple runs:
from random import choice
from collections import defaultdict
rev = defaultdict(list)
for k, v wordList.items():
rev[v].append(k)
...
choice(rev[3])
I think using if statement and random.choice answers your problem in a short time
from random import choice
wordList = {"to": 1, "as": 1, "be": 1, "see": 2, "bed": 2, "owl": 2, "era": 2,
"alive": 3, "debt": 3, "price": 4, "stain": 4} # shortened list obviously
value = int(input())
lst = []
for key,val in wordList.items():
if val == value:
lst.append(key)
print(choice(lst))
one-liner:
choice([key for key, val in wordList.items() if val == value])

python list of lists to dict when key appear many times

I know to write something simple and slow with loop, but I need it to run super fast in big scale.
input:
lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
desired out put:
d = {1 : ["txt1", "txt2"], 2 : "txt3"]
There is something built-in at python which make dict() extend key instead replacing it?
dict(list(zip(lst[0], lst[1])))
One option is to use dict.setdefault:
out = {}
for k, v in zip(*lst):
out.setdefault(k, []).append(v)
Output:
{1: ['txt1', 'txt2'], 2: ['txt3']}
If you want the element itself for singleton lists, one way is adding a condition that checks for it while you build an output dictionary:
out = {}
for k,v in zip(*lst):
if k in out:
if isinstance(out[k], list):
out[k].append(v)
else:
out[k] = [out[k], v]
else:
out[k] = v
or if lst[0] is sorted (like it is in your sample), you could use itertools.groupby:
from itertools import groupby
out = {}
pos = 0
for k, v in groupby(lst[0]):
length = len([*v])
if length > 1:
out[k] = lst[1][pos:pos+length]
else:
out[k] = lst[1][pos]
pos += length
Output:
{1: ['txt1', 'txt2'], 2: 'txt3'}
But as #timgeb notes, it's probably not something you want because afterwards, you'll have to check for data type each time you access this dictionary (if value is a list or not), which is an unnecessary problem that you could avoid by having all values as lists.
If you're dealing with large datasets it may be useful to add a pandas solution.
>>> import pandas as pd
>>> lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
>>> s = pd.Series(lst[1], index=lst[0])
>>> s
1 txt1
1 txt2
2 txt3
>>> s.groupby(level=0).apply(list).to_dict()
{1: ['txt1', 'txt2'], 2: ['txt3']}
Note that this also produces lists for single elements (e.g. ['txt3']) which I highly recommend. Having both lists and strings as possible values will result in bugs because both of those types are iterable. You'd need to remember to check the type each time you process a dict-value.
You can use a defaultdict to group the strings by their corresponding key, then make a second pass through the list to extract the strings from singleton lists. Regardless of what you do, you'll need to access every element in both lists at least once, so some iteration structure is necessary (and even if you don't explicitly use iteration, whatever you use will almost definitely use iteration under the hood):
from collections import defaultdict
lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
result = defaultdict(list)
for key, value in zip(lst[0], lst[1]):
result[key].append(value)
for key in result:
if len(result[key]) == 1:
result[key] = result[key][0]
print(dict(result)) # Prints {1: ['txt1', 'txt2'], 2: 'txt3'}

Find all keys with same unknown value

I've looked all over the internet asking the question how can I find all the keys in a dictionary that have the same value. But this value is not known. The closest thing that came up was this, but the values are known.
Say I had a dictionary like this and these values are totally random, not hardcoded by me.
{'AGAA': 2, 'ATAA': 5,'AJAA':2}
How can I identify all the keys with the same value? What would be the most efficient way of doing this.
['AGAA','AJAA']
The way I would do it is "invert" the dictionary. By this I mean to group the keys for each common value. So if you start with:
{'AGAA': 2, 'ATAA': 5, 'AJAA': 2}
You would want to group it such that the keys are now values and values are now keys:
{2: ['AGAA', 'AJAA'], 5: ['ATAA']}
After grouping the values, you can use max to determine the largest grouping.
Example:
from collections import defaultdict
data = {'AGAA': 2, 'ATAA': 5, 'AJAA': 2}
grouped = defaultdict(list)
for key in data:
grouped[data[key]].append(key)
max_group = max(grouped.values(), key=len)
print(max_group)
Outputs:
['AGAA', 'AJAA']
You could also find the max key and print it that way:
max_key = max(grouped, key=lambda k: len(grouped[k]))
print(grouped[max_key])
You can try this:
from collections import Counter
d = {'AGAA': 2, 'ATAA': 5,'AJAA':2}
l = Counter(d.values())
l = [x for x,y in l.items() if y > 1]
out = [x for x,y in d.items() if y in l]
# Out[21]: ['AGAA', 'AJAA']

Access defaultdict(dict) based on list containing the keys

Ok so I have a project that I am working on and I cannot figure this out.
I apologize if this has been asked before, I've searched and found nothing.
This is my first post.
I have some pandas dataframes that I want to access based on a hash which I've setup with:
df = defaultdict(lambda: defaultdict(dict))
or
df = defaultdict(dict)
I did this so I could index like df['a']['1'][1] or df['a'][1] depending on the use case.
Note that the shape of the "matrix" will not necessarily be equal. So
df['a']['2'][1] may exist but not df['b']['2'][1].
TLDR
I'd like to access the df using a list like ['a', '2', 1] or ['a', 1]
What I've done:
The old way:
I used to create master lists that I would then iterate through and check. This works but I feel like it is very ugly. It is also different for the two use cases above. I am now trying to make a wrapper around the two use cases above. I would love for the wrapper to not be a big switch for the two use cases.
x_master_list = []
y_master_list = []
for x in df:
if x not in x_master_list:
x_master_list.append(channel)
for y in df[x]:
if y not in y_master_list:
y_master_list.append(idx)
for y in y_master_list:
for x in x_master_list:
if x in df:
if y in df[x]:
The newer way:
I found a link discussing using recursion to get all of the keys. It was nice because it preserved the order of the hierarchy.
def iter_leafs(d, keys=[]):
for key, val in d.items():
if isinstance(val, defaultdict) | isinstance(val, dict):
yield from iter_leafs(val, keys + [key])
else:
yield keys + [key]
I modified the creation of my master lists to:
def create_master_lists(type, df):
check_type(type)
lists = master_lists[type]
key_list = list(iter_leafs(df))
for key in key_list:
for idx,list in enumerate(lists):
if key[idx] not in list:
list.append(key[idx])
return lists
Now I want to do something like the following:
key_list = list(iter_leafs(df))
for y in y_master_list:
valid_idx_keys = [key for key in keylist if key[-1] == y]
Here key_list looks like [['a','1',0],['a','1',1], etc]
and valid_idx_keys is basically a filtered version.
I want to take each list from the valid_idx_keys and access df. I cannot figure out how to achieve this.
If I do the following it works, but again the point is to make a wrapper around the two use cases which do not have the same number of indexing arguments.
for x,y,z in valid_idx_keys:
df[x][y][z]
Maybe something with recursion that slowly steps one layer down for each element in the sublists? I am still trying things, but I wanted to post here in case someone has a way to achieve this or a better solution to my problem.
So I ended up with the following. It works but I am open to suggestions.
from collections import defaultdict
def search_dict(d, list):
key = list[0]
val = d.get(key)
if isinstance(val, defaultdict) | isinstance(val, dict):
yield from search_dict(val, list[1:])
else:
yield val
df = defaultdict(lambda: defaultdict(dict))
df['a']['1'][1] = 0
df['b']['1'][1] = 1
test_key_list = [['a', '1', 1], ['b','1',1]]
print(list(search_dict(df, test_key_list[0])))
print(list(search_dict(df, test_key_list[1])))
vals = []
for lis in test_key_list:
print(lis)
vals = vals + list(search_dict(df, lis))
print(vals)
df2 = defaultdict(dict)
df2['a'][1] = 0
df2['b'][1] = 1
test_key_list2 = [['a', 1], ['b',1]]
vals = []
for lis in test_key_list2:
print(lis)
vals = vals + list(search_dict(df2, lis))
print(vals)

Python Iterate Dictionary by Index

I want to iterate through a dictionary in python by index number.
Example :
dict = {'apple':'red','mango':'green','orange':'orange'}
I want to iterate through the dictionary from first to last, so that I can access the dictionary items by their indexes. For example, the 1st item will be apple, and the 2nd item will be mango and value will be green.
Something like this:
for i in range(0,len(dict)):
dict.i
You can iterate over keys and get values by keys:
for key in dict.iterkeys():
print key, dict[key]
You can iterate over keys and corresponding values:
for key, value in dict.iteritems():
print key, value
You can use enumerate if you want indexes (remember that dictionaries don't have an order):
>>> for index, key in enumerate(dict):
... print index, key
...
0 orange
1 mango
2 apple
>>>
There are some very good answers here. I'd like to add the following here as well:
some_dict = {
"foo": "bar",
"lorem": "ipsum"
}
for index, (key, value) in enumerate(some_dict.items()):
print(index, key, value)
results in
0 foo bar
1 lorem ipsum
Appears to work with Python 2.7 and 3.5
I wanted to know (idx, key, value) for a python OrderedDict today (mapping of SKUs to quantities in order of the way they should appear on a receipt). The answers here were all bummers.
In python 3, at least, this way works and and makes sense.
In [1]: from collections import OrderedDict
...: od = OrderedDict()
...: od['a']='spam'
...: od['b']='ham'
...: od['c']='eggs'
...:
...: for i,(k,v) in enumerate(od.items()):
...: print('%d,%s,%s'%(i,k,v))
...:
0,a,spam
1,b,ham
2,c,eggs
Some of the comments are right in saying that these answers do not correspond to the question.
One reason one might want to loop through a dictionary using "indexes" is for example to compute a distance matrix for a set of objects in a dictionary. To put it as an example (going a bit to the basics on the bullet below):
Assuming one have 1000 objects on a dictionary, the distance square
matrix consider all combinations from one object to any other and so
it would have dimensions of 1000x1000 elements. But if the distance
from object 1 to object 2 is the same as from object 2 to object 1,
one need to compute the distance only to less than half of the square
matrix, since the diagonal will have distance 0 and the values are
mirrored above and below the diagonal.
This is why most packages use a condensed distance matrix ( How does condensed distance matrix work? (pdist) )
But consider the case one is implementing the computation of a distance matrix, or any kind of permutation of the sort. In such case you need to skip the results from more than half of the cases. This means that a FOR loop that runs through all the dictionary is just hitting an IF and jumping to the next iteration without performing really any job most of the time. For large datasets this additional "IFs" and loops add up to a relevant amount on the processing time and could be avoided if, at each loop, one starts one "index" further on the dictionary.
Going than to the question, my conclusion right now is that the answer is NO. One has no way to directly access the dictionary values by any index except the key or an iterator.
I understand that most of the answers up to now applies different approaches to perform this task but really don't allow any index manipulation, that would be useful in a case such as exemplified.
The only alternative I see is to use a list or other variable as a sequential index to the dictionary. Here than goes an implementation to exemplify such case:
#!/usr/bin/python3
dishes = {'spam': 4.25, 'eggs': 1.50, 'sausage': 1.75, 'bacon': 2.00}
print("Dictionary: {}\n".format(dishes))
key_list = list(dishes.keys())
number_of_items = len(key_list)
condensed_matrix = [0]*int(round(((number_of_items**2)-number_of_items)/2,0))
c_m_index = 0
for first_index in range(0,number_of_items):
for second_index in range(first_index+1,number_of_items):
condensed_matrix[c_m_index] = dishes[key_list[first_index]] - dishes[key_list[second_index]]
print("{}. {}-{} = {}".format(c_m_index,key_list[first_index],key_list[second_index],condensed_matrix[c_m_index]))
c_m_index+=1
The output is:
Dictionary: {'spam': 4.25, 'eggs': 1.5, 'sausage': 1.75, 'bacon': 2.0}
0. spam-eggs = 2.75
1. spam-sausage = 2.5
2. spam-bacon = 2.25
3. eggs-sausage = -0.25
4. eggs-bacon = -0.5
5. sausage-bacon = -0.25
Its also worth mentioning that are packages such as intertools that allows one to perform similar tasks in a shorter format.
Do this:
for i in dict.keys():
dict[i]
Since you want to iterate in order, you can use sorted:
for k, v in sorted(dict.items()):
print k,v
There are several ways to call the for-loop in python and here what I found so far:
A = [1,2,3,4]
B = {"col1": [1,2,3],"col2":[4,5,6]}
# Forms of for loop in python:
# Forms with a list-form,
for item in A:
print(item)
print("-----------")
for item in B.keys():
print(item)
print("-----------")
for item in B.values():
print(item)
print("-----------")
for item in B.items():
print(item)
print("The value of keys is {} and the value of list of a key is {}".format(item[0],item[1]))
print("-----------")
Results are:
1
2
3
4
-----------
col1
col2
-----------
[1, 2, 3]
[4, 5, 6]
-----------
('col1', [1, 2, 3])
The value of keys is col1 and the value of list of a key is [1, 2, 3]
('col2', [4, 5, 6])
The value of keys is col2 and the value of list of a key is [4, 5, 6]
-----------
When I need to keep the order, I use a list and a companion dict:
color = ['red','green','orange']
fruit = {'apple':0,'mango':1,'orange':2}
color[fruit['apple']]
for i in range(0,len(fruit)): # or len(color)
color[i]
The inconvenience is I don't get easily the fruit from the index. When I need it, I use a tuple:
fruitcolor = [('apple','red'),('mango','green'),('orange','orange')]
index = {'apple':0,'mango':1,'orange':2}
fruitcolor[index['apple']][1]
for i in range(0,len(fruitcolor)):
fruitcolor[i][1]
for f, c in fruitcolor:
c
Your data structures should be designed to fit your algorithm needs, so that it remains clean, readable and elegant.
I can't think of any reason why you would want to do that. If you just need to iterate over the dictionary, you can just do.
for key, elem in testDict.items():
print key, elem
OR
for i in testDict:
print i, testDict[i]

Categories