Python removing 'duplicate' tuples from dictionary - python

I have this dictionary that has tuples as keys. All these tuples got two values, I'll use letters as values to keep things simple, e.g. the tuple (a, b).
The order of the two values doesn't matter, this means that (a, b) is essentially the same as (b, a) (or a sorta duplicate). So I tried to write something that would remove all of these redundant key, value pairs, only it doesn't work and I'm seriously stuck with it. I'm sure that it's just something simple I'm overlooking, regardless I can't figure it out.
I thought this would work:
def undupe_overlaps(overlaps):
dupes = []
for key, item in overlaps.items():
if (key[1], key[0]) in overlaps:
dupes.append((key[1], key[0]))
for item in dupes:
overlaps.pop(item)
return overlaps
overlaps is the dictionary and I use the list dupes cause you can't delete things from a dict and loop over it at the same time. Any help or tips would be appreciated :)

Your if statement is wrong. It should be:
if (key[1], key[0]) not in dupes:
dupes.append(key)
Basically, you are asking that the current key, with element inverted is not already present inside the dupes list.
Your piece of code does not work because you are looking at overlaps, and inserting the current key if the inverted one is present. That way, the "single" keys, i.e. those that do not have their counterpart, are never inserted in dupers.

You can convert the keys into a list separately from the dict and then modify the dict as you iterate through the keys.
def undupe_overlaps(overlaps):
dupes = set()
for key_tuple in list(overlaps.keys()):
if key_tuple in dupes or (key_tuple[1], key_tuple[0]) in dupes:
overlaps.pop(key_tuple)
dupes.add(key_tuple)
return overlaps

Related

Python pick a random value from hashmap that has a list as value?

so I have a defaultdict(list) hashmap, potential_terms
potential_terms={9: ['leather'], 10: ['type', 'polyester'], 13:['hello','bye']}
What I want to output is the 2 values (words) with the lowest keys, so 'leather' is definitely the first output, but 'type' and 'polyester' both have k=10, when the key is the same, I want a random choice either 'type' or 'polyester'
What I did is:
out=[v for k,v in sorted(potential_terms.items(), key=lambda x:(x[0],random.choice(x[1])))][:2]
but when I print out I get :
[['leather'], ['type', 'polyester']]
My guess is ofcourse the 2nd part of the lambda function: random.choice(x[1]). Any ideas on how to make it work as expected by outputting either 'type' or 'polyester' ?
Thanks
EDIT: See Karl's answer and comment as to why this solution isn't correct for OP's problem.
I leave it here because it does demonstrate what OP originally got wrong.
key= doesn't transform the data itself, it only tells sorted how to sort,
you want to apply choice on v when selecting it for the comprehension, like so:
out=[random.choice(v) for k,v in sorted(potential_terms.items())[:2]]
(I also moved the [:2] inside, to shorten the list before the comprehension)
Output:
['leather', 'type']
OR
['leather', 'polyester']
You have (with some extra formatting to highlight the structure):
out = [
v
for k, v in sorted(
potential_terms.items(),
key=lambda x:(x[0], random.choice(x[1]))
)
][:2]
This means (reading from the inside out): sort the items according to the key, breaking ties using a random choice from the value list. Extract the values (which are lists) from those sorted items into a list (of lists). Finally, get the first two items of that list of lists.
This doesn't match the problem description, and is also somewhat nonsensical: since the keys are, well, keys, there cannot be duplicates, and thus there cannot be ties to break.
What we wanted: sort the items according to the key, then put all the contents of those individual lists next to each other to make a flattened list of strings, but randomizing the order within each sublist (i.e., shuffling those sublists). Then, get the first two items of that list of strings.
Thus, applying the technique from the link, and shuffling the sublists "inline" as they are discovered by the comprehension:
out = [
term
for k, v in sorted(
potential_terms.items(),
key = lambda x:x[0] # this is not actually necessary now,
# since the natural sort order of the items will work.
)
for term in random.sample(v, len(v))
][:2]
Please also see https://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/ to understand how the list flattening and result ordering works in a two-level comprehension like this.
Instead of the out, a simpler function, is:
d = list(p.values()) which stores all the values.
It will store the values as:
[['leather'], ['polyester', 'type'], ['hello', 'bye']]
You can access, leather as d[0] and the list, ['polyester', 'type'], as d[1]. Now we'll just use random.shuffle(d[1]), and use d[1][0].
Which would get us a random word, type or polyester.
Final code should be like this:
import random
potential_terms={9: ['leather'], 10: ['type', 'polyester'], 13:['hello','bye']}
d = list(p.values())
random.shuffle(d[1])
c = []
c.append(d[0][0])
c.append(d[1][0])
Which gives the desired output,
either ['leather', 'polyester'] or ['leather', 'type'].

Dictionary unique values in comprehension

I have a little task which I solved.
Task: find all PAIRS in a sequence which sum up to a certain number.
For example (1,2,3,4) and target 3 yields one pair (1,2).
I came up with a solution:
def pair(lst, find):
res = []
for i in lst:
if (find - i) in lst:
res.append([(find - i),i])
return {x:y for x,y in res}
I'm a bit surprised to see the dictionary comprehension filter all duplicate solutions.
Which actually forms my question: how and why a dictionary comprehension removes duplicates?
Because dict hashes its keys then store them in a set-like data structure. As a result the newly created {key :value} overrides the older one and in your case the duplicates. I think this may be a duplicate question

Adding two asynchronous lists, into a dictionary

I've always found Dictionaries to be an odd thing in python. I know it is just me i'm sure but I cant work out how to take two lists and add them to the dict. If both lists were mapable it wouldn't be a problem something like dictionary = dict(zip(list1, list2)) would suffice. However, during each run the list1 will always have one item and the list2 could have multiple items or single item that I'd like as values.
How could I approach adding the key and potentially multiple values to it?
After some deliberation, Kasramvd's second option seems to work well for this scenario:
dictionary.setdefault(list1[0], []).append(list2)
Based on your comment all you need is assigning the second list as a value to only item of first list.
d = {}
d[list1[0]] = list2
And if you want to preserve the values for duplicate keys you can use dict.setdefault() in order to create value of list of list for duplicate keys.
d = {}
d.setdefault(list1[0], []).append(list2)

How to work around needing to update a dictionary

I need to delete a k/v pair from a dictionary in a loop. After getting RuntimeError: dictionary changed size during iteration I pickled the dictionary after deleting the k/v and in one of the outer loops I try to reopen the newly pickled/updated dictionary. However, as many of you will probably know-I get the same error-I think when it reaches the top of the loop. I do not use my dictionary in the outermost loop.
So my question is-does anyone know how to get around this problem? I want to delete a k/V pair from a dictionary and use that resized dictionary on the next iteration of the loop.
to focus the problem and use the solution from Cygil
list=[27,29,23,30,3,5,40]
testDict={}
for x in range(25):
tempDict={}
tempDict['xsquared']=x*x
tempDict['xinverse']=1.0/(x+1.0)
testDict[(x,x+1)]=tempDict
for item in list:
print 'the Dictionary now has',len(testDict.keys()), ' keys'
for key in testDict.keys():
if key[0]==item:
del testDict[key]
I am doing this because I have to have some research assistants compare some observations from two data sets that could not be matched because of name variants. The idea is to throw up a name from one data set (say set A) and then based on a key match find all the names attached to that key in the other dataset (set B). One a match has been identified I don't want to show the value from B again to speed things up for them. Because there are 6,000 observations I also don't want them to have to start at the beginning of A each time they get back to work. However, I can fix that by letting them chose to enter the last key from A they worked with. But I really need to reduce B once the match has been identified
Without code, I'm assuming you're writing something like:
for key in dict:
if check_condition(dict[key]):
del dict[key]
If so, you can write
for key in list(dict.keys()):
if key in dict and check_condition(dict[key]):
del dict[key]
list(dict.keys()) returns a copy of the keys, not a view, which makes it possible to delete from the dictionary (you are iterating through a copy of the keys, not the keys in the dictionary itself, in this case.)
Delete all keys whose value is > 15:
for k in mydict.keys(): # makes a list of the keys and iterate
# over the list, not over the dict.
if mydict[k] > 15:
del mydict[k]
Change:
for ansSeries in notmatched:
To:
for ansSeries in notmatched.copy():

Sorting a dict on __iter__

I am trying to sort a dict based on its key and return an iterator to the values from within an overridden iter method in a class. Is there a nicer and more efficient way of doing this than creating a new list, inserting into the list as I sort through the keys?
How about something like this:
def itersorted(d):
for key in sorted(d):
yield d[key]
By far the easiest approach, and almost certainly the fastest, is something along the lines of:
def sorted_dict(d):
keys = d.keys()
keys.sort()
for key in keys:
yield d[key]
You can't sort without fetching all keys. Fetching all keys into a list and then sorting that list is the most efficient way to do that; list sorting is very fast, and fetching the keys list like that is as fast as it can be. You can then either create a new list of values or yield the values as the example does. Keep in mind that you can't modify the dict if you are iterating over it (the next iteration would fail) so if you want to modify the dict before you're done with the result of sorted_dict(), make it return a list.
def sortedDict(dictobj):
return (value for key, value in sorted(dictobj.iteritems()))
This will create a single intermediate list, the 'sorted()' method returns a real list. But at least it's only one.
Assuming you want a default sort order, you can used sorted(list) or list.sort(). If you want your own sort logic, Python lists support the ability to sort based on a function you pass in. For example, the following would be a way to sort numbers from least to greatest (the default behavior) using a function.
def compareTwo(a, b):
if a > b:
return 1
if a == b:
return 0
if a < b:
return -1
List.Sort(compareTwo)
print a
This approach is conceptually a bit cleaner than manually creating a new list and appending the new values and allows you to control the sort logic.

Categories