Is there something simple like a set for un-hashable objects?

Is there something simple like a set for un-hashable objects? - python

For hashable objects inside a dict I could easily pair down duplicate values store in a dict using a set. For example:
a = {'test': 1, 'key': 1, 'other': 2}
b = set(a.values())
print(b)
Would display [1,2]
Problem I have is I am using a dict to store mapping between variable keys in __dict__ and the corresponding processing functions that will be passed to an engine to order and process those functions, some of these functions may be fast some may be slower due to accessing an API. The problem is each function may use multiple variable, therefor need multiple mappings in the dict. I'm wondering if there is a way to do this or if I am stuck writing my own solution?
Ended up building a callable class, since caching could speed things up for me:
from collections.abc import Callable
class RemoveDuplicates(Callable):
input_cache = []
output_cache = []
def __call__(self, in_list):
if list in self.input_cache:
idx = self.input_cache.index(in_list)
return self.output_cache[idx]
else:
self.input_cache.append(in_list)
out_list = self._remove_duplicates(in_list)
self.output_cache.append(out_list)
return out_list
def _remove_duplicates(self, src_list):
result = []
for item in src_list:
if item not in result:
result.append(item)
return result

If the objects can be ordered, you can use itertools.groupby to eliminate the duplicates:
>>> a = {'test': 1, 'key': 1, 'other': 2}
>>> b = [k for k, it in itertools.groupby(sorted(a.values()))]
>>> print(b)
[1, 2]

Is there something simple like a set for un-hashable objects
Not in the standard library but you need to look beyond and search for BTree implementation of dictionary. I googled and found few hits where the first one (BTree)seems promising and interesting
Quoting from the wiki
The BTree-based data structures differ from Python dicts in several
fundamental ways. One of the most important is that while dicts
require that keys support hash codes and equality comparison, the
BTree-based structures don’t use hash codes and require a total
ordering on keys.
Off-course its trivial fact that a set can be implemented as a dictionary where the value is unused.

You could (indirectly) use the bisect module to create sorted collection of your values which would greatly speed-up the insertion of new values and value membership testing in general — which together can be utilized to unsure that only unique values get put into it.
In the code below, I've used un-hashable set values for the sake of illustration.
# see http://code.activestate.com/recipes/577197-sortedcollection
from sortedcollection import SortedCollection
a = {'test': {1}, 'key': {1}, 'other': {2}}
sc = SortedCollection()
for value in a.values():
if value not in sc:
sc.insert(value)
print(list(sc)) # --> [{1}, {2}]

Related

How to reset value of multiple dictionaries elegantly in python

I am working on a code which pulls data from database and based on the different type of tables , store the data in dictionary for further usage.
This code handles around 20-30 different table so there are 20-30 dictionaries and few lists which I have defined as class variables for further usage in code.
for example.
class ImplVars(object):
#dictionary capturing data from Asset-Feed table
general_feed_dict = {}
ports_feed_dict = {}
vulns_feed_dict = {}
app_list = []
...
I want to clear these dictionaries before I add data in it.
Easiest or common way is to use clear() function but this code is repeatable as I will have to write for each dict.
Another option I am exploring is with using dir() function but its returning variable names as string.
Is there any elegant method which will allow me to fetch all these class variables and clear them ?

You can use introspection as you suggest:
for d in filter(dict.__instancecheck__, ImplVars.__dict__.values()):
d.clear()
Or less cryptic, covering lists and dicts:
for obj in ImplVars.__dict__.values():
if isinstance(obj, (list, dict)):
obj.clear()
But I would recommend you choose a bit of a different data structure so you can be more explicit:
class ImplVars(object):
data_dicts = {
"general_feed_dict": {},
"ports_feed_dict": {},
"vulns_feed_dict": {},
}
Now you can explicitly loop over ImplVars.data_dicts.values and still have other class variables that you may not want to clear.

code:
a_dict = {1:2}
b_dict = {2:4}
c_list = [3,6]
vars_copy = vars().copy()
for variable, value in vars_copy.items():
if variable.endswith("_dict"):
vars()[variable] = {}
elif variable.endswith("_list"):
vars()[variable] = []
print(a_dict)
print(b_dict)
print(c_list)
result:
{}
{}
[]

Maybe one of the easier kinds of implementation would be to create a list of dictionaries and lists you want to clear and later make the loop clear them all.
d = [general_feed_dict, ports_feed_dict, vulns_feed_dict, app_list]
for element in d:
element.clear()
You could also use list comprehension for that.

How add() on set can work in such dictionary? [duplicate]

The addition of collections.defaultdict in Python 2.5 greatly reduced the need for dict's setdefault method. This question is for our collective education:
What is setdefault still useful for, today in Python 2.6/2.7?
What popular use cases of setdefault were superseded with collections.defaultdict?

You could say defaultdict is useful for settings defaults before filling the dict and setdefault is useful for setting defaults while or after filling the dict.
Probably the most common use case: Grouping items (in unsorted data, else use itertools.groupby)
# really verbose
new = {}
for (key, value) in data:
if key in new:
new[key].append( value )
else:
new[key] = [value]
# easy with setdefault
new = {}
for (key, value) in data:
group = new.setdefault(key, []) # key might exist already
group.append( value )
# even simpler with defaultdict
from collections import defaultdict
new = defaultdict(list)
for (key, value) in data:
new[key].append( value ) # all keys have a default already
Sometimes you want to make sure that specific keys exist after creating a dict. defaultdict doesn't work in this case, because it only creates keys on explicit access. Think you use something HTTP-ish with many headers -- some are optional, but you want defaults for them:
headers = parse_headers( msg ) # parse the message, get a dict
# now add all the optional headers
for headername, defaultvalue in optional_headers:
headers.setdefault( headername, defaultvalue )

I commonly use setdefault for keyword argument dicts, such as in this function:
def notify(self, level, *pargs, **kwargs):
kwargs.setdefault("persist", level >= DANGER)
self.__defcon.set(level, **kwargs)
try:
kwargs.setdefault("name", self.client.player_entity().name)
except pytibia.PlayerEntityNotFound:
pass
return _notify(level, *pargs, **kwargs)
It's great for tweaking arguments in wrappers around functions that take keyword arguments.

defaultdict is great when the default value is static, like a new list, but not so much if it's dynamic.
For example, I need a dictionary to map strings to unique ints. defaultdict(int) will always use 0 for the default value. Likewise, defaultdict(intGen()) always produces 1.
Instead, I used a regular dict:
nextID = intGen()
myDict = {}
for lots of complicated stuff:
#stuff that generates unpredictable, possibly already seen str
strID = myDict.setdefault(myStr, nextID())
Note that dict.get(key, nextID()) is insufficient because I need to be able to refer to these values later as well.
intGen is a tiny class I build that automatically increments an int and returns its value:
class intGen:
def __init__(self):
self.i = 0
def __call__(self):
self.i += 1
return self.i
If someone has a way to do this with defaultdict I'd love to see it.

As most answers state setdefault or defaultdict would let you set a default value when a key doesn't exist. However, I would like to point out a small caveat with regard to the use cases of setdefault. When the Python interpreter executes setdefaultit will always evaluate the second argument to the function even if the key exists in the dictionary. For example:
In: d = {1:5, 2:6}
In: d
Out: {1: 5, 2: 6}
In: d.setdefault(2, 0)
Out: 6
In: d.setdefault(2, print('test'))
test
Out: 6
As you can see, print was also executed even though 2 already existed in the dictionary. This becomes particularly important if you are planning to use setdefault for example for an optimization like memoization. If you add a recursive function call as the second argument to setdefault, you wouldn't get any performance out of it as Python would always be calling the function recursively.
Since memoization was mentioned, a better alternative is to use functools.lru_cache decorator if you consider enhancing a function with memoization. lru_cache handles the caching requirements for a recursive function better.

I use setdefault() when I want a default value in an OrderedDict. There isn't a standard Python collection that does both, but there are ways to implement such a collection.

As Muhammad said, there are situations in which you only sometimes wish to set a default value. A great example of this is a data structure which is first populated, then queried.
Consider a trie. When adding a word, if a subnode is needed but not present, it must be created to extend the trie. When querying for the presence of a word, a missing subnode indicates that the word is not present and it should not be created.
A defaultdict cannot do this. Instead, a regular dict with the get and setdefault methods must be used.

Theoretically speaking, setdefault would still be handy if you sometimes want to set a default and sometimes not. In real life, I haven't come across such a use case.
However, an interesting use case comes up from the standard library (Python 2.6, _threadinglocal.py):
>>> mydata = local()
>>> mydata.__dict__
{'number': 42}
>>> mydata.__dict__.setdefault('widgets', [])
[]
>>> mydata.widgets
[]
I would say that using __dict__.setdefault is a pretty useful case.
Edit: As it happens, this is the only example in the standard library and it is in a comment. So may be it is not enough of a case to justify the existence of setdefault. Still, here is an explanation:
Objects store their attributes in the __dict__ attribute. As it happens, the __dict__ attribute is writeable at any time after the object creation. It is also a dictionary not a defaultdict. It is not sensible for objects in the general case to have __dict__ as a defaultdict because that would make each object having all legal identifiers as attributes. So I can't foresee any change to Python objects getting rid of __dict__.setdefault, apart from deleting it altogether if it was deemed not useful.

I rewrote the accepted answer and facile it for the newbies.
#break it down and understand it intuitively.
new = {}
for (key, value) in data:
if key not in new:
new[key] = [] # this is core of setdefault equals to new.setdefault(key, [])
new[key].append(value)
else:
new[key].append(value)
# easy with setdefault
new = {}
for (key, value) in data:
group = new.setdefault(key, []) # it is new[key] = []
group.append(value)
# even simpler with defaultdict
new = defaultdict(list)
for (key, value) in data:
new[key].append(value) # all keys have a default value of empty list []
Additionally,I categorized the methods as reference:
dict_methods_11 = {
'views':['keys', 'values', 'items'],
'add':['update','setdefault'],
'remove':['pop', 'popitem','clear'],
'retrieve':['get',],
'copy':['copy','fromkeys'],}

One drawback of defaultdict over dict (dict.setdefault) is that a defaultdict object creates a new item EVERYTIME non existing key is given (eg with ==, print). Also the defaultdict class is generally way less common then the dict class, its more difficult to serialize it IME.
P.S. IMO functions|methods not meant to mutate an object, should not mutate an object.

Here are some examples of setdefault to show its usefulness:
"""
d = {}
# To add a key->value pair, do the following:
d.setdefault(key, []).append(value)
# To retrieve a list of the values for a key
list_of_values = d[key]
# To remove a key->value pair is still easy, if
# you don't mind leaving empty lists behind when
# the last value for a given key is removed:
d[key].remove(value)
# Despite the empty lists, it's still possible to
# test for the existance of values easily:
if d.has_key(key) and d[key]:
pass # d has some values for key
# Note: Each value can exist multiple times!
"""
e = {}
print e
e.setdefault('Cars', []).append('Toyota')
print e
e.setdefault('Motorcycles', []).append('Yamaha')
print e
e.setdefault('Airplanes', []).append('Boeing')
print e
e.setdefault('Cars', []).append('Honda')
print e
e.setdefault('Cars', []).append('BMW')
print e
e.setdefault('Cars', []).append('Toyota')
print e
# NOTE: now e['Cars'] == ['Toyota', 'Honda', 'BMW', 'Toyota']
e['Cars'].remove('Toyota')
print e
# NOTE: it's still true that ('Toyota' in e['Cars'])

I use setdefault frequently when, get this, setting a default (!!!) in a dictionary; somewhat commonly the os.environ dictionary:
# Set the venv dir if it isn't already overridden:
os.environ.setdefault('VENV_DIR', '/my/default/path')
Less succinctly, this looks like this:
# Set the venv dir if it isn't already overridden:
if 'VENV_DIR' not in os.environ:
os.environ['VENV_DIR'] = '/my/default/path')
It's worth noting that you can also use the resulting variable:
venv_dir = os.environ.setdefault('VENV_DIR', '/my/default/path')
But that's less necessary than it was before defaultdicts existed.

Another use case that I don't think was mentioned above.
Sometimes you keep a cache dict of objects by their id where primary instance is in the cache and you want to set cache when missing.
return self.objects_by_id.setdefault(obj.id, obj)
That's useful when you always want to keep a single instance per distinct id no matter how you obtain an obj each time. For example when object attributes get updated in memory and saving to storage is deferred.

One very important use-case I just stumbled across: dict.setdefault() is great for multi-threaded code when you only want a single canonical object (as opposed to multiple objects that happen to be equal).
For example, the (Int)Flag Enum in Python 3.6.0 has a bug: if multiple threads are competing for a composite (Int)Flag member, there may end up being more than one:
from enum import IntFlag, auto
import threading
class TestFlag(IntFlag):
one = auto()
two = auto()
three = auto()
four = auto()
five = auto()
six = auto()
seven = auto()
eight = auto()
def __eq__(self, other):
return self is other
def __hash__(self):
return hash(self.value)
seen = set()
class cycle_enum(threading.Thread):
def run(self):
for i in range(256):
seen.add(TestFlag(i))
threads = []
for i in range(8):
threads.append(cycle_enum())
for t in threads:
t.start()
for t in threads:
t.join()
len(seen)
# 272 (should be 256)
The solution is to use setdefault() as the last step of saving the computed composite member -- if another has already been saved then it is used instead of the new one, guaranteeing unique Enum members.

In addition to what have been suggested, setdefault might be useful in situations where you don't want to modify a value that has been already set. For example, when you have duplicate numbers and you want to treat them as one group. In this case, if you encounter a repeated duplicate key which has been already set, you won't update the value of that key. You will keep the first encountered value. As if you are iterating/updating the repeated keys once only.
Here's a code example of recording the index for the keys/elements of a sorted list:
nums = [2,2,2,2,2]
d = {}
for idx, num in enumerate(sorted(nums)):
# This will be updated with the value/index of the of the last repeated key
# d[num] = idx # Result (sorted_indices): [4, 4, 4, 4, 4]
# In the case of setdefault, all encountered repeated keys won't update the key.
# However, only the first encountered key's index will be set
d.setdefault(num,idx) # Result (sorted_indices): [0, 0, 0, 0, 0]
sorted_indices = [d[i] for i in nums]

[Edit] Very wrong! The setdefault would always trigger long_computation, Python being eager.
Expanding on Tuttle's answer. For me the best use case is cache mechanism. Instead of:
if x not in memo:
memo[x]=long_computation(x)
return memo[x]
which consumes 3 lines and 2 or 3 lookups, I would happily write :
return memo.setdefault(x, long_computation(x))

I like the answer given here:
http://stupidpythonideas.blogspot.com/2013/08/defaultdict-vs-setdefault.html
In short, the decision (in non-performance-critical apps) should be made on the basis of how you want to handle lookup of empty keys downstream (viz. KeyError versus default value).

The different use case for setdefault() is when you don't want to overwrite the value of an already set key. defaultdict overwrites, while setdefault() does not. For nested dictionaries it is more often the case that you want to set a default only if the key is not set yet, because you don't want to remove the present sub dictionary. This is when you use setdefault().
Example with defaultdict:
>>> from collection import defaultdict()
>>> foo = defaultdict()
>>> foo['a'] = 4
>>> foo['a'] = 2
>>> print(foo)
defaultdict(None, {'a': 2})
setdefault doesn't overwrite:
>>> bar = dict()
>>> bar.setdefault('a', 4)
>>> bar.setdefault('a', 2)
>>> print(bar)
{'a': 4}

Another usecase for setdefault in CPython is that it is atomic in all cases, whereas defaultdict will not be atomic if you use a default value created from a lambda.
cache = {}
def get_user_roles(user_id):
if user_id in cache:
return cache[user_id]['roles']
cache.setdefault(user_id, {'lock': threading.Lock()})
with cache[user_id]['lock']:
roles = query_roles_from_database(user_id)
cache[user_id]['roles'] = roles
If two threads execute cache.setdefault at the same time, only one of them will be able to create the default value.
If instead you used a defaultdict:
cache = defaultdict(lambda: {'lock': threading.Lock()}
This would result in a race condition. In my example above, the first thread could create a default lock, and the second thread could create another default lock, and then each thread could lock its own default lock, instead of the desired outcome of each thread attempting to lock a single lock.
Conceptually, setdefault basically behaves like this (defaultdict also behaves like this if you use an empty list, empty dict, int, or other default value that is not user python code like a lambda):
gil = threading.Lock()
def setdefault(dict, key, value_func):
with gil:
if key not in dict:
return
value = value_func()
dict[key] = value
Conceptually, defaultdict basically behaves like this (only when using python code like a lambda - this is not true if you use an empty list):
gil = threading.Lock()
def __setitem__(dict, key, value_func):
with gil:
if key not in dict:
return
value = value_func()
with gil:
dict[key] = value

Convert a list of dictionaries into a set of dictionaries

How can i make a set of dictionaries from one list of dictionaries?
Example:
import copy
v1 = {'k01': 'v01', 'k02': {'k03': 'v03', 'k04': {'k05': 'v05'}}}
v2 = {'k11': 'v11', 'k12': {'k13': 'v13', 'k14': {'k15': 'v15'}}}
data = []
N = 5
for i in range(N):
data.append(copy.deepcopy(v1))
data.append(copy.deepcopy(v2))
print data
How would you create a set of dictionaries from the list data?
NS: One dictionary is equal to another when they are structurally the same. That means, they got exactly the same keys and same values (recursively)

A cheap workaround would be to serialize your dicts, for example:
import json
dset = set()
d1 = {'a':1, 'b':{'c':2}}
d2 = {'b':{'c':2}, 'a':1} # the same according to your definition
d3 = {'x': 42}
dset.add(json.dumps(d1, sort_keys=True))
dset.add(json.dumps(d2, sort_keys=True))
dset.add(json.dumps(d3, sort_keys=True))
for p in dset:
print json.loads(p)
In the long run it would make sense to wrap the whole thing in a class like SetOfDicts.

Dictionaries are mutable and therefore not hashable in python.
You could either create a dict-subclass with a __hash__ method. Make sure that the hash of a dictionary does not change while it is in the set (that probably means that you cannot allow modifying the members).
See http://code.activestate.com/recipes/414283-frozen-dictionaries/ for an example implementation of frozendicts.
If you can define a sort order on your (frozen) dictionaries, you could alternatively use a data structure based on a binary tree instead of a set. This boils down to the bisect solution provided in the link below.
See also https://stackoverflow.com/a/18824158/5069869 for an explanation why sets without hash do not make sense.

not exactly what you're looking for as this accounts for lists too but:
def hashable_structure(structure):
if isinstance(structure, dict):
return {k: hashable_structure(v) for k, v in structure.items()}
elif isinstance(structure, list):
return {hashable_structure(elem) for elem in structure)}
else:
return structure

Shortest way to get first item of `OrderedDict` in Python 3

What's the shortest way to get first item of OrderedDict in Python 3?
My best:
list(ordered_dict.items())[0]
Quite long and ugly.
I can think of:
next(iter(ordered_dict.items())) # Fixed, thanks Ashwini
But it's not very self-describing.
Any better suggestions?

Programming Practices for Readabililty
In general, if you feel like code is not self-describing, the usual solution is to factor it out into a well-named function:
def first(s):
'''Return the first element from an ordered collection
or an arbitrary element from an unordered collection.
Raise StopIteration if the collection is empty.
'''
return next(iter(s))
With that helper function, the subsequent code becomes very readable:
>>> extension = {'xml', 'html', 'css', 'php', 'xhmtl'}
>>> one_extension = first(extension)
Patterns for Extracting a Single Value from Collection
The usual ways to get an element from a set, dict, OrderedDict, generator, or other non-indexable collection are:
for value in some_collection:
break
and:
value = next(iter(some_collection))
The latter is nice because the next() function lets you specify a default value if collection is empty or you can choose to let it raise an exception. The next() function is also explicit that it is asking for the next item.
Alternative Approach
If you actually need indexing and slicing and other sequence behaviors (such as indexing multiple elements), it is a simple matter to convert to a list with list(some_collection) or to use [itertools.islice()][2]:
s = list(some_collection)
print(s[0], s[1])
s = list(islice(n, some_collection))
print(s)

Use popitem(last=False), but keep in mind that it removes the entry from the dictionary, i.e. is destructive.
from collections import OrderedDict
o = OrderedDict()
o['first'] = 123
o['second'] = 234
o['third'] = 345
first_item = o.popitem(last=False)
>>> ('first', 123)
For more details, have a look at the manual on collections. It also works with Python 2.x.

Subclassing and adding a method to OrderedDict would be the answer to clarity issues:
>>> o = ExtOrderedDict(('a',1), ('b', 2))
>>> o.first_item()
('a', 1)
The implementation of ExtOrderedDict:
class ExtOrderedDict(OrderedDict):
def first_item(self):
return next(iter(self.items()))

Code that's readable, leaves the OrderedDict unchanged and doesn't needlessly generate a potentially large list just to get the first item:
for item in ordered_dict.items():
return item
If ordered_dict is empty, None would be returned implicitly.
An alternate version for use inside a stretch of code:
for first in ordered_dict.items():
break # Leave the name 'first' bound to the first item
else:
raise IndexError("Empty ordered dict")
The Python 2.x code corresponding to the first example above would need to use iteritems() instead:
for item in ordered_dict.iteritems():
return item

You might want to consider using SortedDict instead of OrderedDict.
It provides SortedDict.peekitem to peek an item.
Runtime complexity: O(log(n))
>>> sd = SortedDict({'a': 1, 'b': 2, 'c': 3})
>>> sd.peekitem(0)
('a', 1)

If you need a one-liner:
ordered_dict[[*ordered_dict.keys()][0]]
It creates a list of dict keys, picks the first and use it as key to access the dictionary value.

First record:
[key for key, value in ordered_dict][0]
Last record:
[key for key, value in ordered_dict][-1]

Python dictionary : TypeError: unhashable type: 'list'

I'm having troubles in populating a python dictionary starting from another dictionary.
Let's assume that the "source" dictionary has string as keys and has a list of custom objects per value.
I'm creating my target dictionary exactly as I have been creating my "source" dictionary how is it possible this is not working ?
I get
TypeError: unhashable type: 'list'
Code :
aTargetDictionary = {}
for aKey in aSourceDictionary:
aTargetDictionary[aKey] = []
aTargetDictionary[aKey].extend(aSourceDictionary[aKey])
The error is on this line : aTargetDictionary[aKey] = []

The error you gave is due to the fact that in python, dictionary keys must be immutable types (if key can change, there will be problems), and list is a mutable type.
Your error says that you try to use a list as dictionary key, you'll have to change your list into tuples if you want to put them as keys in your dictionary.
According to the python doc :
The only types of values not acceptable as keys are values containing
lists or dictionaries or other mutable types that are compared by
value rather than by object identity, the reason being that the
efficient implementation of dictionaries requires a key’s hash value
to remain constant

This is indeed rather odd.
If aSourceDictionary were a dictionary, I don't believe it is possible for your code to fail in the manner you describe.
This leads to two hypotheses:
The code you're actually running is not identical to the code in your question (perhaps an earlier or later version?)
aSourceDictionary is in fact not a dictionary, but is some other structure (for example, a list).

As per your description, things don't add up. If aSourceDictionary is a dictionary, then your for loop has to work properly.
>>> source = {'a': [1, 2], 'b': [2, 3]}
>>> target = {}
>>> for key in source:
... target[key] = []
... target[key].extend(source[key])
...
>>> target
{'a': [1, 2], 'b': [2, 3]}
>>>

It works fine : http://codepad.org/5KgO0b1G,
your aSourceDictionary variable may have other datatype than dict
aSourceDictionary = { 'abc' : [1,2,3] , 'ccd' : [4,5] }
aTargetDictionary = {}
for aKey in aSourceDictionary:
aTargetDictionary[aKey] = []
aTargetDictionary[aKey].extend(aSourceDictionary[aKey])
print aTargetDictionary

You can also use defaultdict to address this situation. It goes something like this:
from collections import defaultdict
#initialises the dictionary with values as list
aTargetDictionary = defaultdict(list)
for aKey in aSourceDictionary:
aTargetDictionary[aKey].append(aSourceDictionary[aKey])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is there something simple like a set for un-hashable objects? - python

If the objects can be ordered, you can use itertools.groupby to eliminate the duplicates: >>> a = {'test': 1, 'key': 1, 'other': 2} >>> b = [k for k, it in itertools.groupby(sorted(a.values()))] >>> print(b) [1, 2]

Related

How to reset value of multiple dictionaries elegantly in python

How add() on set can work in such dictionary? [duplicate]

Convert a list of dictionaries into a set of dictionaries

Shortest way to get first item of `OrderedDict` in Python 3

Python dictionary : TypeError: unhashable type: 'list'

Categories

Resources