Related
This question is similar to the one asked here:
However in my case I have a large dictionary filled with many keys and values, and I will not be able to enumerate each key to implement this solution:
from operator import itemgetter
params = {'a': 1, 'b': 2}
a, b = itemgetter('a', 'b')(params)
In my case:
from operator import itemgetter
params = {'a': 1, 'b': 2 ........... 'y': 25, 'z': 26 }
a, b ..... y, z = itemgetter('a', 'b'.... 'y', 'z')(params)
Is there any way I can unpack and assign each key to it's value, and associate values with variables names after its keys on a large scale?
Please Advise.
# Function
a = 1
b = 2
def foo(z):
return a + b + z
foo(3)
# should return 6
Either state the variables as input argument and unpack the dictionary in the call
def foo(z, a, b):
return a + b + z
param = {'a': 1, 'b': 2}
foo(3, **param)
>> 6
then you could also provide default values (for example 0) if any input should be missing.
Another possibility is to provide provide the entire dictionary as input and let the function handle the unpacking.
def bar(z, p):
return p['a'] + p['b'] + z
bar(3, param)
>> 6
I have the following situation:
>>> a = 1
>>> d = {"a": a}
>>> d
{'a': 1}
>>> d["a"] = 2
>>> d
{'a': 2}
>>> a
1
Of course this is the desired behaviour. However, when I assign 2 to the key "a" of the dictionary d, I would like to know if I can access the variable a instead of its value to modify the value of the variable a directly, accessing it through the dictionary. I.e. my expected last output would be
>>> a
2
Is there any way of doing this?
I suppose you know the idea of mutable and immutable objects. Immutable objects (str, int, float, etc) are passed by value. That's why your desired behaviour can't work. When you do:
a = 1
d["a"] = a
you just pass the value of a variable to your dict key 'a'. d['a'] knows nothing about variable a. They just both point to same primitive value in memory.
I don't know your case and why you need this behaviour, but you can try to use mutable object.
For example:
class A:
def __init__(self, a: int):
self._value = a
#property
def value(self) -> int:
return self._value
#value.setter
def value(self, a: int):
# you can put some additional logic for setting new value
self._value = a
def __int__(self) -> int:
return self._value
And then you can use it in this way:
>>> a = A(1)
>>> d = {"a": a}
>>> d['a'].value
1
>>> d["a"].value = 2
>>> d['a'].value
2
>>> a.value
2
>>> int(a)
2
But it still seems like an overhead and you should rethink whether you really need this behaviour.
When you do
>>> a
, you are calling the value of the variable, a that you set on the first line. You have not changed the value of that variable, hence the output of 1. If you did
>>> d["a"]
, your output would be
>>> 2
. If you want this value to be the variable a's value too, set the value of a to the value of d["a"].
Example-
>>> a = 1
>>> d = {"a": a}
>>> d
{'a': 1}
>>> d["a"] = 2
>>> d
{'a': 2}
>>> a = d["a"]
>>> a
2
This question already has answers here:
How to implement an efficient bidirectional hash table?
(8 answers)
Closed 2 years ago.
I'm doing this switchboard thing in python where I need to keep track of who's talking to whom, so if Alice --> Bob, then that implies that Bob --> Alice.
Yes, I could populate two hash maps, but I'm wondering if anyone has an idea to do it with one.
Or suggest another data structure.
There are no multiple conversations. Let's say this is for a customer service call center, so when Alice dials into the switchboard, she's only going to talk to Bob. His replies also go only to her.
You can create your own dictionary type by subclassing dict and adding the logic that you want. Here's a basic example:
class TwoWayDict(dict):
def __setitem__(self, key, value):
# Remove any previous connections with these values
if key in self:
del self[key]
if value in self:
del self[value]
dict.__setitem__(self, key, value)
dict.__setitem__(self, value, key)
def __delitem__(self, key):
dict.__delitem__(self, self[key])
dict.__delitem__(self, key)
def __len__(self):
"""Returns the number of connections"""
return dict.__len__(self) // 2
And it works like so:
>>> d = TwoWayDict()
>>> d['foo'] = 'bar'
>>> d['foo']
'bar'
>>> d['bar']
'foo'
>>> len(d)
1
>>> del d['foo']
>>> d['bar']
Traceback (most recent call last):
File "<stdin>", line 7, in <module>
KeyError: 'bar'
I'm sure I didn't cover all the cases, but that should get you started.
In your special case you can store both in one dictionary:
relation = {}
relation['Alice'] = 'Bob'
relation['Bob'] = 'Alice'
Since what you are describing is a symmetric relationship. A -> B => B -> A
I know it's an older question, but I wanted to mention another great solution to this problem, namely the python package bidict. It's extremely straight forward to use:
from bidict import bidict
map = bidict(Bob = "Alice")
print(map["Bob"])
print(map.inv["Alice"])
I would just populate a second hash, with
reverse_map = dict((reversed(item) for item in forward_map.items()))
Two hash maps is actually probably the fastest-performing solution assuming you can spare the memory. I would wrap those in a single class - the burden on the programmer is in ensuring that two the hash maps sync up correctly.
A less verbose way, still using reversed:
dict(map(reversed, my_dict.items()))
You have two separate issues.
You have a "Conversation" object. It refers to two Persons. Since a Person can have multiple conversations, you have a many-to-many relationship.
You have a Map from Person to a list of Conversations. A Conversion will have a pair of Persons.
Do something like this
from collections import defaultdict
switchboard= defaultdict( list )
x = Conversation( "Alice", "Bob" )
y = Conversation( "Alice", "Charlie" )
for c in ( x, y ):
switchboard[c.p1].append( c )
switchboard[c.p2].append( c )
No, there is really no way to do this without creating two dictionaries. How would it be possible to implement this with just one dictionary while continuing to offer comparable performance?
You are better off creating a custom type that encapsulates two dictionaries and exposes the functionality you want.
You may be able to use a DoubleDict as shown in recipe 578224 on the Python Cookbook.
Another possible solution is to implement a subclass of dict, that holds the original dictionary and keeps track of a reversed version of it. Keeping two seperate dicts can be useful if keys and values are overlapping.
class TwoWayDict(dict):
def __init__(self, my_dict):
dict.__init__(self, my_dict)
self.rev_dict = {v : k for k,v in my_dict.iteritems()}
def __setitem__(self, key, value):
dict.__setitem__(self, key, value)
self.rev_dict.__setitem__(value, key)
def pop(self, key):
self.rev_dict.pop(self[key])
dict.pop(self, key)
# The above is just an idea other methods
# should also be overridden.
Example:
>>> d = {'a' : 1, 'b' : 2} # suppose we need to use d and its reversed version
>>> twd = TwoWayDict(d) # create a two-way dict
>>> twd
{'a': 1, 'b': 2}
>>> twd.rev_dict
{1: 'a', 2: 'b'}
>>> twd['a']
1
>>> twd.rev_dict[2]
'b'
>>> twd['c'] = 3 # we add to twd and reversed version also changes
>>> twd
{'a': 1, 'c': 3, 'b': 2}
>>> twd.rev_dict
{1: 'a', 2: 'b', 3: 'c'}
>>> twd.pop('a') # we pop elements from twd and reversed version changes
>>> twd
{'c': 3, 'b': 2}
>>> twd.rev_dict
{2: 'b', 3: 'c'}
There's the collections-extended library on pypi: https://pypi.python.org/pypi/collections-extended/0.6.0
Using the bijection class is as easy as:
RESPONSE_TYPES = bijection({
0x03 : 'module_info',
0x09 : 'network_status_response',
0x10 : 'trust_center_device_update'
})
>>> RESPONSE_TYPES[0x03]
'module_info'
>>> RESPONSE_TYPES.inverse['network_status_response']
0x09
I like the suggestion of bidict in one of the comments.
pip install bidict
Useage:
# This normalization method should save hugely as aDaD ~ yXyX have the same form of smallest grammar.
# To get back to your grammar's alphabet use trans
def normalize_string(s, nv=None):
if nv is None:
nv = ord('a')
trans = bidict()
r = ''
for c in s:
if c not in trans.inverse:
a = chr(nv)
nv += 1
trans[a] = c
else:
a = trans.inverse[c]
r += a
return r, trans
def translate_string(s, trans):
res = ''
for c in s:
res += trans[c]
return res
if __name__ == "__main__":
s = "bnhnbiodfjos"
n, tr = normalize_string(s)
print(n)
print(tr)
print(translate_string(n, tr))
Since there aren't much docs about it. But I've got all the features I need from it working correctly.
Prints:
abcbadefghei
bidict({'a': 'b', 'b': 'n', 'c': 'h', 'd': 'i', 'e': 'o', 'f': 'd', 'g': 'f', 'h': 'j', 'i': 's'})
bnhnbiodfjos
The kjbuckets C extension module provides a "graph" data structure which I believe gives you what you want.
Here's one more two-way dictionary implementation by extending pythons dict class in case you didn't like any of those other ones:
class DoubleD(dict):
""" Access and delete dictionary elements by key or value. """
def __getitem__(self, key):
if key not in self:
inv_dict = {v:k for k,v in self.items()}
return inv_dict[key]
return dict.__getitem__(self, key)
def __delitem__(self, key):
if key not in self:
inv_dict = {v:k for k,v in self.items()}
dict.__delitem__(self, inv_dict[key])
else:
dict.__delitem__(self, key)
Use it as a normal python dictionary except in construction:
dd = DoubleD()
dd['foo'] = 'bar'
A way I like to do this kind of thing is something like:
{my_dict[key]: key for key in my_dict.keys()}
I have a dictionary like:
d = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
which I would like to convert to a namedtuple.
My current approach is with the following code
namedTupleConstructor = namedtuple('myNamedTuple', ' '.join(sorted(d.keys())))
nt= namedTupleConstructor(**d)
which produces
myNamedTuple(a=1, b=2, c=3, d=4)
This works fine for me (I think), but am I missing a built-in such as...
nt = namedtuple.from_dict() ?
UPDATE: as discussed in the comments, my reason for wanting to convert my dictionary to a namedtuple is so that it becomes hashable, but still generally useable like a dict.
UPDATE2: 4 years after I've posted this question, TLK posts a new answer recommending using the dataclass decorator that I think is really great. I think that's now what I would use going forward.
To create the subclass, you may just pass the keys of a dict directly:
MyTuple = namedtuple('MyTuple', d)
Now to create tuple instances from this dict, or any other dict with matching keys:
my_tuple = MyTuple(**d)
Beware: namedtuples compare on values only (ordered). They are designed to be a drop-in replacement for regular tuples, with named attribute access as an added feature. The field names will not be considered when making equality comparisons. It may not be what you wanted nor expected from the namedtuple type! This differs from dict equality comparisons, which do take into account the keys and also compare order agnostic.
For readers who don't really need a type which is a subclass of tuple, there probably isn't much point to use a namedtuple in the first place. If you just want to use attribute access syntax on fields, it would be simpler and easier to create namespace objects instead:
>>> from types import SimpleNamespace
>>> SimpleNamespace(**d)
namespace(a=1, b=2, c=3, d=4)
my reason for wanting to convert my dictionary to a namedtuple is so that it becomes hashable, but still generally useable like a dict
For a hashable "attrdict" like recipe, check out a frozen box:
>>> from box import Box
>>> b = Box(d, frozen_box=True)
>>> hash(b)
7686694140185755210
>>> b.a
1
>>> b["a"]
1
>>> b["a"] = 2
BoxError: Box is frozen
There may also be a frozen mapping type coming in a later version of Python, watch this draft PEP for acceptance or rejection:
PEP 603 -- Adding a frozenmap type to collections
from collections import namedtuple
nt = namedtuple('x', d.keys())(*d.values())
If you want an easier approach, and you have the flexibility to use another approach other than namedtuple I would like to suggest using SimpleNamespace (docs).
from types import SimpleNamespace as sn
d = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
dd= sn(**d)
# dd.a>>1
# add new property
dd.s = 5
#dd.s>>5
PS: SimpleNamespace is a type, not a class
I'd like to recommend the dataclass for this type of situation. Similar to a namedtuple, but with more flexibility.
https://docs.python.org/3/library/dataclasses.html
from dataclasses import dataclass
#dataclass
class InventoryItem:
"""Class for keeping track of an item in inventory."""
name: str
unit_price: float
quantity_on_hand: int = 0
def total_cost(self) -> float:
return self.unit_price * self.quantity_on_hand
You can use this function to handle nested dictionaries:
def create_namedtuple_from_dict(obj):
if isinstance(obj, dict):
fields = sorted(obj.keys())
namedtuple_type = namedtuple(
typename='GenericObject',
field_names=fields,
rename=True,
)
field_value_pairs = OrderedDict(
(str(field), create_namedtuple_from_dict(obj[field]))
for field in fields
)
try:
return namedtuple_type(**field_value_pairs)
except TypeError:
# Cannot create namedtuple instance so fallback to dict (invalid attribute names)
return dict(**field_value_pairs)
elif isinstance(obj, (list, set, tuple, frozenset)):
return [create_namedtuple_from_dict(item) for item in obj]
else:
return obj
use the dictionary keys as the fieldnames to the namedtuple
d = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
def dict_to_namedtuple(d):
return namedtuple('GenericDict', d.keys())(**d)
result=dict_to_namedtuple(d)
print(result)
output
GenericDict(a=1, b=2, c=3, d=4)
def toNametuple(dict_data):
return namedtuple(
"X", dict_data.keys()
)(*tuple(map(lambda x: x if not isinstance(x, dict) else toNametuple(x), dict_data.values())))
d = {
'id': 1,
'name': {'firstName': 'Ritesh', 'lastName':'Dubey'},
'list_data': [1, 2],
}
obj = toNametuple(d)
Access as obj.name.firstName, obj.id
This will work for nested dictionary with any data types.
I find the following 4-liner the most beautiful. It supports nested dictionaries as well.
def dict_to_namedtuple(typename, data):
return namedtuple(typename, data.keys())(
*(dict_to_namedtuple(typename + '_' + k, v) if isinstance(v, dict) else v for k, v in data.items())
)
The output will look good also:
>>> nt = dict_to_namedtuple('config', {
... 'path': '/app',
... 'debug': {'level': 'error', 'stream': 'stdout'}
... })
>>> print(nt)
config(path='/app', debug=config_debug(level='error', stream='stdout'))
Check this out:
def fill_tuple(NamedTupleType, container):
if container is None:
args = [None] * len(NamedTupleType._fields)
return NamedTupleType(*args)
if isinstance(container, (list, tuple)):
return NamedTupleType(*container)
elif isinstance(container, dict):
return NamedTupleType(**container)
else:
raise TypeError("Cannot create '{}' tuple out of {} ({}).".format(NamedTupleType.__name__, type(container).__name__, container))
Exceptions for incorrect names or invalid argument count is handled by __init__ of namedtuple.
Test with py.test:
def test_fill_tuple():
A = namedtuple("A", "aa, bb, cc")
assert fill_tuple(A, None) == A(aa=None, bb=None, cc=None)
assert fill_tuple(A, [None, None, None]) == A(aa=None, bb=None, cc=None)
assert fill_tuple(A, [1, 2, 3]) == A(aa=1, bb=2, cc=3)
assert fill_tuple(A, dict(aa=1, bb=2, cc=3)) == A(aa=1, bb=2, cc=3)
with pytest.raises(TypeError) as e:
fill_tuple(A, 2)
assert e.value.message == "Cannot create 'A' tuple out of int (2)."
Although I like #fuggy_yama answer, before read it I got my own function, so I leave it here just to show a different approach. It also handles nested namedtuples
def dict2namedtuple(thedict, name):
thenametuple = namedtuple(name, [])
for key, val in thedict.items():
if not isinstance(key, str):
msg = 'dict keys must be strings not {}'
raise ValueError(msg.format(key.__class__))
if not isinstance(val, dict):
setattr(thenametuple, key, val)
else:
newname = dict2namedtuple(val, key)
setattr(thenametuple, key, newname)
return thenametuple
TL;DR: How can you compare two python dictionaries if some of them have values which are unhashable/mutable (e.g. lists or pandas Dataframes)?
I have to compare dictionary pairs for equality. In that sense, this question is similar to these two, but their solutions only seem to work for immutable objects...
Is there a better way to compare dictionary values
Comparing two dictionaries in Python
My problem, is that I'm dealing with pairs of highly nested dictionaries where the unhashable objects could be found in different places depending on which pair of dictionaries I'm comparing. My thinking is that I'll need to iterate across the deapest values contained in the dictionary and can't just rely on the dict.iteritems() which only unrolls the highest key-value pairs. I'm not sure how iterate across all the possible key-value pairs contained in the dictionary and compare them either using sets/== for the hashable objects and in the cases of pandas dataframes, running df1.equals(df2). (Note for pandas dataframe, just running df1==df2 does a piecewise comparison and NA's are poorly handled. df1.equals(df2) gets around that does the trick.)
So for example:
a = {'x': 1, 'y': {'z': "George", 'w': df1}}
b = {'x': 1, 'y': {'z': "George", 'w': df1}}
c = {'x': 1, 'y': {'z': "George", 'w': df2}}
At a minimum, and this would be pretty awesome already, the solution would yield TRUE/FALSE as to whether their values are the same and would work for pandas dataframes.
def dict_compare(d1, d2):
if ...
return True
elif ...
return False
dict_compare(a,b)
>>> True
dict_compare(a,c)
>>> False
Moderately better: the solution would point out what key/values would be different across the dictionaries.
In the ideal case: the solution could separate the values into 4 groupings:
added,
removed,
modified
same
Well, there's a way to make any type comparable: Simply wrap it in a class that compares like you need it:
class DataFrameWrapper():
def __init__(self, df):
self.df = df
def __eq__(self, other):
return self.df.equals(other.df)
So when you wrap your "uncomparable" values you can now simply use ==:
>>> import pandas as pd
>>> df1 = pd.DataFrame({'a': [1,2,3]})
>>> df2 = pd.DataFrame({'a': [3,2,1]})
>>> a = {'x': 1, 'y': {'z': "George", 'w': DataFrameWrapper(df1)}}
>>> b = {'x': 1, 'y': {'z': "George", 'w': DataFrameWrapper(df1)}}
>>> c = {'x': 1, 'y': {'z': "George", 'w': DataFrameWrapper(df2)}}
>>> a == b
True
>>> a == c
False
Of course wrapping your values has it's disadvantages but if you only need to compare them that would be a very easy approach. All that may be needed is a recursive wrapping before doing the comparison and a recursive unwrapping afterwards:
def recursivewrap(dict_):
for key, value in dict_.items():
wrapper = wrappers.get(type(value), lambda x: x) # for other types don't wrap
dict_[key] = wrapper(value)
return dict_ # return dict_ so this function can be used for recursion
def recursiveunwrap(dict_):
for key, value in dict_.items():
unwrapper = unwrappers.get(type(value), lambda x: x)
dict_[key] = unwrapper(value)
return dict_
wrappers = {pd.DataFrame: DataFrameWrapper,
dict: recursivewrap}
unwrappers = {DataFrameWrapper: lambda x: x.df,
dict: recursiveunwrap}
Sample case:
>>> recursivewrap(a)
{'x': 1,
'y': {'w': <__main__.DataFrameWrapper at 0x2affddcc048>, 'z': 'George'}}
>>> recursiveunwrap(recursivewrap(a))
{'x': 1, 'y': {'w': a
0 1
1 2
2 3, 'z': 'George'}}
If you feel really adventurous you could use wrapper classes that depending on the comparison result modify some variable that holds the information what wasn't equal.
This part of the answer was based on the original question that didn't include nestings:
You can seperate the unhashable values from the hashable values and do a set-comparison for the hashable values and a "order-independant" list-comparison for the unhashables:
def split_hashable_unhashable(vals):
"""Seperate hashable values from unhashable ones and returns a set (hashables)
and list (unhashable ones)"""
set_ = set()
list_ = []
for val in vals:
try:
set_.add(val)
except TypeError: # unhashable
list_.append(val)
return set_, list_
def compare_lists_arbitary_order(l1, l2, cmp=pd.DataFrame.equals):
"""Compare two lists using a custom comparison function, the order of the
elements is ignored."""
# need to have equal lengths otherwise they can't be equal
if len(l1) != len(l2):
return False
remaining_indices = set(range(len(l2)))
for item in l1:
for cmpidx in remaining_indices:
if cmp(item, l2[cmpidx]):
remaining_indices.remove(cmpidx)
break
else:
# Run through the loop without finding a match
return False
return True
def dict_compare(d1, d2):
if set(d1) != set(d2): # compare the dictionary keys
return False
set1, list1 = split_hashable_unhashable(d1.values())
set2, list2 = split_hashable_unhashable(d2.values())
if set1 != set2: # set comparison is easy
return False
return compare_lists_arbitary_order(list1, list2)
It got a bit longer than expected. For your test-cases it definetly works:
>>> import pandas as pd
>>> df1 = pd.DataFrame({'a': [1,2,3]})
>>> df2 = pd.DataFrame({'a': [3,2,1]})
>>> a = {'x': 1, 'y': df1}
>>> b = {'y': 1, 'x': df1}
>>> c = {'y': 1, 'x': df2}
>>> dict_compare(a, b)
True
>>> dict_compare(a, c)
False
>>> dict_compare(b, c)
False
The set-operations can also be used to find differences (see set.difference). It's a bit more complicated with the lists, but not really impossible. One could add the items where no match was found to a seperate list instead of instantly returning False.
Deepdiff library provides extensive ability to diff two python dictionaries
https://github.com/seperman/deepdiff
DeepDiff: Deep Difference of dictionaries, iterables, strings and other objects. It will recursively look for all the changes.
pip install deepdiff