Multiple Values for a single key in a dictionary - python

I am currently writing a script that extracts data from an xml and writes it into an html file for easy viewing on a webpage.
Each piece of data has 2 pieces of "sub data": Owner and Type.
In order for the html to work properly I need the "owner" string and the "type" string to be written in the correct place. If it was just a single piece of data then I would use dictionaries and just use the data name as the key and then write the value to html, however there are 2 pieces of data.
My question is, can a dictionary have 2 values (in my case owner and type) assigned to a single key?

Any object can be a value in a dictionary, so you can use any collection to hold more than one value against the same key. To expand my comments into some code samples, in order of increasing complexity (and, in my opinion, readability):
Tuple
The simplest option is a two-tuple of strings, which you can access by index:
>>> d1 = {'key': ('owner', 'type')}
>>> d1['key'][0]
'owner'
>>> d1['key'][1]
'type'
Dictionary
Next up is a sub-dictionary, which allows you to access the values by key name:
>>> d2 = {'key': {'owner': 'owner', 'type': 'type'}}
>>> d2['key']['owner']
'owner'
>>> d2['key']['type']
'type'
Named tuple
Finally the collections module provides namedtuple, which requires a little setup but then allows you to access the values by attribute name:
>>> from collections import namedtuple
>>> MyTuple = namedtuple('MyTuple', ('owner', 'type'))
>>> d3 = {'key': MyTuple('owner', 'type')}
>>> d3['key'].owner
'owner'
>>> d3['key'].type
'type'
Using named keys/attributes makes your subsequent access to the values clearer (d3['key'].owner and d2['key']['owner'] are less ambiguous than d1['key'][0]).

As long as keys are hash-able you can have keys of any format. Note, tuples are hash-able so that would be a possible solution to your problem
Make a tuple of case-owner and type and use it as a key to your dictionary.
Note, generally all objects that are hashable should also be immutable, but not vice-versa. So

Related

Is there something simple like a set for un-hashable objects?

For hashable objects inside a dict I could easily pair down duplicate values store in a dict using a set. For example:
a = {'test': 1, 'key': 1, 'other': 2}
b = set(a.values())
print(b)
Would display [1,2]
Problem I have is I am using a dict to store mapping between variable keys in __dict__ and the corresponding processing functions that will be passed to an engine to order and process those functions, some of these functions may be fast some may be slower due to accessing an API. The problem is each function may use multiple variable, therefor need multiple mappings in the dict. I'm wondering if there is a way to do this or if I am stuck writing my own solution?
Ended up building a callable class, since caching could speed things up for me:
from collections.abc import Callable
class RemoveDuplicates(Callable):
input_cache = []
output_cache = []
def __call__(self, in_list):
if list in self.input_cache:
idx = self.input_cache.index(in_list)
return self.output_cache[idx]
else:
self.input_cache.append(in_list)
out_list = self._remove_duplicates(in_list)
self.output_cache.append(out_list)
return out_list
def _remove_duplicates(self, src_list):
result = []
for item in src_list:
if item not in result:
result.append(item)
return result
If the objects can be ordered, you can use itertools.groupby to eliminate the duplicates:
>>> a = {'test': 1, 'key': 1, 'other': 2}
>>> b = [k for k, it in itertools.groupby(sorted(a.values()))]
>>> print(b)
[1, 2]
Is there something simple like a set for un-hashable objects
Not in the standard library but you need to look beyond and search for BTree implementation of dictionary. I googled and found few hits where the first one (BTree)seems promising and interesting
Quoting from the wiki
The BTree-based data structures differ from Python dicts in several
fundamental ways. One of the most important is that while dicts
require that keys support hash codes and equality comparison, the
BTree-based structures don’t use hash codes and require a total
ordering on keys.
Off-course its trivial fact that a set can be implemented as a dictionary where the value is unused.
You could (indirectly) use the bisect module to create sorted collection of your values which would greatly speed-up the insertion of new values and value membership testing in general — which together can be utilized to unsure that only unique values get put into it.
In the code below, I've used un-hashable set values for the sake of illustration.
# see http://code.activestate.com/recipes/577197-sortedcollection
from sortedcollection import SortedCollection
a = {'test': {1}, 'key': {1}, 'other': {2}}
sc = SortedCollection()
for value in a.values():
if value not in sc:
sc.insert(value)
print(list(sc)) # --> [{1}, {2}]

JSON encoding for dict of dicts

Let us say I have a custom data structure comprising of primitive dicts. I need to serialize this using JSON. My structure is as follows:
path_list_dict = {(node1, node2 .. nodeN): (float1, float2, float3)}
So this is keyed with a tuple and the value is a tuple of three values. Each node element in the key is a custom class object with a _str_ method written for it. The wrapper dict which identifies each dict entry in path_list_dict with a key is as follows:
path_options_dict = {‘Path1’: {(node1, node2 .. nodeN): (float1, float2, float3)}, ‘Path2’: {(nodeA1, nodeA2 .. nodeAN): (floatA1, floatA2, floatA3)} }
and so on.
When I try to serialize this using JSON, of course I run into a TypeError because the inner dict has tuples as keys and values and a dict needs to have keys as strings to be serialized. This can be easily taken care of for me by inserting into the dict as the str(tuple) representation instead of just the native tuple.
What I am concerned about is that when I receive it and unpack the values, I am going to have all strings at the receiving end. The key tuple of the inner dict that consists of custom class elements is now represented as a str. Will I be able to recover the embedded data? Or is these some other way to do this better?
For more clarity, I am using this JSON tutorial as reference.
You have several options:
Serialize with a custom key prefix that you can pick out and unserialize again:
tuple_key = '__tuple__({})'.format(','.join(key))
would produce '__tuple__(node1,node2,nodeN)' as a key, which you could parse back into a tuple on the other side:
if key.startswith('__tuple__('):
key = tuple(key[10:-1].split(','))
Demo:
>>> key = ('node1', 'node2', 'node3')
>>> '__tuple__({})'.format(','.join(key))
'__tuple__(node1,node2,node3)'
>>> mapped_key = '__tuple__({})'.format(','.join(key))
>>> tuple(mapped_key[10:-1].split(','))
('node1', 'node2', 'node3')
Don't use dictionaries, use a list of lists:
{'Path': [[[node1, node2 .. nodeN], [float1, float2, float3]], [...]]}
You can build such a list simply from the dict.items() result:
>>> json.dumps({(1, 2, 3): ('foo', 'bar')}.items())
'[[[1, 2, 3], ["foo", "bar"]]]'
and when decoding, feed the whole thing back into dict() while mapping each key-value list to tuples:
>>> dict(map(tuple, kv) for kv in json.loads('[[[1, 2, 3], ["foo", "bar"]]]'))
{(1, 2, 3): (u'foo', u'bar')}
The latter approach is more suitable for custom classes as well, as the JSONEncoder.default() method will still be handed these custom objects for you to serialize back to a suitable dictionary object, which gives means that a suitable object_hook passed to JSONDecoder() a chance to return fully deserialized custom objects again for those.

Check if key exists in dictionary. If not, append it

I have a large python dict created from json data and am creating a smaller dict from the large one. Some elements of the large dictionary have a key called 'details' and some elements don't. What I want to do is check if the key exists in each entry in the large dictionary and if not, append the key 'details' with the value 'No details available' to the new dictionary. I am putting some sample code below just as a demonstration. The LargeDict is much larger with many keys in my code, but I'm keeping it simple for clarity.
LargeDict = {'results':
[{'name':'john','age':'23','datestart':'12/07/08','department':'Finance','details':'Good Employee'},
{'name':'barry','age':'26','datestart':'25/08/10','department':'HR','details':'Also does payroll'},
{'name':'sarah','age':'32','datestart':'13/05/05','department':'Sales','details':'Due for promotion'},
{'name':'lisa','age':'21','datestart':'02/05/12','department':'Finance'}]}
This is how I am getting the data for the SmallDict:
SmallDict = {d['name']:{'department':d['department'],'details':d['details']} for d in LargeDict['results']}
I get a key error however when one of the large dict entries has no details. Am I right in saying I need to use the DefaultDict module or is there an easier way?
You don't need a collections.defaultdict. You can use the setdefault method of dictionary objects.
d = {}
bar = d.setdefault('foo','bar') #returns 'bar'
print bar # bar
print d #{'foo': 'bar'}
As others have noted, if you don't want to add the key to the dictionary, you can use the get method.
here's an old reference that I often find myself looking at.
You could use collections.defaultdict if you want to create an entry in your dict automatically. However, if you don't, and just want "Not available" (or whatever), then you can just assign to the dict as d[key] = v and use d.get(k, 'Not available') for a default value
Use the get(key, defaultVar) method to supply a default value when the 'details' key is missing:
SmallDict = {d['name']:{'department':d['department'],'details':d.get('details','No details available')} for d in LargeDict['results']}

Python dictionary : TypeError: unhashable type: 'list'

I'm having troubles in populating a python dictionary starting from another dictionary.
Let's assume that the "source" dictionary has string as keys and has a list of custom objects per value.
I'm creating my target dictionary exactly as I have been creating my "source" dictionary how is it possible this is not working ?
I get
TypeError: unhashable type: 'list'
Code :
aTargetDictionary = {}
for aKey in aSourceDictionary:
aTargetDictionary[aKey] = []
aTargetDictionary[aKey].extend(aSourceDictionary[aKey])
The error is on this line : aTargetDictionary[aKey] = []
The error you gave is due to the fact that in python, dictionary keys must be immutable types (if key can change, there will be problems), and list is a mutable type.
Your error says that you try to use a list as dictionary key, you'll have to change your list into tuples if you want to put them as keys in your dictionary.
According to the python doc :
The only types of values not acceptable as keys are values containing
lists or dictionaries or other mutable types that are compared by
value rather than by object identity, the reason being that the
efficient implementation of dictionaries requires a key’s hash value
to remain constant
This is indeed rather odd.
If aSourceDictionary were a dictionary, I don't believe it is possible for your code to fail in the manner you describe.
This leads to two hypotheses:
The code you're actually running is not identical to the code in your question (perhaps an earlier or later version?)
aSourceDictionary is in fact not a dictionary, but is some other structure (for example, a list).
As per your description, things don't add up. If aSourceDictionary is a dictionary, then your for loop has to work properly.
>>> source = {'a': [1, 2], 'b': [2, 3]}
>>> target = {}
>>> for key in source:
... target[key] = []
... target[key].extend(source[key])
...
>>> target
{'a': [1, 2], 'b': [2, 3]}
>>>
It works fine : http://codepad.org/5KgO0b1G,
your aSourceDictionary variable may have other datatype than dict
aSourceDictionary = { 'abc' : [1,2,3] , 'ccd' : [4,5] }
aTargetDictionary = {}
for aKey in aSourceDictionary:
aTargetDictionary[aKey] = []
aTargetDictionary[aKey].extend(aSourceDictionary[aKey])
print aTargetDictionary
You can also use defaultdict to address this situation. It goes something like this:
from collections import defaultdict
#initialises the dictionary with values as list
aTargetDictionary = defaultdict(list)
for aKey in aSourceDictionary:
aTargetDictionary[aKey].append(aSourceDictionary[aKey])

Python: How to access data from this type of list?

Quick Python question: How do I access data from a nested list like this:
{'album': [u'Rumours'], 'comment': [u'Track 3'], 'artist': [u'Fleetwood Mac'], 'title': [u'Never Going Back Again'], 'date': [u'1977'], 'genre': [u'Rock'], 'tracknumber': [u'03']}
I tried listname[0][0] but it returns the error:
AttributeError: 'int' object has no attribute 'lower'
So how would I go about doing this?
This is not a list, it is a dictionary. It takes an immutable type as key and any type as value for every key,value pair. In your case this is a dictionary with str type keys and list's as values. You must first extract the list from the dictionary, and then the first element from the list, assuming you meant that:
somedict = {"test": [u"spam"], "foo": [u"bar"]}
print(somedict["test"][0])
Please note that a dictionary is not type-bound and can mix types:
somedict = {1: "test", "foo": ["bar", "spam"]}
And some more information about dictionaries can be found here: http://docs.python.org/tutorial/datastructures.html#dictionaries
This is not a list. This is a dictionary.
The dictionary is not ordered, and thus it cannot be accessed through a numeric index*.
You must refer to to it like this: listname['album']
The above will return you a list with one element (which happens to be a list): [u'Rumours'], to acces a list, you do as usual.
So altogether:
listname['album'][0]
# Will output the string inside the list.
Notice that the list could have more elements, so you would refer them like so [0],[1] etc.
Take a look at the docs for more information.
*You can do:
d = {2:"a",1:"b"}
print d[1] ### prints string b
What I meant is that you don't use zero based indexes, you use keys that can be "whatever you want" and this keys refer to values.

Categories