Let us say I have a custom data structure comprising of primitive dicts. I need to serialize this using JSON. My structure is as follows:
path_list_dict = {(node1, node2 .. nodeN): (float1, float2, float3)}
So this is keyed with a tuple and the value is a tuple of three values. Each node element in the key is a custom class object with a _str_ method written for it. The wrapper dict which identifies each dict entry in path_list_dict with a key is as follows:
path_options_dict = {‘Path1’: {(node1, node2 .. nodeN): (float1, float2, float3)}, ‘Path2’: {(nodeA1, nodeA2 .. nodeAN): (floatA1, floatA2, floatA3)} }
and so on.
When I try to serialize this using JSON, of course I run into a TypeError because the inner dict has tuples as keys and values and a dict needs to have keys as strings to be serialized. This can be easily taken care of for me by inserting into the dict as the str(tuple) representation instead of just the native tuple.
What I am concerned about is that when I receive it and unpack the values, I am going to have all strings at the receiving end. The key tuple of the inner dict that consists of custom class elements is now represented as a str. Will I be able to recover the embedded data? Or is these some other way to do this better?
For more clarity, I am using this JSON tutorial as reference.
You have several options:
Serialize with a custom key prefix that you can pick out and unserialize again:
tuple_key = '__tuple__({})'.format(','.join(key))
would produce '__tuple__(node1,node2,nodeN)' as a key, which you could parse back into a tuple on the other side:
if key.startswith('__tuple__('):
key = tuple(key[10:-1].split(','))
Demo:
>>> key = ('node1', 'node2', 'node3')
>>> '__tuple__({})'.format(','.join(key))
'__tuple__(node1,node2,node3)'
>>> mapped_key = '__tuple__({})'.format(','.join(key))
>>> tuple(mapped_key[10:-1].split(','))
('node1', 'node2', 'node3')
Don't use dictionaries, use a list of lists:
{'Path': [[[node1, node2 .. nodeN], [float1, float2, float3]], [...]]}
You can build such a list simply from the dict.items() result:
>>> json.dumps({(1, 2, 3): ('foo', 'bar')}.items())
'[[[1, 2, 3], ["foo", "bar"]]]'
and when decoding, feed the whole thing back into dict() while mapping each key-value list to tuples:
>>> dict(map(tuple, kv) for kv in json.loads('[[[1, 2, 3], ["foo", "bar"]]]'))
{(1, 2, 3): (u'foo', u'bar')}
The latter approach is more suitable for custom classes as well, as the JSONEncoder.default() method will still be handed these custom objects for you to serialize back to a suitable dictionary object, which gives means that a suitable object_hook passed to JSONDecoder() a chance to return fully deserialized custom objects again for those.
Related
I am trying to create a multi-level dict with variable depth and with list and int type.
Data structure is like below
A
--B1
-----C1=1
-----C2=[1]
--B2=[3]
D
--E
----F
------G=4
In the case of above data structure, the last value can be an int or list.
If the above data structure has the only int then I can be easily achieved by using the below code:
from collections import defaultdict
f = lambda: defaultdict(f)
d = f()
d['A']['B1']['C1'] = 1
But as the last value has both list and int, it becomes a bit problematic for me.
Now we can insert data in a list using two ways.
d['A']['B1']['C2']= [1]
d['A']['B1']['C2'].append([2])
But when I am using only the append method it is causing the error.
Error is:
AttributeError: 'collections.defaultdict' object has no attribute 'append'
so Is there any way to use only the append method for a list?
There's no way you can use your current defaultdict-based structure to make d['A']['B1']['C2'].append(1) work properly if the 'C2' key doesn't already exist, since the data structure can't tell that the unknown key should correspond to a list rather than another layer of dictionary. It doesn't know what method you're going to call on the value it returns, so it can't know it shouldn't return a dictionary (like it did when it first looked up 'A' and 'B').
This isn't an issue for bare integers, since for those you're as assigning directly to a new key (and all the earlier levels are dictionaries). When you're assigning, the data structure isn't creating the value, you are, so you can use any type you want.
Now, if your keys are distinctive in some way, so that given a key like 'C2' you can know for sure that it should correspond to a list, you may have a chance. You can write your own dict subclass, defining a __missing__ method to handle lookups of keys that don't exist yet in your own special way:
def Tree(dict):
def __missing__(self, key):
if key_corresponds_to_list(key): # magic from somewhere
result = self[key] = []
else:
result = self[key] = Tree()
return result
# you might also want a custom __repr__
Here's an example run with a magic key function that makes any even-length key default to a list, while an odd-length key defaults to a dict:
> def key_corresponds_to_list(key):
return len(key) % 2 == 0
> t = Tree()
> t["A"]["B"]["C2"].append(1) # the default value for C2 is a list because it's even length
> t
{'A': {'B': {'C2': [1]}}}
> t["A"]["B"]["C10"]["D"] = 2 # C10's another layer of dict, since it's length is odd
> t
{'A': {'B': {'C10': {'D': 2}, 'C2': [1]}}} # it didn't matter what length D was though
You probably won't actually want to use a global function to control the class like this, I just did that as an example. If you go with this approach, I'd suggest putting the logic directly into the __missing__ method (or maybe passing a function as a parameter, like defaultdict does with its factory function).
For hashable objects inside a dict I could easily pair down duplicate values store in a dict using a set. For example:
a = {'test': 1, 'key': 1, 'other': 2}
b = set(a.values())
print(b)
Would display [1,2]
Problem I have is I am using a dict to store mapping between variable keys in __dict__ and the corresponding processing functions that will be passed to an engine to order and process those functions, some of these functions may be fast some may be slower due to accessing an API. The problem is each function may use multiple variable, therefor need multiple mappings in the dict. I'm wondering if there is a way to do this or if I am stuck writing my own solution?
Ended up building a callable class, since caching could speed things up for me:
from collections.abc import Callable
class RemoveDuplicates(Callable):
input_cache = []
output_cache = []
def __call__(self, in_list):
if list in self.input_cache:
idx = self.input_cache.index(in_list)
return self.output_cache[idx]
else:
self.input_cache.append(in_list)
out_list = self._remove_duplicates(in_list)
self.output_cache.append(out_list)
return out_list
def _remove_duplicates(self, src_list):
result = []
for item in src_list:
if item not in result:
result.append(item)
return result
If the objects can be ordered, you can use itertools.groupby to eliminate the duplicates:
>>> a = {'test': 1, 'key': 1, 'other': 2}
>>> b = [k for k, it in itertools.groupby(sorted(a.values()))]
>>> print(b)
[1, 2]
Is there something simple like a set for un-hashable objects
Not in the standard library but you need to look beyond and search for BTree implementation of dictionary. I googled and found few hits where the first one (BTree)seems promising and interesting
Quoting from the wiki
The BTree-based data structures differ from Python dicts in several
fundamental ways. One of the most important is that while dicts
require that keys support hash codes and equality comparison, the
BTree-based structures don’t use hash codes and require a total
ordering on keys.
Off-course its trivial fact that a set can be implemented as a dictionary where the value is unused.
You could (indirectly) use the bisect module to create sorted collection of your values which would greatly speed-up the insertion of new values and value membership testing in general — which together can be utilized to unsure that only unique values get put into it.
In the code below, I've used un-hashable set values for the sake of illustration.
# see http://code.activestate.com/recipes/577197-sortedcollection
from sortedcollection import SortedCollection
a = {'test': {1}, 'key': {1}, 'other': {2}}
sc = SortedCollection()
for value in a.values():
if value not in sc:
sc.insert(value)
print(list(sc)) # --> [{1}, {2}]
I am currently writing a script that extracts data from an xml and writes it into an html file for easy viewing on a webpage.
Each piece of data has 2 pieces of "sub data": Owner and Type.
In order for the html to work properly I need the "owner" string and the "type" string to be written in the correct place. If it was just a single piece of data then I would use dictionaries and just use the data name as the key and then write the value to html, however there are 2 pieces of data.
My question is, can a dictionary have 2 values (in my case owner and type) assigned to a single key?
Any object can be a value in a dictionary, so you can use any collection to hold more than one value against the same key. To expand my comments into some code samples, in order of increasing complexity (and, in my opinion, readability):
Tuple
The simplest option is a two-tuple of strings, which you can access by index:
>>> d1 = {'key': ('owner', 'type')}
>>> d1['key'][0]
'owner'
>>> d1['key'][1]
'type'
Dictionary
Next up is a sub-dictionary, which allows you to access the values by key name:
>>> d2 = {'key': {'owner': 'owner', 'type': 'type'}}
>>> d2['key']['owner']
'owner'
>>> d2['key']['type']
'type'
Named tuple
Finally the collections module provides namedtuple, which requires a little setup but then allows you to access the values by attribute name:
>>> from collections import namedtuple
>>> MyTuple = namedtuple('MyTuple', ('owner', 'type'))
>>> d3 = {'key': MyTuple('owner', 'type')}
>>> d3['key'].owner
'owner'
>>> d3['key'].type
'type'
Using named keys/attributes makes your subsequent access to the values clearer (d3['key'].owner and d2['key']['owner'] are less ambiguous than d1['key'][0]).
As long as keys are hash-able you can have keys of any format. Note, tuples are hash-able so that would be a possible solution to your problem
Make a tuple of case-owner and type and use it as a key to your dictionary.
Note, generally all objects that are hashable should also be immutable, but not vice-versa. So
I'm trying to serialize my Python objects into JSON using json.dumps. If you serialize a dict using json.dumps it will obviously be serialized as a JSON dictionary {..}; if you serialize a list or a tuple, it will be a JSON array.
I want to know if there's any way to easily serialize a Python dict as a JSON list, if possible. By if possible, I mean if the keys start at 0 and are sequenced, for example:
{0:'data',1:'data',2:'data}
The above would be serialized into JSON as: '{"0": "data", "1": "data", "2": "data"}', but I would like it to be serialized as ['data','data','data'] since the keys start at 0 and are sequenced.
My reasoning for this is because I have lots of JSON data that is serialized from PHP, where in PHP arrays have keys and if the keys are sequenced as described above, PHP's json.encode uses arrays, if they are keyed in any other manner, they are serialized as JSON dictionaries. I want my JSON serializations to match for both my PHP and Python code. Unfortunately, changing the PHP code isn't an option in my case.
Any suggestions? The only solution I have found is to write my own function to go through and verify each python dictionary and see if it can first be converted to a list before json.dumps.
EDIT: This object that I'm serializing could be a list or a dict, as well, it could have additional dicts inside of it, and lists, and so on (nesting). I'm wondering if there's any 'simple' way to do this, otherwise I believe I can write a recursive solution myself. But it's always better to use existing code to avoid more bugs.
I don't know of a solution without recursion... Although you can call your converter from inside the encode method of your custom Encoder, it would just add unnecessary complexity.
In [1]: import json
In [2]: d = {"0": "data0", "1": "data1", "2": {"0": "data0", "1": "data1", "2": "data2"}}
In [3]: def convert(obj):
...: if isinstance(obj, (list, tuple)):
...: return [convert(i) for i in obj]
...: elif isinstance(obj, dict):
...: _, values = zip(*sorted(obj.items()))
...: return convert(values)
...: return obj
In [4]: json.dumps(convert(d))
Out[4]: '["data0", "data1", ["data0", "data1", "data2"]]'
You could convert the dictionary into a list of tuples and then sort it, as dictionary items won't necessarily come out in the order than you want them to:
items = sorted(d.items(), key=lambda item: item[0])
values = [item[1] for item in items]
json_dict = json.dumps(values)
Normally you could subclass json.JSONEncoder to create your own custom JSON serializer, but that won't allow you to override built-in object types.
If you create your own custom dictlist object (or whatever you want to call it) that doesn't extend dict or list you should be able to override the JSONEncoder.default method to create your own custom JSON serializer.
Regardless of whether you create a custom JSON serializer or recursively replace your special dict instances with lists you will need a function that accepts a dict and returns either a list or a dict as appropriate.
Here's one implementation:
def convert_to_list(obj):
obj_list = []
for i in range(len(obj)):
if i not in obj:
return obj # Return original dict if not an ordered list
obj_list.append(obj[i])
return obj_list
I'm having troubles in populating a python dictionary starting from another dictionary.
Let's assume that the "source" dictionary has string as keys and has a list of custom objects per value.
I'm creating my target dictionary exactly as I have been creating my "source" dictionary how is it possible this is not working ?
I get
TypeError: unhashable type: 'list'
Code :
aTargetDictionary = {}
for aKey in aSourceDictionary:
aTargetDictionary[aKey] = []
aTargetDictionary[aKey].extend(aSourceDictionary[aKey])
The error is on this line : aTargetDictionary[aKey] = []
The error you gave is due to the fact that in python, dictionary keys must be immutable types (if key can change, there will be problems), and list is a mutable type.
Your error says that you try to use a list as dictionary key, you'll have to change your list into tuples if you want to put them as keys in your dictionary.
According to the python doc :
The only types of values not acceptable as keys are values containing
lists or dictionaries or other mutable types that are compared by
value rather than by object identity, the reason being that the
efficient implementation of dictionaries requires a key’s hash value
to remain constant
This is indeed rather odd.
If aSourceDictionary were a dictionary, I don't believe it is possible for your code to fail in the manner you describe.
This leads to two hypotheses:
The code you're actually running is not identical to the code in your question (perhaps an earlier or later version?)
aSourceDictionary is in fact not a dictionary, but is some other structure (for example, a list).
As per your description, things don't add up. If aSourceDictionary is a dictionary, then your for loop has to work properly.
>>> source = {'a': [1, 2], 'b': [2, 3]}
>>> target = {}
>>> for key in source:
... target[key] = []
... target[key].extend(source[key])
...
>>> target
{'a': [1, 2], 'b': [2, 3]}
>>>
It works fine : http://codepad.org/5KgO0b1G,
your aSourceDictionary variable may have other datatype than dict
aSourceDictionary = { 'abc' : [1,2,3] , 'ccd' : [4,5] }
aTargetDictionary = {}
for aKey in aSourceDictionary:
aTargetDictionary[aKey] = []
aTargetDictionary[aKey].extend(aSourceDictionary[aKey])
print aTargetDictionary
You can also use defaultdict to address this situation. It goes something like this:
from collections import defaultdict
#initialises the dictionary with values as list
aTargetDictionary = defaultdict(list)
for aKey in aSourceDictionary:
aTargetDictionary[aKey].append(aSourceDictionary[aKey])