How can i make a set of dictionaries from one list of dictionaries?
Example:
import copy
v1 = {'k01': 'v01', 'k02': {'k03': 'v03', 'k04': {'k05': 'v05'}}}
v2 = {'k11': 'v11', 'k12': {'k13': 'v13', 'k14': {'k15': 'v15'}}}
data = []
N = 5
for i in range(N):
data.append(copy.deepcopy(v1))
data.append(copy.deepcopy(v2))
print data
How would you create a set of dictionaries from the list data?
NS: One dictionary is equal to another when they are structurally the same. That means, they got exactly the same keys and same values (recursively)
A cheap workaround would be to serialize your dicts, for example:
import json
dset = set()
d1 = {'a':1, 'b':{'c':2}}
d2 = {'b':{'c':2}, 'a':1} # the same according to your definition
d3 = {'x': 42}
dset.add(json.dumps(d1, sort_keys=True))
dset.add(json.dumps(d2, sort_keys=True))
dset.add(json.dumps(d3, sort_keys=True))
for p in dset:
print json.loads(p)
In the long run it would make sense to wrap the whole thing in a class like SetOfDicts.
Dictionaries are mutable and therefore not hashable in python.
You could either create a dict-subclass with a __hash__ method. Make sure that the hash of a dictionary does not change while it is in the set (that probably means that you cannot allow modifying the members).
See http://code.activestate.com/recipes/414283-frozen-dictionaries/ for an example implementation of frozendicts.
If you can define a sort order on your (frozen) dictionaries, you could alternatively use a data structure based on a binary tree instead of a set. This boils down to the bisect solution provided in the link below.
See also https://stackoverflow.com/a/18824158/5069869 for an explanation why sets without hash do not make sense.
not exactly what you're looking for as this accounts for lists too but:
def hashable_structure(structure):
if isinstance(structure, dict):
return {k: hashable_structure(v) for k, v in structure.items()}
elif isinstance(structure, list):
return {hashable_structure(elem) for elem in structure)}
else:
return structure
Related
I am working on a code which pulls data from database and based on the different type of tables , store the data in dictionary for further usage.
This code handles around 20-30 different table so there are 20-30 dictionaries and few lists which I have defined as class variables for further usage in code.
for example.
class ImplVars(object):
#dictionary capturing data from Asset-Feed table
general_feed_dict = {}
ports_feed_dict = {}
vulns_feed_dict = {}
app_list = []
...
I want to clear these dictionaries before I add data in it.
Easiest or common way is to use clear() function but this code is repeatable as I will have to write for each dict.
Another option I am exploring is with using dir() function but its returning variable names as string.
Is there any elegant method which will allow me to fetch all these class variables and clear them ?
You can use introspection as you suggest:
for d in filter(dict.__instancecheck__, ImplVars.__dict__.values()):
d.clear()
Or less cryptic, covering lists and dicts:
for obj in ImplVars.__dict__.values():
if isinstance(obj, (list, dict)):
obj.clear()
But I would recommend you choose a bit of a different data structure so you can be more explicit:
class ImplVars(object):
data_dicts = {
"general_feed_dict": {},
"ports_feed_dict": {},
"vulns_feed_dict": {},
}
Now you can explicitly loop over ImplVars.data_dicts.values and still have other class variables that you may not want to clear.
code:
a_dict = {1:2}
b_dict = {2:4}
c_list = [3,6]
vars_copy = vars().copy()
for variable, value in vars_copy.items():
if variable.endswith("_dict"):
vars()[variable] = {}
elif variable.endswith("_list"):
vars()[variable] = []
print(a_dict)
print(b_dict)
print(c_list)
result:
{}
{}
[]
Maybe one of the easier kinds of implementation would be to create a list of dictionaries and lists you want to clear and later make the loop clear them all.
d = [general_feed_dict, ports_feed_dict, vulns_feed_dict, app_list]
for element in d:
element.clear()
You could also use list comprehension for that.
I am trying to create a multi-level dict with variable depth and with list and int type.
Data structure is like below
A
--B1
-----C1=1
-----C2=[1]
--B2=[3]
D
--E
----F
------G=4
In the case of above data structure, the last value can be an int or list.
If the above data structure has the only int then I can be easily achieved by using the below code:
from collections import defaultdict
f = lambda: defaultdict(f)
d = f()
d['A']['B1']['C1'] = 1
But as the last value has both list and int, it becomes a bit problematic for me.
Now we can insert data in a list using two ways.
d['A']['B1']['C2']= [1]
d['A']['B1']['C2'].append([2])
But when I am using only the append method it is causing the error.
Error is:
AttributeError: 'collections.defaultdict' object has no attribute 'append'
so Is there any way to use only the append method for a list?
There's no way you can use your current defaultdict-based structure to make d['A']['B1']['C2'].append(1) work properly if the 'C2' key doesn't already exist, since the data structure can't tell that the unknown key should correspond to a list rather than another layer of dictionary. It doesn't know what method you're going to call on the value it returns, so it can't know it shouldn't return a dictionary (like it did when it first looked up 'A' and 'B').
This isn't an issue for bare integers, since for those you're as assigning directly to a new key (and all the earlier levels are dictionaries). When you're assigning, the data structure isn't creating the value, you are, so you can use any type you want.
Now, if your keys are distinctive in some way, so that given a key like 'C2' you can know for sure that it should correspond to a list, you may have a chance. You can write your own dict subclass, defining a __missing__ method to handle lookups of keys that don't exist yet in your own special way:
def Tree(dict):
def __missing__(self, key):
if key_corresponds_to_list(key): # magic from somewhere
result = self[key] = []
else:
result = self[key] = Tree()
return result
# you might also want a custom __repr__
Here's an example run with a magic key function that makes any even-length key default to a list, while an odd-length key defaults to a dict:
> def key_corresponds_to_list(key):
return len(key) % 2 == 0
> t = Tree()
> t["A"]["B"]["C2"].append(1) # the default value for C2 is a list because it's even length
> t
{'A': {'B': {'C2': [1]}}}
> t["A"]["B"]["C10"]["D"] = 2 # C10's another layer of dict, since it's length is odd
> t
{'A': {'B': {'C10': {'D': 2}, 'C2': [1]}}} # it didn't matter what length D was though
You probably won't actually want to use a global function to control the class like this, I just did that as an example. If you go with this approach, I'd suggest putting the logic directly into the __missing__ method (or maybe passing a function as a parameter, like defaultdict does with its factory function).
I am very new to python, and I have the following problem. I came up with the following solution. I am wondering whether it is "pythonic" or not. If not, what would be the best solution ?
The problem is :
I have a list of dict
each dict has at least three items
I want to find the position in the list of the dict with specific three values
This is my python example
import collections
import random
# lets build the list, for the example
dicts = []
dicts.append({'idName':'NA','idGroup':'GA','idFamily':'FA'})
dicts.append({'idName':'NA','idGroup':'GA','idFamily':'FB'})
dicts.append({'idName':'NA','idGroup':'GB','idFamily':'FA'})
dicts.append({'idName':'NA','idGroup':'GB','idFamily':'FB'})
dicts.append({'idName':'NB','idGroup':'GA','idFamily':'FA'})
dicts.append({'idName':'NB','idGroup':'GA','idFamily':'FB'})
dicts.append({'idName':'NB','idGroup':'GB','idFamily':'FA'})
dicts.append({'idName':'NB','idGroup':'GB','idFamily':'FB'})
# let's shuffle it, again for example
random.shuffle(dicts)
# now I want to have for each combination the index
# I use a recursive defaultdict definition
# because it permits creating a dict of dict
# even if it is not initialized
def tree(): return collections.defaultdict(tree)
# initiate mapping
mapping = tree()
# fill the mapping
for i,d in enumerate(dicts):
idFamily = d['idFamily']
idGroup = d['idGroup']
idName = d['idName']
mapping[idName][idGroup][idFamily] = i
# I end up with the mapping providing me with the index within
# list of dicts
Looks reasonable to me, but perhaps a little too much. You could instead do:
mapping = {
(d['idName'], d['idGroup'], d['idFamily']) : i
for i, d in enumerate(dicts)
}
Then access it with mapping['NA', 'GA', 'FA'] instead of mapping['NA']['GA']['FA']. But it really depends how you're planning to use the mapping. If you need to be able to take mapping['NA'] and use it as a dictionary then what you have is fine.
For hashable objects inside a dict I could easily pair down duplicate values store in a dict using a set. For example:
a = {'test': 1, 'key': 1, 'other': 2}
b = set(a.values())
print(b)
Would display [1,2]
Problem I have is I am using a dict to store mapping between variable keys in __dict__ and the corresponding processing functions that will be passed to an engine to order and process those functions, some of these functions may be fast some may be slower due to accessing an API. The problem is each function may use multiple variable, therefor need multiple mappings in the dict. I'm wondering if there is a way to do this or if I am stuck writing my own solution?
Ended up building a callable class, since caching could speed things up for me:
from collections.abc import Callable
class RemoveDuplicates(Callable):
input_cache = []
output_cache = []
def __call__(self, in_list):
if list in self.input_cache:
idx = self.input_cache.index(in_list)
return self.output_cache[idx]
else:
self.input_cache.append(in_list)
out_list = self._remove_duplicates(in_list)
self.output_cache.append(out_list)
return out_list
def _remove_duplicates(self, src_list):
result = []
for item in src_list:
if item not in result:
result.append(item)
return result
If the objects can be ordered, you can use itertools.groupby to eliminate the duplicates:
>>> a = {'test': 1, 'key': 1, 'other': 2}
>>> b = [k for k, it in itertools.groupby(sorted(a.values()))]
>>> print(b)
[1, 2]
Is there something simple like a set for un-hashable objects
Not in the standard library but you need to look beyond and search for BTree implementation of dictionary. I googled and found few hits where the first one (BTree)seems promising and interesting
Quoting from the wiki
The BTree-based data structures differ from Python dicts in several
fundamental ways. One of the most important is that while dicts
require that keys support hash codes and equality comparison, the
BTree-based structures don’t use hash codes and require a total
ordering on keys.
Off-course its trivial fact that a set can be implemented as a dictionary where the value is unused.
You could (indirectly) use the bisect module to create sorted collection of your values which would greatly speed-up the insertion of new values and value membership testing in general — which together can be utilized to unsure that only unique values get put into it.
In the code below, I've used un-hashable set values for the sake of illustration.
# see http://code.activestate.com/recipes/577197-sortedcollection
from sortedcollection import SortedCollection
a = {'test': {1}, 'key': {1}, 'other': {2}}
sc = SortedCollection()
for value in a.values():
if value not in sc:
sc.insert(value)
print(list(sc)) # --> [{1}, {2}]
I was just wondering if there is a simple way to do this. I have a particular structure that is parsed from a file and the output is a list of a dict of a list of a dict. Currently, I just have a bit of code that looks something like this:
for i in xrange(len(data)):
for j, k in data[i].iteritems():
for l in xrange(len(data[i]['data'])):
for m, n in data[i]['data'][l].iteritems():
dostuff()
I just wanted to know if there was a function that would traverse a structure and internally figure out whether each entry was a list or a dict and if it is a dict, traverse into that dict and so on. I've only been using Python for about a month or so, so I am by no means an expert or even an intermediate user of the language. Thanks in advance for the answers.
EDIT: Even if it's possible to simplify my code at all, it would help.
You never need to iterate through xrange(len(data)). You iterate either through data (for a list) or data.items() (or values()) (for a dict).
Your code should look like this:
for elem in data:
for val in elem.itervalues():
for item in val['data']:
which is quite a bit shorter.
Will, if you're looking to decend an arbitrary structure of array/hash thingies then you can create a function to do that based on the type() function.
def traverse_it(it):
if (isinstance(it, list)):
for item in it:
traverse_it(item)
elif (isinstance(it, dict)):
for key in it.keys():
traverse_it(it[key])
else:
do_something_with_real_value(it)
Note that the average object oriented guru will tell you not to do this, and instead create a class tree where one is based on an array, another on a dict and then have a single function to process each with the same function name (ie, a virtual function) and to call that within each class function. IE, if/else trees based on types are "bad". Functions that can be called on an object to deal with its contents in its own way "good".
I think this is what you're trying to do. There is no need to use xrange() to pull out the index from the list since for iterates over each value of the list. In my example below d1 is therefore a reference to the current data[i].
for d1 in data: # iterate over outer list, d1 is a dictionary
for x in d1: # iterate over keys in d1 (the x var is unused)
for d2 in d1['data']: # iterate over the list
# iterate over (key,value) pairs in inner most dict
for k,v in d2.iteritems():
dostuff()
You're also using the name l twice (intentionally or not), but beware of how the scoping works.
well, question is quite old. however, out of my curiosity, I would like to respond to your question for much better answer which I just tried.
Suppose, dictionary looks like: dict1 = { 'a':5,'b': [1,2,{'a':100,'b':100}], 'dict 2' : {'a':3,'b':5}}
Solution:
dict1 = { 'a':5,'b': [1,2,{'a':100,'b':100}], 'dict 2' : {'a':3,'b':5}}
def recurse(dict):
if type(dict) == type({}):
for key in dict:
recurse(dict[key])
elif type(dict) == type([]):
for element in dict:
if type(element) == type({}):
recurse(element)
else:
print element
else:
print dict
recurse(dict1)