Creating a "dictionary of sets" - python

I need to efficiently store data in something that would resemble a "dictionary of sets" e.g. have a dictionary with multiple (unique) values matching each unique key. The source of my data would be a (not very well) structured XML.
My idea is:
I will look through a number of elements and find keys. If the key does not exist, add it to dictionary, if it already exists, just add a new value in the corresponding key.
And the result would be something like:
{
'key1': {'1484', '1487', 1488', ...}
'key2': {'1485', '1486', '1489', ...}
'key3': {'1490', '1491', '1492', ...}
...
}
I need to add new keys on the go.
I need to push unique values into each set.
I need to be able to iterate through the whole dictionary.
I am not sure if this is even feasible, but if anybody could push me in the right direction, I would be more than thankful.

I'm not going to benchmark this but in my experience native dicts are faster
store = {}
for key, value in yoursource:
try:
store[key].add(value)
except KeyError:
store[key] = {value}

from collections import defaultdict
mydict = defaultdict(set)
mydict["key1"] |= {'1484', '1487', '1488'}
Iteration is just like the normal dict.

Using dict.setdefault() to create the key if it doesn't exist, and initialising it with an empty set:
store = {}
for key, value in yoursource:
store.setdefault(key, set()).add(value)

Related

How can I search for a specific key in a dictionary in Python?

So I got this dictionary from a csv file and I would like to look for a specific key inside this dictionary (actually the og idea was to search for said key in the csv file and then make a dictionary from that key down) but I don't really know how to do it.
So far I got:
df = pd.read_csv('data.csv')
dict = df.to_dict(orient='dict')
for index, line in enumerate(dict):
if "Wavelength [nm]" in line:
print(index)
The idea is to know the index of "Wavelength".
If you want the value of a key without knowing whether it's in the dict, often the most natural way is
value = dict.get( key, defaultvalue)
defaultvalue is what you would set value to in your code once you had established that the key is not present. Often, None, or an empty list or tuple.
If you just waht to check whether the key is present without accessing the value, use
if key in dict:
# do stuff
you can use:
if key in dict:
print(key,dict[key])

how to delete and add multiple items without iterating a dictionary in python 3.7x?

I wonder if a existing dictionary instance can add and/or delete multiple items without using iterations.
I mean something like this.
supposition:(it actually doesn't work)
D = {"key1":"value1", "key2":"value2", "key3":"value3"}
tags = ["key1","key2"]
D.pop(tags)
print(D)
{"key3":"value3"}
Thank you in advance.
If so, you could iterate a list instead of iterate the full dict:
D = {"key1":"value1", "key2":"value2", "key3":"value3"}
for i in ["key1", "key2"]:
D.pop(i)
print(D)
If you don't actually need to avoid iteration, but rather just want to do the transformation of the dictionary in an expression, rather than a statement, you could use a dictionary comprehension to create a new dictionary containing only the keys (and the associated values) that don't match your list of things to remove:
D = {key: value for key, value in D.items() if key not in tags}
Unfortunately, this doesn't modify D in place, so if you need to change the value referenced through some other variable this won't help you (and you'd need to do an explicit loop). Note that if you don't care about the values being removed, you probably should use del D[key] instead of D.pop(key).
If all you're wanting to do is show the dictionary where key from list is not present, why not just create a new dic:
D = {"key1":"value1", "key2":"value2", "key3":"value3"}
tags=["key1", "key2"]
dict = {key:value for key, value in D.items() if key not in tags}
print(dict)

Why does the default dictionary in my code keep expanding?

I have a default dictionary and I run it through a couple of loops to look for certain strings in the dictionary. The loops don't really append anything to the dictionary yet as it turns out, during the loop, new items keep getting appended to the dictionary and the final dictionary ends up bigger than the original one before the loop.
I've been trying to pinpoint the error forever but now it's late and I have no idea what's causing this!
from collections import defaultdict
dummydict = defaultdict(list)
dummydict['Alex'].append('Naomi and I love hotcakes')
dummydict['Benjamin'].append('Hayley and I hate hotcakes')
part = ['Alex', 'Benjamin', 'Hayley', 'Naomi']
emp = []
for var in dummydict:
if 'I' in dummydict[var]:
emp.append(var)
for car in part:
for key in range(len(dummydict)):
print('new len', len(dummydict))
print(key, dummydict)
if car in dummydict[key]:
emp.append(car)
print(emp)
print('why are there new values in the dictionary?!', len(dummydict), dummydict)
I expect the dictionary to remain unchanged.
if car in dummydict[key]:
key being an integer, and your dict being initially filled with only string as keys, this will create a new value in dummydict for each key.
Accessing missing keys as in dummydict[key] will add those keys to the defaultdict. Note that key is an int, not the value at that position, as for key in range(len(dummydict)) iterates indexes, not the dict or its keys.
See the docs:
When each key is encountered for the first time, it is not already in the mapping; so an entry is automatically created using the default_factory function which returns an empty list.
For example, this code will show a dummydict with a value in it, because simply accessing dummydict[key] will add the key to the dict if that key is not already there.
from collections import defaultdict
dummydict = defaultdict(list)
dummydict[1]
print (dummydict)
outputs:
defaultdict(<class 'list'>, {1: []})
Your issue is that in your loop, you do things like dummydict[key] and dummydict[var], which adds those keys.

Check if key exists in dictionary. If not, append it

I have a large python dict created from json data and am creating a smaller dict from the large one. Some elements of the large dictionary have a key called 'details' and some elements don't. What I want to do is check if the key exists in each entry in the large dictionary and if not, append the key 'details' with the value 'No details available' to the new dictionary. I am putting some sample code below just as a demonstration. The LargeDict is much larger with many keys in my code, but I'm keeping it simple for clarity.
LargeDict = {'results':
[{'name':'john','age':'23','datestart':'12/07/08','department':'Finance','details':'Good Employee'},
{'name':'barry','age':'26','datestart':'25/08/10','department':'HR','details':'Also does payroll'},
{'name':'sarah','age':'32','datestart':'13/05/05','department':'Sales','details':'Due for promotion'},
{'name':'lisa','age':'21','datestart':'02/05/12','department':'Finance'}]}
This is how I am getting the data for the SmallDict:
SmallDict = {d['name']:{'department':d['department'],'details':d['details']} for d in LargeDict['results']}
I get a key error however when one of the large dict entries has no details. Am I right in saying I need to use the DefaultDict module or is there an easier way?
You don't need a collections.defaultdict. You can use the setdefault method of dictionary objects.
d = {}
bar = d.setdefault('foo','bar') #returns 'bar'
print bar # bar
print d #{'foo': 'bar'}
As others have noted, if you don't want to add the key to the dictionary, you can use the get method.
here's an old reference that I often find myself looking at.
You could use collections.defaultdict if you want to create an entry in your dict automatically. However, if you don't, and just want "Not available" (or whatever), then you can just assign to the dict as d[key] = v and use d.get(k, 'Not available') for a default value
Use the get(key, defaultVar) method to supply a default value when the 'details' key is missing:
SmallDict = {d['name']:{'department':d['department'],'details':d.get('details','No details available')} for d in LargeDict['results']}

Check for a key pattern in a dictionary in python

dict1=({"EMP$$1":1,"EMP$$2":2,"EMP$$3":3})
How to check if EMP exists in the dictionary using python
dict1.get("EMP##") ??
It's not entirely clear what you want to do.
You can loop through the keys in the dict selecting keys using the startswith() method:
>>> for key in dict1:
... if key.startswith("EMP$$"):
... print "Found",key
...
Found EMP$$1
Found EMP$$2
Found EMP$$3
You can use a list comprehension to get all the values that match:
>>> [value for key,value in dict1.items() if key.startswith("EMP$$")]
[1, 2, 3]
If you just want to know if a key matches you could use the any() function:
>>> any(key.startswith("EMP$$") for key in dict1)
True
This approach strikes me as contrary to the intent of a dictionary.
A dictionary is made up of hash keys which have had values associated with them. The benefit of this structure is that it provides very fast lookups (on the order of O(1)). By searching through the keys, you're negating that benefit.
I would suggest reorganizing your dictionary.
dict1 = {"EMP$$": {"1": 1, "2": 2, "3": 3} }
Then, finding "EMP$$" is as simple as
if "EMP$$" in dict1:
#etc...
You need to be a lot more specific with what you want to do. However, assuming the dictionary you gave:
dict1={"EMP$$1":1, "EMP$$2":2, "EMP$$3":3}
If you wanted to know if a specific key was present before trying to request it you could:
dict1.has_key('EMP$$1')
True
Returns True as dict1 has the a key EMP$$1.
You could also forget about checking for keys and rely on the default return value of dict1.get():
dict1.get('EMP$$5',0)
0
Returns 0 as default given dict1 doesn't have a key EMP$$5.
In a similar way you could also use a `try/except/ structure to catch and handle missed keys:
try:
dict1['EMP$$5']
except KeyError, e:
# Code to deal w key error
print 'Trapped key error in dict1 looking for %s' % e
The other answers to this question are also great, but we need more info to be more precise.
There's no way to match dictionary keys like this. I suggest you rethink your data structure for this problem. If this has to be extra quick you could use something like a suffix tree.
You can use in string operator that checks if item is in another string. dict1 iterator returns list of keys, so you check "EMP$$" against of each dict1.key.
dict1 = {"EMP$$1": 1, "EMP$$2": 2, "EMP$$3": 3}
print(any("EMP$$" in i for i in dict1))
# True
# testing for item that doesn't exist
print(any("AMP$$" in i for i in dict1))
# False

Categories