Multiple keys per value - python

Is it possible to assign multiple keys per value in a Python dictionary. One possible solution is to assign value to each key:
dict = {'k1':'v1', 'k2':'v1', 'k3':'v1', 'k4':'v2'}
but this is not memory efficient since my data file is > 2 GB. Otherwise you could make a dictionary of dictionary keys:
key_dic = {'k1':'k1', 'k2':'k1', 'k3':'k1', 'k4':'k4'}
dict = {'k1':'v1', 'k4':'v2'}
main_key = key_dict['k2']
value = dict[main_key]
This is also very time and effort consuming because I have to go through whole dictionary/file twice. Is there any other easy and inbuilt Python solution?
Note: my dictionary values are not simple string (as in the question 'v1', 'v2') rather complex objects (contains different other dictionary/list etc. and not possible to pickle them)
Note: the question seems similar as How can I use both a key and an index for the same dictionary value?
But I am not looking for ordered/indexed dictionary and I am looking for other efficient solutions (if any) other then the two mentioned in this question.

What type are the values?
dict = {'k1':MyClass(1), 'k2':MyClass(1)}
will give duplicate value objects, but
v1 = MyClass(1)
dict = {'k1':v1, 'k2':v1}
results in both keys referring to the same actual object.
In the original question, your values are strings: even though you're declaring the same string twice, I think they'll be interned to the same object in that case
NB. if you're not sure whether you've ended up with duplicates, you can find out like so:
if dict['k1'] is dict['k2']:
print("good: k1 and k2 refer to the same instance")
else:
print("bad: k1 and k2 refer to different instances")
(is check thanks to J.F.Sebastian, replacing id())

Check out this - it's an implementation of exactly what you're asking: multi_key_dict(ionary)
https://pypi.python.org/pypi/multi_key_dict
(sources at https://github.com/formiaczek/python_data_structures/tree/master/multi_key_dict)
(on Unix platforms it possibly comes as a package and you can try to install it with something like:
sudo apt-get install python-multi-key-dict
for Debian, or an equivalent for your distribution)
You can use different types for keys but also keys of the same type. Also you can iterate over items using key types of your choice, e.g.:
m = multi_key_dict()
m['aa', 12] = 12
m['bb', 1] = 'cc and 1'
m['cc', 13] = 'something else'
print m['aa'] # will print '12'
print m[12] # will also print '12'
# but also:
for key, value in m.iteritems(int):
print key, ':', value
# will print:1
# 1 : cc and 1
# 12 : 12
# 13 : something else
# and iterating by string keys:
for key, value in m.iteritems(str):
print key, ':', value
# will print:
# aa : 12
# cc : something else
# bb : cc and 1
m[12] = 20 # now update the value
print m[12] # will print '20' (updated value)
print m['aa'] # will also print '20' (it maps to the same element)
There is no limit to number of keys, so code like:
m['a', 3, 5, 'bb', 33] = 'something'
is valid, and either of keys can be used to refer to so-created value (either to read / write or delete it).
Edit: From version 2.0 it should also work with python3.

Using python 2.7/3 you can combine a tuple, value pair with dictionary comprehension.
keys_values = ( (('k1','k2'), 0), (('k3','k4','k5'), 1) )
d = { key : value for keys, value in keys_values for key in keys }
You can also update the dictionary similarly.
keys_values = ( (('k1',), int), (('k3','k4','k6'), int) )
d.update({ key : value for keys, value in keys_values for key in keys })
I don't think this really gets to the heart of your question but in light of the title, I think this belongs here.

The most straightforward way to do this is to construct your dictionary using the dict.fromkeys() method. It takes a sequence of keys and a value as inputs and then assigns the value to each key.
Your code would be:
dict = dict.fromkeys(['k1', 'k2', 'k3'], 'v1')
dict.update(dict.fromkeys(['k4'], 'v2'))
And the output is:
print(dict)
{'k1': 'v1', 'k2': 'v1', 'k3': 'v1', 'k4': 'v2'}

You can build an auxiliary dictionary of objects that were already created from the parsed data. The key would be the parsed data, the value would be your constructed object -- say the string value should be converted to some specific object. This way you can control when to construct the new object:
existing = {} # auxiliary dictionary for making the duplicates shared
result = {}
for k, v in parsed_data_generator():
obj = existing.setdefault(v, MyClass(v)) # could be made more efficient
result[k] = obj
Then all the result dictionary duplicate value objects will be represented by a single object of the MyClass class. After building the result, the existing auxiliary dictionary can be deleted.
Here the dict.setdefault() may be elegant and brief. But you should test later whether the more talkative solution is not more efficient -- see below. The reason is that MyClass(v) is always created (in the above example) and then thrown away if its duplicate exists:
existing = {} # auxiliary dictionary for making the duplicates shared
result = {}
for k, v in parsed_data_generator():
if v in existing:
obj = existing[v]
else:
obj = MyClass(v)
existing[v] = obj
result[k] = obj
This technique can be used also when v is not converted to anything special. For example, if v is a string, both key and value in the auxiliary dictionary will be of the same value. However, the existence of the dictionary ensures that the object will be shared (which is not always ensured by Python).

I was able to achieve similar functionality using pandas MultiIndex, although in my case the values are scalars:
>>> import numpy
>>> import pandas
>>> keys = [numpy.array(['a', 'b', 'c']), numpy.array([1, 2, 3])]
>>> df = pandas.DataFrame(['val1', 'val2', 'val3'], index=keys)
>>> df.index.names = ['str', 'int']
>>> df.xs('b', axis=0, level='str')
0
int
2 val2
>>> df.xs(3, axis=0, level='int')
0
str
c val3

I'm surprised no one has mentioned using Tuples with dictionaries. This works just fine:
my_dictionary = {}
my_dictionary[('k1', 'k2', 'k3')] = 'v1'
my_dictionary[('k4')] = 'v2'

Related

Is it okay to nest a dict.get() inside another or is this bad design?

So, I am working on a code base where a dictionary contains some key information. At some point in the development process the name of one of the keys was changed, but the older key still exists in a lot of places. Lets call the keys new and old for reference.
In order to make it compatible with the older version, I am doing something like:
dict_name.get(new_key,dict_name.get(old_key,None))
Is this bad design or is it okay? Why/Why not?
Example for clarification: (Based on input by #Alexander)
There are two dictionaries d1 and d2.
d1={k1:v1,old_key:some_value}
d2={k1:v1,new_key:some_value}
The function which I am designing right now could get either d1 or d2 like dictionary as an argument. My function should be able to pick up some_value, regardless of whether old_key or new_key is present.
That is a reasonable approach. The only downside is that it will perform the get for both keys, which will not affect performance in most situations.
My only notes are nitpicks:
dict is a reserved word, so don't use it as a variable
None is the default, so it can be dropped for old_key, e.g.:
info.get('a', info.get('b'))
In response to "Is there a way to prevent the double call?": Yup, several reasonable ways exist =).
The one-liner would probably look like:
info['a'] if 'a' in info else info.get('b')
which starts to get difficult to read if your keys are longer.
A more verbose way would be to expand it out into full statements:
val = None
if 'a' in info:
val = info['a']
elif 'b' in info:
val = info['b']
And finally a generic option (default after *keys) will only work with python 3):
def multiget(info, *keys, default=None):
''' Try multiple keys in order, or default if not present '''
for k in keys:
if k in info:
return info[k]
return default
which would let you resolve multiple invocations cleanly, e.g.:
option_1 = multiget(info, 'a', 'b')
option_2 = multiget(info, 'x', 'y', 'z', default=10)
If this is somehow a pandemic of multiple api versions or something (?) you could even go so far as wrapping dict, though it is likely to be overkill:
>>> class MultiGetDict(dict):
... def multiget(self, *keys, default=None):
... for k in keys:
... if k in self:
... return self[k]
... return default
...
>>> d = MultiGetDict({1: 2})
>>> d.multiget(1)
2
>>> d.multiget(0, 1)
2
>>> d.multiget(0, 2)
>>> d.multiget(0, 2, default=3)
3
dict.get is there for exactly this reason, so you can fall back on default values if the keys are not in there.
Having a double fallback is very much OK. For example:
d = {}
result = d.get('new_key',d.get('old_key', None))
This would mean that result is None in the worse case, but there is no error (which is the goal of get in the first place.
In other words, it will get the value of new_key as a first priority, old_key as the second priority, and None as a third.
Also worth noting that get(key, None) is the same as get(key) so you might want to shorten that line:
result = d.get('new_key', d.get('old_key'))
If you want to avoid calling get multiple times (for example, if you have to do more than 2 of those, it will be unreadable) you can do something like this:
priority = ('new_key', 'old_key', 'older_key', 'oldest_key')
for key in priority:
result = d.get(key)
if result is not None:
break
And result becomes whatever is encountered first in that loop, or None otherwise
Based on the sample dictionary provided, I would argue that this is bad design...
Lets say your original dictionary is:
d1 = {'k1': 1, 'k2': 2}
If I understand you correctly, you then 'update' one of the keys, e.g.:
d1 = {'k3': 1, 'k2': 2}
If you try to access via:
d1.get('k3', d1.get('k1')) # 'k3' is new key, 'k1' is old key.
then the first lookup will always be present and the second lookup will never be used.
If you meant that the new dictionary would looks like:
d2 = {'k1': 1, 'k2': 2, 'k3': 1}
then you are storing the 'same' data in two different locations in your dictionary, which will surely lead to trouble (similar to normalized data in a database). For example, if the value of 'k3' was updated to 3, then the value of k1 would need to be updated as well.
Given the dictionaries provided in your example:
d1={k1: v1, old_key: some_value}
d2={k1: v1, new_key: some_value}
I assume that some_value are intended to be equal in both, i.e. d1[old_key] == d2[new_key]. If so, then you could use d2.get(new_key, d1.get(old_key). However, it just seems like a mess.
If some_value needs to be updated, for example, it must be updated in both dictionaries.
You are wasting memory by storing the some_value twice.
Your new_key in d2 may accidentally clobber an existing key in d1.
I would recommend not changing the key names in the first place.

Return key if variable in associated values array?

I have a technical dictionary that I am using to correct various spellings of technical terms.
How can I use this structure (or restructure the below code to work) in order to return the key for any alternate spelling?
For example, if someone has written "craniem" I wish to return "cranium". I've tried a number of different constructions, including the one below, and cannot quite get it to work.
def techDict():
myDict = {
'cranium' : ['cranum','crenium','creniam','craniem'],
'coccyx' : ['coscyx','cossyx','koccyx','kosicks'],
'1814A' : ['Aero1814','A1814','1814'],
'SodaAsh' : ['sodaash','na2co3', 'soda', 'washingsoda','sodacrystals']
}
return myDict
techDict = techDict()
correctedSpelling = next(val for key, val in techDict.iteritems() if val=='1814')
print(correctedSpelling)
Using in instead of = will do the trick
next(k for k, v in techDict.items() if 'craniem' in v)
Just reverse and flatten your dictionary:
tech_dict = {
'cranium': ['cranum', 'crenium', 'creniam', 'craniem'],
'coccyx': ['coscyx', 'cossyx', 'koccyx', 'kosicks'],
'1814A': ['Aero1814', 'A1814', '1814'],
'SodaAsh': ['sodaash', 'na2co3', 'soda', 'washingsoda', 'sodacrystals'],
}
lookup = {val: key for key, vals in tech_dict.items() for val in vals}
# ^ note dict.iteritems doesn't exist in 3.x
Then you can trivially get:
corrected_spelling = lookup['1814']
This is far more efficient than potentially scanning through every list for every key in the dictionary to find your search term.
Also note: 1. compliance with the official style guide; and 2. that I've removed the techDict function entirely - it was pointless to write a function just to create a dictionary, especially as you immediately shadowed the function with the dictionary it returned so you couldn't even call it again.

Initializing a dictionary in python with a key value and no corresponding values

I was wondering if there was a way to initialize a dictionary in python with keys but no corresponding values until I set them. Such as:
Definition = {'apple': , 'ball': }
and then later i can set them:
Definition[key] = something
I only want to initialize keys but I don't know the corresponding values until I have to set them later. Basically I know what keys I want to add the values as they are found. Thanks.
Use the fromkeys function to initialize a dictionary with any default value. In your case, you will initialize with None since you don't have a default value in mind.
empty_dict = dict.fromkeys(['apple','ball'])
this will initialize empty_dict as:
empty_dict = {'apple': None, 'ball': None}
As an alternative, if you wanted to initialize the dictionary with some default value other than None, you can do:
default_value = 'xyz'
nonempty_dict = dict.fromkeys(['apple','ball'],default_value)
You could initialize them to None.
you could use a defaultdict. It will let you set dictionary values without worrying if the key already exists. If you access a key that has not been initialized yet it will return a value you specify (in the below example it will return None)
from collections import defaultdict
your_dict = defaultdict(lambda : None)
It would be good to know what your purpose is, why you want to initialize the keys in the first place. I am not sure you need to do that at all.
1) If you want to count the number of occurrences of keys, you can just do:
Definition = {}
# ...
Definition[key] = Definition.get(key, 0) + 1
2) If you want to get None (or some other value) later for keys that you did not encounter, again you can just use the get() method:
Definition.get(key) # returns None if key not stored
Definition.get(key, default_other_than_none)
3) For all other purposes, you can just use a list of the expected keys, and check if the keys found later match those.
For example, if you only want to store values for those keys:
expected_keys = ['apple', 'banana']
# ...
if key_found in expected_keys:
Definition[key_found] = value
Or if you want to make sure all expected keys were found:
assert(all(key in Definition for key in expected_keys))
You can initialize the values as empty strings and fill them in later as they are found.
dictionary = {'one':'','two':''}
dictionary['one']=1
dictionary['two']=2
Comprehension could be also convenient in this case:
# from a list
keys = ["k1", "k2"]
d = {k:None for k in keys}
# or from another dict
d1 = {"k1" : 1, "k2" : 2}
d2 = {k:None for k in d1.keys()}
d2
# {'k1': None, 'k2': None}
q = input("Apple")
w = input("Ball")
Definition = {'apple': q, 'ball': w}
Based on the clarifying comment by #user2989027, I think a good solution is the following:
definition = ['apple', 'ball']
data = {'orange':1, 'pear':2, 'apple':3, 'ball':4}
my_data = {}
for k in definition:
try:
my_data[k]=data[k]
except KeyError:
pass
print my_data
I tried not to do anything fancy here. I setup my data and an empty dictionary. I then loop through a list of strings that represent potential keys in my data dictionary. I copy each value from data to my_data, but consider the case where data may not have the key that I want.

check dictionary for doubled keys [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to raise error if duplicates keys in dictionary
I was recently generating huge dictionaries with hundreds of thousands of keys (such that noticing a bug by looking at them wasn't feasible). They were syntactically correct, yet there was a bug somewhere. It boiled down to "duplicate keys":
{'a':1, ..., 'a':2}
this code compiles fine and I could not figure out why a key has value of 2 as I expected 1. The problem is obvious now.
The question is how I can prevent that in the future. I think this is impossible within python. I used
grep "'.*'[ ]*:" myfile.py | sort | uniq -c | grep -v 1
which is not bulletproof. Any other ideas (within python, this grep is just to illustrate what I'd tried)?
EDIT: I don't want duplicate keys, just need to spot that this occurs and edit data manually
A dict cannot contain double keys. So all you need to do is execute the code and then dump the repr() of the dict.
Another option is creating the dict items as (key, value) tuples. By storing them in a list you can easily create a dict from them and then check if the len()s of the dict/list differ.
If you need to have multiple values per key you can store the values in a list using defaultdict.
>>> from collections import defaultdict
>>> data_dict = defaultdict(list)
>>> data_dict['key'].append('value')
>>> data_dict
defaultdict(<type 'list'>, {'key': ['value']})
>>> data_dict['key'].append('second_value')
>>> data_dict
defaultdict(<type 'list'>, {'key': ['value', 'second_value']})
Are you generating a Python file containing a giant dictionary? Something like:
print "{"
for lines in file:
key, _, value = lines.partition(" ")
print " '%s': '%s',"
print "}"
If so, there's not much you can do to prevent this, as you cannot easily override the construction of the builtin dict.
Instead I'd suggest you validate the data while constructing the dictionary string. You could also generate different syntax:
dict(a = '1', a = '2')
..which will generate a SyntaxError if the key is duplicated. However, these are not exactly equivalent, as dictionary keys are a lot more flexible than keyword-args (e.g {123: '...'} is valid, butdict(123 = '...')` is an error)
You could generate a function call like:
uniq_dict([('a', '...'), ('a', '...')])
Then include the function definition:
def uniq_dict(values):
thedict = {}
for k, v in values:
if k in thedict:
raise ValueError("Duplicate key %s" % k)
thedict[k] = v
return thedict
You don't say or show exactly how you're generating the dictionary display you have where the duplicate keys are appearing. But that is where the problem lies.
Instead of using something like {'a':1, ..., 'a':2} to construct the dictionary, I suggest that you use this form: dict([['a', 1], ..., ['a', 2]]) which will create one from a supplied list of [key, value] pairs. This approach will allow you to check the list of pairs for duplicates before passing it to dict() to do the actual construction of the dictionary.
Here's an example of one way to check the list of pairs for duplicates:
sample = [['a', 1], ['b', 2], ['c', 3], ['a', 2]]
def validate(pairs):
# check for duplicate key names and raise an exception if any are found
dups = []
seen = set()
for key_name,val in pairs:
if key_name in seen:
dups.append(key_name)
else:
seen.add(key_name)
if dups:
raise ValueError('Duplicate key names encountered: %r' % sorted(dups))
else:
return pairs
my_dict = dict(validate(sample))

Map list of tuples into a dictionary

I've got a list of tuples extracted from a table in a DB which looks like (key , foreignkey , value). There is a many to one relationship between the key and foreignkeys and I'd like to convert it into a dict indexed by the foreignkey containing the sum of all values with that foreignkey, i.e. { foreignkey , sumof( value ) }. I wrote something that's rather verbose:
myDict = {}
for item in myTupleList:
if item[1] in myDict:
myDict [ item[1] ] += item[2]
else:
myDict [ item[1] ] = item[2]
but after seeing this question's answer or these two there's got to be a more concise way of expressing what I'd like to do. And if this is a repeat, I missed it and will remove the question if you can provide the link.
Assuming all your values are ints, you could use a defaultdict to make this easier:
from collections import defaultdict
myDict = defaultdict(int)
for item in myTupleList:
myDict[item[1]] += item[2]
defaultdict is like a dictionary, except if you try to get a key that isn't there it fills in the value returned by the callable - in this case, int, which returns 0 when called with no arguments.
UPDATE: Thanks to #gnibbler for reminding me, but tuples can be unpacked in a for loop:
from collections import defaultdict
myDict = defaultdict(int)
for _, key, val in myTupleList:
myDict[key] += val
Here, the 3-item tuple gets unpacked into the variables _, key, and val. _ is a common placeholder name in Python, used to indicate that the value isn't really important. Using this, we can avoid the hairy item[1] and item[2] indexing. We can't rely on this if the tuples in myTupleList aren't all the same size, but I bet they are.
(We also avoid the situation of someone looking at the code and thinking it's broken because the writer thought arrays were 1-indexed, which is what I thought when I first read the code. I wasn't alleviated of this until I read the question. In the above loop, however, it's obvious that myTupleList is a tuple of three elements, and we just don't need the first one.)
from collections import defaultdict
myDict = defaultdict(int)
for _, key, value in myTupleList:
myDict[key] += value
Here's my (tongue in cheek) answer:
myDict = reduce(lambda d, t: (d.__setitem__(t[1], d.get(t[1], 0) + t[2]), d)[1], myTupleList, {})
It is ugly and bad, but here is how it works.
The first argument to reduce (because it isn't clear there) is lambda d, t: (d.__setitem__(t[1], d.get(t[1], 0) + t[2]), d)[1]. I will talk about this later, but for now, I'll just call it joe (no offense to any people named Joe intended). The reduce function basically works like this:
joe(joe(joe({}, myTupleList[0]), myTupleList[1]), myTupleList[2])
And that's for a three element list. As you can see, it basically uses its first argument to sort of accumulate each result into the final answer. In this case, the final answer is the dictionary you wanted.
Now for joe itself. Here is joe as a def:
def joe(myDict, tupleItem):
myDict[tupleItem[1]] = myDict.get(tupleItem[1], 0) + tupleItem[2]
return myDict
Unfortunately, no form of = or return is allowed in a Python lambda so that has to be gotten around. I get around the lack of = by calling the dicts __setitem__ function directly. I get around the lack of return in by creating a tuple with the return value of __setitem__ and the dictionary and then return the tuple element containing the dictionary. I will slowly alter joe so you can see how I accomplished this.
First, remove the =:
def joe(myDict, tupleItem):
# Using __setitem__ to avoid using '='
myDict.__setitem__(tupleItem[1], myDict.get(tupleItem[1], 0) + tupleItem[2])
return myDict
Next, make the entire expression evaluate to the value we want to return:
def joe(myDict, tupleItem):
return (myDict.__setitem__(tupleItem[1], myDict.get(tupleItem[1], 0) + tupleItem[2]),
myDict)[1]
I have run across this use-case for reduce and dict many times in my Python programming. In my opinion, dict could use a member function reduceto(keyfunc, reduce_func, iterable, default_val=None). keyfunc would take the current value from the iterable and return the key. reduce_func would take the existing value in the dictionary and the value from the iterable and return the new value for the dictionary. default_val would be what was passed into reduce_func if the dictionary was missing a key. The return value should be the dictionary itself so you could do things like:
myDict = dict().reduceto(lambda t: t[1], lambda o, t: o + t, myTupleList, 0)
Maybe not exactly readable but it should work:
fks = dict([ (v[1], True) for v in myTupleList ]).keys()
myDict = dict([ (fk, sum([ v[2] for v in myTupleList if v[1] == fk ])) for fk in fks ])
The first line finds all unique foreign keys. The second line builds your dictionary by first constructing a list of (fk, sum(all values for this fk))-pairs and turning that into a dictionary.
Look at SQLAlchemy and see if that does all the mapping you need and perhaps more

Categories