Increment items in nested python dict - python

I have a nested Python Dict and I am trying to take values from a list and then iterate them into a Dict's values as such:
for row in rows:
Dict[A][AA][AAA] += 1
However, when I print my dict, it appears to be adding all of the increments to all of the Dict entries. By which I mean that instead of this:
{KeyA:{KeyAA:{KeyAAA:5}}}
{KeyB:{KeyBB:{KeyBBB:10}}}
I am getting this:
{KeyA:{KeyAA:{KeyAAA:15}}}
{KeyB:{KeyBB:{KeyBBB:15}}}
I'm a bit stumped.
EDIT:
This is how the Dicts were created:
I first skim through a long table that contains a type classification. While I'm doing that, I create a new entry into the main Dict. At the same time, I'm collecting all of the unique classifications into a subDict so that I can add this to the main Dict later on:
Dict = {}
subDict = {}
for row in skimRows:
Dict[row[0]] = {"Type":row[1],"Assoc":{}} # Save each ID and origin Type to Dict
if item not in subDict: # Check to see if unique item already exists in subDict
subDict[item] = 0
Here is evidently where I was going wrong. I was then taking the subDict and plunking this into the main Dict, not realising the inserted subDict was retaining its relationship to the original subDict object:
for key in Dict: # After initial iteration and Type collection, add new subDict to each Dict key
Dict[key]["Assoc"] = subDict
SOLUTION:
Per the correct answer below, I fixed it by adding .copy()
for key in Dict: # After initial iteration and Type collection, add new subDict to each Dict key
Dict[key]["Assoc"] = subDict.copy()

Your innermost dictionaries are shared, not unique objects:
>>> somedict = {}
>>> somedict['foo'] = {'bar': 0}
>>> somedict['spam'] = somedict['foo']
>>> somedict['foo']['bar'] += 1
>>> somedict['spam']
{'bar': 1}
>>> somedict['foo'] is somedict['spam']
True
The two keys foo and spam both are referring to the same object here, one dictionary object holding a key bar.
You should not reuse your dictionaries like this. Either create a new empty dictiorary:
somedict['spam'] = {'bar': 0}
or create a (shallow) copy:
somedict['spam'] = somedict['foo'].copy()

Related

How to index a list of dictionaries in python?

If I have a list of dictionaries in a python script, that I intend to later on dump in a JSON file as an array of objects, how can I index the keys of a specific dictionary within the list?
Example :
dict_list = [{"first_dict": "some_value"}, {"second_dict":"some_value"}, {"third_dict": "[element1,element2,element3]"}]
My intuitive solution was dict_list[-1][0] (to access the first key of the last dictionary in the list for example). This however gave me the following error:
IndexError: list index out of range
the key inputted into the dictionary will pick the some value in the format dict = {0:some_value}
to find a specific value:
list_dictionary = [{"dict1":'value1'},{"dict2","value2"}]
value1 = list_dictionary[0]["dict1"]
the 'key' is what you have to use to find a value from a dictionary
Example:
dictionary = {0:value}
dictionary[0]
in this case it will work
but to pick the elements we will do
values = []
for dictionary in dict_list:
for element in dictionary:
values.append(dictionary[element])
Output:
['some_value', 'some_value', ['element1', 'element2', 'element3']]
dict_list = [{"first_dict": "some_value"}, {"second_dict":"some_value"}, {"third_dict": ['element1','element2','element3']}]
If your dict look like this you can do as well
dict_list[-1]["third_dict"]
You can't access 'the first key' with a int since you have a dict
You can get the first key with .keys() and then
dict_list[-1].keys()[0]
By using dict_list[-1][0], you are trying to access a list with a list, which you do not have. You have a list with a dict key within a list.
Taking your example dict_list[-1][0]:
When you mention dict_list you are already "in the list".
The first index [-1] is referring to the last item of the list.
The second index would only be "usable" if the item mentioned in the previous index were a list. Hence the error.
Using:
dict_list=[{"first_dict": "some_value"}, {"second_dict":"some_value"},{"third_dict": [0,1,2]}]
to access the value of third_dict you need:
for value in list(dict_list[-1].values())[0]:
print(value)
Output:
0
1
2
If you know the order of dictionary keys and you are using one of the latest python versions (key stays in same order), so:
dict_list = [
{"first_dict": "some_value"}
, {"second_dict":"some_value"}
, {"third_dict": ["element1", "element2", "element3"]}
]
first_key = next(iter(dict_list[-1].keys()))
### OR: value
first_value = next(iter(dict_list[-1].values()))
### OR: both key and value
first_key, first_value = next(iter(dict_list[-1].items()))
print(first_key)
print(first_key, first_value)
print(first_value)
If you have the following list of dictionaries:
dict_list = [{"key1":"val1", "key2":"val2"}, {"key10":"val10"}]
Then to access the last dictionary you'd indeed use dict_list[-1] but this returns a dictionary with is indexed using its keys and not numbers: dict_list[0]["key1"]
To only use numbers, you'd need to get a list of the keys first: list(dict_list[-1]). The first element of this list list(dict_list[-1])[0] would then be the first key "key10"
You can then use indices to access the first key of the last dictionary:
dict_index = -1
key_index = 0
d = dict_list[dict_index]
keys = list(d)
val = d[keys[key_index]]
However you'd be using the dictionary as a list, so maybe a list of lists would be better suited than a list of dictionaries.

Immutability of dictionary keys in python

dic={}
dic[1]=100
dic[2]=200
dic[1]+=500
here I have initialed a dictionary and I am able to update the key value of the dictionary. But keys in dictionary are immutable, so what's actually happening , can someone please tell?
Just think of it this way. We have an empty dictionary:
d = {}
If we do this:
d[1] = 100
we are simply adding a key and assigning a value to that key, right then and there.
Just like sets, dicts cannot have duplicate keys, so adding another key with the same name will overwrite the original.
Like doing calling d[1] = 200 will overwrite the original d[1].
d[1] += 500 is the same as:
d[1] = d[1]+500
where we are simply telling python to add a key to d called 1, and assign the value of the original key plus 500 to that key.

clearing a dictionary but keeping the keys

Is it possible to clear all the entries within a dictionary but keep all the keys?
For example if I had:
my_dic={
"colour":[],
"number":[]
}
I put some stuff in them:
my_dic["colour"]='Red'
my_dic["number"]='2'
I can clear these by:
my_dic["colour"] = []
my_dic["number"] = []
But this is long winded if I want to clear a large dictionary quickly, is there a quicker way perhaps using for? I want to keep the keys ["colour"], ["number"], without having to recreate them, just clear all the entries within them.
You can simply clear all lists in a loop:
for value in my_dic.values():
del value[:]
Note the value[:] slice deletion; we are removing all indices in the list, not the value reference itself.
Note that if you are using Python 2 you probably want to use my_dic.itervalues() instead of my_dic.values() to avoid creating a new list object for the loop.
Demo:
>>> my_dic = {'colour': ['foo', 'bar'], 'number': [42, 81]}
>>> for value in my_dic.values():
... del value[:]
...
>>> my_dic
{'colour': [], 'number': []}
You could also replace all values with new empty lists:
my_dic.update((key, []) for key in my_dic)
or replace the whole dictionary entirely:
my_dic = {key: [] for key in my_dic}
Take into account these two approaches will not update other references to either the lists (first approach) or the whole dictionary (second approach).
You no need to delete keys from dictionary:
for key in my_dict:
my_dict[key] = []
One liner:
my_dict = dict.fromkeys(my_dict, None)
You can also replace the None type with other values that are immutable. A mutable type such as a list will cause all of the values in your new dictionary to be the same list.
For mutable types you would have to populate the dictionary with distinct instances of that type as others have shown.

Checking items in a list of dictionaries in python

I have a list of dictionaries=
a = [{"ID":1, "VALUE":2},{"ID":2, "VALUE":2},{"ID":3, "VALUE":4},...]
"ID" is a unique identifier for each dictionary. Considering the list is huge, what is the fastest way of checking if a dictionary with a certain "ID" is in the list, and if not append to it? And then update its "VALUE" ("VALUE" will be updated if the dict is already in list, otherwise a certain value will be written)
You'd not use a list. Use a dictionary instead, mapping ids to nested dictionaries:
a = {
1: {'VALUE': 2, 'foo': 'bar'},
42: {'VALUE': 45, 'spam': 'eggs'},
}
Note that you don't need to include the ID key in the nested dictionary; doing so would be redundant.
Now you can simply look up if a key exists:
if someid in a:
a[someid]['VALUE'] = newvalue
I did make the assumption that your ID keys are not necessarily sequential numbers. I also made the assumption you need to store other information besides VALUE; otherwise just a flat dictionary mapping ID to VALUE values would suffice.
A dictionary lets you look up values by key in O(1) time (constant time independent of the size of the dictionary). Lists let you look up elements in constant time too, but only if you know the index.
If you don't and have to scan through the list, you have a O(N) operation, where N is the number of elements. You need to look at each and every dictionary in your list to see if it matches ID, and if ID is not present, that means you have to search from start to finish. A dictionary will still tell you in O(1) time that the key is not there.
If you can, convert to a dictionary as the other answers suggest, but in case you you have reason* to not change the data structure storing your items, here's what you can do:
items = [{"ID":1, "VALUE":2}, {"ID":2, "VALUE":2}, {"ID":3, "VALUE":4}]
def set_value_by_id(id, value):
# Try to find the item, if it exists
for item in items:
if item["ID"] == id:
break
# Make and append the item if it doesn't exist
else: # Here, `else` means "if the loop terminated not via break"
item = {"ID": id}
items.append(id)
# In either case, set the value
item["VALUE"] = value
* Some valid reasons I can think of include preserving the order of items and allowing duplicate items with the same id. For ways to make dictionaries work with those requirements, you might want to take a look at OrderedDict and this answer about duplicate keys.
Convert your list into a dict and then checking for values is much more efficient.
d = dict((item['ID'], item['VALUE']) for item in a)
for new_key, new_value in new_items:
if new_key not in d:
d[new_key] = new_value
Also need to update on key found:
d = dict((item['ID'], item['VALUE']) for item in a)
for new_key, new_value in new_items:
d.setdefault(new_key, 0)
d[new_key] = new_value
Answering the question you asked, without changing the datastructure around, there's no real faster way of looking without a loop and checking every element and doing a dictionary lookup for each one - but you can push the loop down to the Python runtime instead of using Python's for loop.
I haven't tried if it ends up faster though.
a = [{"ID":1, "VALUE":2},{"ID":2, "VALUE":2},{"ID":3, "VALUE":4}]
id = 2
tmp = filter(lambda d: d['ID']==id, a)
# the filter will either return an empty list, or a list of one item.
if not tmp:
tmp = {"ID":id, "VALUE":"default"}
a.append(tmp)
else:
tmp = tmp[0]
# tmp is bound to the found/new dictionary

Multiple keys per value

Is it possible to assign multiple keys per value in a Python dictionary. One possible solution is to assign value to each key:
dict = {'k1':'v1', 'k2':'v1', 'k3':'v1', 'k4':'v2'}
but this is not memory efficient since my data file is > 2 GB. Otherwise you could make a dictionary of dictionary keys:
key_dic = {'k1':'k1', 'k2':'k1', 'k3':'k1', 'k4':'k4'}
dict = {'k1':'v1', 'k4':'v2'}
main_key = key_dict['k2']
value = dict[main_key]
This is also very time and effort consuming because I have to go through whole dictionary/file twice. Is there any other easy and inbuilt Python solution?
Note: my dictionary values are not simple string (as in the question 'v1', 'v2') rather complex objects (contains different other dictionary/list etc. and not possible to pickle them)
Note: the question seems similar as How can I use both a key and an index for the same dictionary value?
But I am not looking for ordered/indexed dictionary and I am looking for other efficient solutions (if any) other then the two mentioned in this question.
What type are the values?
dict = {'k1':MyClass(1), 'k2':MyClass(1)}
will give duplicate value objects, but
v1 = MyClass(1)
dict = {'k1':v1, 'k2':v1}
results in both keys referring to the same actual object.
In the original question, your values are strings: even though you're declaring the same string twice, I think they'll be interned to the same object in that case
NB. if you're not sure whether you've ended up with duplicates, you can find out like so:
if dict['k1'] is dict['k2']:
print("good: k1 and k2 refer to the same instance")
else:
print("bad: k1 and k2 refer to different instances")
(is check thanks to J.F.Sebastian, replacing id())
Check out this - it's an implementation of exactly what you're asking: multi_key_dict(ionary)
https://pypi.python.org/pypi/multi_key_dict
(sources at https://github.com/formiaczek/python_data_structures/tree/master/multi_key_dict)
(on Unix platforms it possibly comes as a package and you can try to install it with something like:
sudo apt-get install python-multi-key-dict
for Debian, or an equivalent for your distribution)
You can use different types for keys but also keys of the same type. Also you can iterate over items using key types of your choice, e.g.:
m = multi_key_dict()
m['aa', 12] = 12
m['bb', 1] = 'cc and 1'
m['cc', 13] = 'something else'
print m['aa'] # will print '12'
print m[12] # will also print '12'
# but also:
for key, value in m.iteritems(int):
print key, ':', value
# will print:1
# 1 : cc and 1
# 12 : 12
# 13 : something else
# and iterating by string keys:
for key, value in m.iteritems(str):
print key, ':', value
# will print:
# aa : 12
# cc : something else
# bb : cc and 1
m[12] = 20 # now update the value
print m[12] # will print '20' (updated value)
print m['aa'] # will also print '20' (it maps to the same element)
There is no limit to number of keys, so code like:
m['a', 3, 5, 'bb', 33] = 'something'
is valid, and either of keys can be used to refer to so-created value (either to read / write or delete it).
Edit: From version 2.0 it should also work with python3.
Using python 2.7/3 you can combine a tuple, value pair with dictionary comprehension.
keys_values = ( (('k1','k2'), 0), (('k3','k4','k5'), 1) )
d = { key : value for keys, value in keys_values for key in keys }
You can also update the dictionary similarly.
keys_values = ( (('k1',), int), (('k3','k4','k6'), int) )
d.update({ key : value for keys, value in keys_values for key in keys })
I don't think this really gets to the heart of your question but in light of the title, I think this belongs here.
The most straightforward way to do this is to construct your dictionary using the dict.fromkeys() method. It takes a sequence of keys and a value as inputs and then assigns the value to each key.
Your code would be:
dict = dict.fromkeys(['k1', 'k2', 'k3'], 'v1')
dict.update(dict.fromkeys(['k4'], 'v2'))
And the output is:
print(dict)
{'k1': 'v1', 'k2': 'v1', 'k3': 'v1', 'k4': 'v2'}
You can build an auxiliary dictionary of objects that were already created from the parsed data. The key would be the parsed data, the value would be your constructed object -- say the string value should be converted to some specific object. This way you can control when to construct the new object:
existing = {} # auxiliary dictionary for making the duplicates shared
result = {}
for k, v in parsed_data_generator():
obj = existing.setdefault(v, MyClass(v)) # could be made more efficient
result[k] = obj
Then all the result dictionary duplicate value objects will be represented by a single object of the MyClass class. After building the result, the existing auxiliary dictionary can be deleted.
Here the dict.setdefault() may be elegant and brief. But you should test later whether the more talkative solution is not more efficient -- see below. The reason is that MyClass(v) is always created (in the above example) and then thrown away if its duplicate exists:
existing = {} # auxiliary dictionary for making the duplicates shared
result = {}
for k, v in parsed_data_generator():
if v in existing:
obj = existing[v]
else:
obj = MyClass(v)
existing[v] = obj
result[k] = obj
This technique can be used also when v is not converted to anything special. For example, if v is a string, both key and value in the auxiliary dictionary will be of the same value. However, the existence of the dictionary ensures that the object will be shared (which is not always ensured by Python).
I was able to achieve similar functionality using pandas MultiIndex, although in my case the values are scalars:
>>> import numpy
>>> import pandas
>>> keys = [numpy.array(['a', 'b', 'c']), numpy.array([1, 2, 3])]
>>> df = pandas.DataFrame(['val1', 'val2', 'val3'], index=keys)
>>> df.index.names = ['str', 'int']
>>> df.xs('b', axis=0, level='str')
0
int
2 val2
>>> df.xs(3, axis=0, level='int')
0
str
c val3
I'm surprised no one has mentioned using Tuples with dictionaries. This works just fine:
my_dictionary = {}
my_dictionary[('k1', 'k2', 'k3')] = 'v1'
my_dictionary[('k4')] = 'v2'

Categories