Python dictionary - python

I am having some trouble understanding this, I have tried to reduce the problem to this set of code
for k in y.keys():
if k in dateDict.keys():
if yearDict[k] in dict1:
dict1[yearDict[k]].extend(y[k])
else:
dict1[yearDict[k]] = y[k]
if yearDict[k] in dict2:
dict2[yearDict[k]].extend(y[k])
else:
dict2[yearDict[k]] = y[k]
else:
continue
I have two dictionaries y and dateDict to begin with. For a matching key for y in dateDict, I am populating two other dictionaries dict1 and dict2, hashed with keys from some other dictionary yearDict. Unfortunately the result are duplicated in dict1 and dict2, I have values repeating themselves. Any idea what could be happening?
Also I notice that this code works as expected,
for k in y.keys():
if k in dateDict.keys():
if yearDict[k] in dict1:
dict1[yearDict[k]].extend(y[k])
else:
dict1[yearDict[k]] = y[k]
else:
continue

If y[k] is a list (which it looks like), the same list will be assigned to everywhere where is it used. Dictionaries do not make copies of the elements when they are assigned, they just keep references to their objects. In your example, both keys in dict1 and dict2 will point to the same object.
Later, when it is modified, the same elements will be appended with the new values, once for each map. To prevent this, you can create a new list when initially assigning:
dictl[yearDict[k]] = list(y[k])
However, it is always good to know the Python standard library. This code could be made much more readable, and without the error, by using collections.defaultdict:
from collections import defaultdict
# This goes wherever the dictionaries
# where initially defined.
dict1 = defaultdict(list)
dict2 = defaultdict(list)
# You can get the value here, no need to search it later.
for k, value in y.items():
if k in dateDict.keys():
# No need to call this everywhere.
new_key = yearDict[k]
# Note the defaultdict magic.
dict1[new_key].extend(value)
dict2[new_key].extend(value)
# No need for the 'continue' at the end either.
When asked for a key that does not exist yet, the defaultdict will create a new one on the fly -- so you don't have to care about initialization, or creating copies of you values.

Related

The following code is gives the output as i = 0, 1, 2, 3, 4 for some reason. Can anyone explain how this is happening? [duplicate]

Let's say we have a Python dictionary d, and we're iterating over it like so:
for k, v in d.iteritems():
del d[f(k)] # remove some item
d[g(k)] = v # add a new item
(f and g are just some black-box transformations.)
In other words, we try to add/remove items to d while iterating over it using iteritems.
Is this well defined? Could you provide some references to support your answer?
See also How to avoid "RuntimeError: dictionary changed size during iteration" error? for the separate question of how to avoid the problem.
Alex Martelli weighs in on this here.
It may not be safe to change the container (e.g. dict) while looping over the container.
So del d[f(k)] may not be safe. As you know, the workaround is to use d.copy().items() (to loop over an independent copy of the container) instead of d.iteritems() or d.items() (which use the same underlying container).
It is okay to modify the value at an existing index of the dict, but inserting values at new indices (e.g. d[g(k)] = v) may not work.
It is explicitly mentioned on the Python doc page (for Python 2.7) that
Using iteritems() while adding or deleting entries in the dictionary may raise a RuntimeError or fail to iterate over all entries.
Similarly for Python 3.
The same holds for iter(d), d.iterkeys() and d.itervalues(), and I'll go as far as saying that it does for for k, v in d.items(): (I can't remember exactly what for does, but I would not be surprised if the implementation called iter(d)).
You cannot do that, at least with d.iteritems(). I tried it, and Python fails with
RuntimeError: dictionary changed size during iteration
If you instead use d.items(), then it works.
In Python 3, d.items() is a view into the dictionary, like d.iteritems() in Python 2. To do this in Python 3, instead use d.copy().items(). This will similarly allow us to iterate over a copy of the dictionary in order to avoid modifying the data structure we are iterating over.
I have a large dictionary containing Numpy arrays, so the dict.copy().keys() thing suggested by #murgatroid99 was not feasible (though it worked). Instead, I just converted the keys_view to a list and it worked fine (in Python 3.4):
for item in list(dict_d.keys()):
temp = dict_d.pop(item)
dict_d['some_key'] = 1 # Some value
I realize this doesn't dive into the philosophical realm of Python's inner workings like the answers above, but it does provide a practical solution to the stated problem.
The following code shows that this is not well defined:
def f(x):
return x
def g(x):
return x+1
def h(x):
return x+10
try:
d = {1:"a", 2:"b", 3:"c"}
for k, v in d.iteritems():
del d[f(k)]
d[g(k)] = v+"x"
print d
except Exception as e:
print "Exception:", e
try:
d = {1:"a", 2:"b", 3:"c"}
for k, v in d.iteritems():
del d[f(k)]
d[h(k)] = v+"x"
print d
except Exception as e:
print "Exception:", e
The first example calls g(k), and throws an exception (dictionary changed size during iteration).
The second example calls h(k) and throws no exception, but outputs:
{21: 'axx', 22: 'bxx', 23: 'cxx'}
Which, looking at the code, seems wrong - I would have expected something like:
{11: 'ax', 12: 'bx', 13: 'cx'}
Python 3 you should just:
prefix = 'item_'
t = {'f1': 'ffw', 'f2': 'fca'}
t2 = dict()
for k,v in t.items():
t2[k] = prefix + v
or use:
t2 = t1.copy()
You should never modify original dictionary, it leads to confusion as well as potential bugs or RunTimeErrors. Unless you just append to the dictionary with new key names.
This question asks about using an iterator (and funny enough, that Python 2 .iteritems iterator is no longer supported in Python 3) to delete or add items, and it must have a No as its only right answer as you can find it in the accepted answer. Yet: most of the searchers try to find a solution, they will not care how this is done technically, be it an iterator or a recursion, and there is a solution for the problem:
You cannot loop-change a dict without using an additional (recursive) function.
This question should therefore be linked to a question that has a working solution:
How can I remove a key:value pair wherever the chosen key occurs in a deeply nested dictionary? (= "delete")
Also helpful as it shows how to change the items of a dict on the run: How can I replace a key:value pair by its value wherever the chosen key occurs in a deeply nested dictionary? (= "replace").
By the same recursive methods, you will also able to add items as the question asks for as well.
Since my request to link this question was declined, here is a copy of the solution that can delete items from a dict. See How can I remove a key:value pair wherever the chosen key occurs in a deeply nested dictionary? (= "delete") for examples / credits / notes.
import copy
def find_remove(this_dict, target_key, bln_overwrite_dict=False):
if not bln_overwrite_dict:
this_dict = copy.deepcopy(this_dict)
for key in this_dict:
# if the current value is a dict, dive into it
if isinstance(this_dict[key], dict):
if target_key in this_dict[key]:
this_dict[key].pop(target_key)
this_dict[key] = find_remove(this_dict[key], target_key)
return this_dict
dict_nested_new = find_remove(nested_dict, "sub_key2a")
The trick
The trick is to find out in advance whether a target_key is among the next children (= this_dict[key] = the values of the current dict iteration) before you reach the child level recursively. Only then you can still delete a key:value pair of the child level while iterating over a dictionary. Once you have reached the same level as the key to be deleted and then try to delete it from there, you would get the error:
RuntimeError: dictionary changed size during iteration
The recursive solution makes any change only on the next values' sub-level and therefore avoids the error.
I got the same problem and I used following procedure to solve this issue.
Python List can be iterate even if you modify during iterating over it.
so for following code it will print 1's infinitely.
for i in list:
list.append(1)
print 1
So using list and dict collaboratively you can solve this problem.
d_list=[]
d_dict = {}
for k in d_list:
if d_dict[k] is not -1:
d_dict[f(k)] = -1 # rather than deleting it mark it with -1 or other value to specify that it will be not considered further(deleted)
d_dict[g(k)] = v # add a new item
d_list.append(g(k))
Today I had a similar use-case, but instead of simply materializing the keys on the dictionary at the beginning of the loop, I wanted changes to the dict to affect the iteration of the dict, which was an ordered dict.
I ended up building the following routine, which can also be found in jaraco.itertools:
def _mutable_iter(dict):
"""
Iterate over items in the dict, yielding the first one, but allowing
it to be mutated during the process.
>>> d = dict(a=1)
>>> it = _mutable_iter(d)
>>> next(it)
('a', 1)
>>> d
{}
>>> d.update(b=2)
>>> list(it)
[('b', 2)]
"""
while dict:
prev_key = next(iter(dict))
yield prev_key, dict.pop(prev_key)
The docstring illustrates the usage. This function could be used in place of d.iteritems() above to have the desired effect.

Python Comparing Values of Numpy Array Between 2 Dictionaries Value

I have 2 dictionary and an input
letter = 'd'
dict_1 = {"label_1": array(['a','b']), "label_2": array(['c','d']), ...}
dict_2 = {"label_1": array(['x','y']), "label_2": array(['z','o']), ...}
letter_translated = some_function(letter)
output desired: 'o'
What I have in mind right now is to get the index number from the array of the key "label_2" in dict_1 then searching for the same index in dict_2. I am open to other way of doing it. If you are unclear about the question, feel free to drop a comment.
Note: the arrays are numpy arrays
I propose to iterate through the first dictionary, while keeping a trace of how to get to the current element (key and i) so that we can look in the second dict in the same place:
from numpy import array
dict_1 = {"label_1": array(['a','b']), "label_2": array(['c','d'])}
dict_2 = {"label_1": array(['x','y']), "label_2": array(['z','o'])}
def look_for_corresponding(letter, d1, d2):
for key, array_of_letters in d1.items():
for position, d1_letter in enumerate(array_of_letters):
if d1_letter == letter:
return d2[key][position]
return None # Line not necessary, but added for clarity
output = look_for_corresponding('d', dict_1, dict_2)
print(output)
# o
Of course, this code will fail if dict_1 and dict_2 do not have exactly the same structure, or if the arrays are more than 1D. If those cases apply to you, please edit your question to indicate it.
Also, I am not sure what should be done if the letter is not to be found within dict_1. This code will return None, but it could also raise an exception.
What do you mean with 'index'? The number?
dictionaries don't have the concept of counted indices of their entries. You can only access data through the key (here "label_2"), or by iterating (for key in dict_1 ...).
The order is not guaranteed and can change. The order or your declaration is not kept.
If you wish to have "label_2" in both, then you need to access
key = "label_2"
item_from_1 = dict_1[key]
item_from_2 = dict_2[key]
If you need to iterate dict_1, then on each item find the appropriate item in the second, then this also needs to go over the key:
for (key,value1) in dict_1.iteritems():
value2 = dict_2[key]
.....
Note that the order the items come up in the loop may vary. Even from one run of the program to the next.

Python: can view objects show keys of zombie dictionaries?

Here is my example code:
#!/usr/bin/env python3
myDict = {'a':1, 'b':2, 'c':3}
myKeys = myDict.keys()
# myDict = {} # toggle this line
myDict['y'] = 4
for myKey in myDict:
print('D ' + myKey)
for myKey in myKeys:
print('K ' + myKey)
If you run the program as shown here (with the line in the middle commented out), you get this output, which is exactly what I expected. The lines with the prefix D (loop over dictionary) have the same values as the lines with the prefix K (loop over keys of dictionary):
D a
D b
D c
D y
K a
K b
K c
K y
Now remove the hash and activate the line that was commented out. When you run the modified program, you get this:
D y
K a
K b
K c
But I expected one of these behaviors:
either:
After myDict = {} was executed, myKeys has become empty too (since it is a view object that always views the keys of its parent dictionary). Adding the item with the key 'y' should result in this output:
D y
K y
or:
After myDict = {} was executed, a new version of myDict was created and the previous version of myDict was destroyed, so myKeys has no longer any parent dictionary and therefor is pointing to null. So in this case looping over myKeys should throw an error.
But to me is looks as if the old version of myDict has become some kind of zombie, and myKeys is displaying the keys of this zombie.
So, here are my questions:
Is it true, that in my program myKeys shows keys of a zombie dictionary if the line in the middle is activated?
Is it reliable behavior, that myKeys will always show the keys of the previous version of the dictionary? (That would be very helpful for a program that I'm writing just now.)
Is there a way to revive the zombie dictionary? (I have all keys. Can I get the values too?)
The fundamental issue in your understanding here is that:
myDict = {}
Does nothing to the original dict, assignment never mutates, so the object that myDict was referring to is unmodified. Python objects will stay alive as long as something refers to them. CPython uses reference counting, and as an implementation detail, will reclaim objects immediately when their reference count reaches zero.
However, the dict_keys view object you created internally references the dictionary it is acting as a view over, so the original dictionary still has at least one reference to it, since the view object is alive. Note, though, the dict_keys API does not expose this reference, so if it is the only reference, you cannot really access your dict anymore (in any sane way).
Is it reliable behavior, that myKeys will always show the keys of the previous version of the dictionary?
You are misunderstanding the behavior. It is showing the keys of the same dictionary it has always been showing. There is no previous dictionary. The view is over the object, it doesn't care or know which variables happen to be referring to that object at any given time.
It's the same reason that if you do:
x = 'foo'
container = []
container.append(x)
x = 'bar'
print(container[0])
will still print "foo". Objects have no idea what names are referencing them at any given time, and should not care. And when one reference changes, the other references don't magically get updated.

Dealing with lists in a dictionary

I am iterating through some folders to read all the objects in that list to later on move the not rejected ones. As the number of folders and files may vary, basically I managed to create a dictionary where each folder is a key and the items are the items. In a dummy situation I have:
Iterating through the number of source of folders (known but may vary)
sourcefolder = (r"C:\User\Desktop\Test")
subfolders = 3
for i in range(subfolders):
Lst_All["allfiles" + str(i)] = os.listdir(sourcefolder[i])
This results in the dictionary below:
Lst_All = {
allfiles0: ('A.1.txt', 'A.txt', 'rejected.txt')
allfiles1: ('B.txt')
allfiles2: ('C.txt')}
My issue is to remove the rejected files so I can do a shutil.move() with only valid files.
So far I got:
for k, v in lst_All.items():
for i in v:
if i == "rejected.txt":
del lst_All[i]
but it returns an error KeyError: 'rejected.txt'. Any thoughts? Perhaps another way to create the list of items to be moved?
Thanks!
For a start, the members of your dictionary are tuples, not lists. Tuples are immutable, so we can't remove items as easily as we can with lists. To replicate the functionality I think you're after, we can do the following:
Lst_All = {'allfiles0': ('A.1.txt', 'A.txt', 'rejected.txt'),
'allfiles1': ('B.txt',),
'allfiles2': ('C.txt',)}
Lst_All = {k: tuple(x for x in v if x!="rejected.txt") for k, v in Lst_All.items()}
Which gives us:
>>> Lst_All
{'allfiles0': ('A.1.txt', 'A.txt'),
'allfiles1': ('B.txt',),
'allfiles2': ('C.txt',)}
You should not iterate over a dictionary when removing element from that dictionary inside loop. Better to make an list of keys and then iterate over that. Also you do not need a separate loop to check whether rejected.txt is present in that directory.
keys = list(lst_All.keys())
for k in keys:
if "rejected.txt" in lst_All[k]:
del lst_All[k]
If you want to remove rejected.txt then you can only create another tuple without that element and insert in the dictionary with the key. You can do that like -
keys = list(lst_All.keys())
for k in keys:
lst_All[k] = tuple((e for e in lst_All[k] if e != 'rejected.txt'))

Modifying a Python dict while iterating over it

Let's say we have a Python dictionary d, and we're iterating over it like so:
for k, v in d.iteritems():
del d[f(k)] # remove some item
d[g(k)] = v # add a new item
(f and g are just some black-box transformations.)
In other words, we try to add/remove items to d while iterating over it using iteritems.
Is this well defined? Could you provide some references to support your answer?
See also How to avoid "RuntimeError: dictionary changed size during iteration" error? for the separate question of how to avoid the problem.
Alex Martelli weighs in on this here.
It may not be safe to change the container (e.g. dict) while looping over the container.
So del d[f(k)] may not be safe. As you know, the workaround is to use d.copy().items() (to loop over an independent copy of the container) instead of d.iteritems() or d.items() (which use the same underlying container).
It is okay to modify the value at an existing index of the dict, but inserting values at new indices (e.g. d[g(k)] = v) may not work.
It is explicitly mentioned on the Python doc page (for Python 2.7) that
Using iteritems() while adding or deleting entries in the dictionary may raise a RuntimeError or fail to iterate over all entries.
Similarly for Python 3.
The same holds for iter(d), d.iterkeys() and d.itervalues(), and I'll go as far as saying that it does for for k, v in d.items(): (I can't remember exactly what for does, but I would not be surprised if the implementation called iter(d)).
You cannot do that, at least with d.iteritems(). I tried it, and Python fails with
RuntimeError: dictionary changed size during iteration
If you instead use d.items(), then it works.
In Python 3, d.items() is a view into the dictionary, like d.iteritems() in Python 2. To do this in Python 3, instead use d.copy().items(). This will similarly allow us to iterate over a copy of the dictionary in order to avoid modifying the data structure we are iterating over.
I have a large dictionary containing Numpy arrays, so the dict.copy().keys() thing suggested by #murgatroid99 was not feasible (though it worked). Instead, I just converted the keys_view to a list and it worked fine (in Python 3.4):
for item in list(dict_d.keys()):
temp = dict_d.pop(item)
dict_d['some_key'] = 1 # Some value
I realize this doesn't dive into the philosophical realm of Python's inner workings like the answers above, but it does provide a practical solution to the stated problem.
The following code shows that this is not well defined:
def f(x):
return x
def g(x):
return x+1
def h(x):
return x+10
try:
d = {1:"a", 2:"b", 3:"c"}
for k, v in d.iteritems():
del d[f(k)]
d[g(k)] = v+"x"
print d
except Exception as e:
print "Exception:", e
try:
d = {1:"a", 2:"b", 3:"c"}
for k, v in d.iteritems():
del d[f(k)]
d[h(k)] = v+"x"
print d
except Exception as e:
print "Exception:", e
The first example calls g(k), and throws an exception (dictionary changed size during iteration).
The second example calls h(k) and throws no exception, but outputs:
{21: 'axx', 22: 'bxx', 23: 'cxx'}
Which, looking at the code, seems wrong - I would have expected something like:
{11: 'ax', 12: 'bx', 13: 'cx'}
Python 3 you should just:
prefix = 'item_'
t = {'f1': 'ffw', 'f2': 'fca'}
t2 = dict()
for k,v in t.items():
t2[k] = prefix + v
or use:
t2 = t1.copy()
You should never modify original dictionary, it leads to confusion as well as potential bugs or RunTimeErrors. Unless you just append to the dictionary with new key names.
This question asks about using an iterator (and funny enough, that Python 2 .iteritems iterator is no longer supported in Python 3) to delete or add items, and it must have a No as its only right answer as you can find it in the accepted answer. Yet: most of the searchers try to find a solution, they will not care how this is done technically, be it an iterator or a recursion, and there is a solution for the problem:
You cannot loop-change a dict without using an additional (recursive) function.
This question should therefore be linked to a question that has a working solution:
How can I remove a key:value pair wherever the chosen key occurs in a deeply nested dictionary? (= "delete")
Also helpful as it shows how to change the items of a dict on the run: How can I replace a key:value pair by its value wherever the chosen key occurs in a deeply nested dictionary? (= "replace").
By the same recursive methods, you will also able to add items as the question asks for as well.
Since my request to link this question was declined, here is a copy of the solution that can delete items from a dict. See How can I remove a key:value pair wherever the chosen key occurs in a deeply nested dictionary? (= "delete") for examples / credits / notes.
import copy
def find_remove(this_dict, target_key, bln_overwrite_dict=False):
if not bln_overwrite_dict:
this_dict = copy.deepcopy(this_dict)
for key in this_dict:
# if the current value is a dict, dive into it
if isinstance(this_dict[key], dict):
if target_key in this_dict[key]:
this_dict[key].pop(target_key)
this_dict[key] = find_remove(this_dict[key], target_key)
return this_dict
dict_nested_new = find_remove(nested_dict, "sub_key2a")
The trick
The trick is to find out in advance whether a target_key is among the next children (= this_dict[key] = the values of the current dict iteration) before you reach the child level recursively. Only then you can still delete a key:value pair of the child level while iterating over a dictionary. Once you have reached the same level as the key to be deleted and then try to delete it from there, you would get the error:
RuntimeError: dictionary changed size during iteration
The recursive solution makes any change only on the next values' sub-level and therefore avoids the error.
I got the same problem and I used following procedure to solve this issue.
Python List can be iterate even if you modify during iterating over it.
so for following code it will print 1's infinitely.
for i in list:
list.append(1)
print 1
So using list and dict collaboratively you can solve this problem.
d_list=[]
d_dict = {}
for k in d_list:
if d_dict[k] is not -1:
d_dict[f(k)] = -1 # rather than deleting it mark it with -1 or other value to specify that it will be not considered further(deleted)
d_dict[g(k)] = v # add a new item
d_list.append(g(k))
Today I had a similar use-case, but instead of simply materializing the keys on the dictionary at the beginning of the loop, I wanted changes to the dict to affect the iteration of the dict, which was an ordered dict.
I ended up building the following routine, which can also be found in jaraco.itertools:
def _mutable_iter(dict):
"""
Iterate over items in the dict, yielding the first one, but allowing
it to be mutated during the process.
>>> d = dict(a=1)
>>> it = _mutable_iter(d)
>>> next(it)
('a', 1)
>>> d
{}
>>> d.update(b=2)
>>> list(it)
[('b', 2)]
"""
while dict:
prev_key = next(iter(dict))
yield prev_key, dict.pop(prev_key)
The docstring illustrates the usage. This function could be used in place of d.iteritems() above to have the desired effect.

Categories