Dealing with lists in a dictionary - python

I am iterating through some folders to read all the objects in that list to later on move the not rejected ones. As the number of folders and files may vary, basically I managed to create a dictionary where each folder is a key and the items are the items. In a dummy situation I have:
Iterating through the number of source of folders (known but may vary)
sourcefolder = (r"C:\User\Desktop\Test")
subfolders = 3
for i in range(subfolders):
Lst_All["allfiles" + str(i)] = os.listdir(sourcefolder[i])
This results in the dictionary below:
Lst_All = {
allfiles0: ('A.1.txt', 'A.txt', 'rejected.txt')
allfiles1: ('B.txt')
allfiles2: ('C.txt')}
My issue is to remove the rejected files so I can do a shutil.move() with only valid files.
So far I got:
for k, v in lst_All.items():
for i in v:
if i == "rejected.txt":
del lst_All[i]
but it returns an error KeyError: 'rejected.txt'. Any thoughts? Perhaps another way to create the list of items to be moved?
Thanks!

For a start, the members of your dictionary are tuples, not lists. Tuples are immutable, so we can't remove items as easily as we can with lists. To replicate the functionality I think you're after, we can do the following:
Lst_All = {'allfiles0': ('A.1.txt', 'A.txt', 'rejected.txt'),
'allfiles1': ('B.txt',),
'allfiles2': ('C.txt',)}
Lst_All = {k: tuple(x for x in v if x!="rejected.txt") for k, v in Lst_All.items()}
Which gives us:
>>> Lst_All
{'allfiles0': ('A.1.txt', 'A.txt'),
'allfiles1': ('B.txt',),
'allfiles2': ('C.txt',)}

You should not iterate over a dictionary when removing element from that dictionary inside loop. Better to make an list of keys and then iterate over that. Also you do not need a separate loop to check whether rejected.txt is present in that directory.
keys = list(lst_All.keys())
for k in keys:
if "rejected.txt" in lst_All[k]:
del lst_All[k]
If you want to remove rejected.txt then you can only create another tuple without that element and insert in the dictionary with the key. You can do that like -
keys = list(lst_All.keys())
for k in keys:
lst_All[k] = tuple((e for e in lst_All[k] if e != 'rejected.txt'))

Related

Function that makes dict from string but swaps keys and values?

I'm trying to make a function that takes in list of strings as an input like the one listed below:
def swap_values_dict(['Summons: Bahamut, Shiva, Chocomog',
'Enemies: Bahamut, Shiva, Cactaur'])
and creates a dictionary from them using the words after the colons as keys and the words before the colons as values. I need to clarify that, at this point, there are only two strings in the list. I plan to split the strings into sublists and, from there, try and assign them to a dictionary.
The output should look like
{'Bahamut': ['Summons','Enemies'],'Shiva':['Summons','Enemies'],'Chocomog':['Summons'],'Cactaur':['Enemies']}
As you can see, the words after the colon in the original list have become keys while the words before the colon (categories) have become the values. If one of the values appears in both lists, it is assigned two values in the final dictionary. I would like to be able to make similar dictionaries out of many lists of different sizes, not just ones that contain two strings. Could this be done without list comprehension and only for loops and if statements?
What I've Tried So Far
title_list = []
for i in range(len(mobs)):#counts amount of strings in list
titles = (mobs[i].split(":"))[0] #gets titles from list using split
title_list.append(titles)
title_list
this code returns ['Summons', 'Enemies'] which aren't the results I wanted to receive but I think they could help me write the function. I had planned on separating the keys and values into separate lists and then zipping them together afterwards as a dictionary.
Try:
def swap_values_dict(lst):
tmp = {}
for s in lst:
k, v = map(str.strip, s.split(":"))
tmp[k] = list(map(str.strip, v.split(",")))
out = {}
for k, v in tmp.items():
for i in v:
out.setdefault(i, []).append(k)
return out
print(
swap_values_dict(
[
"Summons: Bahamut, Shiva, Chocomog",
"Enemies: Bahamut, Shiva, Cactaur",
]
)
)
Prints:
{
"Bahamut": ["Summons", "Enemies"],
"Shiva": ["Summons", "Enemies"],
"Chocomog": ["Summons"],
"Cactaur": ["Enemies"],
}
I'd use a defaultdict. It saves you the trouble of manually checking if a key exists in your dictionary and constructing a new empty list, making for a rather concise function:
from collections import defaultdict
def swap_values_dict(mobs):
result = defaultdict(list)
for elem in mobs:
role, members = elem.split(': ')
for m in members.split(', '):
result[m].append(role)
return result

Python dictionary

I am having some trouble understanding this, I have tried to reduce the problem to this set of code
for k in y.keys():
if k in dateDict.keys():
if yearDict[k] in dict1:
dict1[yearDict[k]].extend(y[k])
else:
dict1[yearDict[k]] = y[k]
if yearDict[k] in dict2:
dict2[yearDict[k]].extend(y[k])
else:
dict2[yearDict[k]] = y[k]
else:
continue
I have two dictionaries y and dateDict to begin with. For a matching key for y in dateDict, I am populating two other dictionaries dict1 and dict2, hashed with keys from some other dictionary yearDict. Unfortunately the result are duplicated in dict1 and dict2, I have values repeating themselves. Any idea what could be happening?
Also I notice that this code works as expected,
for k in y.keys():
if k in dateDict.keys():
if yearDict[k] in dict1:
dict1[yearDict[k]].extend(y[k])
else:
dict1[yearDict[k]] = y[k]
else:
continue
If y[k] is a list (which it looks like), the same list will be assigned to everywhere where is it used. Dictionaries do not make copies of the elements when they are assigned, they just keep references to their objects. In your example, both keys in dict1 and dict2 will point to the same object.
Later, when it is modified, the same elements will be appended with the new values, once for each map. To prevent this, you can create a new list when initially assigning:
dictl[yearDict[k]] = list(y[k])
However, it is always good to know the Python standard library. This code could be made much more readable, and without the error, by using collections.defaultdict:
from collections import defaultdict
# This goes wherever the dictionaries
# where initially defined.
dict1 = defaultdict(list)
dict2 = defaultdict(list)
# You can get the value here, no need to search it later.
for k, value in y.items():
if k in dateDict.keys():
# No need to call this everywhere.
new_key = yearDict[k]
# Note the defaultdict magic.
dict1[new_key].extend(value)
dict2[new_key].extend(value)
# No need for the 'continue' at the end either.
When asked for a key that does not exist yet, the defaultdict will create a new one on the fly -- so you don't have to care about initialization, or creating copies of you values.

Checking items in a list of dictionaries in python

I have a list of dictionaries=
a = [{"ID":1, "VALUE":2},{"ID":2, "VALUE":2},{"ID":3, "VALUE":4},...]
"ID" is a unique identifier for each dictionary. Considering the list is huge, what is the fastest way of checking if a dictionary with a certain "ID" is in the list, and if not append to it? And then update its "VALUE" ("VALUE" will be updated if the dict is already in list, otherwise a certain value will be written)
You'd not use a list. Use a dictionary instead, mapping ids to nested dictionaries:
a = {
1: {'VALUE': 2, 'foo': 'bar'},
42: {'VALUE': 45, 'spam': 'eggs'},
}
Note that you don't need to include the ID key in the nested dictionary; doing so would be redundant.
Now you can simply look up if a key exists:
if someid in a:
a[someid]['VALUE'] = newvalue
I did make the assumption that your ID keys are not necessarily sequential numbers. I also made the assumption you need to store other information besides VALUE; otherwise just a flat dictionary mapping ID to VALUE values would suffice.
A dictionary lets you look up values by key in O(1) time (constant time independent of the size of the dictionary). Lists let you look up elements in constant time too, but only if you know the index.
If you don't and have to scan through the list, you have a O(N) operation, where N is the number of elements. You need to look at each and every dictionary in your list to see if it matches ID, and if ID is not present, that means you have to search from start to finish. A dictionary will still tell you in O(1) time that the key is not there.
If you can, convert to a dictionary as the other answers suggest, but in case you you have reason* to not change the data structure storing your items, here's what you can do:
items = [{"ID":1, "VALUE":2}, {"ID":2, "VALUE":2}, {"ID":3, "VALUE":4}]
def set_value_by_id(id, value):
# Try to find the item, if it exists
for item in items:
if item["ID"] == id:
break
# Make and append the item if it doesn't exist
else: # Here, `else` means "if the loop terminated not via break"
item = {"ID": id}
items.append(id)
# In either case, set the value
item["VALUE"] = value
* Some valid reasons I can think of include preserving the order of items and allowing duplicate items with the same id. For ways to make dictionaries work with those requirements, you might want to take a look at OrderedDict and this answer about duplicate keys.
Convert your list into a dict and then checking for values is much more efficient.
d = dict((item['ID'], item['VALUE']) for item in a)
for new_key, new_value in new_items:
if new_key not in d:
d[new_key] = new_value
Also need to update on key found:
d = dict((item['ID'], item['VALUE']) for item in a)
for new_key, new_value in new_items:
d.setdefault(new_key, 0)
d[new_key] = new_value
Answering the question you asked, without changing the datastructure around, there's no real faster way of looking without a loop and checking every element and doing a dictionary lookup for each one - but you can push the loop down to the Python runtime instead of using Python's for loop.
I haven't tried if it ends up faster though.
a = [{"ID":1, "VALUE":2},{"ID":2, "VALUE":2},{"ID":3, "VALUE":4}]
id = 2
tmp = filter(lambda d: d['ID']==id, a)
# the filter will either return an empty list, or a list of one item.
if not tmp:
tmp = {"ID":id, "VALUE":"default"}
a.append(tmp)
else:
tmp = tmp[0]
# tmp is bound to the found/new dictionary

Python - Updating value in one dictionary is updating value in all dictionaries

I have a list of dictionaries called lod. All dictionaries have the same keys but different values. I am trying to update one specific value in the list of values for the same key in all the dictionaries.
I am attempting to do it with the following for loop:
for i in range(len(lod)):
a=lod[i][key][:]
a[p]=a[p]+lov[i]
lod[i][key]=a
What's happening is each is each dictionary is getting updated len(lod) times so lod[0][key][p] is supposed to have lov[0] added to it but instead it is getting lov[0]+lov[1]+.... added to it.
What am I doing wrong?
Here is how I declared the list of dicts:
lod = [{} for _ in range(len(dataul))]
for j in range(len(dataul)):
for i in datakl:
rrdict[str.split(i,',')[0]]=list(str.split(i,',')[1:len(str.split(i,','))])
lod[j]=rrdict
The problem is in how you created the list of dictionaries. You probably did something like this:
list_of_dicts = [{}] * 20
That's actually the same dict 20 times. Try doing something like this:
list_of_dicts = [{} for _ in range(20)]
Without seeing how you actually created it, this is only an example solution to an example problem.
To know for sure, print this:
[id(x) for x in list_of_dicts]
If you defined it in the * 20 method, the id is the same for each dict. In the list comprehension method, the id is unique.
This it where the trouble starts: lod[j] = rrdict. lod itself is created properly with different dictionaries. Unfortunately, afterwards any references to the original dictionaries in the list get overwritten with a reference to rrdict. So in the end, the list contains only references to one single dictionary. Here is some more pythonic and readable way to solve your problem:
lod = [{} for _ in range(len(dataul))]
for rrdict in lod:
for line in datakl:
splt = line.split(',')
rrdict[splt[0]] = splt[1:]
You created the list of dictionaries correctly, as per other answer.
However, when you are updating individual dictionaries, you completely overwrite the list.
Removing noise from your code snippet:
lod = [{} for _ in range(whatever)]
for j in range(whatever):
# rrdict = lod[j] # Uncomment this as a possible fix.
for i in range(whatever):
rrdict[somekey] = somevalue
lod[j] = rrdict
Assignment on the last line throws away the empty dict that was in lod[j] and inserts a reference to the object represented by rrdict.
Not sure what your code does, but see a commented-out line - it might be the fix you are looking for.

Modifying a Python dict while iterating over it

Let's say we have a Python dictionary d, and we're iterating over it like so:
for k, v in d.iteritems():
del d[f(k)] # remove some item
d[g(k)] = v # add a new item
(f and g are just some black-box transformations.)
In other words, we try to add/remove items to d while iterating over it using iteritems.
Is this well defined? Could you provide some references to support your answer?
See also How to avoid "RuntimeError: dictionary changed size during iteration" error? for the separate question of how to avoid the problem.
Alex Martelli weighs in on this here.
It may not be safe to change the container (e.g. dict) while looping over the container.
So del d[f(k)] may not be safe. As you know, the workaround is to use d.copy().items() (to loop over an independent copy of the container) instead of d.iteritems() or d.items() (which use the same underlying container).
It is okay to modify the value at an existing index of the dict, but inserting values at new indices (e.g. d[g(k)] = v) may not work.
It is explicitly mentioned on the Python doc page (for Python 2.7) that
Using iteritems() while adding or deleting entries in the dictionary may raise a RuntimeError or fail to iterate over all entries.
Similarly for Python 3.
The same holds for iter(d), d.iterkeys() and d.itervalues(), and I'll go as far as saying that it does for for k, v in d.items(): (I can't remember exactly what for does, but I would not be surprised if the implementation called iter(d)).
You cannot do that, at least with d.iteritems(). I tried it, and Python fails with
RuntimeError: dictionary changed size during iteration
If you instead use d.items(), then it works.
In Python 3, d.items() is a view into the dictionary, like d.iteritems() in Python 2. To do this in Python 3, instead use d.copy().items(). This will similarly allow us to iterate over a copy of the dictionary in order to avoid modifying the data structure we are iterating over.
I have a large dictionary containing Numpy arrays, so the dict.copy().keys() thing suggested by #murgatroid99 was not feasible (though it worked). Instead, I just converted the keys_view to a list and it worked fine (in Python 3.4):
for item in list(dict_d.keys()):
temp = dict_d.pop(item)
dict_d['some_key'] = 1 # Some value
I realize this doesn't dive into the philosophical realm of Python's inner workings like the answers above, but it does provide a practical solution to the stated problem.
The following code shows that this is not well defined:
def f(x):
return x
def g(x):
return x+1
def h(x):
return x+10
try:
d = {1:"a", 2:"b", 3:"c"}
for k, v in d.iteritems():
del d[f(k)]
d[g(k)] = v+"x"
print d
except Exception as e:
print "Exception:", e
try:
d = {1:"a", 2:"b", 3:"c"}
for k, v in d.iteritems():
del d[f(k)]
d[h(k)] = v+"x"
print d
except Exception as e:
print "Exception:", e
The first example calls g(k), and throws an exception (dictionary changed size during iteration).
The second example calls h(k) and throws no exception, but outputs:
{21: 'axx', 22: 'bxx', 23: 'cxx'}
Which, looking at the code, seems wrong - I would have expected something like:
{11: 'ax', 12: 'bx', 13: 'cx'}
Python 3 you should just:
prefix = 'item_'
t = {'f1': 'ffw', 'f2': 'fca'}
t2 = dict()
for k,v in t.items():
t2[k] = prefix + v
or use:
t2 = t1.copy()
You should never modify original dictionary, it leads to confusion as well as potential bugs or RunTimeErrors. Unless you just append to the dictionary with new key names.
This question asks about using an iterator (and funny enough, that Python 2 .iteritems iterator is no longer supported in Python 3) to delete or add items, and it must have a No as its only right answer as you can find it in the accepted answer. Yet: most of the searchers try to find a solution, they will not care how this is done technically, be it an iterator or a recursion, and there is a solution for the problem:
You cannot loop-change a dict without using an additional (recursive) function.
This question should therefore be linked to a question that has a working solution:
How can I remove a key:value pair wherever the chosen key occurs in a deeply nested dictionary? (= "delete")
Also helpful as it shows how to change the items of a dict on the run: How can I replace a key:value pair by its value wherever the chosen key occurs in a deeply nested dictionary? (= "replace").
By the same recursive methods, you will also able to add items as the question asks for as well.
Since my request to link this question was declined, here is a copy of the solution that can delete items from a dict. See How can I remove a key:value pair wherever the chosen key occurs in a deeply nested dictionary? (= "delete") for examples / credits / notes.
import copy
def find_remove(this_dict, target_key, bln_overwrite_dict=False):
if not bln_overwrite_dict:
this_dict = copy.deepcopy(this_dict)
for key in this_dict:
# if the current value is a dict, dive into it
if isinstance(this_dict[key], dict):
if target_key in this_dict[key]:
this_dict[key].pop(target_key)
this_dict[key] = find_remove(this_dict[key], target_key)
return this_dict
dict_nested_new = find_remove(nested_dict, "sub_key2a")
The trick
The trick is to find out in advance whether a target_key is among the next children (= this_dict[key] = the values of the current dict iteration) before you reach the child level recursively. Only then you can still delete a key:value pair of the child level while iterating over a dictionary. Once you have reached the same level as the key to be deleted and then try to delete it from there, you would get the error:
RuntimeError: dictionary changed size during iteration
The recursive solution makes any change only on the next values' sub-level and therefore avoids the error.
I got the same problem and I used following procedure to solve this issue.
Python List can be iterate even if you modify during iterating over it.
so for following code it will print 1's infinitely.
for i in list:
list.append(1)
print 1
So using list and dict collaboratively you can solve this problem.
d_list=[]
d_dict = {}
for k in d_list:
if d_dict[k] is not -1:
d_dict[f(k)] = -1 # rather than deleting it mark it with -1 or other value to specify that it will be not considered further(deleted)
d_dict[g(k)] = v # add a new item
d_list.append(g(k))
Today I had a similar use-case, but instead of simply materializing the keys on the dictionary at the beginning of the loop, I wanted changes to the dict to affect the iteration of the dict, which was an ordered dict.
I ended up building the following routine, which can also be found in jaraco.itertools:
def _mutable_iter(dict):
"""
Iterate over items in the dict, yielding the first one, but allowing
it to be mutated during the process.
>>> d = dict(a=1)
>>> it = _mutable_iter(d)
>>> next(it)
('a', 1)
>>> d
{}
>>> d.update(b=2)
>>> list(it)
[('b', 2)]
"""
while dict:
prev_key = next(iter(dict))
yield prev_key, dict.pop(prev_key)
The docstring illustrates the usage. This function could be used in place of d.iteritems() above to have the desired effect.

Categories