Getting rid of unwanted keys in a dictionary - python

I have a dictionary dict1; each value is a list of strings. If all elements in this list of strings contain 'my_string', I don't need this particular key. I've come up with this:
from collections import defaultdict
dict2 = defaultdict(list)
for key, value in dict1.iteritems():
for list_element in value:
if 'my_string' not in list_element:
dict2[key] = dict1[key]
It does work but I'm sure there is a better way of doing it. (And I'd prefer not to create another dictionary, which happens in the code above, but it's not really important.)

You can't modify a dict while iterating over it. You either need to create a new dict by filtering the old, or create a temporary object of some kind to iterate over:
(1) Create a new dict with the filtered results:
dict1 = {k:v for (k, v) in dict1.iteritems() if all('my_string' in e for e in v)}
(2.1) Create a temporary dict:
for k, v in dict1.copy():
if all('my_string' in e for e in v):
del dict1[k]
(2.2) Create a temporary list of key-value tuples:
for k, v in dict1.items():
if all('my_string' in e for e in v):
del dict1[k]
(2.3) Create a temporary list of keys:
for k in dict1.keys():
if all('my_string' in e for e in dict1[k]):
del dict1[k]
So, how do you decide between them?
Well, 1 is easiest to reason about, because it has all of the benefits of mutation-free code. But 2.1-2.3 are probably more straightforward for a novice programmer. Usually, that distinction is the most important one.
If you're worried about memory usage, obviously 2.3 is better than 2.1-2.2, because it generates a much smaller temporary object. But what about 2.3 vs. 1? That depends on two things: First, how big is a list of all of your keys compared to a dict of just your remaining items? Second, how much space is gained by building a smaller hash table from scratch instead of shrinking a larger one? Usually, you don't get any benefit from the latter, because Python doesn't shrink the hash table at all… but if it matters, you need to test with your use cases on your platform, and see what happens.
If you're worried about performance, it's pretty similar to memory usage. 2.3 vs. 1 are the obvious contenders, and 1 will be better unless you're keeping most of the dict around—but again, if it matters, you need to measure for yourself.
Finally, note that the above is for Python 2.7, which is what (as a guess) you seem to be using. In 3.x, items and keys both return iterators over the existing dict, so you need to do list(dict1.items()) and list(dict1.keys()) to make the copying explicit.

for key, value in dict1.items():
if all('my_string' in e for e in value):
del dict1[key]
Note: be careful not to use iteritems and delete from the same dict. items are fine, it makes a copy.

I think you can just use a dictionary comprehension, if that's available in your version:
filtered = {k:v for k,v in d1.items() if all(e == 'my_string' for e in v)}
This assumes you don't mind making a second dictionary that is a filtered copy of the first.

Related

Python: Create sorted list of keys moving one key to the head

Is there a more pythonic way of obtaining a sorted list of dictionary keys with one key moved to the head? So far I have this:
# create a unique list of keys headed by 'event' and followed by a sorted list.
# dfs is a dict of dataframes.
for k in (dict.fromkeys(['event']+sorted(dfs))):
display(k,dfs[k]) # ideally this should be (k,v)
I suppose you would be able to do
for k, v in list(dfs.items()) + [('event', None)]:
.items() casts a dictionary to a list of tuples (or technically a dict_items, which is why I have to cast it to list explicitly to append), to which you can append a second list. Iterating through a list of tuples allows for automatic unpacking (so you can do k,v in list instead of tup in list)
What we really want is an iterable, but that's not possible with sorted, because it must see all the keys before it knows what the first item should be.
Using dict.fromkeys to create a blank dictionary by insertion order was pretty clever, but relies on an implementation detail of the current version of python. (dict is fundamentally unordered) I admit, it took me a while to figure out that line.
Since the code you posted is just working with the keys, I suggest you focus on that. Taking up a few more lines for readability is a good thing, especially if we can hide it in a testable function:
def display_by_keys(dfs, priority_items=None):
if not priority_items:
priority_items = ['event']
featured = {k for k in priority_items if k in dfs}
others = {k for k in dfs.keys() if k not in featured}
for key in list(featured) + sorted(others):
display(key, dfs[key])
The potential downside is you must sort the keys every time. If you do this much more often than the data store changes, on a large data set, that's a potential concern.
Of course you wouldn't be displaying a really large result, but if it becomes a problem, then you'll want to store them in a collections.OrderedDict (https://stackoverflow.com/a/13062357/1766544) or find a sorteddict module.
from collections import OrderedDict
# sort once
ordered_dfs = OrderedDict.fromkeys(sorted(dfs.keys()))
ordered_dfs.move_to_end('event', last=False)
ordered_dfs.update(dfs)
# display as often as you need
for k, v in ordered_dfs.items():
print (k, v)
If you display different fields first in different views, that's not a problem. Just sort all the fields normally, and use a function like the one above, without the sort.

Deleting items from a dictionary with a for loop [duplicate]

This question already has an answer here:
deleting entries in a dictionary based on a condition
(1 answer)
Closed 8 years ago.
I'm trying to drop items from a dictionary if the value of the key is below a certain threshold. For a simple example to what I mean:
my_dict = {'blue': 1, 'red': 2, 'yellow': 3, 'green': 4}
for color in my_dict:
threshold_value = 3
if my_dict[color] < threshold_value:
del my_dict[color]
print(my_dict)
Now, I get a RuntimeError: dictionary changed size during iteration error. No big surprises there. The reason I'm posting this question is:
Find out if there's an elegant solution that doesn't require creating a new dictionary (that holds only the keys with values >= threshold).
Try to understand Python's rationale here. The way I read it to myself is: "go to the first key. Is the value of that key < x ? if yes - del this key:value item and continue on the the next key in the dictionary, if no - continue to next key without doing anything". In other words, what happened historically to previous keys shouldn't affect where I go next. I'm looking forward to the next items, regardless of the past.
I know it's a funny (some might say stupid, I'll give you that) but what's Python's "way of thinking" about this loop? Why doesn't it work? How would Python read it out loud to itself? Just trying to get a better understanding of the language...
Due to the fact that Python dictionaries are implemented as hash tables, you shouldn't rely on them having any sort of an order. Key order may change unpredictably (but only after insertion or removal of a key). Thus, it's impossible to predict the next key. Python throws the RuntimeError to be safe, and to prevent people from running into unexpected results.
Python 2's dict.items method returns a copy of key-value pairs, so you can safely iterate over it and delete values you don't need by keys, as #wim suggested in comments. Example:
for k, v in my_dict.items():
if v < threshold_value:
del my_dict[k]
However, Python 3's dict.items returns a view object that reflects all changes made to the dictionary. This is the reason the solution above only works in Python 2. You may convert my_dict.items() to list (tuple etc.) to make it Python 3-compatible.
Another way to approach the problem is to select keys you want to delete and then delete them
keys = [k for k, v in my_dict.items() if v < threshold_value]
for x in keys:
del my_dict[x]
This works in both Python 2 and Python 3.
Dictionaries are unordered. By deleting one key nobody can say, what the next key is. So python in general disallow to add or remove keys from a dictionary, over that is iterated.
Just create a new one:
my_dict = {"blue":1,"red":2,"yellow":3,"green":4}
new_dict = {k:v for k,v in my_dict.iteritems() if v >= threshold_value}
I guess that modifying a collection while iterating over it is a hard thing to do to implement properly. Consider following exaple:
>>> list = [1, 2, 3, 4, 5, 6]
>>> for ii in range(len(list)):
print list[ii];
if list[ii] == 3:
del list[ii]
1
2
3
5
6
Notice that in this example 4 was altogether omitted. It is very similat in dictionaries, deleting/adding entries might invalidate internal structures that define order of iteration (for example you deleted enough entries so hash map bucket size changed).
To solve your case --- just create new dictionary and copy items there. As to

Python: Tool to compare pairs of dicts of varying deepness?

I have a couple of pairs of rather big dicts. The structure of the pair dicts is exactly the same but the values will differ. All pairs differ in how nested they are.
To clarify:
dict_a has same structure as dict_b
dict_c has same structure as dict_d (but is different from dict_a and dict_b)
etc.
Is there a tool out there that makes it easy to implement a function to compare the values only, and/or do some basic arithmetic on them? My dicts can be quite nested, so a simple [for k,v in dict_x.iteritems()...] won't do.
Sounds like a problem for...recursive functions!
Basically, if I understand your question, you have a deep dictionary with varying levels of depths at unspecified keys. You'd like to compare the values of dict_a to dict_b but don't care much about the keys: just the differences in values. Here's an idea using a recursive function to print out each set of values that doesn't match.
def dict_compare(da, db):
for k, v in da.iteritems():
if isinstance(v, dict): #if the value is another dict:
dict_compare(v, db[k]) #enter into the comparison function again!
else:
if v != db[k]:
print 'values not equal at', k
Then you just can call
dict_compare(dict_a, dict_b)
The magic being that if the value of a given key is in fact another dictionary, just call your comparison function again.
Obviously, if you wanted to do something more complicated than just print the simple key of the values that don't match, just modify what happens after the if v != db[k] line.

Remove a dictionary key that has a certain value [duplicate]

This question already has answers here:
Removing entries from a dictionary based on values
(4 answers)
Closed 9 years ago.
I know dictionary's are not meant to be used this way, so there is no built in function to help do this, but I need to delete every entry in my dictionary that has a specific value.
so if my dictionary looks like:
'NameofEntry1': '0'
'NameofEntry2': 'DNC'
...
I need to delete(probably pop) all the entries that have value DNC, there are multiple in the dictionary.
Modifying the original dict:
for k,v in your_dict.items():
if v == 'DNC':
del your_dict[k]
or create a new dict using dict comprehension:
your_dict = {k:v for k,v in your_dict.items() if v != 'DNC'}
From the docs on iteritems(),iterkeys() and itervalues():
Using iteritems(), iterkeys() or itervalues() while adding or
deleting entries in the dictionary may raise a RuntimeError or fail
to iterate over all entries.
Same applies to the normal for key in dict: loop.
In Python 3 this is applicable to dict.keys(), dict.values() and dict.items().
You just need to make sure that you aren't modifying the dictionary while you are iterating over it else you would get RuntimeError: dictionary changed size during iteration.
So you need to iterate over a copy of the keys, values (for d use d.items() in 2.x or list(d.items()) in 3.x)
>>> d = {'NameofEntry1': '0', 'NameofEntry2': 'DNC'}
>>> for k,v in d.items():
... if v == 'DNC':
... del d[k]
...
>>> d
{'NameofEntry1': '0'}
This should work:
for key, value in dic.items():
if value == 'DNC':
dic.pop(key)
If restrictions re: modifying the dictionary while iterating on it is a problem, you could create a new class compatible with dict that stores a reverse index of all keys that have the given value (updated at create / update / delete of dict item), which can be arguments to del without iterating over the dict's items.
Subclass dict, and override __setitem__, __delitem__, pop, popitem and clear.
If this is an operation you're doing a lot of, that might be convenient and fast.

Python dict key delete if pattern match with other dict key

Python dict key delete, if key pattern match with other dict key.
e.g.
a={'a.b.c.test':1, 'b.x.d.pqr':2, 'c.e.f.dummy':3, 'd.x.y.temp':4}
b={'a.b.c':1, 'b.p.q':20}
result
a={'b.x.d.pqr':2,'c.e.f.dummy':3,'d.x.y.temp':4}`
If "pattern match with other dict key" means "starts with any key in the other dict", the most direct way to write that would be like this:
a = {k:v for (k, v) in a.items() if any(k.startswith(k2) for k2 in b)}
If that's hard to follow at first glance, it's basically the equivalent of this:
def matches(key1, d2):
for key2 in d2:
if key1.startswith(key2):
return True
return False
c = {}
for key in a:
if not matches(key, b):
c[key] = a[key]
a = c
This is going to be slower than necessary. If a has N keys, and b has M keys, the time taken is O(NM). While you can checked "does key k exist in dict b" in constant time, there's no way to check "does any key starting with k exist in dict b" without iterating over the whole dict. So, if b is potentially large, you probably want to search sorted(b.keys()) and write a binary search, which will get the time down to O(N log M). But if this isn't a bottleneck, you may be better off sticking with the simple version, just because it's simple.
Note that I'm generating a new a with the matches filtered out, rather than deleting the matches. This is almost always a better solution than deleting in-place, for multiple reasons:
* It's much easier to reason about. Treating objects as immutable and doing pure operations on them means you don't need to think about how states change over time. For example, the naive way to delete in place would run into the problem that you're changing the dictionary while iterating over it, which will raise an exception. Issues like that never come up without mutable operations.
* It's easier to read, and (once you get the hang of it) even to write.
* It's almost always faster. (One reason is that it takes a lot more memory allocations and deallocations to repeatedly modify a dictionary than to build one with a comprehension.)
The one tradeoff is memory usage. The delete-in-place implementation has to make a copy of all of the keys; the built-a-new-dict implementation has to have both the filtered dict and the original dict in memory. If you're keeping 99% of the values, and the values are much larger than the keys, this could hurt you. (On the other hand, if you're keeping 10% of the values, and the values are about the same size as the keys, you'll actually save space.) That's why it's "almost always" a better solution, rather than "always".
for key in list(a.keys()):
if any(key.startswith(k) for k in b):
del a[key]
Replace key.startswith(k) with an appropriate condition for "matching".
c={} #result in dict c
for key in b.keys():
if all([z.count(key)==0 for z in a.keys()]): #string of the key in b should not be substring for any of the keys in a
c[key]=b[key]

Categories