The task that I wanted to see if possible to solve is, swapping key,value pairs of a dictionary (in Python), with an in-place calculation, without additional data-structures (Only a constant number of extra variables). It seems rather impossible (in a finite world), but I'm open to hear suggestions on solving it.
I've seen a few posts about in-place dictionary inverse in python, and I've found one common thing between all of the solutions.
The following dictionary won't be properly inversed:
dict = {'b':'a','a':'c','c':123}
The reason for that is, when swapping the first argument, we overwrite 'a''s actual value (The values are unique, the keys are unique, but that doesn't mean there isn't a value that is the same as an already existing key)
NOTES:
1) The dictionary given as an example has hashable values.
2) The key/values can be of any data-type. Not necessarily strings.
I'd love to hear ways to solve it, I've thought of one but it only works if we have infinite memory, which obviously is not true.
EDIT:
1) My Idea was, changing the dictionary such that I add a constant number of underscores ("_") to the beginning of each key entry. The number of underscores is determined based on the keys, if some key has X underscores, I'll add X+1 underscores (max_underscores_of_key_in_prefix+1).
To work around objects in the keys, I'll make a wrapper class for that.
I have tried my best explaining my intuition, but I am not sure this is practical.
2) #Mark Ransom's solution works perfectly, but if anyone has an other algorithmic solution to the problem, I'd still love to hear it out!
I mark this question as solved because it is solved, but again, other solutions are more than welcome :-)
Obviously for this to be possible, both keys and values must be hashable. This means that none of your keys or values can be a list. We can take advantage of this to know which dictionary elements have been already processed.
Since you can't iterate and modify a dictionary at the same time, we must start over every time we swap a key/value. That makes this very slow, an O(n^2) operation.
def invert_dict(d):
done = False
while not done:
done = True
for key, val in d.items():
if isinstance(val, list):
if len(val) > 1:
d[key] = [val[1]]
val = val[0]
else:
del d[key]
if not isinstance(val, list):
if val in d:
d[val] = [d[val], key]
else:
d[val] = [key]
done = False
break
for key, val in d.items():
d[key] = val[0]
Related
Is there a more pythonic way of obtaining a sorted list of dictionary keys with one key moved to the head? So far I have this:
# create a unique list of keys headed by 'event' and followed by a sorted list.
# dfs is a dict of dataframes.
for k in (dict.fromkeys(['event']+sorted(dfs))):
display(k,dfs[k]) # ideally this should be (k,v)
I suppose you would be able to do
for k, v in list(dfs.items()) + [('event', None)]:
.items() casts a dictionary to a list of tuples (or technically a dict_items, which is why I have to cast it to list explicitly to append), to which you can append a second list. Iterating through a list of tuples allows for automatic unpacking (so you can do k,v in list instead of tup in list)
What we really want is an iterable, but that's not possible with sorted, because it must see all the keys before it knows what the first item should be.
Using dict.fromkeys to create a blank dictionary by insertion order was pretty clever, but relies on an implementation detail of the current version of python. (dict is fundamentally unordered) I admit, it took me a while to figure out that line.
Since the code you posted is just working with the keys, I suggest you focus on that. Taking up a few more lines for readability is a good thing, especially if we can hide it in a testable function:
def display_by_keys(dfs, priority_items=None):
if not priority_items:
priority_items = ['event']
featured = {k for k in priority_items if k in dfs}
others = {k for k in dfs.keys() if k not in featured}
for key in list(featured) + sorted(others):
display(key, dfs[key])
The potential downside is you must sort the keys every time. If you do this much more often than the data store changes, on a large data set, that's a potential concern.
Of course you wouldn't be displaying a really large result, but if it becomes a problem, then you'll want to store them in a collections.OrderedDict (https://stackoverflow.com/a/13062357/1766544) or find a sorteddict module.
from collections import OrderedDict
# sort once
ordered_dfs = OrderedDict.fromkeys(sorted(dfs.keys()))
ordered_dfs.move_to_end('event', last=False)
ordered_dfs.update(dfs)
# display as often as you need
for k, v in ordered_dfs.items():
print (k, v)
If you display different fields first in different views, that's not a problem. Just sort all the fields normally, and use a function like the one above, without the sort.
Suppose I have a dictionary:
dictionary1 = {'test':'output','test2':'output2'}
How would I be able to print the key, test, on the screen?
I only want one of the keys at a time, not all of them.
By the way, not literally, by doing print('test'), I mean how do you print the key of any dictionary?
Like is there something like this:
#pseudocode
x = dictionary1.keys()[0]
>>> print(x)
'test'
I do not want to sort the dictionary that I'm actually using in my program.
A dictionary may have any number of keys, 0+. To print them all (if any) in sorted order, you could
for key in sorted(dictionary1):
print(key)
If you don't care about the order (i.e you're fine with a random-looking order), remove the sorted call.
If you want to be selective (e.g only print keys that are sequences of length 4, so that in your example test would be printed but test2 wouldn't):
for key in sorted(dictionary1):
try:
if len(key) == 4:
print(key)
except TypeError:
pass
If you want to print only one such key, add a break after the print.
There are at least 347 other things you could mean by your extremely vague question, but I tried to field the ones that seem most likely... if that's not enough for you, edit your question to make it much more precise!-)
Added: the edit only said the OP wants to print only one key and do no sorting, so it remains a mystery how that one key is to be picked (I suspect there's a misunderstanding -- as if dict keys had a specific order, which of course in Python they don't, and the OP wants "the first one", which just can't be pinned down -- OrderedDict is a very different [and alas inevitably slower] beast, but dicts what the OP's showing).
So if the goal is to print "any one key, no matter which one" (and it is known that thedict is not empty):
print(next(iter(thedict)))
is the simplest, most direct way -- a dict is iterable, yielding its keys in arbitrary order; iter returns an iterator on its iterable argument; and next returns the first (first in arbitrary order means an arbitrary one of course) item of its iterator argument.
If it's possible for thedict to be empty, and nothing must be printed in that case, just add a guard:
if thedict: print(next(iter(thedict)))
This will print nothing for an empty dictionary, the only key for a dictionary with length 1, an arbitrary key for a dictionary with length greater than 1.
If you're trying to find a key associated with a specific value you can do
keys = [key for key in d if d[key] == value]
print(keys)
This question already has an answer here:
deleting entries in a dictionary based on a condition
(1 answer)
Closed 8 years ago.
I'm trying to drop items from a dictionary if the value of the key is below a certain threshold. For a simple example to what I mean:
my_dict = {'blue': 1, 'red': 2, 'yellow': 3, 'green': 4}
for color in my_dict:
threshold_value = 3
if my_dict[color] < threshold_value:
del my_dict[color]
print(my_dict)
Now, I get a RuntimeError: dictionary changed size during iteration error. No big surprises there. The reason I'm posting this question is:
Find out if there's an elegant solution that doesn't require creating a new dictionary (that holds only the keys with values >= threshold).
Try to understand Python's rationale here. The way I read it to myself is: "go to the first key. Is the value of that key < x ? if yes - del this key:value item and continue on the the next key in the dictionary, if no - continue to next key without doing anything". In other words, what happened historically to previous keys shouldn't affect where I go next. I'm looking forward to the next items, regardless of the past.
I know it's a funny (some might say stupid, I'll give you that) but what's Python's "way of thinking" about this loop? Why doesn't it work? How would Python read it out loud to itself? Just trying to get a better understanding of the language...
Due to the fact that Python dictionaries are implemented as hash tables, you shouldn't rely on them having any sort of an order. Key order may change unpredictably (but only after insertion or removal of a key). Thus, it's impossible to predict the next key. Python throws the RuntimeError to be safe, and to prevent people from running into unexpected results.
Python 2's dict.items method returns a copy of key-value pairs, so you can safely iterate over it and delete values you don't need by keys, as #wim suggested in comments. Example:
for k, v in my_dict.items():
if v < threshold_value:
del my_dict[k]
However, Python 3's dict.items returns a view object that reflects all changes made to the dictionary. This is the reason the solution above only works in Python 2. You may convert my_dict.items() to list (tuple etc.) to make it Python 3-compatible.
Another way to approach the problem is to select keys you want to delete and then delete them
keys = [k for k, v in my_dict.items() if v < threshold_value]
for x in keys:
del my_dict[x]
This works in both Python 2 and Python 3.
Dictionaries are unordered. By deleting one key nobody can say, what the next key is. So python in general disallow to add or remove keys from a dictionary, over that is iterated.
Just create a new one:
my_dict = {"blue":1,"red":2,"yellow":3,"green":4}
new_dict = {k:v for k,v in my_dict.iteritems() if v >= threshold_value}
I guess that modifying a collection while iterating over it is a hard thing to do to implement properly. Consider following exaple:
>>> list = [1, 2, 3, 4, 5, 6]
>>> for ii in range(len(list)):
print list[ii];
if list[ii] == 3:
del list[ii]
1
2
3
5
6
Notice that in this example 4 was altogether omitted. It is very similat in dictionaries, deleting/adding entries might invalidate internal structures that define order of iteration (for example you deleted enough entries so hash map bucket size changed).
To solve your case --- just create new dictionary and copy items there. As to
I have a couple of pairs of rather big dicts. The structure of the pair dicts is exactly the same but the values will differ. All pairs differ in how nested they are.
To clarify:
dict_a has same structure as dict_b
dict_c has same structure as dict_d (but is different from dict_a and dict_b)
etc.
Is there a tool out there that makes it easy to implement a function to compare the values only, and/or do some basic arithmetic on them? My dicts can be quite nested, so a simple [for k,v in dict_x.iteritems()...] won't do.
Sounds like a problem for...recursive functions!
Basically, if I understand your question, you have a deep dictionary with varying levels of depths at unspecified keys. You'd like to compare the values of dict_a to dict_b but don't care much about the keys: just the differences in values. Here's an idea using a recursive function to print out each set of values that doesn't match.
def dict_compare(da, db):
for k, v in da.iteritems():
if isinstance(v, dict): #if the value is another dict:
dict_compare(v, db[k]) #enter into the comparison function again!
else:
if v != db[k]:
print 'values not equal at', k
Then you just can call
dict_compare(dict_a, dict_b)
The magic being that if the value of a given key is in fact another dictionary, just call your comparison function again.
Obviously, if you wanted to do something more complicated than just print the simple key of the values that don't match, just modify what happens after the if v != db[k] line.
Python dict key delete, if key pattern match with other dict key.
e.g.
a={'a.b.c.test':1, 'b.x.d.pqr':2, 'c.e.f.dummy':3, 'd.x.y.temp':4}
b={'a.b.c':1, 'b.p.q':20}
result
a={'b.x.d.pqr':2,'c.e.f.dummy':3,'d.x.y.temp':4}`
If "pattern match with other dict key" means "starts with any key in the other dict", the most direct way to write that would be like this:
a = {k:v for (k, v) in a.items() if any(k.startswith(k2) for k2 in b)}
If that's hard to follow at first glance, it's basically the equivalent of this:
def matches(key1, d2):
for key2 in d2:
if key1.startswith(key2):
return True
return False
c = {}
for key in a:
if not matches(key, b):
c[key] = a[key]
a = c
This is going to be slower than necessary. If a has N keys, and b has M keys, the time taken is O(NM). While you can checked "does key k exist in dict b" in constant time, there's no way to check "does any key starting with k exist in dict b" without iterating over the whole dict. So, if b is potentially large, you probably want to search sorted(b.keys()) and write a binary search, which will get the time down to O(N log M). But if this isn't a bottleneck, you may be better off sticking with the simple version, just because it's simple.
Note that I'm generating a new a with the matches filtered out, rather than deleting the matches. This is almost always a better solution than deleting in-place, for multiple reasons:
* It's much easier to reason about. Treating objects as immutable and doing pure operations on them means you don't need to think about how states change over time. For example, the naive way to delete in place would run into the problem that you're changing the dictionary while iterating over it, which will raise an exception. Issues like that never come up without mutable operations.
* It's easier to read, and (once you get the hang of it) even to write.
* It's almost always faster. (One reason is that it takes a lot more memory allocations and deallocations to repeatedly modify a dictionary than to build one with a comprehension.)
The one tradeoff is memory usage. The delete-in-place implementation has to make a copy of all of the keys; the built-a-new-dict implementation has to have both the filtered dict and the original dict in memory. If you're keeping 99% of the values, and the values are much larger than the keys, this could hurt you. (On the other hand, if you're keeping 10% of the values, and the values are about the same size as the keys, you'll actually save space.) That's why it's "almost always" a better solution, rather than "always".
for key in list(a.keys()):
if any(key.startswith(k) for k in b):
del a[key]
Replace key.startswith(k) with an appropriate condition for "matching".
c={} #result in dict c
for key in b.keys():
if all([z.count(key)==0 for z in a.keys()]): #string of the key in b should not be substring for any of the keys in a
c[key]=b[key]