Search element in dictionary by mulitple conditions without looping - python

I have a dictionary like one below (but with 10k key-value pairs):
test_dict={'2*foo*+':['5','10'],'3*bar*-':['15','20']}
Is there a way in python to find an element which key.split("*")[0]==2, key.split("*")[2]=="+" and val[1]<15 without looping through the dictionary. Its easy to do by for loop, but in my case this is a part of a bigger code which is nested into another for loop, so it will take very long to finish.
Thanks,

As asked, the answer is no. There is no way to test the keys and values of a dictionary without looking at each one in turn until you find a match.
However, if you build a more complex datastructure (possibly consisting of a series of dicts) so that entries are also indexed by key.split("*")[0], then you would only have to loop over those elements.
(It does sound like you are trying to build an in-memory database though - you might well be better off just using a proper database, and relying on the caching to keep most of it in memory.)

You can use filter with a set():
test_dict={'2*foo*+':['5','10'],'3*bar*-':['15':'20']}
possibilities = list(filter(lambda x: int(x[0].split("*")[0]) == 2 and x[0].split("*")[2] == "+" and int(x[1][1]) < 15, test_dict.items()))

You note that "this is a part of a bigger code which is nested into another for loop" so I suggest that you build an index of key parts before your outer loop. Your indexes will contain sets of keys matching individual conditions. Because they contain sets you can find fast intersections to find keys that satisfy your key condition.
from collections import defaultdict
key_index_num = defaultdict(set)
key_index_word = defaultdict(set)
key_index_sign = defaultdict(set)
for key in test_dict:
num, word, sign = key.split('*')
key_index_num[num].add(key)
key_index_word[word].add(key)
key_index_sign[sign].add(key)
Then it will be easy to find keys in your inner loop. Let's say you want to find all keys that have num == '2' and sign == '+'. Find the keys by doing:
keys = key_index_num['2'].intersection(key_index_sign['+'])
Note: I have built three indexes, but if the three parts of your key are always unique you can build a single key index. The code would then look like this:
from collections import defaultdict
key_index = defaultdict(set)
for key in test_dict:
for key_part in key.split('*'):
key_index[key_part].add(key)
And keys search would look like:
keys = key_index['2'].intersection(key_index['+'])

Related

How to remove random excess keys and values from dictionary in Python

I have a dictionary variable with several thousands of items. For the purpose of writing code and debugging, I want to temporarily reduce its size to more easily work with it (i.e. check contents by printing). I don't really care which items get removed for this purpose. I tried to keep only 10 first keys with this code:
i = 0
for x in dict1:
if i >= 10:
dict1.pop(x)
i += 1
but I get the error:
RuntimeError: dictionary changed size during iteration
What is the best way to do it?
You could just rewrite the dictionary selecting a slice from its items.
dict(list(dict1.items())[:10])
Select some random keys to delete first, then iterate over that list and remove them.
import random
keys = random.sample(list(dict1.keys()), k=10)
for k in keys:
dict1.pop(k)
You can convert the dictionary into a list of items, split, and convert back to a dictionary like this:
splitPosition = 10
subDict = dict(list(dict1.items())[:splitPosition])

Python: Create sorted list of keys moving one key to the head

Is there a more pythonic way of obtaining a sorted list of dictionary keys with one key moved to the head? So far I have this:
# create a unique list of keys headed by 'event' and followed by a sorted list.
# dfs is a dict of dataframes.
for k in (dict.fromkeys(['event']+sorted(dfs))):
display(k,dfs[k]) # ideally this should be (k,v)
I suppose you would be able to do
for k, v in list(dfs.items()) + [('event', None)]:
.items() casts a dictionary to a list of tuples (or technically a dict_items, which is why I have to cast it to list explicitly to append), to which you can append a second list. Iterating through a list of tuples allows for automatic unpacking (so you can do k,v in list instead of tup in list)
What we really want is an iterable, but that's not possible with sorted, because it must see all the keys before it knows what the first item should be.
Using dict.fromkeys to create a blank dictionary by insertion order was pretty clever, but relies on an implementation detail of the current version of python. (dict is fundamentally unordered) I admit, it took me a while to figure out that line.
Since the code you posted is just working with the keys, I suggest you focus on that. Taking up a few more lines for readability is a good thing, especially if we can hide it in a testable function:
def display_by_keys(dfs, priority_items=None):
if not priority_items:
priority_items = ['event']
featured = {k for k in priority_items if k in dfs}
others = {k for k in dfs.keys() if k not in featured}
for key in list(featured) + sorted(others):
display(key, dfs[key])
The potential downside is you must sort the keys every time. If you do this much more often than the data store changes, on a large data set, that's a potential concern.
Of course you wouldn't be displaying a really large result, but if it becomes a problem, then you'll want to store them in a collections.OrderedDict (https://stackoverflow.com/a/13062357/1766544) or find a sorteddict module.
from collections import OrderedDict
# sort once
ordered_dfs = OrderedDict.fromkeys(sorted(dfs.keys()))
ordered_dfs.move_to_end('event', last=False)
ordered_dfs.update(dfs)
# display as often as you need
for k, v in ordered_dfs.items():
print (k, v)
If you display different fields first in different views, that's not a problem. Just sort all the fields normally, and use a function like the one above, without the sort.

Find which dictionaries from a list contain word

I have a dictionary with each keys having multiple values in a list.
The tasks are:
To detect whether a given word is in the dictionary values
If it is true, then return the respective key from the dictionary
Task 1 is achieved by using an if condition:
if (word in dictionary[topics] for topics in dictionary.keys())
I want to get the topics when the if condition evaluates to be True. Something like
if (word in dictionary[topics] for topics in dictionary.keys()):
print topics
You can use a list comprehension (which is like a compressed for loop). They are simpler to write and can in some circumstances be faster to compute:
topiclist = [topic for topic in dictionary if word in dictionary[topic]]
You don't need dictionary.keys() because a dict is already an iterable object; iterating over it will yield the keys anyway, and (in Python 2) in a more efficient way than dictionary.keys().
EDIT:
Here is another way to approach this (it avoids an extra dictionary look up):
topiclist = [topic for (topic, tlist) in dictionary.items() if word in tlist]
Avoiding the extra dictionary lookup may make it faster, although I haven't tested it.
In Python 2, for efficiency sake, you may want to do:
topiclist = [topic for (topic, tlist) in dictionary.iteritems() if word in tlist]
if (word in dictionary[topics] for topics in dictionary.keys())
the problem with the above line is that you are creating a generator object that assesses whether word is in each value of dictionary and returning a bool for each. Since non-empty lists are always true, this if statement will ALWAYS be true, regardless if the word is in the values or not. you can do 2 things:
using any() will make your if statement work:
if any(word in dictionary[topics] for topics in dictionary.keys()):
however, this does not solve your initial problem of capturing the key value. so instead:
use an actual list comprehension that uses the predefined (I assume) variable word as a filter of sorts:
keys = [topics for topics in dictionary if word in dictionary[topics]]
or
use filter()
keys = filter(lambda key: word in dictionary[key],dictionary)
these both do the same thing. reminder that iterating through dictionary and dictionary.keys() are equivalent
just a note that both these methods return a list of all the keys that have values containing word. Access each key with regular list item getting.
It sounds like the word you are searching for will be found in only one key. Correct?
If so, you can just iterate over the dictionary's key-value pairs until you find the key that contains the search word.
For Python 2:
found = False
for (topic, value) in dictionary.iteritems():
if word in topic:
found = True
print topic
break
For Python 3, just replace iteritems() with items().

Is there a quick/optimal way to get a list of unique values for particular key?

I'd like to get all unique values in a collection for a particular key in a MongoDB. I can loop through the entire collection to get them:
values = []
for item in collection.find():
if item['key'] in values:
pass
else:
values.append(item)
But this seems incredibly inefficient, since I have to check every entry, and loop through the list each time (which gets slow as the number of values gets high). Alternatively, I can put all the values in a list and then make a set (which I think is faster, though I haven't tried to figure out how to test speed yet):
values = []
for item in collection.find():
values.append(item['key'])
unique_values = set(values)
Or with a list comprehension:
unique_values = set([item['key'] for item in collection.find()])
But I'm wondering if there's a built-in function that wouldn't require looping through the entire collection (like if these values are stored in hash tables or something), or if there's some better way to get this.
The distinct() method does this. It returns an array(list) of the distinct values for the given key:
unqiue_values = collection.distinct("key")
MongoDB has a build-in method for this problem:
db.collection.distinct(FIELD)

Going through the last x elements in an ordered dictionary?

I want to go through an x number of the most recently added entries of an ordered dictionary. So far, the only way I can think of is this:
listLastKeys = orderedDict.keys()[-x:]
for key in listLastKeys:
#do stuff with orderedDict[key]
But it feels redundant and somewhat wasteful to make another list and go through the ordered dictionary with that list when the ordered dictionary should already know what order it is in. Is there an alternative way? Thanks!
Iterate over the dict in reverse and apply an itertools.islice:
from itertools import islice
for key in islice(reversed(your_ordered_dict), 5):
# do something with last 5 keys
Instead of reversing it like you are, you can loop through it in reverse order using reversed(). Example:
D = {0 : 'h', 1: 'i', 2:'j'}
x = 1
for key in reversed(D.keys()):
if x == key:
break
You could keep a list of the keys present in the dictionary last run.
I don't know the exact semantics of your program, but this is a function that will check for new keys.
keys=[] #this list should be global
def checkNewKeys(myDict):
for key, item in myDict.iteritems():
if key not in keys:
#do what you want with new keys
keys.append(key)
This basically keep track of what was in the dictionary the whole run of your program, without needing to create a new list every time.

Categories