Find which dictionaries from a list contain word - python

I have a dictionary with each keys having multiple values in a list.
The tasks are:
To detect whether a given word is in the dictionary values
If it is true, then return the respective key from the dictionary
Task 1 is achieved by using an if condition:
if (word in dictionary[topics] for topics in dictionary.keys())
I want to get the topics when the if condition evaluates to be True. Something like
if (word in dictionary[topics] for topics in dictionary.keys()):
print topics

You can use a list comprehension (which is like a compressed for loop). They are simpler to write and can in some circumstances be faster to compute:
topiclist = [topic for topic in dictionary if word in dictionary[topic]]
You don't need dictionary.keys() because a dict is already an iterable object; iterating over it will yield the keys anyway, and (in Python 2) in a more efficient way than dictionary.keys().
EDIT:
Here is another way to approach this (it avoids an extra dictionary look up):
topiclist = [topic for (topic, tlist) in dictionary.items() if word in tlist]
Avoiding the extra dictionary lookup may make it faster, although I haven't tested it.
In Python 2, for efficiency sake, you may want to do:
topiclist = [topic for (topic, tlist) in dictionary.iteritems() if word in tlist]

if (word in dictionary[topics] for topics in dictionary.keys())
the problem with the above line is that you are creating a generator object that assesses whether word is in each value of dictionary and returning a bool for each. Since non-empty lists are always true, this if statement will ALWAYS be true, regardless if the word is in the values or not. you can do 2 things:
using any() will make your if statement work:
if any(word in dictionary[topics] for topics in dictionary.keys()):
however, this does not solve your initial problem of capturing the key value. so instead:
use an actual list comprehension that uses the predefined (I assume) variable word as a filter of sorts:
keys = [topics for topics in dictionary if word in dictionary[topics]]
or
use filter()
keys = filter(lambda key: word in dictionary[key],dictionary)
these both do the same thing. reminder that iterating through dictionary and dictionary.keys() are equivalent
just a note that both these methods return a list of all the keys that have values containing word. Access each key with regular list item getting.

It sounds like the word you are searching for will be found in only one key. Correct?
If so, you can just iterate over the dictionary's key-value pairs until you find the key that contains the search word.
For Python 2:
found = False
for (topic, value) in dictionary.iteritems():
if word in topic:
found = True
print topic
break
For Python 3, just replace iteritems() with items().

Related

Using a for loop to print keys and/or values in a dictionary for python. Looking for logical thinking explanation thanks :D

My problem is understanding why these certain lines of code do what they do. Basically why it works logically. I am using PyCharm python 3 I think.
house_Number = {
"Luca": 1, "David": 2, "Alex": 3, "Kaden": 4, "Kian": 5
}
for item in house_Number:
print(house_Number[item]) # Why does this print the values tied with the key?
print(item) # Why does this print the key?
This is my first question so sorry I don't know how to format the code to make it look nice. My question is why when you use the for loop to print the dictionary key or value the syntax to print the key is to print every item? And what does it even mean to print(house_Number[item]).
They both work to print key or value but I really want to know a logical answer as to why it works this way. Thanks :D
I'm not working on any projects just starting to learn off of codeacademey.
In Python, iteration over a dictionary (for item in dict) is defined as iteration over that dictionary's keys. This is simply how the language was designed -- other languages and collection classes do it differently, iterating, for example, over key-value tuples, templated Pair<X,Y> objects, or what have you.
house_Number[item] accesses the value in house_Number referenced by the key item. [...] is the syntax for indexing in Python (and most other languages); an_array[2] gives the third element of an_array and house_Number[item] gives the value corresponding to the key item in the dictionary house_Number.
Just a side note: Python naming conventions would dictate house_number, not house_Number. Capital letters are generally only used in CamelCasedClassNames and CONSTANTS.
In python values inside a dictionary object are accessed using dictionay_name['KEY']
In your case you are iterating over the keys of dictionary
Hope this helps
for item in dic:
print(item) # key
print(dic[item]) # value
Dictionaries are basically containers containing some items (keys) which are stored by hashing method. These keys just map to the values (dic[key]).
Like in set, if you traverse using for loop, you get the keys from it (in random order since they are hashed). Similarly, dictionaries are just sets with a value associated with it. it makes more sense to iterate the keys as in sets (too in random order).
Read more about dicionaries here https://docs.python.org/3/tutorial/datastructures.html#dictionaries and hopefully that will answer your question. Specifically, look at the .items() method of the dictionary object.
When you type for item in house_Number, you don’t specify whether item is the key or value of house_Number. Then python just thinks that you meant the key of house_Number.
So when you do the function print(house_Number[item]), you’re printing the value because your taking the key and finding the value. In other words, you taking each key once, and finding their values, which are 1, 2, 3, 4, 5, 6
The print(item) is just to print the item, which are the keys, "Luca", "David", "Alex", "Kaden", "Kian"
Because the print(house_Number[item]) and print(item) alternating, you get the keys and values alternating, each on a new line.

Fastest way to extract matching strings

I want to search for words that match a given word in a list (example below). However, say there is a list that contain millions of words. What is the most efficient way to perform this search?. I was thinking of tokenizing each list and putting the words in hashtable. Then perform the word search / match and retrieve the list of words that contain this word. From what I can see is this operation will take O(n) operations. Is there any other way? may be without using hash-tables?.
words_list = ['yek', 'lion', 'opt'];
# e.g. if we were to search or match the word "key" with the words in the list we should get the word "yek" or a list of words if there many that match
Also, is there a python library or third party package that can perform efficient searches?
It's not entirely clear when you mean by "match" here, but if you can reduce that to an identity comparison, the problem reduces to a set lookup, which is O(1) time.
For example, if "match" means "has exactly the same set of characters":
words_set = {frozenset(word) for word in words_list}
Then, to look up a word:
frozenset(word) in words_set
Or, if it means "has exactly the same multiset of characters" (i.e., counting duplicates but ignoring order):
words_set = {sorted(word) for word in words_list}
sorted(word) in words_set
… or, if you prefer:
words_set = {collections.Counter(word) for word in words_list}
collections.Counter(word) in words_set
Either way, the key (no pun intended… but maybe it should have been) idea here is to come up with a transformation that turns your values (strings) into values that are identical iff they match (a set of characters, a multiset of characters, an ordered list of sorted characters, etc.). Then, the whole point of a set is that it can look for a value that's equal to your value in constant time.
Of course transforming the list takes O(N) time (unless you just build the transformed set in the first place, instead of building the list and then converting it), but you can use it over and over, and it takes O(1) time each time instead of O(N), which is what it sounds like you care about.
If you need to get back the matching word rather than just know that there is one, you can still do this with a set, but it's easier (if you can afford to waste a bit of space) with a dict:
words_dict = {frozenset(word): word for word in words_list}
words_dict[frozenset(word)] # KeyError if no match
If there could be multiple matches, just change the dict to a multidict:
words_dict = collections.defaultdict(set)
for word in words_list:
words_dict[frozenset(word)].add(word)
words_dict[frozenset(word)] # empty set if no match
Or, if you explicitly want it to be a list rather than a set:
words_dict = collections.defaultdict(list)
for word in words_list:
words_dict[frozenset(word)].append(word)
words_dict[frozenset(word)] # empty list if no match
If you want to do it without using hash tables (why?), you can use a search tree or other logarithmic data structure:
import blist # pip install blist to get it
words_dict = blist.sorteddict()
for word in words_list:
words_dict.setdefault(word, set()).add(word)
words_dict[frozenset(word)] # KeyError if no match
This looks almost identical, except for the fact that it's not quite trivial to wrap defaultdict around a blist.sorteddict—but that just takes a few lines of code. (And maybe you actually want a KeyError rather than an empty set, so I figured it was worth showing both defaultdict and normal dict with setdefault somewhere, so you can choose.)
But under the covers, it's using a hybrid B-tree variant instead of a hash table. Although this is O(log N) time instead of O(1), in some cases it's actually faster than a dict.

Search element in dictionary by mulitple conditions without looping

I have a dictionary like one below (but with 10k key-value pairs):
test_dict={'2*foo*+':['5','10'],'3*bar*-':['15','20']}
Is there a way in python to find an element which key.split("*")[0]==2, key.split("*")[2]=="+" and val[1]<15 without looping through the dictionary. Its easy to do by for loop, but in my case this is a part of a bigger code which is nested into another for loop, so it will take very long to finish.
Thanks,
As asked, the answer is no. There is no way to test the keys and values of a dictionary without looking at each one in turn until you find a match.
However, if you build a more complex datastructure (possibly consisting of a series of dicts) so that entries are also indexed by key.split("*")[0], then you would only have to loop over those elements.
(It does sound like you are trying to build an in-memory database though - you might well be better off just using a proper database, and relying on the caching to keep most of it in memory.)
You can use filter with a set():
test_dict={'2*foo*+':['5','10'],'3*bar*-':['15':'20']}
possibilities = list(filter(lambda x: int(x[0].split("*")[0]) == 2 and x[0].split("*")[2] == "+" and int(x[1][1]) < 15, test_dict.items()))
You note that "this is a part of a bigger code which is nested into another for loop" so I suggest that you build an index of key parts before your outer loop. Your indexes will contain sets of keys matching individual conditions. Because they contain sets you can find fast intersections to find keys that satisfy your key condition.
from collections import defaultdict
key_index_num = defaultdict(set)
key_index_word = defaultdict(set)
key_index_sign = defaultdict(set)
for key in test_dict:
num, word, sign = key.split('*')
key_index_num[num].add(key)
key_index_word[word].add(key)
key_index_sign[sign].add(key)
Then it will be easy to find keys in your inner loop. Let's say you want to find all keys that have num == '2' and sign == '+'. Find the keys by doing:
keys = key_index_num['2'].intersection(key_index_sign['+'])
Note: I have built three indexes, but if the three parts of your key are always unique you can build a single key index. The code would then look like this:
from collections import defaultdict
key_index = defaultdict(set)
for key in test_dict:
for key_part in key.split('*'):
key_index[key_part].add(key)
And keys search would look like:
keys = key_index['2'].intersection(key_index['+'])

What is the difference between the solution that uses defaultdict and the one that uses setdefault?

In Think Python the author introduces defaultdict. The following is an excerpt from the book regarding defaultdict:
If you are making a dictionary of lists, you can often write simpler
code using defaultdict. In my solution to Exercise 12-2, which you can
get from http://thinkpython2.com/code/anagram_sets.py, I make a
dictionary that maps from a sorted string of letters to the list of
words that can be spelled with those letters. For example, 'opst' maps
to the list ['opts', 'post', 'pots', 'spot', 'stop', 'tops']. Here’s
the original code:
def all_anagrams(filename):
d = {}
for line in open(filename):
word = line.strip().lower()
t = signature(word)
if t not in d:
d[t] = [word]
else:
d[t].append(word) return d
This can be simplified using setdefault, which you might have used in Exercise 11-2:
def all_anagrams(filename):
d = {}
for line in open(filename):
word = line.strip().lower()
t = signature(word)
d.setdefault(t, []).append(word)
return d
This solution has the drawback that it makes a new list every time, regardless of whether it is needed. For lists, that’s no big deal, but if the factory function is complicated, it might be. We can avoid this problem and simplify
the code using a defaultdict:
def all_anagrams(filename):
d = defaultdict(list)
for line in open(filename):
word = line.strip().lower()
t = signature(word)
d[t].append(word)
return d
Here's the definition of signature function:
def signature(s):
"""Returns the signature of this string.
Signature is a string that contains all of the letters in order.
s: string
"""
# TODO: rewrite using sorted()
t = list(s)
t.sort()
t = ''.join(t)
return t
What I understand regarding the second solution is that setdefault checks whether t (the signature of the word) exists as a key, if not, it sets it as a key and sets an empty list as its value, then append appends the word to it. If t exists, setdefault returns its value (a list with at least one item, which is a string representing a word), and append appends the word to this list.
What I understand regarding the third solution is that d, which represents a defaultdict, makes t a key and sets an empty list as its value (if t doesn't already exist as a key), then the word is appended to the list. If t does already exist, its value (the list) is returned, and to which the word is appended.
What is the difference between the second and third solutions? I What it means that the code in the second solution makes a new list every time, regardless of whether it's needed? How is setdefault responsible for that? How does using defaultdict make us avoid this problem? How are the second and third solutions different?
The "makes a new list every time" means everytime setdefault(t, []) is called, a new empty list (the [] argument) is created to be the default value just in case it's needed. Using a defaultdict avoids the need for doing that.
Although both solutions return a dictionary, the one using defaultdict is actually returning a defaultdict(list) which is a subclass of the built-in dict class. This normally is not a problem. The most notable effect will likely be if you print() the returned object, as the output from the two looks quite different.
If you don't want that for whatever reason, you can change the last statement of the function to:
return dict(d)
to convert the defaultdict(list) created into a regular dict.

How do you print a key of a dictionary in Python?

Suppose I have a dictionary:
dictionary1 = {'test':'output','test2':'output2'}
How would I be able to print the key, test, on the screen?
I only want one of the keys at a time, not all of them.
By the way, not literally, by doing print('test'), I mean how do you print the key of any dictionary?
Like is there something like this:
#pseudocode
x = dictionary1.keys()[0]
>>> print(x)
'test'
I do not want to sort the dictionary that I'm actually using in my program.
A dictionary may have any number of keys, 0+. To print them all (if any) in sorted order, you could
for key in sorted(dictionary1):
print(key)
If you don't care about the order (i.e you're fine with a random-looking order), remove the sorted call.
If you want to be selective (e.g only print keys that are sequences of length 4, so that in your example test would be printed but test2 wouldn't):
for key in sorted(dictionary1):
try:
if len(key) == 4:
print(key)
except TypeError:
pass
If you want to print only one such key, add a break after the print.
There are at least 347 other things you could mean by your extremely vague question, but I tried to field the ones that seem most likely... if that's not enough for you, edit your question to make it much more precise!-)
Added: the edit only said the OP wants to print only one key and do no sorting, so it remains a mystery how that one key is to be picked (I suspect there's a misunderstanding -- as if dict keys had a specific order, which of course in Python they don't, and the OP wants "the first one", which just can't be pinned down -- OrderedDict is a very different [and alas inevitably slower] beast, but dicts what the OP's showing).
So if the goal is to print "any one key, no matter which one" (and it is known that thedict is not empty):
print(next(iter(thedict)))
is the simplest, most direct way -- a dict is iterable, yielding its keys in arbitrary order; iter returns an iterator on its iterable argument; and next returns the first (first in arbitrary order means an arbitrary one of course) item of its iterator argument.
If it's possible for thedict to be empty, and nothing must be printed in that case, just add a guard:
if thedict: print(next(iter(thedict)))
This will print nothing for an empty dictionary, the only key for a dictionary with length 1, an arbitrary key for a dictionary with length greater than 1.
If you're trying to find a key associated with a specific value you can do
keys = [key for key in d if d[key] == value]
print(keys)

Categories