Python: searching through set of objects

Python: searching through set of objects - python

I have a set of objects W that have the attributes name and a score. The __hash__() function is based upon the name only, and the __eq__() function is not defined, so it is based upon the __hash__() function.
Now, I want to use the score of the object. Is there a quicker way to reference to an instance than the following script? Given the way a set works, there must be...
tmp_obj = W(name="myname", score=0)
for obj in w_set:
if obj == tmp_obj: break
else:
# do nothing with obj
# do something with obj.score

You can use the in operator to check for set membership. This is a constant time operation in sets and dictionaries, since they are implemented as hash tables. For lists and tuples in is linear time.
obj = W("myname", 0)
if obj in w_set:
# do something with obj

You don't say how you set up your object, but why not just use if obj.score == 0?
for obj in w_set:
if obj.score == 0:
break
Or perhaps your question is about avoiding the linear search?
If you have a lot of objects and you'll be doing a lot of searches by score, you need to build an index mapping scores to objects. Presumably several objects could have the same score, so we'll build a list for each score (a set would also work):
from collections import defaultdict
score_index = defaultdict(list)
for obj in w_set:
score_index[obj.score].append(obj)
You can now loop over the list of all objects with score zero without searching:
for obj in score_index[0]:
# Do something

Related

How to check if a tuple key is in a dict with O(1) time?

I'm trying to implement a hash table/hash map in Python.
Say I'm using tuples as keys like this:
hashTable = {}
node = [1, 2, 3]
print(hashTable[tuple(node)]) # throws an error
hashTable[tuple(node)] = True
print(hashTable[tuple(node)]) # prints TRUE
I want to check if elements exist in the hashTable before adding it. I have tried initializing the dictionary with all False values.
hashTable = {}
for i in range(1000):
hashTable[i] = False
So this creates a hash table of size 1000 with every slot set to FALSE. But if I try to check if a non-existent element is in the hashTable:
print(hashTable[tuple(node)])
I get the same error as before.
How does one go about doing this? I think this would work iterating through the dict with in but doesn't that defeat the whole purpose of using a hash table in the first place?

Accessing a key is similar to, but not necessarily the same as checking if it exists. To check if a key is in a dictionary, use dict.__contains__ via the in operator. To check if it is missing, use the not in operator:
key = tuple(node)
if key not in hashTable:
hashTable[key] = value
That being said, a totally valid way to check for containment can be by attempting access:
key = tuple(node)
try:
# attempt to use hashTable[key]
except KeyError:
# Do something with missing key
The advantage of doing it this way when both paths are needed is that you only need to access the dictionary once rather than twice.
Try to avoid calling tuple(node) over and over: it's not free. If you can, generate node as a tuple, do so. If not, perform the conversion once and use the converted value.

You can use the in operator to determine membership in a dictionary:
e.g.
if tuple(node) in hashTable:
x = hashTable[tuple(node)]
...

You can try to get the key and in case it is not in de dictionary yet, return a default value as may be None:
x = hashTable.get(node, default=None)

How add() on set can work in such dictionary? [duplicate]

The addition of collections.defaultdict in Python 2.5 greatly reduced the need for dict's setdefault method. This question is for our collective education:
What is setdefault still useful for, today in Python 2.6/2.7?
What popular use cases of setdefault were superseded with collections.defaultdict?

You could say defaultdict is useful for settings defaults before filling the dict and setdefault is useful for setting defaults while or after filling the dict.
Probably the most common use case: Grouping items (in unsorted data, else use itertools.groupby)
# really verbose
new = {}
for (key, value) in data:
if key in new:
new[key].append( value )
else:
new[key] = [value]
# easy with setdefault
new = {}
for (key, value) in data:
group = new.setdefault(key, []) # key might exist already
group.append( value )
# even simpler with defaultdict
from collections import defaultdict
new = defaultdict(list)
for (key, value) in data:
new[key].append( value ) # all keys have a default already
Sometimes you want to make sure that specific keys exist after creating a dict. defaultdict doesn't work in this case, because it only creates keys on explicit access. Think you use something HTTP-ish with many headers -- some are optional, but you want defaults for them:
headers = parse_headers( msg ) # parse the message, get a dict
# now add all the optional headers
for headername, defaultvalue in optional_headers:
headers.setdefault( headername, defaultvalue )

I commonly use setdefault for keyword argument dicts, such as in this function:
def notify(self, level, *pargs, **kwargs):
kwargs.setdefault("persist", level >= DANGER)
self.__defcon.set(level, **kwargs)
try:
kwargs.setdefault("name", self.client.player_entity().name)
except pytibia.PlayerEntityNotFound:
pass
return _notify(level, *pargs, **kwargs)
It's great for tweaking arguments in wrappers around functions that take keyword arguments.

defaultdict is great when the default value is static, like a new list, but not so much if it's dynamic.
For example, I need a dictionary to map strings to unique ints. defaultdict(int) will always use 0 for the default value. Likewise, defaultdict(intGen()) always produces 1.
Instead, I used a regular dict:
nextID = intGen()
myDict = {}
for lots of complicated stuff:
#stuff that generates unpredictable, possibly already seen str
strID = myDict.setdefault(myStr, nextID())
Note that dict.get(key, nextID()) is insufficient because I need to be able to refer to these values later as well.
intGen is a tiny class I build that automatically increments an int and returns its value:
class intGen:
def __init__(self):
self.i = 0
def __call__(self):
self.i += 1
return self.i
If someone has a way to do this with defaultdict I'd love to see it.

As most answers state setdefault or defaultdict would let you set a default value when a key doesn't exist. However, I would like to point out a small caveat with regard to the use cases of setdefault. When the Python interpreter executes setdefaultit will always evaluate the second argument to the function even if the key exists in the dictionary. For example:
In: d = {1:5, 2:6}
In: d
Out: {1: 5, 2: 6}
In: d.setdefault(2, 0)
Out: 6
In: d.setdefault(2, print('test'))
test
Out: 6
As you can see, print was also executed even though 2 already existed in the dictionary. This becomes particularly important if you are planning to use setdefault for example for an optimization like memoization. If you add a recursive function call as the second argument to setdefault, you wouldn't get any performance out of it as Python would always be calling the function recursively.
Since memoization was mentioned, a better alternative is to use functools.lru_cache decorator if you consider enhancing a function with memoization. lru_cache handles the caching requirements for a recursive function better.

I use setdefault() when I want a default value in an OrderedDict. There isn't a standard Python collection that does both, but there are ways to implement such a collection.

As Muhammad said, there are situations in which you only sometimes wish to set a default value. A great example of this is a data structure which is first populated, then queried.
Consider a trie. When adding a word, if a subnode is needed but not present, it must be created to extend the trie. When querying for the presence of a word, a missing subnode indicates that the word is not present and it should not be created.
A defaultdict cannot do this. Instead, a regular dict with the get and setdefault methods must be used.

Theoretically speaking, setdefault would still be handy if you sometimes want to set a default and sometimes not. In real life, I haven't come across such a use case.
However, an interesting use case comes up from the standard library (Python 2.6, _threadinglocal.py):
>>> mydata = local()
>>> mydata.__dict__
{'number': 42}
>>> mydata.__dict__.setdefault('widgets', [])
[]
>>> mydata.widgets
[]
I would say that using __dict__.setdefault is a pretty useful case.
Edit: As it happens, this is the only example in the standard library and it is in a comment. So may be it is not enough of a case to justify the existence of setdefault. Still, here is an explanation:
Objects store their attributes in the __dict__ attribute. As it happens, the __dict__ attribute is writeable at any time after the object creation. It is also a dictionary not a defaultdict. It is not sensible for objects in the general case to have __dict__ as a defaultdict because that would make each object having all legal identifiers as attributes. So I can't foresee any change to Python objects getting rid of __dict__.setdefault, apart from deleting it altogether if it was deemed not useful.

I rewrote the accepted answer and facile it for the newbies.
#break it down and understand it intuitively.
new = {}
for (key, value) in data:
if key not in new:
new[key] = [] # this is core of setdefault equals to new.setdefault(key, [])
new[key].append(value)
else:
new[key].append(value)
# easy with setdefault
new = {}
for (key, value) in data:
group = new.setdefault(key, []) # it is new[key] = []
group.append(value)
# even simpler with defaultdict
new = defaultdict(list)
for (key, value) in data:
new[key].append(value) # all keys have a default value of empty list []
Additionally,I categorized the methods as reference:
dict_methods_11 = {
'views':['keys', 'values', 'items'],
'add':['update','setdefault'],
'remove':['pop', 'popitem','clear'],
'retrieve':['get',],
'copy':['copy','fromkeys'],}

One drawback of defaultdict over dict (dict.setdefault) is that a defaultdict object creates a new item EVERYTIME non existing key is given (eg with ==, print). Also the defaultdict class is generally way less common then the dict class, its more difficult to serialize it IME.
P.S. IMO functions|methods not meant to mutate an object, should not mutate an object.

Here are some examples of setdefault to show its usefulness:
"""
d = {}
# To add a key->value pair, do the following:
d.setdefault(key, []).append(value)
# To retrieve a list of the values for a key
list_of_values = d[key]
# To remove a key->value pair is still easy, if
# you don't mind leaving empty lists behind when
# the last value for a given key is removed:
d[key].remove(value)
# Despite the empty lists, it's still possible to
# test for the existance of values easily:
if d.has_key(key) and d[key]:
pass # d has some values for key
# Note: Each value can exist multiple times!
"""
e = {}
print e
e.setdefault('Cars', []).append('Toyota')
print e
e.setdefault('Motorcycles', []).append('Yamaha')
print e
e.setdefault('Airplanes', []).append('Boeing')
print e
e.setdefault('Cars', []).append('Honda')
print e
e.setdefault('Cars', []).append('BMW')
print e
e.setdefault('Cars', []).append('Toyota')
print e
# NOTE: now e['Cars'] == ['Toyota', 'Honda', 'BMW', 'Toyota']
e['Cars'].remove('Toyota')
print e
# NOTE: it's still true that ('Toyota' in e['Cars'])

I use setdefault frequently when, get this, setting a default (!!!) in a dictionary; somewhat commonly the os.environ dictionary:
# Set the venv dir if it isn't already overridden:
os.environ.setdefault('VENV_DIR', '/my/default/path')
Less succinctly, this looks like this:
# Set the venv dir if it isn't already overridden:
if 'VENV_DIR' not in os.environ:
os.environ['VENV_DIR'] = '/my/default/path')
It's worth noting that you can also use the resulting variable:
venv_dir = os.environ.setdefault('VENV_DIR', '/my/default/path')
But that's less necessary than it was before defaultdicts existed.

Another use case that I don't think was mentioned above.
Sometimes you keep a cache dict of objects by their id where primary instance is in the cache and you want to set cache when missing.
return self.objects_by_id.setdefault(obj.id, obj)
That's useful when you always want to keep a single instance per distinct id no matter how you obtain an obj each time. For example when object attributes get updated in memory and saving to storage is deferred.

One very important use-case I just stumbled across: dict.setdefault() is great for multi-threaded code when you only want a single canonical object (as opposed to multiple objects that happen to be equal).
For example, the (Int)Flag Enum in Python 3.6.0 has a bug: if multiple threads are competing for a composite (Int)Flag member, there may end up being more than one:
from enum import IntFlag, auto
import threading
class TestFlag(IntFlag):
one = auto()
two = auto()
three = auto()
four = auto()
five = auto()
six = auto()
seven = auto()
eight = auto()
def __eq__(self, other):
return self is other
def __hash__(self):
return hash(self.value)
seen = set()
class cycle_enum(threading.Thread):
def run(self):
for i in range(256):
seen.add(TestFlag(i))
threads = []
for i in range(8):
threads.append(cycle_enum())
for t in threads:
t.start()
for t in threads:
t.join()
len(seen)
# 272 (should be 256)
The solution is to use setdefault() as the last step of saving the computed composite member -- if another has already been saved then it is used instead of the new one, guaranteeing unique Enum members.

In addition to what have been suggested, setdefault might be useful in situations where you don't want to modify a value that has been already set. For example, when you have duplicate numbers and you want to treat them as one group. In this case, if you encounter a repeated duplicate key which has been already set, you won't update the value of that key. You will keep the first encountered value. As if you are iterating/updating the repeated keys once only.
Here's a code example of recording the index for the keys/elements of a sorted list:
nums = [2,2,2,2,2]
d = {}
for idx, num in enumerate(sorted(nums)):
# This will be updated with the value/index of the of the last repeated key
# d[num] = idx # Result (sorted_indices): [4, 4, 4, 4, 4]
# In the case of setdefault, all encountered repeated keys won't update the key.
# However, only the first encountered key's index will be set
d.setdefault(num,idx) # Result (sorted_indices): [0, 0, 0, 0, 0]
sorted_indices = [d[i] for i in nums]

[Edit] Very wrong! The setdefault would always trigger long_computation, Python being eager.
Expanding on Tuttle's answer. For me the best use case is cache mechanism. Instead of:
if x not in memo:
memo[x]=long_computation(x)
return memo[x]
which consumes 3 lines and 2 or 3 lookups, I would happily write :
return memo.setdefault(x, long_computation(x))

I like the answer given here:
http://stupidpythonideas.blogspot.com/2013/08/defaultdict-vs-setdefault.html
In short, the decision (in non-performance-critical apps) should be made on the basis of how you want to handle lookup of empty keys downstream (viz. KeyError versus default value).

The different use case for setdefault() is when you don't want to overwrite the value of an already set key. defaultdict overwrites, while setdefault() does not. For nested dictionaries it is more often the case that you want to set a default only if the key is not set yet, because you don't want to remove the present sub dictionary. This is when you use setdefault().
Example with defaultdict:
>>> from collection import defaultdict()
>>> foo = defaultdict()
>>> foo['a'] = 4
>>> foo['a'] = 2
>>> print(foo)
defaultdict(None, {'a': 2})
setdefault doesn't overwrite:
>>> bar = dict()
>>> bar.setdefault('a', 4)
>>> bar.setdefault('a', 2)
>>> print(bar)
{'a': 4}

Another usecase for setdefault in CPython is that it is atomic in all cases, whereas defaultdict will not be atomic if you use a default value created from a lambda.
cache = {}
def get_user_roles(user_id):
if user_id in cache:
return cache[user_id]['roles']
cache.setdefault(user_id, {'lock': threading.Lock()})
with cache[user_id]['lock']:
roles = query_roles_from_database(user_id)
cache[user_id]['roles'] = roles
If two threads execute cache.setdefault at the same time, only one of them will be able to create the default value.
If instead you used a defaultdict:
cache = defaultdict(lambda: {'lock': threading.Lock()}
This would result in a race condition. In my example above, the first thread could create a default lock, and the second thread could create another default lock, and then each thread could lock its own default lock, instead of the desired outcome of each thread attempting to lock a single lock.
Conceptually, setdefault basically behaves like this (defaultdict also behaves like this if you use an empty list, empty dict, int, or other default value that is not user python code like a lambda):
gil = threading.Lock()
def setdefault(dict, key, value_func):
with gil:
if key not in dict:
return
value = value_func()
dict[key] = value
Conceptually, defaultdict basically behaves like this (only when using python code like a lambda - this is not true if you use an empty list):
gil = threading.Lock()
def __setitem__(dict, key, value_func):
with gil:
if key not in dict:
return
value = value_func()
with gil:
dict[key] = value

Multi-level defaultdict with variable depth and with list and int type

I am trying to create a multi-level dict with variable depth and with list and int type.
Data structure is like below
A
--B1
-----C1=1
-----C2=[1]
--B2=[3]
D
--E
----F
------G=4
In the case of above data structure, the last value can be an int or list.
If the above data structure has the only int then I can be easily achieved by using the below code:
from collections import defaultdict
f = lambda: defaultdict(f)
d = f()
d['A']['B1']['C1'] = 1
But as the last value has both list and int, it becomes a bit problematic for me.
Now we can insert data in a list using two ways.
d['A']['B1']['C2']= [1]
d['A']['B1']['C2'].append([2])
But when I am using only the append method it is causing the error.
Error is:
AttributeError: 'collections.defaultdict' object has no attribute 'append'
so Is there any way to use only the append method for a list?

There's no way you can use your current defaultdict-based structure to make d['A']['B1']['C2'].append(1) work properly if the 'C2' key doesn't already exist, since the data structure can't tell that the unknown key should correspond to a list rather than another layer of dictionary. It doesn't know what method you're going to call on the value it returns, so it can't know it shouldn't return a dictionary (like it did when it first looked up 'A' and 'B').
This isn't an issue for bare integers, since for those you're as assigning directly to a new key (and all the earlier levels are dictionaries). When you're assigning, the data structure isn't creating the value, you are, so you can use any type you want.
Now, if your keys are distinctive in some way, so that given a key like 'C2' you can know for sure that it should correspond to a list, you may have a chance. You can write your own dict subclass, defining a __missing__ method to handle lookups of keys that don't exist yet in your own special way:
def Tree(dict):
def __missing__(self, key):
if key_corresponds_to_list(key): # magic from somewhere
result = self[key] = []
else:
result = self[key] = Tree()
return result
# you might also want a custom __repr__
Here's an example run with a magic key function that makes any even-length key default to a list, while an odd-length key defaults to a dict:
> def key_corresponds_to_list(key):
return len(key) % 2 == 0
> t = Tree()
> t["A"]["B"]["C2"].append(1) # the default value for C2 is a list because it's even length
> t
{'A': {'B': {'C2': [1]}}}
> t["A"]["B"]["C10"]["D"] = 2 # C10's another layer of dict, since it's length is odd
> t
{'A': {'B': {'C10': {'D': 2}, 'C2': [1]}}} # it didn't matter what length D was though
You probably won't actually want to use a global function to control the class like this, I just did that as an example. If you go with this approach, I'd suggest putting the logic directly into the __missing__ method (or maybe passing a function as a parameter, like defaultdict does with its factory function).

Python creating dictionary key from a list of items

I wish to use a Python dictionary to keep track of some running tasks. Each of these tasks has a number of attributes which makes it unique, so I'd like to use a function of these attributes to generate the dictionary keys, so that I can find them in the dictionary again by using the same attributes; something like the following:
class Task(object):
def __init__(self, a, b):
pass
#Init task dictionary
d = {}
#Define some attributes
attrib_a = 1
attrib_b = 10
#Create a task with these attributes
t = Task(attrib_a, attrib_b)
#Store the task in the dictionary, using a function of the attributes as a key
d[[attrib_a, attrib_b]] = t
Obviously this doesn't work (the list is mutable, and so can't be used as a key ("unhashable type: list")) - so what's the canonical way of generating a unique key from several known attributes?

Use a tuple in place of the list. Tuples are immutable and can be used as dictionary keys:
d[(attrib_a, attrib_b)] = t
The parentheses can be omitted:
d[attrib_a, attrib_b] = t
However, some people seem to dislike this syntax.

Use a tuple
d[(attrib_a, attrib_b)] = t
That should work fine

Updating values in dictionary object

My program has a class Words where a defaultdict(int) named t_e_f is created as an object and a function main() that contains a pointer to a function that uses the values of the dictionary 't_e_f' to compute other calculations. 't_e_f' is a dictionary having as key a tuple of words and as value a float number.
My programs looks like this:
class Words:
def __init__(init):
self.t_e_f=Words.set_t_e_f(self)
def set_t_e_f(self):
raw_text_e=open_file('toyen')
raw_text_f=open_file('toyde')
tokens_e=raw_text_e.split()
tokens_f=raw_text_f.split()+['NULL']
tef_dict=collections.defaultdict(int)
for word_e in tokens_e_set:
for word_f in tokens_f_set:
tef_dict[(word_e,word_f)]=1/len(tokens_e_set)
return tef_dict
def get_t_e_f(self):
return self.t_e_f
def main():
words=Words()
t_e_f=words.get_t_e_f()
s_total_e=normalization(t_e_f)
I then have a normalization function that takes t_e_f and uses it to compute calculations over the values of another dictionary created in the normalization function, s_total_e.
def normalization(t_e_f):
s_total_e=collections.defaultdict(int)
words_sent_e=['the','big','book']
words_sent_de=['das','grosse','buch']
for item in words_sent_e:
s_total_e[item]
for item in words_sent_e:
for item_2 in words_sent_de:
s_total_e[item]+=t_e_f[(item,item_2)]
The problem is that when t_e_f is passed to normalization all the values are set to 0, therefore losing the initial values set when the words object was created. I was wondering what was happening and how to solve this problem.
Thank you.

The tef_dict variable isn't being saved to the instance and is not being returned. Add a line to set_t_e_f():
return tef_dict
Also note that defaultdict will automatically add a zero entry even if you only lookup or inspect a missing key.
You may be better-off using collections.Counter() instead. Unlike defaultdict, it will return zeros for missing keys but won't add them to the underlying dictionary.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: searching through set of objects - python

You can use the in operator to check for set membership. This is a constant time operation in sets and dictionaries, since they are implemented as hash tables. For lists and tuples in is linear time. obj = W("myname", 0) if obj in w_set: # do something with obj

Related

How to check if a tuple key is in a dict with O(1) time?

How add() on set can work in such dictionary? [duplicate]

Multi-level defaultdict with variable depth and with list and int type

Python creating dictionary key from a list of items

Updating values in dictionary object

Categories

Resources