Python heapq module, heapify method on an object - python

Since I'm trying to be efficient in this program I'm making, I thought I'd use the built in heapq module in python, but some of my objects have multiple attributes, like name and number. Is there a way to use the heapify method to heapify my objects based on a certain attribute? I don't see anything in the documentation.

Right after I posted, I figured you could make a list of the objects by the attribute needed before using heapify which would take O(n) linear time. This wouldn't affect the runtime of heapify or other heapq methods.

#vsekhar and # all the others wondering about the accepted answer.
Assumption:
class SomeObject():
def __init__(self,name, number):
self.name = name
self.number = number
a_list = []
obj_1 = SomeObject("tim", 12)
obj_2 = SomeObject("tom", 13)
Now, instead of creating a heap with the objects only as elements:
heapq.heappush(a_list, obj_1)
heapq.heappush(a_list, obj_2)
you actually want to create the heap with a tuple of 2 values as heap elements - The idea is to have the attribute you want to sort with as first value of the tuple and the object (as before) as the second element of the tuple:
# Sort by 'number'.
heapq.heappush(a_list, (obj_1.number, obj_1))
heapq.heappush(a_list, (obj_2.number, obj_2))
The heap considers this first value of the tuple as the value to sort by.
In case the element pushed to the heap is not of a simple data type like int or str, the underlying implementation needs to know how to compare elements.
If the element is an iterable the first element is considered to contain the sort value.
Have a look at the examples here: https://docs.python.org/3/library/heapq.html#basic-examples (search for tuple)
Heap elements can be tuples. This is useful for assigning comparison values (such as task priorities) alongside the main record being tracked:
Another option might be to make comparison work with your custom class - This can be implemented so the object itself can be used as the heap element (as in the first example).
Have a look here for reference and an example: "Enabling" comparison for classes
Have a look at the enahnced class SomeObject:
class SomeObject():
def __init__(self,name, number):
self.name = name
self.number = number
def __eq__(self, obj):
return self.number == obj.number
def __lt__(self, obj):
return self.number < obj.number
def __hash__(self):
return hash(self.number)
This way you can create the heap with the objects only as elements:
heapq.heappush(a_list, obj_1)
heapq.heappush(a_list, obj_2)

Related

how to sort a list that is contained in an object

The task is to code a sorting algorithm using the below code as a starting point. The issue is I cannot seem to figure out how I go about starting the code, I'm not looking for the full solution - just techniques in how to sort lists of tuples that are actually part of an object. I get errors when I try to iterate through the list, saying cannot iterate through an object.
class LinkedList:
def __init__(self, data):
self.label = data[0][0]
self.value = data[0][1]
self.tail = None if (len(data) == 1) else LinkedList(data[1:])
countries = LinkedList([("Ukraine",41879904),("Brunei",442400),("Christmas Island (Australia)",1928)
You can use a pointer to iterate through linked list.:
curr = countries
while curr:
print("Label {}, Value {}".format(curr.label, curr.value))
curr = curr.tail
In order to sort linked list, firstly, you need to implement helper functions to remove/insert a node to given linked list at certain position. Once you have such methods, you can implement any of the famous sorting algorithms (e.g quick sort) using your helper methods you just created.
To iterate over this list you need to keep getting the tail reference of the object:
class LinkedList:
def __init__(self, data):
self.label = data[0][0]
self.value = data[0][1]
self.tail = None if (len(data) == 1) else LinkedList(data[1:])
countries = LinkedList([("Ukraine",41879904),("Brunei",442400),("Christmas Island (Australia)",1928)])
nextObj = countries
while nextObj is not None:
print(nextObj.label, nextObj.value)
nextObj = nextObj.tail
print("done!")
output:
Ukraine 41879904
Brunei 442400
Christmas Island (Australia) 1928
done!
To get an element at a certain index, you start iterating from the first element and just keep a counter.
functools, #total_ordering
One of the powerful features in python. you can sort objects in a classic and easy straitforward way.
Functools module in python helps in implementing higher-order functions. Higher-order functions are dependent functions that call other functions. Total_ordering provides rich class comparison methods that help in comparing classes without explicitly defining a function for it. So, It helps in the redundancy of code.
There are 2 essential conditions to implement these comparison methods
At least one of the comparison methods must be defined from lt(less than), le(less than or equal to), gt(greater than) or ge(greater than or equal to).
The eq function must also be defined
from functools import total_ordering
#total_ordering
class LinkedList(object):
def __init__(self, data):
self.label = data[0][0]
self.value = data[0][1]
def __lt__(self, other):
return self.label < other.value ##lets assume that the sort is based on label data member(attribute)
def __eq__(self, other):
return self.label == other.value
##I dont know what data is. just pass your data objects to list of LinkedList objects. and sort it with sorted method (treat them like int objects)!
my_list = [LinkedList(data0),LinkedList(data1),LinkedList(data2)]
new_list=my_list.sorted()
for obj in new_list:
print(...)

Efficiently mapping unhashable objects to their index in a list

A Python list
f = [x0, x1, x2]
may be seen as an efficient representation of a mapping from [0, 1, ..., len(f) - 1] to the set of its elements. By "efficient" I mean that f[i] returns the element associated with i in O(1) time.
The inverse mapping may be defined as follows:
class Inverse:
def __init__(self, f):
self.f = f
def __getitem__(self, x):
return self.f.index(x)
This works, but Inverse(f)[x] takes O(n) time on average.
Alternatively, one may use a dict:
f_inv = {x: i for i, x in enumerate(f)}
This has O(1) average time complexity, but it requires the objects in the list to be hashable.
Is there a way to define an inverse mapping that provides equality-based lookups, in O(1) average time, with unhashable objects?
Edit: sample input and expected output:
>>> f = [x0, x1, x2]
>>> f_inv = Inverse(f) # this is to be defined
>>> f_inv[x0] # in O(1) time
0
>>> f_inv[x2] # in O(1) time
2
You can create an associated dictionary mapping the object ID's back to the list index.
The obvious disadvantage is that you will have to search the index for the identity object, not for on eobject that is merely equal.
On the upside, by creating a custom MutableSequence class using collections.abc, you can, with minimal code, write a class that keeps your data both as a sequence and as the reverse dictionary.
from collections.abc import MutableSequence
from threading import RLock
class MD(dict):
# No need for a full MutableMapping subclass, as the use is limited
def __getitem__(self, key):
return super().__getitem__(id(key))
class Reversible(MutableSequence):
def __init__(self, args):
self.seq = list()
self.reverse = MD()
self.lock = RLock()
for element in args:
self.append(element)
def __getitem__(self, index):
return self.seq[index]
def __setitem__(self, index, value):
with self.lock:
del self.reverse[id(self.seq[index])]
self.seq[index] = value
self.reverse[id(value)] = index
def __delitem__(self, index):
if index < 0:
index += len(self)
with self.lock:
# Increase all mapped indexes
for obj in self.seq[index:]:
self.reverse[obj] -= 1
del self.reverse[id(self.seq[index])]
del self.seq[index]
def __len__(self):
return len(self.seq)
def insert(self, index, value):
if index < 0:
index += len(self)
with self.lock:
# Increase all mapped indexes
for obj in self.seq[index:]:
self.reverse[obj] += 1
self.seq.insert(index, value)
self.reverse[id(value)] = index
And voilá: just use this object in place of your list, and the public attribute "reverse" to get the index of identity objects.
Perceive you can augment the "intelligence" of the "MD" class by trying to use different strategies, like to use the objects themselves, if they are hashable, and only resort to id, or other custom key based on other object attributes, when needed. That way you could mitigate the need for the search to be for the same object.
So, for ordinary operations on the list, this class maintain the reverted dictionary synchronized. There is no support for slice indexing, though.
For more information, check the docs at https://docs.python.org/3/library/collections.abc.html
Unfortunately you're stuck with an algorithm limitation here. Fast lookup structures, like hash tables or binary trees, are efficient because they put objects in particular buckets or order them based on their values. This requires them to be hashable or comparable consistently for the entire time you are storing them in this structure, otherwise a lookup is very likely to fail.
If the objects you need are mutable (usually the reason they are not hashable) then any time an object you are tracking changes you need to update the data structure. The safest way to do this is to create immutable objects. If you need to change an object, then create a new one, remove the old one from the dictionary, and insert the new object as a key with the same value.
The operations here are still O(1) with respect to the size of the dictionary, you just need to consider whether the cost of copying objects on every change is worth it.

python set() membership and hashable objects

I wanted to store instances of a class in a set, so I could use the set methods to find intersections, etc. My class has a __hash__() function, along with an __eq__ and a __lt__, and is decorated with functools.total_ordering
When I create two sets, each containing the same two objects, and do a set_a.difference(set_b), I get a result with a single object, and I have no idea why. I was expecting none, or at the least, 2, indicating a complete failure in my understanding of how sets work. But one?
for a in set_a:
print(a, a.__hash__())
for b in set_b:
print(b, b.__hash__(), b in set_a)
(<foo>, -5267863171333807568)
(<bar>, -8020339072063373731)
(<foo>, -5267863171333807568, False)
(<bar)>, -8020339072063373731, True)
Why is the <foo> object in set_b not considered to be in set_a? What other properties does an object require in order to be considered a member of a set? And why is bar considered to be a part of set_a, but not foo?
edit: updating with some more info. I figured that simply showing that the two objects' hash() results where the same meant that they where indeed the same, so I guess that's where my mistake probably comes from.
#total_ordering
class Thing(object):
def __init__(self, i):
self.i = i
def __eq__(self, other):
return self.i == other.i
def __lt__(self, other):
return self.i < other.i
def __repr__(self):
return "<Thing {}>".format(self.i)
def __hash__(self):
return hash(self.i)
I figured it out thanks to some of the questions in the comments- the problem was due to the fact that I had believed that ultimately, the hash function decides if two objects are the same, or not. The __eq__ also needs to match, which it always did in my tests and attempts to create a minimal example here.
However, when pulling data from a DB in prod, a certain float was being rounded down, and thus, the x == y was failing in prod. Argh.

Adding an element to a collection type without changing the collection type in Python

Lets say I have a class that would work having either a tuple, a list, a dictionary, or a set of another type of object.
Something like this:
class AbstractClass:
"""An example class"""
def __init__(self, items=None):
self.items = items
def items(self):
"""Returns the items that this instance has"""
return self.items
Now I want to add a method like this:
def add_item(self, item):
"""Adds item to items"""
# code goes here
Now I'm stuck. I don't want to have to check if items is a list, tuple, and etc. and then do it on a case by case basis (as it simply seems unpythonic), but there doesn't seem to be one method that works universally. I would also want to try and preserve the type, so I don't want to convert items to a list (for example) and then use list's method of adding an item (either with items.append(item) or items + [item,]). Any suggestions?
The following is a limited list of examples for the expected behavior:
List
a = AbstractClass([1, 2])
a.add_item(4) # a.items now contains [1, 2, 4] in any order
Tuple
a = AbstractClass((1, 2))
a.add_item(4) # a.items now contains (1, 2, 4) in any order
Dictionary (note: this one is really quite optional, as I don't expect to be using this)
a = AbstractClass({0:1, 2:2})
a.add_item({3:4}) # a.items now should be {0:1, 2:2, 3:4}
Note: This is not meant to be used in practice, I just wanted to test the limits of python's dynamic nature
You could do this and it will work for all of the standard collection types:
def add_item(self, item):
# You could use append here, and it'd be faster
temporary_list = list(self.items) + [item]
self.items = type(self.items)(temporary_list)
This works because type(x) returns the "type" of that object. To extract out the actual type string ("list", "set", "tuple", etc.), we have to do some string manipulation. We can then construct a statement equivalent to:
# For lists
self.items = list(temporary_list)
# For sets
self.items = set(temporary_list)
# And so on...
EDIT: I have no idea why I was downvoted. Stackoverflow advises against downvoting without explaining why, just so you know.
You can check for method names you think will work, if you don't want to use isinstance():
def __init__(self, items=None):
self.items = items
if hasattr(self.items, 'add'):
self.add_item = self.items.add
elif hasattr(self.items, 'append'):
self.add_item = self.items.append
else:
raise Exception("no add method found")

Pythonic slicing of nested attributes

I am dealing with classes whose attributes are sometimes list whose elements can be dictionaries or further nested objects with attributes etc. I would like to perform some slicing that with my grasp of python is only doable with what feels profoundly un-Pythonic.
My minimal code looks like this:
class X(object):
def __init__(self):
self.a = []
x=X()
x.a.append({'key1':'v1'})
x.a.append({'key1':'v2'})
x.a.append({'key1':'v3'})
# this works as desired
x.a[0]['key1'] # 'v1'
I would like to perform an access to a key in the nested dictionary but make that call for all elements of the list containing that dictionary. The standard python way of doing this would be a list comprehension a la:
[v['key1'] for v in x.a]
However, my minimal example doesn't quite convey the full extent of nesting in my real-world scenario: The attribute list a in class X might contain objects, whose attributes are objects, whose attributes are dictionaries whose keys I want to select on while iterating over the outer list.
# I would like something like
useful_list = x.a[:]['key1'] # TypeError: list indices must be integers, not str
# or even better
cool_list = where(x.a[:]['key1'] == 'v2') # same TypeError
If I start list comprehending for every interesting key it quickly doesn't look all that Pythonic. Is there a nice way of doing this or do I have to code 'getter' methods for all conceivable pairings of lists and dictionary keys?
UPDATE:
I have been reading about overloading lists. Apparently one can mess with the getitem method which is used for indeces for lists and keys for dict. Maybe a custom class that iterates over list members. This is starting to sound contrived...
So, you want to create an hierarchical structure, with an operation which means
different things for different types, and is defined recursively.
Polymorphism to the rescue.
You could override __getitem__ instead of my get_items below, but in your case it might be better to define a non-builtin operation to avoid risking ambiguity. It's up to you really.
class ItemsInterface(object):
def get_items(self, key):
raise NotImplementedError
class DictItems(ItemsInterface, dict):
def __init__(self, *args, **kwargs):
dict.__init__(self, *args, **kwargs)
def get_items(self, key):
res = self[key]
# apply recursively
try:
res = res.get_items(key)
except AttributeError:
pass
return res
class ListItems(ItemsInterface, list):
def __init__(self, *args, **kwargs):
list.__init__(self, *args, **kwargs)
def get_items(self, key):
return [ x.get_items(key) for x in self ]
x = ListItems()
x.append(DictItems({'key1':'v1'}))
x.append(DictItems({'key1':'v2'}))
x.append(DictItems({'key1':'v3'}))
y = DictItems({'key1':'v999'})
x.append(ListItems([ y ]))
x.get_items('key1')
=> ['v1', 'v2', 'v3', ['v999']]
Of course, this solution might not be exactly what you need (you didn't explain what it should do if the key is missing, etc.)
but you can easily modify it to suit your needs.
This solution also supports ListItems as values of the DictItems. the get_items operation is applied recursively.

Categories