I was working on the sorting but I'm not able to call the function with the specific way.
Basically, what I want to do is to create a function that takes a list of object Node with attribute Value and returns a list with the items from the original list stored into sublists. Items of the same value should be in the same sublist and sorted in descending order.
For continuing the code I want to know what should be the parameter of this.
def advanced_sort(<What will come here according to the call>):
Function call:
advanced_sort([Node(1), Node(2), Node(1),Node(2)])
Can anyone please help me out with the code? Thanks in advance.
advanced_sort takes a single argument: a list (or possibly an arbitrary iterable). As such, the signature only has one argument:
def advanced_sort(nodes):
Ignoring type hints, the signature does not and cannot reflect the internal structure of the single argument; it's just a name to refer to the passed value inside the body of the function.
Inside the body, you can write code that assumes that nodes is a list, and that further each element of the list is a Node instance, so that you can do things like assume each value as a Value attribute.
def advanced_sort(nodes):
# If nodes is iterable, then x refers to a different
# element of the iterable each time through the loop.
for x in nodes:
# If nodes is a list of Node instances, then
# x is a Node instance, and thus you can access
# its Value attribute in the normal fashion.
print("Found value {}".format(x.Value))
Assuming a definition of Node like
class Node:
def __init__(self, v):
self.Value = v
the above definition of advanced_sort will produce the following output:
>>> advanced_sort([Node(3), Node(2), Node(1),Node(2)])
Found value 1
Found value 2
Found value 3
Found value 4
The argument is a single iterable object such as a list, a tuple, a set, ...
Then you iterate on the items as in chepner's response.
For exemple you can use a dictionary to group the Nodes by value:
def advanced_sort(node_list):
ret = dict()
for node in node_list:
if node.value not in ret.keys():
ret[node.value] = list()
ret[node.value].append(node)
return [ret[value] for value in sorted(ret.keys(), reverse=True)] #descending order
advanced_sort([Node(3), Node(2), Node(1),Node(1)])
>>> [[Node(3)], [Node(2)], [Node(1),Node(1)]]
Are you able to make changes to the Node class? In that case, you could do something like this:
from functools import total_ordering
#total_ordering
class Node:
def __init__(self, value):
self.value = value
def __eq__(self, other):
if not isinstance(other, Node):
return NotImplemented
return self.value == other.value
def __lt__(self, other):
if not isinstance(other, Node):
return NotImplemented
return self.value < other.value
def __str__(self):
return f"({self.value})"
def main():
from itertools import groupby
nodes = [Node(1), Node(2), Node(1), Node(2)]
nodes_sorted = sorted(nodes, reverse=True)
nodes_sublists = [list(group) for key, group in groupby(nodes_sorted)]
for sublist in nodes_sublists:
print(*map(str, sublist))
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
Output:
(2) (2)
(1) (1)
Related
I'd like to return either one or two variables for a function in python(3.x). Ideally, that would depend on amount of returned variables requested by user on function call. For example, the max() function returns by default the max value and can return the argmax. In python, that would look something like:
maximum = max(array)
maximum, index = max(array)
Im currently solving this with an extra argument return_arg:
import numpy as np
def my_max(array, return_arg=False):
maximum = np.max(array)
if return_arg:
index = np.argmax(array)
return maximum, index
else:
return maximum
This way, the first block of code would look like this:
maximum = my_max(array)
maximum, index = my_max(array, return_arg=True)
Is there a way to avoid the extra argument? Maybe testing for how many vaules the user is expecting? I know you can return a tuple and unpack it when calling it (that's what I'm doing).
Asume the actual function I'm doing this in is one where this behaviour makes sense.
You can instead return an instance of a subclass of int (or float, depending on the data type you want to process) that has an additional index attribute and would return an iterator of two items when used in a sequence context:
class int_with_index(int):
def __new__(cls, value, index):
return super(int_with_index, cls).__new__(cls, value)
def __init__(self, value, index):
super().__init__()
self.index = index
def __iter__(self):
return iter((self, self.index))
def my_max(array, return_arg=False):
maximum = np.max(array)
index = np.argmax(array)
return int_with_index(maximum, index)
so that:
maximum = my_max(array)
maximum, index = my_max(array)
would both work as intended.
The answer is no, in Python a function has no context of the caller and can't know how many values the caller expects in return.
Instead in Python you would rather have different functions, a flag in the function signature (like you did) or you would return an object with multiple fields of which the consumer can take whatever it needs.
No, there is no way of doing this. my_max(array) will be called and return before assigning a value to maximum. If more than one value is returned by the function then it will try unpacking the values and assigning them accordingly.
Most people tackle this problem by doing this:
maximum, _ = my_max(array)
maximum, index = my_max(array)
or
maximum = my_max(array)[0]
maximum, index = my_max(array)
If you need a solution that works for any data type such as np.ndarray, you can use a decorator that uses ast.NodeTransformer to modify any assignment statement that assigns a call to a given target function name (e.g. my_max) to a single variable name, to the same statement but assigns to a tuple of the same variable name plus a _ variable (which by convention stores a discarded value), so that a statement such as maximum = my_max(array) is automatically transformed into maximum, _ = my_max(array):
import ast
import inspect
from textwrap import dedent
class ForceUnpack(ast.NodeTransformer):
def __init__(self, target_name):
self.target = ast.dump(ast.parse(target_name, mode='eval').body)
def visit_Assign(self, node):
if isinstance(node.value, ast.Call) and ast.dump(node.value.func) == self.target and isinstance(node.targets[0], ast.Name):
node.targets[0] = ast.Tuple(elts=[node.targets[0], ast.Name(id='_', ctx=ast.Store())], ctx=ast.Store())
return node
# remove force_unpack from the decorator list to avoid re-decorating during exec
def visit_FunctionDef(self, node):
node.decorator_list = [
decorator for decorator in node.decorator_list
if not isinstance(decorator, ast.Call) or decorator.func.id != "force_unpack"
]
self.generic_visit(node)
return node
def force_unpack(target_name):
def decorator(func):
tree = ForceUnpack(target_name).visit(ast.parse(dedent(inspect.getsource(func))))
ast.fix_missing_locations(tree)
scope = {}
exec(compile(tree, inspect.getfile(func), "exec"), func.__globals__, scope)
return scope[func.__name__]
return decorator
so that you can define your my_max function to always return a tuple:
def my_max(array, return_arg=False):
maximum = np.max(array)
index = np.argmax(array)
return maximum, index
while applying the force_unpack decorator to any function that calls my_max so that the assignment statements within can unpack the returning values of my_max even if they're assigned to a single variable:
#force_unpack('my_max')
def foo():
maximum = my_max(array)
maximum, index = my_max(array)
I've got the following wrapper for a dictionary:
class MyDict:
def __init__(self):
self.container = {}
def __setitem__(self, key, value):
self.container[key] = value
def __getitem__(self, key):
return self.container[key]
def __iter__(self):
return self
def next(self):
pass
dic = MyDict()
dic['a'] = 1
dic['b'] = 2
for key in dic:
print key
My problem is that I don't know how to implement the next method to make MyDict iterable. Any advice would be appreciated.
Dictionaries are themselves not an iterator (which can only be iterated over once). You usually make them an iterable, an object for which you can produce multiple iterators instead.
Drop the next method altogether, and have __iter__ return an iterable object each time it is called. That can be as simple as just returning an iterator for self.container:
def __iter__(self):
return iter(self.container)
If you must make your class an iterator, you'll have to somehow track a current iteration position and raise StopIteration once you reach the 'end'. A naive implementation could be to store the iter(self.container) object on self the first time __iter__ is called:
def __iter__(self):
return self
def next(self):
if not hasattr(self, '_iter'):
self._iter = iter(self.container)
return next(self._iter)
at which point the iter(self.container) object takes care of tracking iteration position for you, and will raise StopIteration when the end is reached. It'll also raise an exception if the underlying dictionary was altered (had keys added or deleted) and iteration order has been broken.
Another way to do this would be to just store in integer position and index into list(self.container) each time, and simply ignore the fact that insertion or deletion can alter the iteration order of a dictionary:
_iter_index = 0
def __iter__(self):
return self
def next(self):
idx = self._iter_index
if idx is None or idx >= len(self.container):
# once we reach the end, all iteration is done, end of.
self._iter_index = None
raise StopIteration()
value = list(self.container)[idx]
self._iter_index = idx + 1
return value
In both cases your object is then an iterator that can only be iterated over once. Once you reach the end, you can't restart it again.
If you want to be able to use your dict-like object inside nested loops, for example, or any other application that requires multiple iterations over the same object, then you need to implement an __iter__ method that returns a newly-created iterator object.
Python's iterable objects all do this:
>>> [1, 2, 3].__iter__()
<listiterator object at 0x7f67146e53d0>
>>> iter([1, 2, 3]) # A simpler equivalent
<listiterator object at 0x7f67146e5390>
The simplest thing for your objects' __iter__ method to do would be to return an iterator on the underlying dict, like this:
def __iter__(self):
return iter(self.container)
For more detail than you probably will ever require, see this Github repository.
First off this is a homework assignment I'm working on, but I really just need help on an error.
So the project is to implement a vector (a list in all but name for this project), using the Array class. The array class I'm using can be found here.
My error is that every time I try to call my code to test it, specifically the getitem and setitem functions, I wind up with an error stating:
builtins.TypeError: 'type' object does not support item assignment
Below is the class I'm currently building, (so far it seems that only len and contains are working).
class Vector:
"""Vector ADT
Creates a mutable sequence type that is similar to Python's list type."""
def __init__(self):
"""Constructs a new empty vector with initial capacity of two elements"""
self._vector = Array(2)
self._capacity = 2
self._len = 0
def __len__(self):
"""Returns the number of items in the vector"""
return self._len
def __contains__(self, item):
"""Determines if the given item is stored in the vector"""
if item in self._vector:
return True
else:
return False
def __getitem__(self, ndx):
"""Returns the item in the index element of the list, must be within the
valid range"""
assert ndx >= 0 and ndx <= self._capacity - 1, "Array subscript out of range"
return self._vector[ndx]
def __setitem__(self, ndx, item):
"""Sets the elements at position index to contain the given item. The
value of index must be within a valid range"""
assert ndx >= 0 and ndx <= self._capacity - 1, "Array subscript out of range"
self._vector[ndx] = item
def append(self, item):
"""Adds the given item to the list"""
if self._len < self._capacity:
self._vector[self._len] = item
self._len += 1
I'm trying to call the code by either typing:
Vector()[i] = item
or
Vector[i] = item
However, trying:
Vector[i] = item
Gives me the error, and:
Vector()[i] = item
Doesn't really seem to do anything other than not cause an error.
You need to create an instance of your Vector class. Try:
vector = Vector()
vector[0] = 42
The error means that you are trying erroneously to assign to the Vector class itself, which does not make much sense.
Try using the replace method instead of assigning a value.
Vector is a class; Vector() creates an instance of that class.
So
Vector[i] = item
gives an error: Vector.__setitem__ is an instance method (runs against an instance of a class, ie an object), not a classmethod (runs against a class). (You could in theory make it a classmethod, but I have trouble picturing a use case where that would make sense.)
On the other hand,
Vector()[i] = item
# 1. creates a Vector() object
# 2. calls {new_object}.__setitem__(self, i, item)
# 3. doesn't keep any reference to {new_object}, so
# (a) you have no way to interact with it any more and
# (b) it will be garbage-collected shortly.
Try
v = Vector()
v[i] = item
print(item in v) # => True
I am writing a function of the form:
def fn(adict, b):
"""`adict` contains key(str): value(list). if `b` is a dict we have to
call `do_something` for pairs of lists from `adict` and `b` for
matching keys. if `b` is a list, we have to call `do_something`
for all lists in `adict` with `b` as the common second
argument.
"""
if isinstance(b, dict):
for key, value in adict.items():
do_something(value, b[key])
else:
for key, value in adict.items():
do_something(value, b)
def do_something(x, y):
"""x and y are lists"""
pass
I am aware that this may not be a good design (Is it bad design to base control flow/conditionals around an object's class?). But writing two functions, one taking b as a dict and another as a list, seems too redundant. What are some better alternatives?
There's indeed a pattern for such problems, it's named "multiple dispatch" or "multimethods". You can find a (quite simple) example Python implementation here http://www.artima.com/weblogs/viewpost.jsp?thread=101605
Using this solution, your code might look like:
from mm import multimethod
#multimethod(list, dict)
def itersources(sources, samples):
for key, value in sources.iteritems():
yield value, samples[key]
#multimethod(list, list)
def itersources(sources, samples):
for key, value in sources.iteritems():
yield value, samples
def fn(sources, samples):
for value1, value2 in itersources(sources, samples):
do_something_with(value1, value2)
I use a "switch" method for this:
class Demo(object):
def f(self, a):
name = 'f_%s' % type(a).__name__
m = getattr(self, name)
m(a)
def f_dict(self, a):
...
The code creates a method name from the type of the argument, then looks up the method in self and then calls it.
I am trying to build a heap with a custom sort predicate. Since the values going into it are of "user-defined" type, I cannot modify their built-in comparison predicate.
Is there a way to do something like:
h = heapq.heapify([...], key=my_lt_pred)
h = heapq.heappush(h, key=my_lt_pred)
Or even better, I could wrap the heapq functions in my own container so I don't need to keep passing the predicate.
According to the heapq documentation, the way to customize the heap order is to have each element on the heap to be a tuple, with the first tuple element being one that accepts normal Python comparisons.
The functions in the heapq module are a bit cumbersome (since they are not object-oriented), and always require our heap object (a heapified list) to be explicitly passed as the first parameter. We can kill two birds with one stone by creating a very simple wrapper class that will allow us to specify a key function, and present the heap as an object.
The class below keeps an internal list, where each element is a tuple, the first member of which is a key, calculated at element insertion time using the key parameter, passed at Heap instantiation:
# -*- coding: utf-8 -*-
import heapq
class MyHeap(object):
def __init__(self, initial=None, key=lambda x:x):
self.key = key
self.index = 0
if initial:
self._data = [(key(item), i, item) for i, item in enumerate(initial)]
self.index = len(self._data)
heapq.heapify(self._data)
else:
self._data = []
def push(self, item):
heapq.heappush(self._data, (self.key(item), self.index, item))
self.index += 1
def pop(self):
return heapq.heappop(self._data)[2]
(The extra self.index part is to avoid clashes when the evaluated key value is a draw and the stored value is not directly comparable - otherwise heapq could fail with TypeError)
Define a class, in which override the __lt__() function. See example below (works in Python 3.7):
import heapq
class Node(object):
def __init__(self, val: int):
self.val = val
def __repr__(self):
return f'Node value: {self.val}'
def __lt__(self, other):
return self.val < other.val
heap = [Node(2), Node(0), Node(1), Node(4), Node(2)]
heapq.heapify(heap)
print(heap) # output: [Node value: 0, Node value: 2, Node value: 1, Node value: 4, Node value: 2]
heapq.heappop(heap)
print(heap) # output: [Node value: 1, Node value: 2, Node value: 2, Node value: 4]
The heapq documentation suggests that heap elements could be tuples in which the first element is the priority and defines the sort order.
More pertinent to your question, however, is that the documentation includes a discussion with sample code of how one could implement their own heapq wrapper functions to deal with the problems of sort stability and elements with equal priority (among other issues).
In a nutshell, their solution is to have each element in the heapq be a triple with the priority, an entry count and the element to be inserted. The entry count ensures that elements with the same priority a sorted in the order they were added to the heapq.
setattr(ListNode, "__lt__", lambda self, other: self.val <= other.val)
Use this for comparing values of objects in heapq
The limitation with both answers is that they don't allow ties to be treated as ties. In the first, ties are broken by comparing items, in the second by comparing input order. It is faster to just let ties be ties, and if there are a lot of them it could make a big difference. Based on the above and on the docs, it is not clear if this can be achieved in heapq. It does seem strange that heapq does not accept a key, while functions derived from it in the same module do.
P.S.:
If you follow the link in the first comment ("possible duplicate...") there is another suggestion of defining le which seems like a solution.
In python3, you can use cmp_to_key from functools module. cpython source code.
Suppose you need a priority queue of triplets and specify the priority use the last attribute.
from heapq import *
from functools import cmp_to_key
def mycmp(triplet_left, triplet_right):
key_l, key_r = triplet_left[2], triplet_right[2]
if key_l > key_r:
return -1 # larger first
elif key_l == key_r:
return 0 # equal
else:
return 1
WrapperCls = cmp_to_key(mycmp)
pq = []
myobj = tuple(1, 2, "anystring")
# to push an object myobj into pq
heappush(pq, WrapperCls(myobj))
# to get the heap top use the `obj` attribute
inner = pq[0].obj
Performance Test:
Environment
python 3.10.2
Code
from functools import cmp_to_key
from timeit import default_timer as time
from random import randint
from heapq import *
class WrapperCls1:
__slots__ = 'obj'
def __init__(self, obj):
self.obj = obj
def __lt__(self, other):
kl, kr = self.obj[2], other.obj[2]
return True if kl > kr else False
def cmp_class2(obj1, obj2):
kl, kr = obj1[2], obj2[2]
return -1 if kl > kr else 0 if kl == kr else 1
WrapperCls2 = cmp_to_key(cmp_class2)
triplets = [[randint(-1000000, 1000000) for _ in range(3)] for _ in range(100000)]
# tuple_triplets = [tuple(randint(-1000000, 1000000) for _ in range(3)) for _ in range(100000)]
def test_cls1():
pq = []
for triplet in triplets:
heappush(pq, WrapperCls1(triplet))
def test_cls2():
pq = []
for triplet in triplets:
heappush(pq, WrapperCls2(triplet))
def test_cls3():
pq = []
for triplet in triplets:
heappush(pq, (-triplet[2], triplet))
start = time()
for _ in range(10):
test_cls1()
# test_cls2()
# test_cls3()
print("total running time (seconds): ", -start+(start:=time()))
Results
use list instead of tuple, per function:
WrapperCls1: 16.2ms
WrapperCls1 with __slots__: 9.8ms
WrapperCls2: 8.6ms
move the priority attribute into the first position ( don't support custom predicate ): 6.0ms.
Therefore, this method is slightly faster than using a custom class with an overridden __lt__() function and the __slots__ attribute.
Simple and Recent
A simple solution is to store entries as a list of tuples for each tuple define the priority in your desired order if you need a different order for each item within the tuple just make it the negative for descending order.
See the official heapq python documentation in this topic Priority Queue Implementation Notes