Find closest value algorithm

Find closest value algorithm - python

def find_closest(data, target, key = lambda x:f(x))
This is my function definition where data is set of values, and I want to find the value that evaluates the closest to target in as few evaluations as possible, i.e. abs(target-f(x)) is minimum. f(x) is monotonic.
I've heard that binary search can do this in O(log(n)) time, is there a library implementation in python? Are there more efficient search algorithms?
EDIT: I'm looking to minimize complexity in terms of evaluating f(x) because that's the expensive part. I want to find the x in data that when evaluated with f(x), comes closest to the target. data is in the domain of f, target is in the range of f. Yes, data can be sorted quickly.

You can use the utilities in the bisect module. You will have to evaluate x on data though, i.e. list(f(x) for x in data) to get a monotonic / sorted list to bisect.
I am not aware of a binary search in the standard library that works directly on f and data.

If the data presented is already sorted and the function is strctly monotonic,
apply the function f on the data and then perform a binary search using bisect.bisect
import bisect
def find_closest(data, target, key = f):
data = map(f, data)
if f(0) > f(1):
data = [-e for e in data]
try:
return data[bisect.bisect_left(data, target)]
except IndexError:
return data[-1]

Use bisect_left() method to find lower bound.
Bisect_left accepts a random-access list of elements, to avoid calculating all of them you can use lazy collection of calculated function values with __len__ and __getitem__ methods defined.
Carefully check return value for border conditions.
Your heavy calculation will be called O(log(N) + 1) = O(log(N)) times.
from bisect import bisect_left
from collections import defaultdict
class Cache(defaultdict):
def __init__(self, method):
self.method = method
def __missing__(self, key):
return self.method(key)
class MappedList(object):
def __init__(self, method, input):
self.method = method
self.input = input
self.cache = Cache(method)
def __len__(self):
return len(self.input)
def __getitem__(self, i):
return self.cache[input[i]]
def find_closest(data, target, key = lambda x:x):
s = sorted(data)
evaluated = MappedList(key, s)
index = bisect_left(evaluated, target)
if index == 0:
return data[0]
if index == len(data):
return data[index-1]
if target - evaluated[index-1] <= evaluated[index] - target:
return data[index-1]
else:
return data[index]

Related

Evaluate function over set if intervals

I have two sets of numbers
xl=linspace(0.,1,1000)
xu=linspace(0.,1,1000)+0.5
which should form pairwise intervals over which I want to run a polynomial function.
I want to store the resulting values for each interval, as lists within a list.
The only way I can think of is the following:
-Variables
xl=linspace(0.,1,1000)
xu=linspace(0.,1,1000)+0.5
M=[]-the list where the intervals are to be stored
Values=[] # The list where the output of the polynomial function will be stored.
class Interval:
def __init__(self,left,right):
self.left=left
self.right=right
def __repr__(self):
return'[{},{}]'.format(self.left,self.right)
def BuildIntervalFromLists(x,y): (builds the list of intervals)
for i, j in zip(x, y):
M.append(Interval(i,j))
return M
def Polynomial(t): (The function)
3*t**3-2*t**2-5*t-1
def PolynomialFunction(x): (Function to run over all intervals)
for k in x: # The intervals in sequence
for l in k: # The numbers in each interval in sequence
Values.append([Polynomial(l)])
return(Values)
BuildIntervalFromLists(xl,xu)
PolynomialFunction(M)
This, however, gives the error message that I can't run an iteration over an interval.
Is there any way of getting around this problem?
If not, is there a better approach?

There is a special method in python which is called getitem to reach an item inside of an object. You can simply define it as follows:
def __getitem__(self, position):
return self.my_array[position]
As you can see I also define my_array variable inside the class. So the new class implementation will be like this:
class Interval:
def __init__(self,left,right):
self.left=left
self.right=right
self.my_array = [left,right]
def __repr__(self):
return'[{},{}]'.format(self.left,self.right)
def __getitem__(self, position):
return self.my_array[position]
Now, you can see the variables inside of your class thanks to repr and you can iterate them by using getitem
The overall code will be like this:
import numpy as np
xl=np.linspace(0.,1,1000)
xu=np.linspace(0.,1,1000)+0.5
M=[]
Values=[] # The list where the output of the polynomial function will be stored.
class Interval:
def __init__(self,left,right):
self.left=left
self.right=right
self.my_array = [left,right]
def __repr__(self):
return'[{},{}]'.format(self.left,self.right)
def __getitem__(self, position):
return self.my_array[position]
def BuildIntervalFromLists(x,y):
for i, j in zip(x, y):
M.append(Interval(i,j))
return M
def Polynomial(t):
return 3*t**3-2*t**2-5*t-1
def PolynomialFunction(x):
for k in x: # The intervals in sequence
for l in k: # The numbers in each interval in sequence
Values.append([Polynomial(l)])
return(Values)
BuildIntervalFromLists(xl,xu)
PolynomialFunction(M)
I hope it will solve your problem. Best

Python: is it possible to return a variable amount of variables as requested?

I'd like to return either one or two variables for a function in python(3.x). Ideally, that would depend on amount of returned variables requested by user on function call. For example, the max() function returns by default the max value and can return the argmax. In python, that would look something like:
maximum = max(array)
maximum, index = max(array)
Im currently solving this with an extra argument return_arg:
import numpy as np
def my_max(array, return_arg=False):
maximum = np.max(array)
if return_arg:
index = np.argmax(array)
return maximum, index
else:
return maximum
This way, the first block of code would look like this:
maximum = my_max(array)
maximum, index = my_max(array, return_arg=True)
Is there a way to avoid the extra argument? Maybe testing for how many vaules the user is expecting? I know you can return a tuple and unpack it when calling it (that's what I'm doing).
Asume the actual function I'm doing this in is one where this behaviour makes sense.

You can instead return an instance of a subclass of int (or float, depending on the data type you want to process) that has an additional index attribute and would return an iterator of two items when used in a sequence context:
class int_with_index(int):
def __new__(cls, value, index):
return super(int_with_index, cls).__new__(cls, value)
def __init__(self, value, index):
super().__init__()
self.index = index
def __iter__(self):
return iter((self, self.index))
def my_max(array, return_arg=False):
maximum = np.max(array)
index = np.argmax(array)
return int_with_index(maximum, index)
so that:
maximum = my_max(array)
maximum, index = my_max(array)
would both work as intended.

The answer is no, in Python a function has no context of the caller and can't know how many values the caller expects in return.
Instead in Python you would rather have different functions, a flag in the function signature (like you did) or you would return an object with multiple fields of which the consumer can take whatever it needs.

No, there is no way of doing this. my_max(array) will be called and return before assigning a value to maximum. If more than one value is returned by the function then it will try unpacking the values and assigning them accordingly.
Most people tackle this problem by doing this:
maximum, _ = my_max(array)
maximum, index = my_max(array)
or
maximum = my_max(array)[0]
maximum, index = my_max(array)

If you need a solution that works for any data type such as np.ndarray, you can use a decorator that uses ast.NodeTransformer to modify any assignment statement that assigns a call to a given target function name (e.g. my_max) to a single variable name, to the same statement but assigns to a tuple of the same variable name plus a _ variable (which by convention stores a discarded value), so that a statement such as maximum = my_max(array) is automatically transformed into maximum, _ = my_max(array):
import ast
import inspect
from textwrap import dedent
class ForceUnpack(ast.NodeTransformer):
def __init__(self, target_name):
self.target = ast.dump(ast.parse(target_name, mode='eval').body)
def visit_Assign(self, node):
if isinstance(node.value, ast.Call) and ast.dump(node.value.func) == self.target and isinstance(node.targets[0], ast.Name):
node.targets[0] = ast.Tuple(elts=[node.targets[0], ast.Name(id='_', ctx=ast.Store())], ctx=ast.Store())
return node
# remove force_unpack from the decorator list to avoid re-decorating during exec
def visit_FunctionDef(self, node):
node.decorator_list = [
decorator for decorator in node.decorator_list
if not isinstance(decorator, ast.Call) or decorator.func.id != "force_unpack"
]
self.generic_visit(node)
return node
def force_unpack(target_name):
def decorator(func):
tree = ForceUnpack(target_name).visit(ast.parse(dedent(inspect.getsource(func))))
ast.fix_missing_locations(tree)
scope = {}
exec(compile(tree, inspect.getfile(func), "exec"), func.__globals__, scope)
return scope[func.__name__]
return decorator
so that you can define your my_max function to always return a tuple:
def my_max(array, return_arg=False):
maximum = np.max(array)
index = np.argmax(array)
return maximum, index
while applying the force_unpack decorator to any function that calls my_max so that the assignment statements within can unpack the returning values of my_max even if they're assigned to a single variable:
#force_unpack('my_max')
def foo():
maximum = my_max(array)
maximum, index = my_max(array)

SICP "streams as signals" in Python

I have found some nice examples (here, here) of implementing SICP-like streams in Python. But I am still not sure how to handle an example like the integral found in SICP 3.5.3 "Streams as signals."
The Scheme code found there is
(define (integral integrand initial-value dt)
(define int
(cons-stream initial-value
(add-streams (scale-stream integrand dt)
int)))
int)
What is tricky about this one is that the returned stream int is defined in terms of itself (i.e., the stream int is used in the definition of the stream int).
I believe Python could have something similarly expressive and succinct... but not sure how. So my question is, what is an analogous stream-y construct in Python? (What I mean by a stream is the subject of 3.5 in SICP, but briefly, a construct (like a Python generator) that returns successive elements of a sequence of indefinite length, and can be combined and processed with operations such as add-streams and scale-stream that respect streams' lazy character.)

There are two ways to read your question. The first is simply: How do you use Stream constructs, perhaps the ones from your second link, but with a recursive definition? That can be done, though it is a little clumsy in Python.
In Python you can represent looped data structures but not directly. You can't write:
l = [l]
but you can write:
l = [None]
l[0] = l
Similarly you can't write:
def integral(integrand,initial_value,dt):
int_rec = cons_stream(initial_value,
add_streams(scale_stream(integrand,dt),
int_rec))
return int_rec
but you can write:
def integral(integrand,initial_value,dt):
placeholder = Stream(initial_value,lambda : None)
int_rec = cons_stream(initial_value,
add_streams(scale_stream(integrand,dt),
placeholder))
placeholder._compute_rest = lambda:int_rec
return int_rec
Note that we need to clumsily pre-compute the first element of placeholder and then only fix up the recursion for the rest of the stream. But this does all work (alongside appropriate definitions of all the rest of the code - I'll stick it all at the bottom of this answer).
However, the second part of your question seems to be asking how to do this naturally in Python. You ask for an "analogous stream-y construct in Python". Clearly the answer to that is exactly the generator. The generator naturally provides the lazy evaluation of the stream concept. It differs by not being naturally expressed recursively but then Python does not support that as well as Scheme, as we will see.
In other words, the strict stream concept can be expressed in Python (as in the link and above) but the idiomatic way to do it is to use generators.
It is more or less possible to replicate the Scheme example by a kind of direct mechanical transformation of stream to generator (but avoiding the built-in int):
def integral_rec(integrand,initial_value,dt):
def int_rec():
for x in cons_stream(initial_value,
add_streams(scale_stream(integrand,dt),int_rec())):
yield x
for x in int_rec():
yield x
def cons_stream(a,b):
yield a
for x in b:
yield x
def add_streams(a,b):
while True:
yield next(a) + next(b)
def scale_stream(a,b):
for x in a:
yield x * b
The only tricky thing here is to realise that you need to eagerly call the recursive use of int_rec as an argument to add_streams. Calling it doesn't start it yielding values - it just creates the generator ready to yield them lazily when needed.
This works nicely for small integrands, though it's not very pythonic. The Scheme version works by optimising the tail recursion - the Python version will exceed the max stack depth if your integrand is too long. So this is not really appropriate in Python.
A direct and natural pythonic version would look something like this, I think:
def integral(integrand,initial_value,dt):
value = initial_value
yield value
for x in integrand:
value += dt * x
yield value
This works efficiently and correctly treats the integrand lazily as a "stream". However, it uses iteration rather than recursion to unpack the integrand iterable, which is more the Python way.
In moving to natural Python I have also removed the stream combination functions - for example, replaced add_streams with +=. But we could still use them if we wanted a sort of halfway house version:
def accum(initial_value,a):
value = initial_value
yield value
for x in a:
value += x
yield value
def integral_hybrid(integrand,initial_value,dt):
for x in accum(initial_value,scale_stream(integrand,dt)):
yield x
This hybrid version uses the stream combinations from the Scheme and avoids only the tail recursion. This is still pythonic and python includes various other nice ways to work with iterables in the itertools module. They all "respect streams' lazy character" as you ask.
Finally here is all the code for the first recursive stream example, much of it taken from the Berkeley reference:
class Stream(object):
"""A lazily computed recursive list."""
def __init__(self, first, compute_rest, empty=False):
self.first = first
self._compute_rest = compute_rest
self.empty = empty
self._rest = None
self._computed = False
#property
def rest(self):
"""Return the rest of the stream, computing it if necessary."""
assert not self.empty, 'Empty streams have no rest.'
if not self._computed:
self._rest = self._compute_rest()
self._computed = True
return self._rest
def __repr__(self):
if self.empty:
return '<empty stream>'
return 'Stream({0}, <compute_rest>)'.format(repr(self.first))
Stream.empty = Stream(None, None, True)
def cons_stream(a,b):
return Stream(a,lambda : b)
def add_streams(a,b):
if a.empty or b.empty:
return Stream.empty
def compute_rest():
return add_streams(a.rest,b.rest)
return Stream(a.first+b.first,compute_rest)
def scale_stream(a,scale):
if a.empty:
return Stream.empty
def compute_rest():
return scale_stream(a.rest,scale)
return Stream(a.first*scale,compute_rest)
def make_integer_stream(first=1):
def compute_rest():
return make_integer_stream(first+1)
return Stream(first, compute_rest)
def truncate_stream(s, k):
if s.empty or k == 0:
return Stream.empty
def compute_rest():
return truncate_stream(s.rest, k-1)
return Stream(s.first, compute_rest)
def stream_to_list(s):
r = []
while not s.empty:
r.append(s.first)
s = s.rest
return r
def integral(integrand,initial_value,dt):
placeholder = Stream(initial_value,lambda : None)
int_rec = cons_stream(initial_value,
add_streams(scale_stream(integrand,dt),
placeholder))
placeholder._compute_rest = lambda:int_rec
return int_rec
a = truncate_stream(make_integer_stream(),5)
print(stream_to_list(integral(a,8,.5)))

calculating current value based on previous value

i would like to perform a calculation using python, where the current value (i) of the equation is based on the previous value of the equation (i-1), which is really easy to do in a spreadsheet but i would rather learn to code it
i have noticed that there is loads of information on finding the previous value from a list, but i don't have a list i need to create it! my equation is shown below.
h=(2*b)-h[i-1]
can anyone give me tell me a method to do this ?
i tried this sort of thing, but that will not work as when i try to do the equation i'm calling a value i haven't created yet, if i set h=0 then i get an error that i am out of index range
i = 1
for i in range(1, len(b)):
h=[]
h=(2*b)-h[i-1]
x+=1

h = [b[0]]
for val in b[1:]:
h.append(2 * val - h[-1]) # As you add to h, you keep up with its tail
for large b list (brr, one-letter identifier), to avoid creating large slice
from itertools import islice # For big list it will keep code less wasteful
for val in islice(b, 1, None):
....

As pointed out by #pad, you simply need to handle the base case of receiving the first sample.
However, your equation makes no use of i other than to retrieve the previous result. It's looking more like a running filter than something which needs to maintain a list of past values (with an array which might never stop growing).
If that is the case, and you only ever want the most recent value,then you might want to go with a generator instead.
def gen():
def eqn(b):
eqn.h = 2*b - eqn.h
return eqn.h
eqn.h = 0
return eqn
And then use thus
>>> f = gen()
>>> f(2)
4
>>> f(3)
2
>>> f(2)
0
>>>
The same effect could be acheived with a true generator using yield and send.

First of, do you need all the intermediate values? That is, do you want a list h from 0 to i? Or do you just want h[i]?
If you just need the i-th value you could us recursion:
def get_h(i):
if i>0:
return (2*b) - get_h(i-1)
else:
return h_0
But be aware that this will not work for large i, as it will exceed the maximum recursion depth. (Thanks for pointing this out kdopen) In that case a simple for-loop or a generator is better.
Even better is to use a (mathematically) closed form of the equation (for your example that is possible, it might not be in other cases):
def get_h(i):
if i%2 == 0:
return h_0
else:
return (2*b)-h_0
In both cases h_0 is the initial value that you start out with.

h = []
for i in range(len(b)):
if i>0:
h.append(2*b - h[i-1])
else:
# handle i=0 case here

You are successively applying a function (equation) to the result of a previous application of that function - the process needs a seed to start it. Your result looks like this [seed, f(seed), f(f(seed)), f(f(f(seed)), ...]. This concept is function composition. You can create a generalized function that will do this for any sequence of functions, in Python functions are first class objects and can be passed around just like any other object. If you need to preserve the intermediate results use a generator.
def composition(functions, x):
""" yields f(x), f(f(x)), f(f(f(x)) ....
for each f in functions
functions is an iterable of callables taking one argument
"""
for f in functions:
x = f(x)
yield x
Your specs require a seed and a constant,
seed = 0
b = 10
The equation/function,
def f(x, b = b):
return 2*b - x
f is applied b times.
functions = [f]*b
Usage
print list(composition(functions, seed))
If the intermediate results are not needed composition can be redefined as
def composition(functions, x):
""" Returns f(x), g(f(x)), h(g(f(x)) ....
for each function in functions
functions is an iterable of callables taking one argument
"""
for f in functions:
x = f(x)
return x
print composition(functions, seed)
Or more generally, with no limitations on call signature:
def compose(funcs):
'''Return a callable composed of successive application of functions
funcs is an iterable producing callables
for [f, g, h] returns f(g(h(*args, **kwargs)))
'''
def outer(f, g):
def inner(*args, **kwargs):
return f(g(*args, **kwargs))
return inner
return reduce(outer, funcs)
def plus2(x):
return x + 2
def times2(x):
return x * 2
def mod16(x):
return x % 16
funcs = (mod16, plus2, times2)
eq = compose(funcs) # mod16(plus2(times2(x)))
print eq(15)
While the process definition appears to be recursive, I resisted the temptation so I could stay out of maximum recursion depth hades.
I got curious, searched SO for function composition and, of course, there are numerous relavent Q&A's.

heapq with custom compare predicate

I am trying to build a heap with a custom sort predicate. Since the values going into it are of "user-defined" type, I cannot modify their built-in comparison predicate.
Is there a way to do something like:
h = heapq.heapify([...], key=my_lt_pred)
h = heapq.heappush(h, key=my_lt_pred)
Or even better, I could wrap the heapq functions in my own container so I don't need to keep passing the predicate.

According to the heapq documentation, the way to customize the heap order is to have each element on the heap to be a tuple, with the first tuple element being one that accepts normal Python comparisons.
The functions in the heapq module are a bit cumbersome (since they are not object-oriented), and always require our heap object (a heapified list) to be explicitly passed as the first parameter. We can kill two birds with one stone by creating a very simple wrapper class that will allow us to specify a key function, and present the heap as an object.
The class below keeps an internal list, where each element is a tuple, the first member of which is a key, calculated at element insertion time using the key parameter, passed at Heap instantiation:
# -*- coding: utf-8 -*-
import heapq
class MyHeap(object):
def __init__(self, initial=None, key=lambda x:x):
self.key = key
self.index = 0
if initial:
self._data = [(key(item), i, item) for i, item in enumerate(initial)]
self.index = len(self._data)
heapq.heapify(self._data)
else:
self._data = []
def push(self, item):
heapq.heappush(self._data, (self.key(item), self.index, item))
self.index += 1
def pop(self):
return heapq.heappop(self._data)[2]
(The extra self.index part is to avoid clashes when the evaluated key value is a draw and the stored value is not directly comparable - otherwise heapq could fail with TypeError)

Define a class, in which override the __lt__() function. See example below (works in Python 3.7):
import heapq
class Node(object):
def __init__(self, val: int):
self.val = val
def __repr__(self):
return f'Node value: {self.val}'
def __lt__(self, other):
return self.val < other.val
heap = [Node(2), Node(0), Node(1), Node(4), Node(2)]
heapq.heapify(heap)
print(heap) # output: [Node value: 0, Node value: 2, Node value: 1, Node value: 4, Node value: 2]
heapq.heappop(heap)
print(heap) # output: [Node value: 1, Node value: 2, Node value: 2, Node value: 4]

The heapq documentation suggests that heap elements could be tuples in which the first element is the priority and defines the sort order.
More pertinent to your question, however, is that the documentation includes a discussion with sample code of how one could implement their own heapq wrapper functions to deal with the problems of sort stability and elements with equal priority (among other issues).
In a nutshell, their solution is to have each element in the heapq be a triple with the priority, an entry count and the element to be inserted. The entry count ensures that elements with the same priority a sorted in the order they were added to the heapq.

setattr(ListNode, "__lt__", lambda self, other: self.val <= other.val)
Use this for comparing values of objects in heapq

The limitation with both answers is that they don't allow ties to be treated as ties. In the first, ties are broken by comparing items, in the second by comparing input order. It is faster to just let ties be ties, and if there are a lot of them it could make a big difference. Based on the above and on the docs, it is not clear if this can be achieved in heapq. It does seem strange that heapq does not accept a key, while functions derived from it in the same module do.
P.S.:
If you follow the link in the first comment ("possible duplicate...") there is another suggestion of defining le which seems like a solution.

In python3, you can use cmp_to_key from functools module. cpython source code.
Suppose you need a priority queue of triplets and specify the priority use the last attribute.
from heapq import *
from functools import cmp_to_key
def mycmp(triplet_left, triplet_right):
key_l, key_r = triplet_left[2], triplet_right[2]
if key_l > key_r:
return -1 # larger first
elif key_l == key_r:
return 0 # equal
else:
return 1
WrapperCls = cmp_to_key(mycmp)
pq = []
myobj = tuple(1, 2, "anystring")
# to push an object myobj into pq
heappush(pq, WrapperCls(myobj))
# to get the heap top use the `obj` attribute
inner = pq[0].obj
Performance Test:
Environment
python 3.10.2
Code
from functools import cmp_to_key
from timeit import default_timer as time
from random import randint
from heapq import *
class WrapperCls1:
__slots__ = 'obj'
def __init__(self, obj):
self.obj = obj
def __lt__(self, other):
kl, kr = self.obj[2], other.obj[2]
return True if kl > kr else False
def cmp_class2(obj1, obj2):
kl, kr = obj1[2], obj2[2]
return -1 if kl > kr else 0 if kl == kr else 1
WrapperCls2 = cmp_to_key(cmp_class2)
triplets = [[randint(-1000000, 1000000) for _ in range(3)] for _ in range(100000)]
# tuple_triplets = [tuple(randint(-1000000, 1000000) for _ in range(3)) for _ in range(100000)]
def test_cls1():
pq = []
for triplet in triplets:
heappush(pq, WrapperCls1(triplet))
def test_cls2():
pq = []
for triplet in triplets:
heappush(pq, WrapperCls2(triplet))
def test_cls3():
pq = []
for triplet in triplets:
heappush(pq, (-triplet[2], triplet))
start = time()
for _ in range(10):
test_cls1()
# test_cls2()
# test_cls3()
print("total running time (seconds): ", -start+(start:=time()))
Results
use list instead of tuple, per function:
WrapperCls1: 16.2ms
WrapperCls1 with __slots__: 9.8ms
WrapperCls2: 8.6ms
move the priority attribute into the first position ( don't support custom predicate ): 6.0ms.
Therefore, this method is slightly faster than using a custom class with an overridden __lt__() function and the __slots__ attribute.

Simple and Recent
A simple solution is to store entries as a list of tuples for each tuple define the priority in your desired order if you need a different order for each item within the tuple just make it the negative for descending order.
See the official heapq python documentation in this topic Priority Queue Implementation Notes

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find closest value algorithm - python

You can use the utilities in the bisect module. You will have to evaluate x on data though, i.e. list(f(x) for x in data) to get a monotonic / sorted list to bisect. I am not aware of a binary search in the standard library that works directly on f and data.

Related

Evaluate function over set if intervals

Python: is it possible to return a variable amount of variables as requested?

SICP "streams as signals" in Python

calculating current value based on previous value

heapq with custom compare predicate

Categories

Resources