Related
Use map to evaluate a given polynomial at a specific x-value.
Input:
p: A list of coefficients for increasing powers of x
x: The value of x to evaluate
Output: Number representing the value of the evaluated polynomial
Example: poly_eval([1, 2, 3], 2) = 1(2)^0 + 2(2)^1 + 3(2)^2 = 17
def poly_eval(coeff_list, x):
total = 0
for i, coeff in enumerate(coeff_list):
total += coeff * x**i
return total
or if you really want to use map :
def poly_eval(coeff_list, x):
n = len(coeff_list)
return sum(map(lambda coeff, x, y: coeff*x**y, coeff_list, [x]*n, range(n)))
This is actually an interesting question. Since the answer is relatively simple and the pen and paper solution is known by everybody the real thing is kind of overlooked.
As mentioned, normally most people would approach like how it's done by pen and paper. However there is a better way which is more suitable for coding purposes, known as the Ruffini Horner method. This is a perfect case for reducing.
Write your polynomial in an array. So y = x^3-7x+7 would be var y = [1,0,-7,7].
Then a simple function;
var calcP = (y,x) => y.reduce((p,c) => p*x+c);
That's it.
E.g., given
from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])
points = [Point(x=1.0, y=1.0), Point(x=2.0, y=2.0), Point(x=5.0, y=5.0)]
target = Point(x=4.5, y=5.0)
closest_point = find_closest(target, points)
I want to return Point(x=5.0, y=5.0). Ideally I'd like to use a built in function that takes (list, target, comp) where comp takes (a, b) -> float and the goal is to find such a from list that minimizes (a, target), e.g.:
closest_point = find_closest(points, target, dist) # where dist is (a.x-b.x)**2 + (a.y-b.y)**2
The reason I'm interested in this is because I found myself writing 3 duplicated functions where the only difference is that dist functions (and they use different fields to compute it).
The min function can take a key argument, which is a function to use as your comparator. In this case you can write a lambda to compute the distance from each point to target.
>>> min(points, key=lambda pt : sqrt((target.x - pt.x)**2 + (target.y - pt.y)**2))
Point(x=5.0, y=5.0)
You can try following example code:
def find_closest(target, points):
dist = []
for i in points:
dist.append((i.x - target.x)**2 + (i.y - target.y)**2)
min_index = dist.index(min(dist))
return points[min_index]
closest_point = find_closest(target, points)
It gives output: Point(x=5.0, y=5.0)
I want to solve the following problem with python, if possible with sympy.
Let n be a fixed positive number. Let p=(p_1,...p_n) be a fixed known vector of positive integers. Let d be a fixed, known positive integer. Let q=(q_1,...,q_n) be a vector of unknown nonnegative integers.
How can I get all the solutions of p.q=d?
Where . means dot product.
Actually I can solve this for each individual n. But I want to create a function
def F(n,p,d):
...
return result
Such that result is a, e.g., list of all solutions. Note that from the restrictions made above, there is a finite number of solutions for each triplet of data (n,p,d).
I can't figure a way to do this, so any suggestion will be appreciated.
Added.
Example: suppose n=3 (the case n=2 is trivial), p=(2,1,3), d=3. Then I would do something like
res=[]
for i in range (d):
for j in range (d):
k=d-p[0]*i-p[2]*j
if k>=0:
res.append([i,k,j])
Then res=[[0, 3, 0], [0, 0, 1], [1, 1, 0]] which is correct.
As you can imagine, the bigger n is, the more for loops I need if I want to follow the same idea. So I do not think this is a good way to do it for arbitrary n, say n=57 or whatever big enough...
Following the algorithm you provided:
from itertools import product
dot = lambda X, Y: sum(x * y for x, y in zip(X, Y))
p = [1, 2, 3, ...] # Whatever fixed value you have for `p`
d = 100 # Fixed d
results = []
for q in product(range(0, d+1), repeat=len(p)):
if dot(p, q) == d:
results.append(q)
However this is slightly inefficient since it is possible to determine prior to computing the entire dot product, whether k will be positive. So let's define the dot product like this:
def dot(X, Y, d):
total = 0
for x, y in zip(X, Y):
total += x * y
if total > d:
return -1
return total
Now, as soon as the total exceeds d, the calculation exits. You can also express this as a list comprehension:
results = [q for q in product(range(0, d+1), repeat=len(p)) if dot(p, q, d) == d]
I am trying to compute in Python the length of the path from a point A to a point B going through a list of intermediary points. I know how to do it but I do want to use the reduce Built-in function.
Why I tried so far, please note that it is completely wrong, is this:
reduce(lambda x,y: math.sqrt((y[1]-y[0])**2+(x[1]-x[0])**2) , ((1,2),(3,4),(1,8)))
Any idea?
Thanks.
You should map before you reduce.
points = [(1, 2), (3, 4), (1, 8)]
distances = (math.hypot(b[0]-a[0], b[1]-a[1])
for a, b in zip(points, points[1:]))
total_distance = sum(distances)
or, if you must use reduce(), although sum() is better for this purpose:
import operator
total_distance = reduce(operator.add, distances)
If you have a lot of points, you might find NumPy helpful in doing this all at once, quickly:
import numpy
total_distance = numpy.hypot(*numpy.diff(numpy.array(points), axis=0)).sum()
Edit: use math.hypot() and add NumPy method.
It isn't pretty but it can be done :-)
>>> tot = ((1,2),(3,4),(1,8))
>>> reduce(lambda d,((x0,y0),(x1,y1)): d + ((x1-x0)**2+(y1-y0)**2)**0.5, zip(tot[1:], tot[0:]), 0.0)
7.3005630797457695
reduce() is simply the wrong tool for this purpose. It is possible to do it with reduce(), but it is a bit weird:
def distance((x, d), y):
return y, d + math.hypot(y[0] - x[0], y[1] - x[1])
print reduce(distance, [(3,4),(1,8)], ((1, 2), 0.0))[1]
prints
7.30056307975
The last parameter passed to the reduce() call is the starting point and the initial value for the distance.
reduce does not work that way, you start with an initial value a, which you specify or is taken as first element from your iterable. afterwards, you pass a,next_element to the function (lambda) provided and store the result in a, repeat until all elements are iterated.
You can do what you want with sum and map by first calculating all distances from one point to the next and then summing them:
path = [(1,2),(3,4),(1,8)]
sum(map(lambda x,y: math.sqrt((x[0]-y[0])**2+(x[1]-y[1])**2), path[:-1],path[1:]))
edit: or with the hypot function (thx #ralu):
sum(map(lambda x,y: math.hypot(x[0]-y[0],x[1]-y[1]), path[:-1],path[1:]))
This is just not the sort of code you want to write.
Reduce won't be a good solution.
I suggest a iterative one.
It will be the most readable, pythonic and maintainable solution.
import math
path = [(1,2),(3,4),(1,8)]
def calc_dist(waypoints):
dist = 0.0
for i in range(len(waypoints) - 1):
a = waypoints[i]
b = waypoints[i+1]
dist += math.hypot(a[0]-b[0], b[1]-a[1])
return dist
print calc_dist( path )
Here is a redux meta-iterator that can be combined with the built-in reduce to get the result you want. This implementation avoids all buffering of the input sequence.
def redux(f):
def execute(iterable):
iterable = iter(iterable)
try:
state = iterable.next()
except StopIteration:
raise ValueError, 'empty sequences not supported'
while True:
newstate = iterable.next()
yield f(state, newstate)
state = newstate
return execute
f = redux(lambda x, y: math.sqrt((y[0] - x[0])**2 + (y[1] - x[1])**2))
print reduce(operator.add, f(((1,2),(3,4),(1,8))))
The above prints 7.30056307975.
The redux function can be generalized to support more than two arguments at a time in a sliding window, by using inspect.getargspec to count the number of arguments required by its function argument.
I'm aware that what I'm about to suggest is not ideal, but I think this is as close as I can get for my contribution. This is a fun problem to solve, even if it isn't the most traditional application of reduce.
The key issue seems to be keeping track of the distance from point to point without overwriting the points themselves- adding another 'dimension' to each point gives you a field with which you can track the running distance.
iterable = ((1,2,0), (3,4,0), (1,8,0))
# originally ((1,2), (3,4), (1,8))
from math import sqrt
def func(tup1, tup2):
'''function to pass to reduce'''
# extract coordinates
x0 = tup1[0]
x1 = tup2[0]
y0 = tup1[1]
y1 = tup2[1]
dist = tup1[2] # retrieve running total for distance
dx = x1 - x0 # find change in x
dy = y1 - y0 # find change in y
# add new distance to running total
dist += sqrt(dx**2 + dy**2)
# return 2nd point with the updated distance
return tup2[:-1] + (dist,) # e.g. (3, 4, 2.828)
Now reduce:
reduce(func, iterable)[-1]
# returns 7.3005630797457695
This way, the intermediate tuple of tuples (i.e., after one 'reduction') becomes:
((3, 4, 2.8284271247461903), (1,8,0))
Just for fun, here is an alternate solution with a slightly different approach than the reduce(sum, map(hypot, zip(...))) approach.
tot = ((1,2),(3,4),(1,8))
reduce(lambda (d,(x,y)),b: (d+math.hypot(x-b[0],y-b[1]), b), tot, (0, tot[0]))[0]
Note that the reduce actually returns the tuple (distance, last point), hence the [0] at the end. I think this would be more efficient than zip solutions but haven't actually checked.
What is the fastest way to check if a value exists in a very large list?
7 in a
Clearest and fastest way to do it.
You can also consider using a set, but constructing that set from your list may take more time than faster membership testing will save. The only way to be certain is to benchmark well. (this also depends on what operations you require)
As stated by others, in can be very slow for large lists. Here are some comparisons of the performances for in, set and bisect. Note the time (in second) is in log scale.
Code for testing:
import random
import bisect
import matplotlib.pyplot as plt
import math
import time
def method_in(a, b, c):
start_time = time.time()
for i, x in enumerate(a):
if x in b:
c[i] = 1
return time.time() - start_time
def method_set_in(a, b, c):
start_time = time.time()
s = set(b)
for i, x in enumerate(a):
if x in s:
c[i] = 1
return time.time() - start_time
def method_bisect(a, b, c):
start_time = time.time()
b.sort()
for i, x in enumerate(a):
index = bisect.bisect_left(b, x)
if index < len(a):
if x == b[index]:
c[i] = 1
return time.time() - start_time
def profile():
time_method_in = []
time_method_set_in = []
time_method_bisect = []
# adjust range down if runtime is too long or up if there are too many zero entries in any of the time_method lists
Nls = [x for x in range(10000, 30000, 1000)]
for N in Nls:
a = [x for x in range(0, N)]
random.shuffle(a)
b = [x for x in range(0, N)]
random.shuffle(b)
c = [0 for x in range(0, N)]
time_method_in.append(method_in(a, b, c))
time_method_set_in.append(method_set_in(a, b, c))
time_method_bisect.append(method_bisect(a, b, c))
plt.plot(Nls, time_method_in, marker='o', color='r', linestyle='-', label='in')
plt.plot(Nls, time_method_set_in, marker='o', color='b', linestyle='-', label='set')
plt.plot(Nls, time_method_bisect, marker='o', color='g', linestyle='-', label='bisect')
plt.xlabel('list size', fontsize=18)
plt.ylabel('log(time)', fontsize=18)
plt.legend(loc='upper left')
plt.yscale('log')
plt.show()
profile()
You could put your items into a set. Set lookups are very efficient.
Try:
s = set(a)
if 7 in s:
# do stuff
edit In a comment you say that you'd like to get the index of the element. Unfortunately, sets have no notion of element position. An alternative is to pre-sort your list and then use binary search every time you need to find an element.
The original question was:
What is the fastest way to know if a value exists in a list (a list
with millions of values in it) and what its index is?
Thus there are two things to find:
is an item in the list, and
what is the index (if in the list).
Towards this, I modified #xslittlegrass code to compute indexes in all cases, and added an additional method.
Results
Methods are:
in--basically if x in b: return b.index(x)
try--try/catch on b.index(x) (skips having to check if x in b)
set--basically if x in set(b): return b.index(x)
bisect--sort b with its index, binary search for x in sorted(b).
Note mod from #xslittlegrass who returns the index in the sorted b,
rather than the original b)
reverse--form a reverse lookup dictionary d for b; then
d[x] provides the index of x.
Results show that method 5 is the fastest.
Interestingly the try and the set methods are equivalent in time.
Test Code
import random
import bisect
import matplotlib.pyplot as plt
import math
import timeit
import itertools
def wrapper(func, *args, **kwargs):
" Use to produced 0 argument function for call it"
# Reference https://www.pythoncentral.io/time-a-python-function/
def wrapped():
return func(*args, **kwargs)
return wrapped
def method_in(a,b,c):
for i,x in enumerate(a):
if x in b:
c[i] = b.index(x)
else:
c[i] = -1
return c
def method_try(a,b,c):
for i, x in enumerate(a):
try:
c[i] = b.index(x)
except ValueError:
c[i] = -1
def method_set_in(a,b,c):
s = set(b)
for i,x in enumerate(a):
if x in s:
c[i] = b.index(x)
else:
c[i] = -1
return c
def method_bisect(a,b,c):
" Finds indexes using bisection "
# Create a sorted b with its index
bsorted = sorted([(x, i) for i, x in enumerate(b)], key = lambda t: t[0])
for i,x in enumerate(a):
index = bisect.bisect_left(bsorted,(x, ))
c[i] = -1
if index < len(a):
if x == bsorted[index][0]:
c[i] = bsorted[index][1] # index in the b array
return c
def method_reverse_lookup(a, b, c):
reverse_lookup = {x:i for i, x in enumerate(b)}
for i, x in enumerate(a):
c[i] = reverse_lookup.get(x, -1)
return c
def profile():
Nls = [x for x in range(1000,20000,1000)]
number_iterations = 10
methods = [method_in, method_try, method_set_in, method_bisect, method_reverse_lookup]
time_methods = [[] for _ in range(len(methods))]
for N in Nls:
a = [x for x in range(0,N)]
random.shuffle(a)
b = [x for x in range(0,N)]
random.shuffle(b)
c = [0 for x in range(0,N)]
for i, func in enumerate(methods):
wrapped = wrapper(func, a, b, c)
time_methods[i].append(math.log(timeit.timeit(wrapped, number=number_iterations)))
markers = itertools.cycle(('o', '+', '.', '>', '2'))
colors = itertools.cycle(('r', 'b', 'g', 'y', 'c'))
labels = itertools.cycle(('in', 'try', 'set', 'bisect', 'reverse'))
for i in range(len(time_methods)):
plt.plot(Nls,time_methods[i],marker = next(markers),color=next(colors),linestyle='-',label=next(labels))
plt.xlabel('list size', fontsize=18)
plt.ylabel('log(time)', fontsize=18)
plt.legend(loc = 'upper left')
plt.show()
profile()
def check_availability(element, collection: iter):
return element in collection
Usage
check_availability('a', [1,2,3,4,'a','b','c'])
I believe this is the fastest way to know if a chosen value is in an array.
a = [4,2,3,1,5,6]
index = dict((y,x) for x,y in enumerate(a))
try:
a_index = index[7]
except KeyError:
print "Not found"
else:
print "found"
This will only be a good idea if a doesn't change and thus we can do the dict() part once and then use it repeatedly. If a does change, please provide more detail on what you are doing.
Be aware that the in operator tests not only equality (==) but also identity (is), the in logic for lists is roughly equivalent to the following (it's actually written in C and not Python though, at least in CPython):
for element in s:
if element is target:
# fast check for identity implies equality
return True
if element == target:
# slower check for actual equality
return True
return False
In most circumstances this detail is irrelevant, but in some circumstances it might leave a Python novice surprised, for example, numpy.NAN has the unusual property of being not being equal to itself:
>>> import numpy
>>> numpy.NAN == numpy.NAN
False
>>> numpy.NAN is numpy.NAN
True
>>> numpy.NAN in [numpy.NAN]
True
To distinguish between these unusual cases you could use any() like:
>>> lst = [numpy.NAN, 1 , 2]
>>> any(element == numpy.NAN for element in lst)
False
>>> any(element is numpy.NAN for element in lst)
True
Note the in logic for lists with any() would be:
any(element is target or element == target for element in lst)
However, I should emphasize that this is an edge case, and for the vast majority of cases the in operator is highly optimised and exactly what you want of course (either with a list or with a set).
If you only want to check the existence of one element in a list,
7 in list_data
is the fastest solution. Note though that
7 in set_data
is a near-free operation, independently of the size of the set! Creating a set from a large list is 300 to 400 times slower than in, so if you need to check for many elements, creating a set first is faster.
Plot created with perfplot:
import perfplot
import numpy as np
def setup(n):
data = np.arange(n)
np.random.shuffle(data)
return data, set(data)
def list_in(data):
return 7 in data[0]
def create_set_from_list(data):
return set(data[0])
def set_in(data):
return 7 in data[1]
b = perfplot.bench(
setup=setup,
kernels=[list_in, set_in, create_set_from_list],
n_range=[2 ** k for k in range(24)],
xlabel="len(data)",
equality_check=None,
)
b.save("out.png")
b.show()
It sounds like your application might gain advantage from the use of a Bloom Filter data structure.
In short, a bloom filter look-up can tell you very quickly if a value is DEFINITELY NOT present in a set. Otherwise, you can do a slower look-up to get the index of a value that POSSIBLY MIGHT BE in the list. So if your application tends to get the "not found" result much more often then the "found" result, you might see a speed up by adding a Bloom Filter.
For details, Wikipedia provides a good overview of how Bloom Filters work, and a web search for "python bloom filter library" will provide at least a couple useful implementations.
This is not the code, but the algorithm for very fast searching.
If your list and the value you are looking for are all numbers, this is pretty straightforward. If strings: look at the bottom:
-Let "n" be the length of your list
-Optional step: if you need the index of the element: add a second column to the list with current index of elements (0 to n-1) - see later
Order your list or a copy of it (.sort())
Loop through:
Compare your number to the n/2th element of the list
If larger, loop again between indexes n/2-n
If smaller, loop again between indexes 0-n/2
If the same: you found it
Keep narrowing the list until you have found it or only have 2 numbers (below and above the one you are looking for)
This will find any element in at most 19 steps for a list of 1.000.000 (log(2)n to be precise)
If you also need the original position of your number, look for it in the second, index column.
If your list is not made of numbers, the method still works and will be fastest, but you may need to define a function which can compare/order strings.
Of course, this needs the investment of the sorted() method, but if you keep reusing the same list for checking, it may be worth it.
Edge case for spatial data
There are probably faster algorithms for handling spatial data (e.g. refactoring to use a k-d tree), but the special case of checking if a vector is in an array is useful:
If you have spatial data (i.e. cartesian coordinates)
If you have integer masks (i.e. array filtering)
In this case, I was interested in knowing if an (undirected) edge defined by two points was in a collection of (undirected) edges, such that
(pair in unique_pairs) | (pair[::-1] in unique_pairs) for pair in pairs
where pair constitutes two vectors of arbitrary length (i.e. shape (2,N)).
If the distance between these vectors is meaningful, then the test can be expressed by a floating point inequality like
test_result = Norm(v1 - v2) < Tol
and "Value exists in List" is simply any(test_result).
Example code and dummy test set generators for integer pairs and R3 vector pairs are below.
# 3rd party
import numpy as np
import numpy.linalg as LA
import matplotlib.pyplot as plt
# optional
try:
from tqdm import tqdm
except ModuleNotFoundError:
def tqdm(X, *args, **kwargs):
return X
print('tqdm not found. tqdm is a handy progress bar module.')
def get_float_r3_pairs(size):
""" generate dummy vector pairs in R3 (i.e. case of spatial data) """
coordinates = np.random.random(size=(size, 3))
pairs = []
for b in coordinates:
for a in coordinates:
pairs.append((a,b))
pairs = np.asarray(pairs)
return pairs
def get_int_pairs(size):
""" generate dummy integer pairs (i.e. case of array masking) """
coordinates = np.random.randint(0, size, size)
pairs = []
for b in coordinates:
for a in coordinates:
pairs.append((a,b))
pairs = np.asarray(pairs)
return pairs
def float_tol_pair_in_pairs(pair:np.ndarray, pairs:np.ndarray) -> np.ndarray:
"""
True if abs(a0 - b0) <= tol & abs(a1 - b1) <= tol for (ai1, aj2), (bi1, bj2)
in [(a01, a02), ... (aik, ajl)]
NB this is expected to be called in iteration so no sanitization is performed.
Parameters
----------
pair : np.ndarray
pair of vectors with shape (2, M)
pairs : np.ndarray
collection of vector pairs with shape (N, 2, M)
Returns
-------
np.ndarray
(pair in pairs) | (pair[::-1] in pairs).
"""
m1 = np.sum( abs(LA.norm(pairs - pair, axis=2)) <= (1e-03, 1e-03), axis=1 ) == 2
m2 = np.sum( abs(LA.norm(pairs - pair[::-1], axis=2)) <= (1e-03, 1e-03), axis=1 ) == 2
return m1 | m2
def get_unique_pairs(pairs:np.ndarray) -> np.ndarray:
"""
apply float_tol_pair_in_pairs for pair in pairs
Parameters
----------
pairs : np.ndarray
collection of vector pairs with shape (N, 2, M)
Returns
-------
np.ndarray
pair if not ((pair in rv) | (pair[::-1] in rv)) for pair in pairs
"""
pairs = np.asarray(pairs).reshape((len(pairs), 2, -1))
rv = [pairs[0]]
for pair in tqdm(pairs[1:], desc='finding unique pairs...'):
if not any(float_tol_pair_in_pairs(pair, rv)):
rv.append(pair)
return np.array(rv)