Find nearest neighbour in a more pythonic way

Find nearest neighbour in a more pythonic way - python

A is a point, and P is a list of points.
I want to find which point P[i] is the closest to A, i.e. I want to find P[i_0] with:
i_0 = argmin_i || A - P[i]||^2
I do it this way:
import numpy as np
# P is a list of 4 points
P = [np.array([-1, 0, 7, 3]), np.array([5, -2, 8, 1]), np.array([0, 2, -3, 4]), np.array([-9, 11, 3, 4])]
A = np.array([1, 2, 3, 4])
distance = 1000000000 # better would be : +infinity
closest = None
for p in P:
delta = sum((p - A)**2)
if delta < distance:
distance = delta
closest = p
print closest # the closest point to A among all the points in P
It works, but how to do this in a shorter/more Pythonic way?
More generally in Python (and even without using Numpy), how to find k_0 such that D[k_0] = min D[k]? i.e. k_0 = argmin_k D[k]

A more Pythonic way of implementing the same algorithm you're using is to replace your loop with a call to min with a key function:
closest = min(P, key=lambda p: sum((p - A)**2))
Note that I'm using ** for exponentiation (^ is the binary-xor operator in Python).

A fully vectorized approach in numpy. Similar to the one of #MikeMüller, but using numpy's broadcasting to avoid lambda functions.
With the example data:
>>> P = [np.array([-1, 0, 7, 3]), np.array([5, -2, 8, 1]), np.array([0, 2, -3, 4]), np.array([-9, 11, 3, 4])]
>>> A = np.array([1, 2, 3, 4])
And making P a 2D numpy array:
>>> P = np.asarray(P)
>>> P
array([[-1, 0, 7, 3],
[ 5, -2, 8, 1],
[ 0, 2, -3, 4],
[-9, 11, 3, 4]])
It can be computed in one line using numpy:
>>> P[np.argmin(np.sum((P - A)**2, axis=1))]
Note that P - A, with P.shape = (N, 4) and A.shape = (4,) will brooadcast the substraction to all the rows of P (Pi = Pi - A).
For small N (number of rows in P), the pythonic approach is probably faster. For large values of N this should be significantly faster.

A NumPy version as one-liner:
clostest = P[np.argmin(np.apply_along_axis(lambda p: np.sum((p - A) **2), 1, P))]

Usage of the builtin min is the way for this:
import math
p1 = [1,2]
plst = [[1,3], [10,10], [5,5]]
res = min(plst, key=lambda x: math.sqrt(pow(p1[0]-x[0], 2) + pow(p1[1]-x[1], 2)))
print res
[1, 3]
Note that I just used plain python lists.

Related

Any efficient analogue of argsort for array of indices with NumPy?

I have an array of indices like a = [2, 4, 1, 0, 3] and I want to transform it into np.argsort(a) = [3, 2, 0, 4, 1].
The problem is that argsort has O(n*log(n)) timing, but for my case it may be O(n) and I even have code for this:
b = np.zeros(a.size)
for i in range(a.size):
b[a[i]] = i
The second problem is that cycles are slow in Python and I hope that it's possible to use some NumPy tricks to achieve the goal.

Do you have all numbers for 0 to len(a)-1?
Then use smart indexing:
a = [2, 4, 1, 0, 3]
b = np.empty(len(a), dtype=int) # or b = np.empty_like(a)
b[a] = np.arange(len(a))
b
output: array([3, 2, 0, 4, 1])

Joint accumulation of addition and multiplication

I have an array:
a = np.array([1, 2, 3, 1, 3, 4, 2, 4])
and I want to do the following calculation:
out = 0
for e in a:
out *= 3
out += e
With out as the output (4582 for the given example), is there a nice way to vectorize this? I think einsum can be used, but I couldn't figure how to write it.

One approach:
import numpy as np
a = np.array([1, 2, 3, 1, 3, 4, 2, 4])
powers = np.multiply.accumulate(np.repeat(3, len(a) - 1))
res = np.sum(powers[::-1] * a[:-1]) + a[-1]
print(res)
Output
4582
If you expand the loop, you'll notice that you are multiplying each value of a by a power of 3 and then summing the result.

Personally I would use reduce:
reduce(lambda x, y: x * 3 + y, a)

Average difference between ints in two lists in one line - Python

There are two non-empty lists, containing only ints, both have the same length.
Our function needs to return the average absolute difference between ints of same index.
For example, for the lists [1, 2, 3, 4] and [1, 1, 1, 1], the answer will be 1.5.
The function needs to be completed in one line.
I had a little something that does that, but as you can probably guess, it's not a one-liner:
def avg_diff(a, b):
sd = 0.0
for x, y in zip(a, b):
sd += abs(x - y)
return sd / len(a)
Thanks.

In Python 3.4 we got some statistic functions in the standard library, including statistics.mean.
Using this function and a generator-expression:
from statistics import mean
a = [1, 2, 3, 4]
b = [1, 1, 1, 1]
mean(abs(x - y) for x, y in zip(a, b))
# 1.5

a = [1, 2, 3, 4]
b = [1, 1, 1, 1]
sum([abs(i - j) for i, j in zip(a,b)]) / float(len(a))

If you are happy to use a 3rd party library, numpy provides one way:
import numpy as np
A = np.array([1, 2, 3, 4])
B = np.array([1, 1, 1, 1])
res = np.mean(np.abs(A - B))
# 1.5

Using the in-built sum and len functions on list:
lst1 = [1, 2, 3, 4]
lst2 = [1, 1, 1, 1]
diff = [abs(x-y) for x, y in zip(lst1, lst2)] # find index-wise differences
print(sum(diff)/len(diff)) # divide sum of differences by total
# 1.5

How to get lists of indices to unique values efficiently?

Is there a built-in method that would help me achieve the following efficiently: given an array, I need a list of arrays, each with indices to a different unique value of the array?
If f is the desired function,
b = f(a)
and
u, idxs = unique(a)
then
b[i] == where(idxs==i)[0]
I am aware that pandas.Series.groupby() can do this, but it may no be efficient to create a dict when there are over 10^5 unique integers.

If you have numpy >= 1.9 you can do:
>>> a = np.random.randint(5, size=10)
>>> a
array([0, 2, 4, 4, 2, 4, 4, 3, 2, 1])
>>> unq, unq_inv, unq_cnt = np.unique(a, return_inverse=True, return_counts=True)
>>> np.split(np.argsort(unq_inv), np.cumsum(unq_cnt[:-1]))
[array([0]), array([9]), array([1, 4, 8]), array([7]), array([2, 3, 5, 6])]
>>> unq
array([0, 1, 2, 3, 4])
In earlier versions, you can get the counts doing an extra:
>>> unq_cnt = np.bincount(unq_inv)
Also, if you want to make sure that the indices for each value are sorted, I think you will need to use a stable sort, e.g. np.argsort(unq_inv, kind='mergesort')
Thinking about what you seem to be after, which I think is minimizing calls to an expensive function, I don't think you need to do what you are asking. Say that your function was squaring, you could simply do:
>>> unq, unq_inv = np.unique(a, return_inverse=True)
>>> f_unq = unq**2
>>> f_a = f_unq[unq_inv]
>>> a
array([0, 2, 4, 4, 2, 4, 4, 3, 2, 1])
>>> f_a
array([ 0, 4, 16, 16, 4, 16, 16, 9, 4, 1])

def foo(a):
I=np.arange(a.shape[0])
d={}
while a.shape[0]:
x = a[0]
ii = a==x
d[x] = I[ii]
a = a[~ii]
I = I[~ii]
return d
In [767]: a
Out[767]: array([4, 4, 3, 0, 0, 2, 1, 1, 0, 3])
In [768]: foo(a)
Out[768]:
{0: array([3, 4, 8]),
1: array([6, 7]),
2: array([5]),
3: array([2, 9]),
4: array([0, 1])}
Is this the sort of dictionary that you want?
For small a this works fine.
An equivalent dictionary building function is:
def foo1(a):
unq = np.unique(a)
return {i:np.where(a==i)[0] for i in unq}
Off hand I don't see how unq_inv helps with building the dictionary.
foo is about 30% slower than foo1. I was hoping that by reducing the searched array each time a value was counted that I might gain some speed. But it looks like the extra bookkeeping chews up time. And the where time might not be that sensitive to the length of a.
For a2=np.random.randint(5000,size=100000) run times are on the order of 2-3 sec.
But np.random.randint(50000,size=1000000) takes too long to time (for either version).
On further experimentation, a 'dumb' approach using a collections.defaultdict is much faster (20x):
def food(a):
d = defaultdict(list)
for i,j in enumerate(a):
d[j].append(i)
return d
The 'too big' (1000000,) array takes only 1.1 sec;

Maybe do something like:
s = argsort(a)
d = diff(a[s])
starts = where(d)[0]
f = [s[starts[i:i+1]] for i in xrange(len(a))]
(code not checked)

How to update a Numpy array sequentially without loop

I have a Numpy array v and I want to update each element using a function on the current element of the array :
v[i] = f(v, i)
A basic way to do this is to use a loop
for i in xrange(2, len(v)):
v[i] = f(v, i)
Hence the value used to update v[i] is the updated array v. Is there a way to do these updates without a loop ?
For example,
v = [f(v, i) for i in xrange(len(v))]
does not work since the v[i-1] is not updated when it is used in the comprehensive list.
IThe function f can depend on several elements on the list, those with index lower than i should be updated and those with an index greater than i are not yet updated, as in the following example :
v = [1, 2, 3, 4, 5]
f = lambda v, i: (v[i-1] + v[i]) / v[i+1] # for i = [1,3]
f = lambda v, i: v[i] # for i = {0,4}
it should return
v = [1, (1+2)/3, (1+4)/4, ((5/4)+4)/5, 5]

There is a function for this:
import numpy
v = numpy.array([1, 2, 3, 4, 5])
numpy.add.accumulate(v)
#>>> array([ 1, 3, 6, 10, 15])
This works on many different types of ufunc:
numpy.multiply.accumulate(v)
#>>> array([ 1, 2, 6, 24, 120])
For an arbitrary function doing this kind of accumulation, you can make your own ufunc, although this will be much slower:
myfunc = numpy.frompyfunc(lambda x, y: x + y, 2, 1)
myfunc.accumulate([1, 2, 3], dtype=object)
#>>> array([1, 3, 6], dtype=object)

you can use sum function for sum the numbers before v[i]:
>>> v = [v[i] + sum(v[:i]) for i in xrange(len(v))]
>>> v
[1, 3, 6, 10, 15]
or in a better way you can use np.cumsum()
>>> np.cumsum(v)
array([ 1, 3, 6, 10, 15])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find nearest neighbour in a more pythonic way - python

A more Pythonic way of implementing the same algorithm you're using is to replace your loop with a call to min with a key function: closest = min(P, key=lambda p: sum((p - A)2)) Note that I'm using for exponentiation (^ is the binary-xor operator in Python).

A NumPy version as one-liner: clostest = P[np.argmin(np.apply_along_axis(lambda p: np.sum((p - A) **2), 1, P))]

Usage of the builtin min is the way for this: import math p1 = [1,2] plst = [[1,3], [10,10], [5,5]] res = min(plst, key=lambda x: math.sqrt(pow(p1[0]-x[0], 2) + pow(p1[1]-x[1], 2))) print res [1, 3] Note that I just used plain python lists.

Related

Any efficient analogue of argsort for array of indices with NumPy?

Joint accumulation of addition and multiplication

Average difference between ints in two lists in one line - Python

How to get lists of indices to unique values efficiently?

How to update a Numpy array sequentially without loop

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find nearest neighbour in a more pythonic way - python

A more Pythonic way of implementing the same algorithm you're using is to replace your loop with a call to min with a key function: closest = min(P, key=lambda p: sum((p - A)**2)) Note that I'm using ** for exponentiation (^ is the binary-xor operator in Python).

A NumPy version as one-liner: clostest = P[np.argmin(np.apply_along_axis(lambda p: np.sum((p - A) **2), 1, P))]

Usage of the builtin min is the way for this: import math p1 = [1,2] plst = [[1,3], [10,10], [5,5]] res = min(plst, key=lambda x: math.sqrt(pow(p1[0]-x[0], 2) + pow(p1[1]-x[1], 2))) print res [1, 3] Note that I just used plain python lists.

Related

Any efficient analogue of argsort for array of indices with NumPy?

Joint accumulation of addition and multiplication

Average difference between ints in two lists in one line - Python

How to get lists of indices to unique values efficiently?

How to update a Numpy array sequentially without loop

Categories

Resources

A more Pythonic way of implementing the same algorithm you're using is to replace your loop with a call to min with a key function: closest = min(P, key=lambda p: sum((p - A)2)) Note that I'm using for exponentiation (^ is the binary-xor operator in Python).