Nearest neighbor 1 dimensional data with a specified range

Nearest neighbor 1 dimensional data with a specified range - python

I have two nested lists A and B:
A = [[50,140],[51,180],[54,500],......]
B = [[50.1, 170], [51,200],[55,510].....]
The 1st element in each inner list runs from 0 to around 1e5, the 0th element runs from around 50 up to around 700, these elements are unsorted. What i want to do, is run through each element in A[n][1] and find the closest element in B[n][1], but when searching for the nearest neighbor i want to only search within an interval defined by A[n][0] plus or minus 0.5.
I have been using the function:
def find_nearest_vector(array, value):
idx = np.array([np.linalg.norm(x+y) for (x,y) in array-value]).argmin()
return array[idx]
Which finds the nearest neighbor between the coordinates A[0][:]and B[0][:], for example. However, this I need to confine the search range to a rectangle around some small shift in the value A[0][0]. Also, I do not want to reuse elements - I want a unique bijection between each value A[n][1] to B[n][1] within the interval A[n][0] +/- 0.5.
I have been trying to use Scipy's KDTree, but this reuses elements and I don't know how to confine the search range. Effectively, I want to do a one dimensional NNN search on a two dimensional nested list along a specific axis where the neighborhood in which the NNN search is within a hyper-rectangle defined by the 0th element in each inner list plus or minus some small shift.

I use numpy.argsort(), numpy.searchsorted(), numpy.argmin() to do the search.
%pylab inline
import numpy as np
np.random.seed(0)
A = np.random.rand(5, 2)
B = np.random.rand(100, 2)
xaxis_range = 0.02
order = np.argsort(B[:, 0])
bx = B[order, 0]
sidx = np.searchsorted(bx, A[:, 0] - xaxis_range, side="right")
eidx = np.searchsorted(bx, A[:, 0] + xaxis_range, side="left")
result = []
for s, e, ay in zip(sidx, eidx, A[:, 1]):
section = order[s:e]
by = B[section, 1]
idx = np.argmin(np.abs(ay-by))
result.append(B[section[idx]])
result = np.array(result)
I plot the result as following:
plot(A[:, 0], A[:, 1], "o")
plot(B[:, 0], B[:, 1], ".")
plot(result[:, 0], result[:, 1], "x")
the output:

My understanding of your problem is that you are trying to find the closest elements for each A[n][1] in another set of points (B[i][1] restricted to points where if A[n][0] is within +/- 0.5 of B[i][0]).
(I'm not familiar with numpy or scipy, and I'm sure that there's a better way to do this with their algorithms.)
That being said, here's my naive implementation in O(a*b*log(a*b)) time.
def main(a,b):
for a_bound,a_val in a:
dist_to_valid_b_points = {abs(a_val-b_val):(b_bound,b_val) for b_bound,b_val in b if are_within_bounds(a_bound,b_bound)}
print get_closest_point((a_bound, a_val),dist_to_valid_b_points)
def are_within_bounds(a_bound, b_bound):
return abs(b_bound-a_bound) < 0.5
def get_closest_point(a_point, point_dict):
return (a_point, None if not point_dict else point_dict[min(point_dict, key=point_dict.get)])
main([[50,140],[51,180],[54,500]],[[50.1, 170], [51,200],[55,510]]) yields the following output:
((50, 140), (50.1, 170))
((51, 180), (51, 200))
((54, 500), None)

Related

Getting the right sign when calculating repeated sign switches in numpy array

I am trying to simulate a grid of spins in python that can change their orientation (represented by the sign):
>>> import numpy as np
>>> spin_values = np.random.choice([-1, 1], (2, 2))
>>> spin_values
array([[-1, 1],
[ 1, 1]])
I then throw two sets of random indices of that grid for spins that have a certain probability to switch their orientation, let's say:
>>> i = np.array([1, 1])
>>> j = np.array([0, 0])
>>> switches = np.array([-1, -1])
i and j here contain the indices that might change and switches states whether they do switch (-1) or keep their orientation (1). My idea for calculating the new orientations was:
>>> spin_values[i, j] *= switches
When a spin orientation only changes once this works fine. However, when it is supposed to change twice (as with the example values) it only changes once, therefore giving me a wrong result.
>>> spin_values
array([[-1, 1],
[-1, 1]])
How could I get the right results while having a short run time (this has to be done many times on a bigger grid)?

I would use numpy.unique to get the count of unique pairs of indices and compute -1 ** n:
idx, cnt = np.unique(np.vstack([i, j]), axis=1, return_counts=True)
spin_values[tuple(idx)] = (-1) ** cnt
Updated spin_values:
array([[-1, 1],
[ 1, 1]])

How to plot pairwise distances of two-dimensional vectors?

I have a set of data in python likes:
x y angle
If I want to calculate the distance between two points with all possible value and plot the distances with the difference between two angles.
x, y, a = np.loadtxt('w51e2-pa-2pk.log', unpack=True)
n = 0
f=(((x[n])-x[n+1:])**2+((y[n])-y[n+1:])**2)**0.5
d = a[n]-a[n+1:]
plt.scatter(f,d)
There are 255 points in my data.
f is the distance and d is the difference between two angles.
My question is can I set n = [1,2,3,.....255] and do the calculation again to get the f and d of all possible pairs?

You can obtain the pairwise distances through broadcasting by considering it as an outer operation on the array of 2-dimensional vectors as follows:
vecs = np.stack((x, y)).T
np.linalg.norm(vecs[np.newaxis, :] - vecs[:, np.newaxis], axis=2)
For example,
In [1]: import numpy as np
...: x = np.array([1, 2, 3])
...: y = np.array([3, 4, 6])
...: vecs = np.stack((x, y)).T
...: np.linalg.norm(vecs[np.newaxis, :] - vecs[:, np.newaxis], axis=2)
...:
Out[1]:
array([[ 0. , 1.41421356, 3.60555128],
[ 1.41421356, 0. , 2.23606798],
[ 3.60555128, 2.23606798, 0. ]])
Here, the (i, j)'th entry is the distance between the i'th and j'th vectors.
The case of the pairwise differences between angles is similar, but simpler, as you only have one dimension to deal with:
In [2]: a = np.array([10, 12, 15])
...: a[np.newaxis, :] - a[: , np.newaxis]
...:
Out[2]:
array([[ 0, 2, 5],
[-2, 0, 3],
[-5, -3, 0]])
Moreover, plt.scatter does not care that the results are given as matrices, and putting everything together using the notation of the question, you can obtain the plot of angles by distances by doing something like
vecs = np.stack((x, y)).T
f = np.linalg.norm(vecs[np.newaxis, :] - vecs[:, np.newaxis], axis=2)
d = angle[np.newaxis, :] - angle[: , np.newaxis]
plt.scatter(f, d)

You have to use a for loop and range() to iterate over n, e.g. like like this:
n = len(x)
for i in range(n):
# do something with the current index
# e.g. print the points
print x[i]
print y[i]
But note that if you use i+1 inside the last iteration, this will already be outside of your list.
Also in your calculation there are errors. (x[n])-x[n+1:] does not work because x[n] is a single value in your list while x[n+1:] is a list starting from n+1'th element. You can not subtract a list from an int or whatever it is.
Maybe you will have to even use two nested loops to do what you want. I guess that you want to calculate the distance between each point so a two dimensional array may be the data structure you want.

If you are interested in all combinations of the points in x and y I suggest to use itertools, which will give you all possible combinations. Then you can do it like follows:
import itertools
f = [((x[i]-x[j])**2 + (y[i]-y[j])**2)**0.5 for i,j in itertools.product(255,255) if i!=j]
# and similar for the angles
But maybe there is even an easier way...

How to derive with respect to a Matrix element with Sympy

Given the product of a matrix and a vector
A.v
with A of shape (m,n) and v of dim n, where m and n are symbols, I need to calculate the Derivative with respect to the matrix elements.
I haven't found the way to use a proper vector, so I started with 2 MatrixSymbol:
n, m = symbols('n m')
j = tensor.Idx('j')
i = tensor.Idx('i')
l = tensor.Idx('l')
h = tensor.Idx('h')
A = MatrixSymbol('A', n,m)
B = MatrixSymbol('B', m,1)
C=A*B
Now, if I try to derive with respect to one of A's elements with the indices I get back the unevaluated expression:
diff(C, A[i,j])
>>>> Derivative(A*B, A[i, j])
If I introduce the indices in C also (it won't let me use only one index in the resulting vector) I get back the product expressed as a Sum:
C[l,h]
>>>> Sum(A[l, _k]*B[_k, h], (_k, 0, m - 1))
If I derive this with respect to the matrix element I end up getting 0 instead of an expression with the KroneckerDelta, which is the result that I would like to get:
diff(C[l,h], A[i,j])
>>>> 0
I wonder if maybe I shouldn't be using MatrixSymbols to start with. How should I go about implementing the behaviour that I want to get?

SymPy does not yet know matrix calculus; in particular, one cannot differentiate MatrixSymbol objects. You can do this sort of computation with Matrix objects filled with arrays of symbols; the drawback is that the matrix sizes must be explicit for this to work.
Example:
from sympy import *
A = Matrix(symarray('A', (4, 5)))
B = Matrix(symarray('B', (5, 3)))
C = A*B
print(C.diff(A[1, 2]))
outputs:
Matrix([[0, 0, 0], [B_2_0, B_2_1, B_2_2], [0, 0, 0], [0, 0, 0]])

The git version of SymPy (and the next version) handles this better:
In [55]: print(diff(C[l,h], A[i,j]))
Sum(KroneckerDelta(_k, j)*KroneckerDelta(i, l)*B[_k, h], (_k, 0, m - 1))

Wrapped (circular) 2D interpolation in Python

I have angular data on a domain that is wrapped at pi radians (i.e. 0 = pi). The data are 2D, where one dimension represents the angle. I need to interpolate this data onto another grid in a wrapped way.
In one dimension, the np.interp function takes a period kwarg (for NumPy 1.10 and later):
http://docs.scipy.org/doc/numpy/reference/generated/numpy.interp.html
This is exactly what I need, but I need it in two dimensions. I'm currently just stepping through columns in my array and using np.interp, but this is of course slow.
Anything out there that could achieve this same outcome but faster?

An explanation of how np.interp works
Use the source, Luke!
The numpy doc for np.interp makes the source particularly easy to find, since it has the link right there, along with the documentation. Let's go through this, line by line.
First, recall the parameters:
"""
x : array_like
The x-coordinates of the interpolated values.
xp : 1-D sequence of floats
The x-coordinates of the data points, must be increasing if argument
`period` is not specified. Otherwise, `xp` is internally sorted after
normalizing the periodic boundaries with ``xp = xp % period``.
fp : 1-D sequence of floats
The y-coordinates of the data points, same length as `xp`.
period : None or float, optional
A period for the x-coordinates. This parameter allows the proper
interpolation of angular x-coordinates. Parameters `left` and `right`
are ignored if `period` is specified.
"""
Let's take a simple example of a triangular wave while going through this:
xp = np.array([-np.pi/2, -np.pi/4, 0, np.pi/4])
fp = np.array([0, -1, 0, 1])
x = np.array([-np.pi/8, -5*np.pi/8]) # Peskiest points possible }:)
period = np.pi
Now, I start off with the period != None branch in the source code, after all the type-checking happens:
# normalizing periodic boundaries
x = x % period
xp = xp % period
This just ensures that all values of x and xp supplied are between 0 and period. So, since the period is pi, but we specified x and xp to be between -pi/2 and pi/2, this will adjust for that by adding pi to all values in the range [-pi/2, 0), so that they effectively appear after pi/2. So our xp now reads [pi/2, 3*pi/4, 0, pi/4].
asort_xp = np.argsort(xp)
xp = xp[asort_xp]
fp = fp[asort_xp]
This is just ordering xp in increasing order. This is especially required after performing that modulo operation in the previous step. So, now xp is [0, pi/4, pi/2, 3*pi/4]. fp has also been shuffled accordingly, [0, 1, 0, -1].
xp = np.concatenate((xp[-1:]-period, xp, xp[0:1]+period))
fp = np.concatenate((fp[-1:], fp, fp[0:1]))
return compiled_interp(x, xp, fp, left, right) # Paraphrasing a little
np.interp does linear interpolation. When trying to interpolate between two points a and b present in xp, it only uses the values of f(a) and f(b) (i.e., the values of fp at the corresponding indices). So what np.interp is doing in this last step is to take the point xp[-1] and put it in front of the array, and take the point xp[0] and put it after the array, but after subtracting and adding one period respectively. So you now have a new xp that looks like [-pi/4, 0, pi/4, pi/2, 3*pi/4, pi]. Likewise, fp[0] and fp[-1] have been concatenated around, so fp is now [-1, 0, 1, 0, -1, 0].
Note that after the modulo operations, x had been brought into the [0, pi] range too, so x is now [7*pi/8, 3*pi/8]. Which lets you easily see that you'll get back [-0.5, 0.5].
Now, coming to your 2D case:
Say you have a grid and some values. Let's just take all values to be between [0, pi] off the bat so we don't need to worry about modulos and shufflings.
xp = np.array([0, np.pi/4, np.pi/2, 3*np.pi/4])
yp = np.array([0, 1, 2, 3])
period = np.pi
# Put x on the 1st dim and y on the 2nd dim; f is linear in y
fp = np.array([0, 1, 0, -1])[:, np.newaxis] + yp[np.newaxis, :]
# >>> fp
# array([[ 0, 1, 2, 3],
# [ 1, 2, 3, 4],
# [ 0, 1, 2, 3],
# [-1, 0, 1, 2]])
We now know that all you need to do is to add xp[[-1]] in front of the array and xp[[0]] at the end, adjusting for the period. Note how I've indexed using the singleton lists [-1] and [0]. This is a trick to ensure that dimensions are preserved.
xp = np.concatenate((xp[[-1]]-period, xp, xp[[0]]+period))
fp = np.concatenate((fp[[-1], :], fp, fp[[0], :]))
Finally, you are free to use scipy.interpolate.interpn to achieve your result. Let's get the value at x = pi/8 for all y:
from scipy.interpolate import interpn
interp_points = np.hstack(( (np.pi/8 * np.ones(4))[:, np.newaxis], yp[:, np.newaxis] ))
result = interpn((xp, yp), fp, interp_points)
# >>> result
# array([ 0.5, 1.5, 2.5, 3.5])
interp_points has to be specified as an Nx2 matrix of points, where the first dimension is for each point you want interpolation at the second dimension gives the x- and y-coordinate of that point. See this answer for a detailed explanation.
If you want to get the value outside of the range [0, period], you'll need to modulo it yourself:
x = 21 * np.pi / 8
x_equiv = x % period # Now within [0, period]
interp_points = np.hstack(( (x_equiv * np.ones(4))[:, np.newaxis], yp[:, np.newaxis] ))
result = interpn((xp, yp), fp, interp_points)
# >>> result
# array([-0.5, 0.5, 1.5, 2.5])
Again, if you want to generate interp_points for a bunch of x- and y- values, look at this answer.

How to index a Cartesian product

Suppose that the variables x and theta can take the possible values [0, 1, 2] and [0, 1, 2, 3], respectively.
Let's say that in one realization, x = 1 and theta = 3. The natural way to represent this is by a tuple (1,3). However, I'd like to instead label the state (1,3) by a single index. A 'brute-force' method of doing this is to form the Cartesian product of all the possible ordered pairs (x,theta) and look it up:
import numpy as np
import itertools
N_x = 3
N_theta = 4
np.random.seed(seed = 1)
x = np.random.choice(range(N_x))
theta = np.random.choice(range(N_theta))
def get_box(x, N_x, theta, N_theta):
states = list(itertools.product(range(N_x),range(N_theta)))
inds = [i for i in range(len(states)) if states[i]==(x,theta)]
return inds[0]
print (x, theta)
box = get_box(x, N_x, theta, N_theta)
print box
This gives (x, theta) = (1,3) and box = 7, which makes sense if we look it up in the states list:
[(0, 0), (0, 1), (0, 2), (0, 3), (1, 0), (1, 1), (1, 2), (1, 3), (2, 0), (2, 1), (2, 2), (2, 3)]
However, this 'brute-force' approach seems inefficient, as it should be possible to determine the index beforehand without looking it up. Is there any general way to do this? (The number of states N_x and N_theta may vary in the actual application, and there might be more variables in the Cartesian product).

If you always store your states lexicographically and the possible values for x and theta are always the complete range from 0 to some maximum as your examples suggests, you can use the formula
index = x * N_theta + theta
where (x, theta) is one of your tuples.
This generalizes in the following way to higher dimensional tuples: If N is a list or tuple representing the ranges of the variables (so N[0] is the number of possible values for the first variable, etc.) and p is a tuple, you get the index into a lexicographically sorted list of all possible tuples using the following snippet:
index = 0
skip = 1
for dimension in reversed(range(len(N))):
index += skip * p[dimension]
skip *= N[dimension]
This might not be the most Pythonic way to do it but it shows what is going on: You think of your tuples as a hypercube where you can only go along one dimension, but if you reach the edge, your coordinate in the "next" dimension increases and your traveling coordinate resets. The reader is advised to draw some pictures. ;)

I think it depends on the data you have. If they are sparse, the best solution is a dictionary. And works for any tuple's dimension.
import itertools
import random
n = 100
m = 100
l1 = [i for i in range(n)]
l2 = [i for i in range(m)]
a = {}
prod = [element for element in itertools.product(l1, l2)]
for i in prod:
a[i] = random.randint(1, 100)
A very good source about the performance is in this discution.

For the sake of completeness I'll include my implementation of Julian Kniephoff's solution, get_box3, with a slightly adapted version of the original implementation, get_box2:
# 'Brute-force' method
def get_box2(p, N):
states = list(itertools.product(*[range(n) for n in N]))
return states.index(p)
# 'Analytic' method
def get_box3(p, N):
index = 0
skip = 1
for dimension in reversed(range(len(N))):
index += skip * p[dimension]
skip *= N[dimension]
return index
p = (1,3,2) # Tuple characterizing the total state of the system
N = [3,4,3] # List of the number of possible values for each state variable
print "Brute-force method yields %s" % get_box2(p, N)
print "Analytical method yields %s" % get_box3(p, N)
Both the 'brute-force' and 'analytic' method yield the same result:
Brute-force method yields 23
Analytical method yields 23
but I expect the 'analytic' method to be faster. I've changed the representation to p and N as suggested by Julian.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Nearest neighbor 1 dimensional data with a specified range - python

Related

Getting the right sign when calculating repeated sign switches in numpy array

How to plot pairwise distances of two-dimensional vectors?

How to derive with respect to a Matrix element with Sympy

Wrapped (circular) 2D interpolation in Python

How to index a Cartesian product

Categories

Resources