I have two sets of arrays, and what I am looking for is the index of the closest point in array2 to each value in array1, for example:
import numpy as np
from scipy.spatial import distance
array1 = np.array([[1,2,1], [4,2,6]])
array2 = np.array([[0,0,1], [4,5,0], [1,2,0], [6,5,0]])
def f(x):
return distance.cdist([x], array2 ).argmin()
def array_map(x):
return np.array(list(map(f, x)))
array_map(array1)
This code returns the correct results but is slow when both arrays are very big. I was wondering if it was possible to make this any quicker ?
Thanks to #Max7CD here is a working solution that works quite efficiantly (at least for my purpose):
from scipy import spatial
tree =spatial.KDTree(array2)
slitArray = np.split(array1, 2) #I split the data so that the KDtree doesn't take for ever and so that I can moniter progress, probably useless
listFinal = []
for elem in slitArray:
a = tree.query(elem)
listFinal.append(a[1])
print("Fnished")
b = np.array(listFinal).ravel()
Related
A is a k dimensional numpy array of floats (k could be pretty big, e.g. up to 10)
I need to implement an update to A by incrementing each of the values (as described below). I'm wondering if there is a numpy-style way that would be fast.
Let L_i be the length of axis i
An update to this array is generated in two steps follows:
For each axis of A a corresponding vector G is generated.
For example, corresponding to axis i a vector G_i of length L_i is generated (from data).
Update A at all positions by calculating an increment from the G vectors for each position in A
To do this at any particular position, let p be an array of k indices, corresponding to a position in A. Then A at p is incremented by a value calculated as the product:
Product(G_i[p[i]], for i from 0 to k-1)
A full update to A involves doing this operation for all locations in A (i.e. all possible values of p)
This operation would be very slow doing positions one by one via loops.
Is there a numpy style way to do this that would be fast?
edit
## this for three dimensions, final matrix at pos i,j,k has the
## product of c[i]*b[j]*a[k]
## but for arbitrary # of dimensions it will have a loop in a loop
## and will be slow
import numpy as np
a = np.array([1,2])
b = np.array([3,4,5])
c = np.array([6,7,8,9])
ab = []
for bval in b:
ab.append(bval*a)
ab = np.stack(ab)
abc = []
for cval in c:
abc.append(cval*ab)
abc = np.stack(abc)
as a function
def loopfunc(arraylist):
ndim = len(arraylist)
m = arraylist[0]
for i in range(1,ndim):
ml = []
for val in arraylist[i]:
ml.append(val*m)
m = np.stack(ml)
return m
This is a wacky problem, but I like it.
If I understand what you need from your example, you can accomplish this with some reshaping trickery and NumPy's usual broadcasting rules. The idea is to reshape each array so it has the right number of dimensions, then just directly multiply.
Here's a function that implements this.
from functools import reduce
import operator
import numpy as np
import scipy.linalg
def wacky_outer_product(*arrays):
assert len(arrays) >= 2
assert all(arr.ndim == 1 for arr in arrays)
ndim = len(arrays)
shapes = scipy.linalg.toeplitz((-1,) + (1,) * (ndim - 1))
reshaped = (arr.reshape(new_shape) for arr, new_shape in zip(arrays, shapes))
return reduce(operator.mul, reshaped).T
Testing this on your example arrays, we have:
>>> foo = wacky_outer_product(a, b, c)
>>> np.all(foo, abc)
True
Edit
Ok, the above function is fun, but the below is probably much better. No transposing, clearer, and much smaller:
from functools import reduce
import operator
import numpy as np
def wacky_outer_product(*arrays):
return reduce(operator.mul, np.ix_(*reversed(arrays)))
I have to boost the time for an interpolation over a large (NxMxT) matrix MTR, where:
N is about 8000;
M is about 10000;
T represents the number of times at which each NxM matrix is calculated (in my case it's 23).
I have to compute the interpolation element-wise, on all the T different times, and return the interpolated values over a different array of times (T_interp, in my case with lenght 47) so, as output, I want an NxMxT_interp matrix.
The code snippet below defines the function I built for the interpolation, using scipy.interpolate.Rbf (y is the array MTR[i,j,:], x is the times array with length T, x_interp is the new array of times with length T_interp:
#==============================================================================
# Interpolate without nans
#==============================================================================
def interp(x,y,x_interp,**kwargs):
import numpy as np
from scipy.interpolate import Rbf
mask = np.isnan(y)
y_mask = np.ma.array(y,mask = mask)
x_new = [x[i] for i in np.where(~mask)[0]]
if len(y_mask.compressed()) == 0:
return [np.nan for i,n in enumerate(x_interp)]
elif len(y_mask.compressed()) == 1:
return [y_mask.compressed() for i,n in enumerate(x_interp)]
interp = Rbf(x_new,y_mask.compressed(),**kwargs)
y_interp = interp(x_interp)
return y_interp
I tried to achieve my goal either by looping over the NxM elements of the MTR matrix:
new_MTR = np.empty((N,M,T_interp))
for i in range(N):
for j in range(M):
new_MTR[i,j,:]=interp(times,MTR[i,j,:],New_times,function = 'linear')
or by using the np.apply_along_axis funtion:
new_MTR = np.apply_along_axis(lambda x: interp(times,x,New_times,function = 'linear'),2,MTR)
In both cases I extimated the time it takes to perform the whole operation and it appears to be slightly better for the np.apply_along_axis function, but still it will take about 15 hours!!
Is there a way to reduce this time? Maybe by vectorizing the entire operation? I don't know much about vectorizing and how it can be done in a situation like mine so any help would be much appreciated. Thank you!
My code is running fine for first iteration but after that it outputs the following error:
ValueError: matrix must be 2-dimensional
To the best of my knowledge (which is not much in python), my code is correct. but I don't know, why it is not running correctly for all given iterations. Could anyone help me in this problem.
from __future__ import division
import numpy as np
import math
import matplotlib.pylab as plt
import sympy as sp
from numpy.linalg import inv
#initial guesses
x = -2
y = -2.5
i1 = 0
while i1<5:
F= np.matrix([[(x**2)+(x*y**3)-9],[(3*y*x**2)-(y**3)-4]])
theta = np.sum(F)
J = np.matrix([[(2*x)+y**3, 3*x*y**2],[6*x*y, (3*x**2)-(3*y**2)]])
Jinv = inv(J)
xn = np.array([[x],[y]])
xn_1 = xn - (Jinv*F)
x = xn_1[0]
y = xn_1[1]
#~ print theta
print xn
i1 = i1+1
I believe xn_1 is a 2D matrix. Try printing it you and you will see [[something], [something]]
Therefore to get the x and y, you need to use multidimensional indexing. Here is what I did
x = xn_1[0,0]
y = xn_1[1,0]
This works because within the 2D matrix xn_1 are two single element arrays. Therefore we need to further index 0 to get that single element.
Edit: To clarify, xn_1[1,0] means to index 1 and then take that subarray and index 0 on that. And although according to Scipy it may seem that it should be functionally equivalent to xn_1[1][0], that only applies to the general np.array type and not the np.matrix type. Here is an excellent thread on SO that explains this.
So you should use the xn_1[1,0] way to get the element you want.
xn_1 is a numpy matrix, so it's elements are accessed with the item() method, not like an array. (with []s)
So just change
x = xn_1[0]
y = xn_1[1]
to
x = xn_1.item(0)
y = xn_1.item(1)
I have two 3D arrays and want to identify 2D elements in one array, which have one or more similar counterparts in the other array.
This works in Python 3:
import numpy as np
import random
np.random.seed(123)
A = np.round(np.random.rand(25000,2,2),2)
B = np.round(np.random.rand(25000,2,2),2)
a_index = np.zeros(A.shape[0])
for a in range(A.shape[0]):
for b in range(B.shape[0]):
if np.allclose(A[a,:,:].reshape(-1, A.shape[1]), B[b,:,:].reshape(-1, B.shape[1]),
rtol=1e-04, atol=1e-06):
a_index[a] = 1
break
np.nonzero(a_index)[0]
But of course this approach is awfully slow. Please tell me, that there is a more efficient way (and what it is). THX.
You are trying to do an all-nearest-neighbor type query. This is something that has special O(n log n) algorithms, I'm not aware of a python implementation. However you can use regular nearest-neighbor which is also O(n log n) just a bit slower. For example scipy.spatial.KDTree or cKDTree.
import numpy as np
import random
np.random.seed(123)
A = np.round(np.random.rand(25000,2,2),2)
B = np.round(np.random.rand(25000,2,2),2)
import scipy.spatial
tree = scipy.spatial.cKDTree(A.reshape(25000, 4))
results = tree.query_ball_point(B.reshape(25000, 4), r=1e-04, p=1)
print [r for r in results if r != []]
# [[14252], [1972], [7108], [13369], [23171]]
query_ball_point() is not an exact equivalent to allclose() but it is close enough, especially if you don't care about the rtol parameter to allclose(). You also get a choice of metric (p=1 for city block, or p=2 for Euclidean).
P.S. Consider using query_ball_tree() for very large data sets. Both A and B have to be indexed in that case.
P.S. I'm not sure what effect the 2d-ness of the elements should have; the sample code I gave treats them as 1d and that is identical at least when using city block metric.
From the docs of np.allclose, we have :
If the following equation is element-wise True, then allclose returns
True.
absolute(a - b) <= (atol + rtol * absolute(b))
Using that criteria, we can have a vectorized implementation using broadcasting, customized for the stated problem, like so -
# Setup parameters
rtol,atol = 1e-04, 1e-06
# Use np.allclose criteria to detect true/false across all pairwise elements
mask = np.abs(A[:,None,] - B) <= (atol + rtol * np.abs(B))
# Use the problem context to get final output
out = np.nonzero(mask.all(axis=(2,3)).any(1))[0]
I'd like to sample from indices of a 2D Numpy array, considering that each index is weighted by the number inside of that array. The way I know it is with numpy.random.choice however that does not return the index but the number itself. Is there any efficient way of doing so?
Here is my code:
import numpy as np
A=np.arange(1,10).reshape(3,3)
A_flat=A.flatten()
d=np.random.choice(A_flat,size=10,p=A_flat/float(np.sum(A_flat)))
print d
You could do something like:
import numpy as np
def wc(weights):
cs = np.cumsum(weights)
idx = cs.searchsorted(np.random.random() * cs[-1], 'right')
return np.unravel_index(idx, weights.shape)
Notice that the cumsum is the slowest part of this, so if you need to do this repeatidly for the same array I'd suggest computing the cumsum ahead of time and reusing it.
To expand on my comment: Adapting the weighted choice method presented here https://stackoverflow.com/a/10803136/553404
def weighted_choice_indices(weights):
cs = np.cumsum(weights.flatten())/np.sum(weights)
idx = np.sum(cs < np.random.rand())
return np.unravel_index(idx, weights.shape)