Related
I am trying to figure out how to do this with numpy, so I can then convert it to c++ from scratch. I have figured out how to do it when the mode is constant. The way that is done is shown below.
import numpy as np
from scipy import signal
a = np.array([[1, 2, 0, 0], [5, 3, 0, 4], [0, 0, 0, 7], [9, 3, 0, 0]])
k = np.array([[1,0,0],[0,1,0],[0,0,0]])
a = np.pad(a, 1)
k = np.flip(k)
output = signal.convolve(a, k, 'valid')
Which then comes out to the same output as scipy.ndimage.filters.convolve(a, k, mode='constant) So I thought that when the mode was reflect it would work the same way. Except, that the line a = np.pad(a, 1) would be changed to a = np.pad(a, 1, mode='reflect'). However, that does not seem to be the case. Could someone explain how it would work from scratch using numpy and scipy.signal.convolve? Thank you.
Update: I made the solution into a library called close-numerical-matches.
I am looking for a way to find all close matches (within some tolerance) between two 2D arrays and get an array of the indices of the found matches. Multiple answers on SO show how to solve this problem for exact matches (typically with a dictionary), but that is not what I am looking for. Let me give an example:
>>> arr1 = [
[19.21, 19.19],
[13.18, 11.55],
[21.45, 5.83]
]
>>> arr2 = [
[13.11, 11.54],
[19.20, 19.19],
[51.21, 21.55],
[19.22, 19.18],
[11.21, 11.55]
]
>>> find_close_match_indices(arr1, arr2, tol=0.1)
[[0, 1], [0, 3], [1, 0]]
Above, [[0, 1], [0, 3], [1, 0]] is returned because element 0 in arr1, [19.21, 19.19] is within tolerance to elements 1 and 3 in arr2. Order is not important to me, i.e. [[0, 3], [1, 0], [0, 1]] would be just as acceptable.
The shape of arr1 is (n, 2) and arr2 is (m, 2). You can expect that n and m will be huge. Now, I can easily implement this using a nested for loop but I am sure there must be some smarter way than comparing every element against all other elements.
I thought about using k-means clustering to divide the problem into k buckets and thus make the nested for-loop approach more tractable, but I think there may be a small risk two close elements are just at the "border" of each of their clusters and therefore wouldn't get compared.
Any external dependencies such as Numpy, Scipy, etc. are fine and it is fine as well as to use O(n + m) space.
You can't do it with NO loops, but you can do it with ONE loop by taking advantage of the boolean indexing:
import numpy as np
xarr1 = np.array([
[19.21, 19.19],
[13.18, 11.55],
[21.45, 5.83]
])
xarr2 = np.array([
[13.11, 11.54],
[19.20, 19.19],
[51.21, 21.55],
[19.22, 19.18],
[11.21, 11.55]
])
def find_close_match_indices(arr1, arr2, tol=0.1):
results = []
for i,r1 in enumerate(arr1[:,0]):
x1 = np.abs(arr2[:,0]-r1) < tol
results.extend( [i,k] for k in np.where(x1)[0] )
return results
print(find_close_match_indices(xarr1,xarr2,0.1))
Output:
[[0, 1], [0, 3], [1, 0]]
Perhaps you might find the following useful. Might be faster than #Tim-Roberts 's solution because there are no explicit for loops. But it will use more storage.
import numpy as np
xarr1 = np.array([
[19.21, 19.19],
[13.18, 11.55],
[21.45, 5.83]
])
xarr2 = np.array([
[13.11, 11.54],
[19.20, 19.19],
[51.21, 21.55],
[19.22, 19.18],
[11.21, 11.55]
])
tol=0.1
xarr1=xarr1[:,None,:]
xarr2=xarr2[None,:,:]
# broadcasting
cc = xarr2-xarr1
cc = np.apply_along_axis(np.linalg.norm,-1,cc)
# or you can use other metrics of closeness e.g. as below
#cc = np.apply_along_axis(np.abs,-1,cc)
#cc = np.apply_along_axis(np.max,-1,cc)
id1,id2=np.where(cc<tol)
I got an idea for how to use buckets to solve this problem. The idea is that a key is formed based on the values of the elements and the tolerance level. To make sure potential matches that were in the "edge" of the bucket are compared against other element at "edges", all neighbour buckets are compared. Finally, I modified #Tim Roberts' approach for performing the actual matching slightly to match on both columns.
I made this into a library called close-numerical-matches. Sample usage:
>>> import numpy as np
>>> from close_numerical_matches import find_matches
>>> arr0 = np.array([[25, 24], [50, 50], [25, 26]])
>>> arr1 = np.array([[25, 23], [25, 25], [50.6, 50.6], [60, 60]])
>>> find_matches(arr0, arr1, tol=1.0001)
array([[0, 0], [0, 1], [1, 2], [2, 1]])
>>> find_matches(arr0, arr1, tol=0.9999)
array([[1, 2]])
>>> find_matches(arr0, arr1, tol=0.60001)
array([], dtype=int64)
>>> find_matches(arr0, arr1, tol=0.60001, dist='max')
array([[1, 2]])
>>> manhatten_dist = lambda arr: np.sum(np.abs(arr), axis=1)
>>> matches = find_matches(arr0, arr1, tol=0.11, dist=manhatten_dist)
>>> matches
array([[0, 1], [0, 1], [2, 1]])
>>> indices0, indices1 = matches.T
>>> arr0[indices0]
array([[25, 24], [25, 24], [25, 26]])
Some profiling:
from timeit import default_timer as timer
import numpy as np
from close_numerical_matches import naive_find_matches, find_matches
arr0 = np.random.rand(320_000, 2)
arr1 = np.random.rand(44_000, 2)
start = timer()
naive_find_matches(arr0, arr1, tol=0.001)
end = timer()
print(end - start) # 255.335 s
start = timer()
find_matches(arr0, arr1, tol=0.001)
end = timer()
print(end - start) # 5.821 s
I am using the following example from :
from scipy import spatial
x, y = np.mgrid[0:5, 2:8]
tree = spatial.KDTree(list(zip(x.ravel(), y.ravel())))
pts = np.array([[0, 0], [2.1, 2.9]])
idx = tree.query(pts)[1]
data = tree.data[??????????]
If I input two arbitrary points (see variable pts), I am looking to return all pairs of coordinates that lie within the rectangle defined by the two points (KDTree finds the closest neighbour). So in this case:
array([[0, 0],
[0, 1],
[0, 2],
[1, 0],
[1, 1],
[1, 2],
[2, 0],
[2, 1],
[2, 2]])
How can I achieve that from the tree data?
Seems that I found a solution:
from scipy import spatial
import numpy as np
x, y = np.mgrid[0:5, 0:5]
tree = spatial.KDTree(list(zip(x.ravel(), y.ravel())))
pts = np.array([[0, 0], [2.1, 2.2]])
idx = tree.query(pts)[1]
data = tree.data[[idx[0], idx[1]]]
rectangle = tree.data[np.where((tree.data[:,0]>=min(data[:,0])) & (tree.data[:,0]<=max(data[:,0])) & (tree.data[:,1]>=min(data[:,1])) & (tree.data[:,1]<=max(data[:,1])))]
However, I would love to see a solution using the query option!
I tried the example for LMS algorithm:
import numpy as np
from neupy import algorithms
input_data = np.array([[1, 0], [2, 2], [3, 3], [0, 0]])
target_data = np.array([[1], [0], [0], [1]])
lmsnet = algorithms.LMS((2, 1), step=0.5)
lmsnet.train(input_data, target_data, epochs=200)
lmsnet.predict(np.array([[4, 4], [0, 0]]))
But I get "OverflowError: cannot convert float infinity to integer" error in this line (file:summary_info.py):
scale = math.ceil(self.delay_limit / average_delay)
I can't relate the input parameters from the example to the error, I know that a division by zero get there but I can't figure out how to fix this. I don't want to modify library files to fix the problem.
Your example works perfectly fine for me
You can overcome this issue, if you train your network in a loop, like this
import numpy as np
from neupy import algorithms
input_data = np.array([[1, 0], [2, 2], [3, 3], [0, 0]])
target_data = np.array([[1], [0], [0], [1]])
# Used smaller step since 0.5 is too big
lmsnet = algorithms.LMS((2, 1), step=0.1)
for _ in range(200):
lmsnet.train(input_data, target_data, epochs=1)
lmsnet.predict(np.array([[4, 4], [0, 0]]))
I need sparse matrix to solve problem and according to description of scr.matrix() in scipy here http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html#scipy.sparse.csr_matrix it fits perfectly for my issue.
However I cannot even initialize it.
When I use empty matrix example from this doc http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.sparse.csr_matrix.html it works fine, exactly as in doc
>>> import numpy as np
>>> from scipy.sparse import csr_matrix
>>> csr_matrix((3, 4), dtype=np.int8).toarray()
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]], dtype=int8)
but when I use example of non-empty martix or try to fill it with my own data
>>> row = np.array([0, 0, 1, 2, 2, 2])
>>> col = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csr_matrix((data, (row, col)), shape=(3, 3)).toarray()
I always got this message
/Library/Python/2.7/site-packages/numpy-1.9.2-py2.7-macosx-10.10-
intel.egg/numpy/core/fromnumeric.py:2507: VisibleDeprecationWarning:
`rank` is deprecated; use the `ndim` attribute or function instead.
To find the rank of a matrix see `numpy.linalg.matrix_rank`.
VisibleDeprecationWarning)
What does it mean? I completely stuck. Excuse me for that question I'm new to scipy and need help.
It is only a warning, your matrix I expect to be created.
Scipy is caling an old numpy function. It was fixed in April 2014 in scipy.
Scipy changes at:
https://github.com/scipy/scipy/commit/fa1782e04fdab91f672ccf7a4ebfb887de50f01c