Related
I have a 2D numpy array "signals" of shape (100000, 1024). Each row contains the traces of amplitude of a signal, which I want to normalise to be within 0-1.
The signals each have different amplitudes, so I can't just divide by one common factor, so I was wondering if there's a way to normalise each of the signals so that each value within them is between 0-1?
Let's say that the signals look something like [[0,1,2,3,5,8,2,1],[0,2,5,10,7,4,2,1]] and I want them to become [[0.125,0.25,0.375,0.625,1,0.25,0.125],[0,0.2,0.5,0.7,0.4,0.2,0.1]].
Is there a way to do it without looping over all 100,000 signals, as this will surely be slow?
Thanks!
Easy thing to do would be to generate a new numpy array with max values by axis and divide by it:
import numpy as np
a = np.array([[0,1,2,3,5,8,2,1],[0,2,5,10,7,4,2,1]])
b = np.max(a, axis = 1)
print(a / b[:,np.newaxis])
output:
[[0. 0.125 0.25 0.375 0.625 1. 0.25 0.125]
[0. 0.2 0.5 1. 0.7 0.4 0.2 0.1 ]]
Adding a little benchmark to show just how significant is the performance difference between the two solutions:
import numpy as np
import timeit
arr = np.arange(1024).reshape(128,8)
def using_list_comp():
return np.array([s/np.max(s) for s in arr])
def using_vectorized_max_div():
return arr/arr.max(axis=1)[:, np.newaxis]
result1 = using_list_comp()
result2 = using_vectorized_max_div()
print("Results equal:", (result1==result2).all())
time1 = timeit.timeit('using_list_comp()', globals=globals(), number=1000)
time2 = timeit.timeit('using_vectorized_max_div()', globals=globals(), number=1000)
print(time1)
print(time2)
print(time1/time2)
On my machine the output is:
Results equal: True
0.9873569
0.010177099999999939
97.01750989967731
Almost a 100x difference!
Another solution is to use normalize:
from sklearn.preprocessing import normalize
data = [[0,1,2,3,5,8,2,1],[0,2,5,10,7,4,2,1]]
normalize(data, axis=1, norm='max')
result:
array([[0. , 0.125, 0.25 , 0.375, 0.625, 1. , 0.25 , 0.125],
[0. , 0.2 , 0.5 , 1. , 0.7 , 0.4 , 0.2 , 0.1 ]])
Please note norm='max' argument. Default value is 'l2'.
I have two arrays say:
A = np.array([[ 1. , 1. , 0.5 ],
[ 2. , 2. , 0.7 ],
[ 3. , 4. , 1.2 ],
[ 4. , 3. , 2.33],
[ 1. , 2. , 0.5 ],
[ 6. , 5. , 0.3 ],
[ 4. , 5. , 1.2 ],
[ 5. , 5. , 1.5 ]])
B = np.array([2,1])
I would want to find all values of A which are not within a radius of 2 from B.
My answer should be:
C = [[3,4,1.2],[4,3,2.33],[6,5,0.3],[4,5,1.2],[5,5,1.5]]
Is there a pythonic way to do this?
What I have tried is:
radius = 2
C.append(np.extract((cdist(A[:, :2], B[np.newaxis]) > radius), A))
But I realized that np.extract flattens A and i dont get what i is expected.
Let R be the radius here. We would have few methods to solve it, as discussed next.
Approach #1 : Using cdist -
from scipy.spatial.distance import cdist
A[(cdist(A[:,:2],B[None]) > R).ravel()]
Approach #2 : Using np.einsum -
d = A[:,:2] - B
out = A[np.einsum('ij,ij->i', d,d) > R**2]
Approach #3 : Using np.linalg.norm -
A[np.linalg.norm(A[:,:2] - B, axis=1) > R]
Approach #4 : Using matrix-multiplication with np.dot -
A[(A[:,:2]**2).sum(1) + (B**2).sum() - 2*A[:,:2].dot(B) > R**2]
Approach #5 : Using a combination of einsum and matrix-multiplication -
A[np.einsum('ij,ij->i',A[:,:2],A[:,:2]) + B.dot(B) - 2*A[:,:2].dot(B) > R**2]
Approach #6 : Using broadcasting -
A[((A[:,:2] - B)**2).sum(1) > R**2]
Hence, to get the points within radius R simply replace > with < in the above mentioned solutions.
Another useful approach not mentioned by #Divakar is to use a cKDTree:
from scipy.spatial import cKDTree
# Find indices of points within radius
radius = 2
indices = cKDTree(A[:, :2]).query_ball_point(B, radius)
# Construct a mask over these points
mask = np.zeros(len(A), dtype=bool)
mask[indices] = True
# Extract values not among the nearest neighbors
A[~mask]
The primary benefit is that it will be much faster than any direct approach as the size of the array increases, because the data structure avoids computing a distance for every point in A.
I'm trying to evaluate the probabilities of end locations of random walks but I'm having some trouble with the speed of my program. Basically what I'm trying to do is take as an input a dictionary that contains the probabilities for a random walk( e.g. p = {0:0.5, 1:0.2. -1:0.3} meaning there's a 50% probability X stays at 0, a 20% probability X increases by 1, and a 30% probability X decreases by 1) and then calculate the probabilities for all the possible future states after n iterations.
So for example if p = {0:0.5, 1:0.2. -1:0.3} and n = 2 then it will return {0:0.37, 1:0.2, -1:0.3, 2:0.04, -2:0.09}
if p = {0:0.5, 1:0.2. -1:0.3} and n = 1 then it will return {0:0.5, 1:0.2. -1:0.3}
I have working code, and it runs relatively quickly if n is low and if the p dictionary is small, but when n > 500 and the dictionary has around 50 values it takes upwards of 5 minutes to calculate. I'm guessing this is because it does it only on one processor so I went ahead and modified it so it would use python's multiprocessing module (as I read that multithreading doesn't improve parallel computing performance because of GIL).
My problem is, that there is not much improvement with multiprocessing, now I'm not sure if it's because I'm implementing it wrong or because of the overhead of multiprocessing in python. I'm just wondering if there's a library somewhere that evaluates all the probabilities of all the possibilities of a random walk when n > 500 in parallel? My next step if I can't find anything is to write my own function as an extension in C but it will be my first time doing it and although I've coded in C before it has been a while.
Original Non MultiProcessed Code
def random_walk_predictor(probabilities_tree, period):
ret = probabilities_tree
probabilities_leaves = ret.copy()
for x in range(period):
tmp = {}
for leaf in ret.keys():
for tree_leaf in probabilities_leaves.keys():
try:
tmp[leaf + tree_leaf] = (ret[leaf] * probabilities_leaves[tree_leaf]) + tmp[leaf + tree_leaf]
except:
tmp[leaf + tree_leaf] = ret[leaf] * probabilities_leaves[tree_leaf]
ret = tmp
return ret
MultiProcessed code
from multiprocessing import Manager,Pool
from functools import partial
def probability_calculator(origin, probability, outp, reference):
for leaf in probability.keys():
try:
outp[origin + leaf] = outp[origin + leaf] + (reference[origin] * probability[leaf])
except KeyError:
outp[origin + leaf] = reference[origin] * probability[leaf]
def random_walk_predictor(probabilities_leaves, period):
probabilities_leaves = tree_developer(probabilities_leaves)
manager = Manager()
prob_leaves = manager.dict(probabilities_leaves)
ret = manager.dict({0:1})
p = Pool()
for x in range(period):
out = manager.dict()
partial_probability_calculator = partial(probability_calculator, probability = prob_leaves, outp = out, reference = ret.copy())
p.map(partial_probability_calculator, ret.keys())
ret = out
return ret.copy()
There tend to be analytic solutions to exactly solve this kind of problem that look similar to binomial distributions, but I'll assume you're really asking for a computational solution for a more general class of problem.
Rather than using python dictionaries, it's easier to think about this in terms of the underlying mathematical problem. Build a matrix A that describes the probability of going from one state to another. Build a state x that describes the probability of being at a given location at some time.
Because after n transitions you can step at most n steps from the origin (in either direction) - your state needs to have 2n+1 rows, and A needs to be square and of size 2n+1 by 2n+1.
For a two timestep problem your transition matrix will be 5x5 and look like:
[[ 0.5 0.2 0. 0. 0. ]
[ 0.3 0.5 0.2 0. 0. ]
[ 0. 0.3 0.5 0.2 0. ]
[ 0. 0. 0.3 0.5 0.2]
[ 0. 0. 0. 0.3 0.5]]
And your state at time 0 will be:
[[ 0.]
[ 0.]
[ 1.]
[ 0.]
[ 0.]]
The one step evolution of the system can be predicted by multiplying A and x.
So at t = 1,
x.T = [[ 0. 0.2 0.5 0.3 0. ]]
and at t = 2,
x.T = [[ 0.04 0.2 0.37 0.3 0.09]]
Because for even modest numbers of timesteps this is potentially going to take a fair bit of storage (A requires n^2 storage), but is very sparse, we can use sparse matrices to reduce our storage (and speed up our calculations). Doing this means A requires approximate 3n elements.
import scipy.sparse as sp
import numpy as np
def random_walk_transition_probability(n, left = 0.3, centre = 0.5, right = 0.2):
m = 2*n+1
A = sp.csr_matrix((m, m))
A += sp.diags(centre*np.ones(m), 0)
A += sp.diags(left*np.ones(m-1), -1)
A += sp.diags(right*np.ones(m-1), 1)
x = np.zeros((m,1))
x[n] = 1.0
for i in xrange(n):
x = A.dot(x)
return x
print random_walk_transition_probability(4)
Timings
%timeit random_walk_transition_probability(500)
100 loops, best of 3: 7.12 ms per loop
%timeit random_walk_transition_probability(10000)
1 loops, best of 3: 1.06 s per loop
I want to compute the epipolar lines of a stereo camera.
I know both camera intrinsics matrix as well as R and T.
I tried to compute the essential matrix as told in Learning Opencv book and wikipedia.
where [t]x is the matrix representation of the cross product with t.
so
I tried to implement this with python and then use the opencv function cv2.computeCorrespondEpilines to compute the epilines.
The problem is that the lines I get don't converge in a point as they should...
I guess I must have a problem computing F.
This is the relevant pice of code:
T #Contains translation vector
R #Rotation matrix
S=np.mat([[0,-T[2],T[1]],[T[2],0,-T[1]],[-T[1],T[0],0]])
E=np.mat(R)*S
M1=np.mat(self.getCameraMatrix(cam1))
M1_inv=np.linalg.inv(M1)
M2=np.mat(self.getCameraMatrix(cam2))
M2_inv=np.linalg.inv(M2)
F=(M2_inv.T)*E*M1_inv
The matrices are:
M1=[[ 776.21275864 0. 773.70733324]
[ 0. 776.21275864 627.82872456]
[ 0. 0. 1. ]]
M2=[[ 764.35675708 0. 831.26052677]
[ 0. 764.35675708 611.85363745]
[ 0. 0. 1. ]]
R=[[ 0.9999902 0.00322032 0.00303674]
[-0.00387935 0.30727176 0.9516139 ]
[ 0.0021314 -0.95161636 0.30728124]]
T=[ 0.0001648 0.04149158 -0.02854541]
The ouput F I get it's something like:
F=[[ 4.75910592e-07 6.28777619e-08 -2.78886982e-04]
[ -4.66942275e-08 -7.62837993e-08 -7.34825205e-04]
[ -8.86965149e-04 -6.86717269e-04 1.40633035e+00]]
EDITED:
The cross multiplication matrix was wrong, it has to be:
S=np.mat([[0,-T2,T1],[T2,0,-T[0]],[-T1,T[0],0]])
The epilines converge now at the epipole.
Hum, your F matrix seems wrong - to begin with, the rank is closer to 3 than 2.
From your data I get:
octave:9> tx = [ 0 -T(3) T(2)
> T(3) 0 -T(1)
> -T(2) T(1) 0]
tx =
0.000000 0.028545 0.041492
-0.028545 0.000000 -0.000165
-0.041492 0.000165 0.000000
octave:11> E= R* tx
E =
-2.1792e-04 2.8546e-02 4.1491e-02
-4.8255e-02 4.6088e-05 -2.1160e-04
1.4415e-02 1.1148e-04 2.4526e-04
octave:12> F=inv(M1')*E*inv(M2)
F =
-3.6731e-10 4.8113e-08 2.4320e-05
-8.1333e-08 7.7681e-11 6.7289e-05
7.0206e-05 -3.7128e-05 -7.6583e-02
octave:14> rank(F)
ans = 2
Which seems to make more sense. Can you try that F matrix in your plotting code?
EDIT: Paul has solved this one below. Thanks!
I'm trying to resample (upscale) a 3x3 matrix to 5x5, filling in the intermediate points with either interpolate.interp2d or interpolate.RectBivariateSpline (or whatever works).
If there's a simple, existing function to do this, I'd like to use it, but I haven't found it yet. For example, a function that would work like:
# upscale 2x2 to 4x4
matrixSmall = ([[-1,8],[3,5]])
matrixBig = matrixSmall.resample(4,4,cubic)
So, if I start with a 3x3 matrix / array:
0,-2,0
-2,11,-2
0,-2,0
I want to compute a new 5x5 matrix ("I" meaning interpolated value):
0, I[1,0], -2, I[3,0], 0
I[0,1], I[1,1], I[2,1], I[3,1], I[4,1]
-2, I[1,2], 11, I[3,2], -2
I[0,3], I[1,3], I[2,3], I[3,3], I[4,3]
0, I[1,4], -2, I[3,4], 0
I've been searching and reading up and trying various different test code, but I haven't quite figured out the correct syntax for what I'm trying to do. I'm also not sure if I need to be using meshgrid, mgrid or linspace in certain lines.
EDIT: Fixed and working Thanks to Paul
import numpy, scipy
from scipy import interpolate
kernelIn = numpy.array([[0,-2,0],
[-2,11,-2],
[0,-2,0]])
inKSize = len(kernelIn)
outKSize = 5
kernelOut = numpy.zeros((outKSize,outKSize),numpy.uint8)
x = numpy.array([0,1,2])
y = numpy.array([0,1,2])
z = kernelIn
xx = numpy.linspace(x.min(),x.max(),outKSize)
yy = numpy.linspace(y.min(),y.max(),outKSize)
newKernel = interpolate.RectBivariateSpline(x,y,z, kx=2,ky=2)
kernelOut = newKernel(xx,yy)
print kernelOut
Only two small problems:
1) Your xx,yy is outside the bounds of x,y (you can extrapolate, but I'm guessing you don't want to.)
2) Your sample size is too small for a kx and ky of 3 (default). Lower it to 2 and get a quadratic fit instead of cubic.
import numpy, scipy
from scipy import interpolate
kernelIn = numpy.array([
[0,-2,0],
[-2,11,-2],
[0,-2,0]])
inKSize = len(kernelIn)
outKSize = 5
kernelOut = numpy.zeros((outKSize),numpy.uint8)
x = numpy.array([0,1,2])
y = numpy.array([0,1,2])
z = kernelIn
xx = numpy.linspace(x.min(),x.max(),outKSize)
yy = numpy.linspace(y.min(),y.max(),outKSize)
newKernel = interpolate.RectBivariateSpline(x,y,z, kx=2,ky=2)
kernelOut = newKernel(xx,yy)
print kernelOut
##[[ 0. -1.5 -2. -1.5 0. ]
## [ -1.5 5.4375 7.75 5.4375 -1.5 ]
## [ -2. 7.75 11. 7.75 -2. ]
## [ -1.5 5.4375 7.75 5.4375 -1.5 ]
## [ 0. -1.5 -2. -1.5 0. ]]
If you are using scipy already, I think scipy.ndimage.interpolate.zoom can do what you need:
import numpy
import scipy.ndimage
a = numpy.array([[0.,-2.,0.], [-2.,11.,-2.], [0.,-2.,0.]])
out = numpy.round(scipy.ndimage.interpolation.zoom(input=a, zoom=(5./3), order = 2),1)
print out
#[[ 0. -1. -2. -1. 0. ]
# [ -1. 1.8 4.5 1.8 -1. ]
# [ -2. 4.5 11. 4.5 -2. ]
# [ -1. 1.8 4.5 1.8 -1. ]
# [ 0. -1. -2. -1. 0. ]]
Here the "zoom factor" is 5./3 because we are going from a 3x3 array to a 5x5 array. If you read the docs, it says that you can also specify the zoom factor independently for the two axes, which means you can upscale non-square matrices as well. By default, it uses third order spline interpolation, which I am not sure is best.
I tried it on some images and it works nicely.