Dynamically normalise 2D numpy array - python

I have a 2D numpy array "signals" of shape (100000, 1024). Each row contains the traces of amplitude of a signal, which I want to normalise to be within 0-1.
The signals each have different amplitudes, so I can't just divide by one common factor, so I was wondering if there's a way to normalise each of the signals so that each value within them is between 0-1?
Let's say that the signals look something like [[0,1,2,3,5,8,2,1],[0,2,5,10,7,4,2,1]] and I want them to become [[0.125,0.25,0.375,0.625,1,0.25,0.125],[0,0.2,0.5,0.7,0.4,0.2,0.1]].
Is there a way to do it without looping over all 100,000 signals, as this will surely be slow?
Thanks!

Easy thing to do would be to generate a new numpy array with max values by axis and divide by it:
import numpy as np
a = np.array([[0,1,2,3,5,8,2,1],[0,2,5,10,7,4,2,1]])
b = np.max(a, axis = 1)
print(a / b[:,np.newaxis])
output:
[[0. 0.125 0.25 0.375 0.625 1. 0.25 0.125]
[0. 0.2 0.5 1. 0.7 0.4 0.2 0.1 ]]

Adding a little benchmark to show just how significant is the performance difference between the two solutions:
import numpy as np
import timeit
arr = np.arange(1024).reshape(128,8)
def using_list_comp():
return np.array([s/np.max(s) for s in arr])
def using_vectorized_max_div():
return arr/arr.max(axis=1)[:, np.newaxis]
result1 = using_list_comp()
result2 = using_vectorized_max_div()
print("Results equal:", (result1==result2).all())
time1 = timeit.timeit('using_list_comp()', globals=globals(), number=1000)
time2 = timeit.timeit('using_vectorized_max_div()', globals=globals(), number=1000)
print(time1)
print(time2)
print(time1/time2)
On my machine the output is:
Results equal: True
0.9873569
0.010177099999999939
97.01750989967731
Almost a 100x difference!

Another solution is to use normalize:
from sklearn.preprocessing import normalize
data = [[0,1,2,3,5,8,2,1],[0,2,5,10,7,4,2,1]]
normalize(data, axis=1, norm='max')
result:
array([[0. , 0.125, 0.25 , 0.375, 0.625, 1. , 0.25 , 0.125],
[0. , 0.2 , 0.5 , 1. , 0.7 , 0.4 , 0.2 , 0.1 ]])
Please note norm='max' argument. Default value is 'l2'.

Related

How to append to a list but skip a line?

I am trying to store values of length 150 into a list array but I want to skip a line each iteration. This is what I have which doesnt function. freq_data_1 has size (150,) which I try to append to freq_data which occurs but when I try to skip to the next line, it wont work. Any suggestions?
import numpy as np
import matplotlib.pyplot as plt
from scipy import pi
from scipy.fftpack import fft
freq_data = []
freq_data_2 = []
for i in range(len(video_samples)):
freq_data_1 = fft(video_samples[i,:])
freq_data.append(freq_data_1[i])
freq_data_2 = '\n'.join(freq_data)
My video_samples is an array of (4000,150) meaning I have 4000 signals of length in time of 150 steps. I want my output to be the same size as this but storing the frequency output.
Video_samples is a collection of signals with slightly varying frequency for each signal/row. e.g.
Input:
[0.775 0.3223 0.4613 0.2619 0.4012 0.567
0.908 0.4223 0.5128 0.489 0.318 0.187]
The first row is one of my signals of length 6. The second row is another signal of length 6. Each of these signals represent a frequency with added noise.
I wish to take each row separately, use the FFT on it to obtain the frequency of that signal and then store it in a matrix where each row would represent the FFT of that signal.
Another guess...
import numpy as np # it's not necessary for this snippet actually
def fft(lst): return [x*2 for x in lst] # just for example
# 2d array, just a guess
video_samples = [
[0.775, 0.3223, 0.4613, 0.2619, 0.4012, 0.567],
[0.908, 0.4223, 0.5128, 0.489, 0.318, 0.187],
[0.1, 0.2, 0.3, 0.4, 0.5, 0.6],
[0.7, 0.8, 0.9, 0.1, 0.2, 0.3]
]
video_samples = np.array(video_samples, dtype = 'float') # 2d list to ndarray just for example
print('video samples (input?): \n', video_samples)
matrix1 = []
matrix2 = []
for s1, s2 in zip(video_samples[::2], video_samples[1::2]):
matrix1.append(fft(s1))
matrix2.append(fft(s2))
matrix1 = np.array(matrix1, dtype = 'float') # just for example
matrix2 = np.array(matrix2, dtype = 'float') # just for example
print('\nmatrix1:\n', matrix1)
print('\nmatrix2:\n', matrix2)
Output:
video samples (input?):
[[0.775 0.3223 0.4613 0.2619 0.4012 0.567 ]
[0.908 0.4223 0.5128 0.489 0.318 0.187 ]
[0.1 0.2 0.3 0.4 0.5 0.6 ]
[0.7 0.8 0.9 0.1 0.2 0.3 ]]
matrix1:
[[1.55 0.6446 0.9226 0.5238 0.8024 1.134 ]
[0.2 0.4 0.6 0.8 1. 1.2 ]]
matrix2:
[[1.816 0.8446 1.0256 0.978 0.636 0.374 ]
[1.4 1.6 1.8 0.2 0.4 0.6 ]]
Five guys for two (or more?) days can't get what do you mean. Amazing.

Pythonic way to remove elements from Numpy array closer than threshold

What is the best way to remove the minimal number of elements from a sorted Numpy array so that the minimal distance among the remaining is always bigger than a certain threshold?
For example, if the threshold is 1, the following sequence [0.1, 0.5, 1.1, 2.5, 3.] will become [0.1, 1.1, 2.5]. The 0.5 is removed because it is too close to 0.1 but then 1.1 is preserved because it is far enough from 0.1.
My current code:
import numpy as np
MIN_DISTANCE = 1
a = np.array([0.1, 0.5, 1.1, 2.5, 3.])
for i in range(len(a)-1):
if(a[i+1] - a[i] < MIN_DISTANCE):
a[i+1] = a[i]
a = np.unique(a)
a
array([0.1, 1.1, 2.5])
Is there a more efficient way to do so?
Note that my question is similar to Remove values from numpy array closer to each other but not exactly the same.
You could use numpy.ufunc.accumulate to iterate thru adjacent pairs of the array instead of the for loop.
The numpy.add.accumulate example or itertools.accumulate probably shows best what it's doing.
Along with numpy.frompyfunc your condition can be applied as ufunc (universal functions ).
Code: (with an extended array to cross check some additional cases, but works with your array as well)
import numpy as np
MIN_DISTANCE = 1
a = np.array([0.1, 0.5, 0.6, 0.7, 1.1, 2.5, 3., 4., 6., 6.1])
print("original: \n" + str(a))
def my_py_function(arr1, arr2):
if(arr2 - arr1 < MIN_DISTANCE):
arr2 = arr1
return arr2
my_np_function = np.frompyfunc(my_py_function, 2, 1)
my_np_function.accumulate(a, dtype=np.object, out=a).astype(float)
print("complete: \n" + str(a))
a = np.unique(a)
print("unique: \n" + str(a))
Result:
original:
[0.1 0.5 0.6 0.7 1.1 2.5 3. 4. 6. 6.1]
complete:
[0.1 0.1 0.1 0.1 1.1 2.5 2.5 4. 6. 6. ]
unique:
[0.1 1.1 2.5 4. 6. ]
Concerning execution time timeit shows a turnaround at array length of about 20.
Your code is much faster (relative) for your array length of 5
whereas for array length >>20 the accumulate option speeds up considerably (~35% in time for array length 300)

Why is my gaussian np.array not symmetric?

I am trying to write a function that returns an np.array of size nx x ny that contains a centered gaussian distribution with mean mu and sd sig. It works in principle like below but the problem is that the result is not completely symmetric. This is not a problem for larger nx x ny but for smaller ones it is obvious that something is not quite right in my implementation ...
For:
create2dGaussian (1, 1, 5, 5)
It outputs:
[[ 0. 0.2 0.3 0.1 0. ]
[ 0.2 0.9 1. 0.5 0. ]
[ 0.3 1. 1. 0.6 0. ]
[ 0.1 0.5 0.6 0.2 0. ]
[ 0. 0. 0. 0. 0. ]]
... which is not symmetric. For larger nx and ny a 3d plot looks perfectly fine/smooth but why are the detailed numerics not correct and how can I fix it?
import numpy as np
def create2dGaussian (mu, sigma, nx, ny):
x, y = np.meshgrid(np.linspace(-nx/2, +nx/2+1,nx), np.linspace(-ny/2, +ny/2+1,ny))
d = np.sqrt(x*x+y*y)
g = np.exp(-((d-mu)**2 / ( 2.0 * sigma**2 )))
np.set_printoptions(precision=1, suppress=True)
print(g.shape)
print(g)
return g
----- EDIT -----
While the below described solution works for the problem mentioned in the headline (non-symmetric distribution) this code has also some other issues that are discussed here.
Numpy's linspace is inclusive of both edges by default, unlike range, you don't need to add one to the right side. I'd also recommend only dividing by floats, just to be safe:
x, y = np.meshgrid(np.linspace(-nx/2.0, +nx/2.0,nx), np.linspace(-ny/2.0, +ny/2.0,ny))

How to make a ufunc output a matrix given two array_like operands (instead of trying to broadcast them)?

I would like to get a matrix of values given two ndarray's from a ufunc, for example:
degs = numpy.array(range(5))
pnts = numpy.array([0.0, 0.1, 0.2])
values = scipy.special.eval_chebyt(degs, pnts)
The above code doesn't work (it gives a ValueError because it tries to broadcast two arrays and fails since they have different shapes: (5,) and (3,)); I would like to get a matrix of values with rows corresponding to degrees and columns to points at which polynomials are evaluated (or vice versa, it doesn't matter).
Currently my workaround is simply to use for-loop:
values = numpy.zeros((5,3))
for j in range(5):
values[j] = scipy.special.eval_chebyt(j, pnts)
Is there a way to do that? In general, how would you let a ufunc know you want an n-dimensional array if you have n array_like arguments?
I know about numpy.vectorize, but that seems neither faster nor more elegant than just a simple for-loop (and I'm not even sure you can apply it to an existent ufunc).
UPDATE What about ufunc's that receive 3 or more parameters? trying outer method gives a ValueError: outer product only supported for binary functions. For example, scipy.special.eval_jacobi.
What you need is exactly the outer method of ufuncs:
ufunc.outer(A, B, **kwargs)
Apply the ufunc op to all pairs (a, b) with a in A and b in B.
values = scipy.special.eval_chebyt.outer(degs, pnts)
#array([[ 1. , 1. , 1. ],
# [ 0. , 0.1 , 0.2 ],
# [-1. , -0.98 , -0.92 ],
# [-0. , -0.296 , -0.568 ],
# [ 1. , 0.9208, 0.6928]])
UPDATE
For more parameters, you must broadcast by hand. meshgrid often help for that,spanning each parameter in a dimension. For exemple :
n=3
alpha = numpy.array(range(5))
beta = numpy.array(range(3))
x = numpy.array(range(2))
data = numpy.meshgrid(n,alpha,beta,x)
values = scipy.special.eval_jacobi(*data)
Reshape the input arguments for broadcasting. In this case, change the shape of degs to be (5, 1) instead of just (5,). The shape (5, 1) broadcast with the shape (3,) results in the shape (5, 3):
In [185]: import numpy as np
In [186]: import scipy.special
In [187]: degs = np.arange(5).reshape(-1, 1) # degs has shape (5, 1)
In [188]: pnts = np.array([0.0, 0.1, 0.2])
In [189]: values = scipy.special.eval_chebyt(degs, pnts)
In [190]: values
Out[190]:
array([[ 1. , 1. , 1. ],
[ 0. , 0.1 , 0.2 ],
[-1. , -0.98 , -0.92 ],
[-0. , -0.296 , -0.568 ],
[ 1. , 0.9208, 0.6928]])

How do I use scipy.interpolate.splrep to interpolate a curve?

Using some experimental data, I cannot for the life of me work out how to use splrep to create a B-spline. The data are here: http://ubuntuone.com/4ZFyFCEgyGsAjWNkxMBKWD
Here is an excerpt:
#Depth Temperature
1 14.7036
-0.02 14.6842
-1.01 14.7317
-2.01 14.3844
-3 14.847
-4.05 14.9585
-5.03 15.9707
-5.99 16.0166
-7.05 16.0147
and here's a plot of it with depth on y and temperature on x:
Here is my code:
import numpy as np
from scipy.interpolate import splrep, splev
tdata = np.genfromtxt('t-data.txt',
skip_header=1, delimiter='\t')
depth = tdata[:, 0]
temp = tdata[:, 1]
# Find the B-spline representation of 1-D curve:
tck = splrep(depth, temp)
### fails here with "Error on input data" returned. ###
I know I am doing something bleedingly stupid, but I just can't see it.
You just need to have your values from smallest to largest :). It shouldn't be a problem for you #a different ben, but beware readers from the future, depth[indices] will throw a TypeError if depth is a list instead of a numpy array!
>>> indices = np.argsort(depth)
>>> depth = depth[indices]
>>> temp = temp[indices]
>>> splrep(depth, temp)
(array([-7.05, -7.05, -7.05, -7.05, -5.03, -4.05, -3. , -2.01, -1.01,
1. , 1. , 1. , 1. ]), array([ 16.0147 , 15.54473241, 16.90606794, 14.55343229,
15.12525673, 14.0717599 , 15.19657895, 14.40437622,
14.7036 , 0. , 0. , 0. , 0. ]), 3)
Hat tip to #FerdinandBeyer for the suggestion of argsort instead of my ugly "zip the values, sort the zip, re-assign the values" method.

Categories