How to append to a list but skip a line? - python

I am trying to store values of length 150 into a list array but I want to skip a line each iteration. This is what I have which doesnt function. freq_data_1 has size (150,) which I try to append to freq_data which occurs but when I try to skip to the next line, it wont work. Any suggestions?
import numpy as np
import matplotlib.pyplot as plt
from scipy import pi
from scipy.fftpack import fft
freq_data = []
freq_data_2 = []
for i in range(len(video_samples)):
freq_data_1 = fft(video_samples[i,:])
freq_data.append(freq_data_1[i])
freq_data_2 = '\n'.join(freq_data)
My video_samples is an array of (4000,150) meaning I have 4000 signals of length in time of 150 steps. I want my output to be the same size as this but storing the frequency output.
Video_samples is a collection of signals with slightly varying frequency for each signal/row. e.g.
Input:
[0.775 0.3223 0.4613 0.2619 0.4012 0.567
0.908 0.4223 0.5128 0.489 0.318 0.187]
The first row is one of my signals of length 6. The second row is another signal of length 6. Each of these signals represent a frequency with added noise.
I wish to take each row separately, use the FFT on it to obtain the frequency of that signal and then store it in a matrix where each row would represent the FFT of that signal.

Another guess...
import numpy as np # it's not necessary for this snippet actually
def fft(lst): return [x*2 for x in lst] # just for example
# 2d array, just a guess
video_samples = [
[0.775, 0.3223, 0.4613, 0.2619, 0.4012, 0.567],
[0.908, 0.4223, 0.5128, 0.489, 0.318, 0.187],
[0.1, 0.2, 0.3, 0.4, 0.5, 0.6],
[0.7, 0.8, 0.9, 0.1, 0.2, 0.3]
]
video_samples = np.array(video_samples, dtype = 'float') # 2d list to ndarray just for example
print('video samples (input?): \n', video_samples)
matrix1 = []
matrix2 = []
for s1, s2 in zip(video_samples[::2], video_samples[1::2]):
matrix1.append(fft(s1))
matrix2.append(fft(s2))
matrix1 = np.array(matrix1, dtype = 'float') # just for example
matrix2 = np.array(matrix2, dtype = 'float') # just for example
print('\nmatrix1:\n', matrix1)
print('\nmatrix2:\n', matrix2)
Output:
video samples (input?):
[[0.775 0.3223 0.4613 0.2619 0.4012 0.567 ]
[0.908 0.4223 0.5128 0.489 0.318 0.187 ]
[0.1 0.2 0.3 0.4 0.5 0.6 ]
[0.7 0.8 0.9 0.1 0.2 0.3 ]]
matrix1:
[[1.55 0.6446 0.9226 0.5238 0.8024 1.134 ]
[0.2 0.4 0.6 0.8 1. 1.2 ]]
matrix2:
[[1.816 0.8446 1.0256 0.978 0.636 0.374 ]
[1.4 1.6 1.8 0.2 0.4 0.6 ]]
Five guys for two (or more?) days can't get what do you mean. Amazing.

Related

Dynamically normalise 2D numpy array

I have a 2D numpy array "signals" of shape (100000, 1024). Each row contains the traces of amplitude of a signal, which I want to normalise to be within 0-1.
The signals each have different amplitudes, so I can't just divide by one common factor, so I was wondering if there's a way to normalise each of the signals so that each value within them is between 0-1?
Let's say that the signals look something like [[0,1,2,3,5,8,2,1],[0,2,5,10,7,4,2,1]] and I want them to become [[0.125,0.25,0.375,0.625,1,0.25,0.125],[0,0.2,0.5,0.7,0.4,0.2,0.1]].
Is there a way to do it without looping over all 100,000 signals, as this will surely be slow?
Thanks!
Easy thing to do would be to generate a new numpy array with max values by axis and divide by it:
import numpy as np
a = np.array([[0,1,2,3,5,8,2,1],[0,2,5,10,7,4,2,1]])
b = np.max(a, axis = 1)
print(a / b[:,np.newaxis])
output:
[[0. 0.125 0.25 0.375 0.625 1. 0.25 0.125]
[0. 0.2 0.5 1. 0.7 0.4 0.2 0.1 ]]
Adding a little benchmark to show just how significant is the performance difference between the two solutions:
import numpy as np
import timeit
arr = np.arange(1024).reshape(128,8)
def using_list_comp():
return np.array([s/np.max(s) for s in arr])
def using_vectorized_max_div():
return arr/arr.max(axis=1)[:, np.newaxis]
result1 = using_list_comp()
result2 = using_vectorized_max_div()
print("Results equal:", (result1==result2).all())
time1 = timeit.timeit('using_list_comp()', globals=globals(), number=1000)
time2 = timeit.timeit('using_vectorized_max_div()', globals=globals(), number=1000)
print(time1)
print(time2)
print(time1/time2)
On my machine the output is:
Results equal: True
0.9873569
0.010177099999999939
97.01750989967731
Almost a 100x difference!
Another solution is to use normalize:
from sklearn.preprocessing import normalize
data = [[0,1,2,3,5,8,2,1],[0,2,5,10,7,4,2,1]]
normalize(data, axis=1, norm='max')
result:
array([[0. , 0.125, 0.25 , 0.375, 0.625, 1. , 0.25 , 0.125],
[0. , 0.2 , 0.5 , 1. , 0.7 , 0.4 , 0.2 , 0.1 ]])
Please note norm='max' argument. Default value is 'l2'.

Pythonic way to remove elements from Numpy array closer than threshold

What is the best way to remove the minimal number of elements from a sorted Numpy array so that the minimal distance among the remaining is always bigger than a certain threshold?
For example, if the threshold is 1, the following sequence [0.1, 0.5, 1.1, 2.5, 3.] will become [0.1, 1.1, 2.5]. The 0.5 is removed because it is too close to 0.1 but then 1.1 is preserved because it is far enough from 0.1.
My current code:
import numpy as np
MIN_DISTANCE = 1
a = np.array([0.1, 0.5, 1.1, 2.5, 3.])
for i in range(len(a)-1):
if(a[i+1] - a[i] < MIN_DISTANCE):
a[i+1] = a[i]
a = np.unique(a)
a
array([0.1, 1.1, 2.5])
Is there a more efficient way to do so?
Note that my question is similar to Remove values from numpy array closer to each other but not exactly the same.
You could use numpy.ufunc.accumulate to iterate thru adjacent pairs of the array instead of the for loop.
The numpy.add.accumulate example or itertools.accumulate probably shows best what it's doing.
Along with numpy.frompyfunc your condition can be applied as ufunc (universal functions ).
Code: (with an extended array to cross check some additional cases, but works with your array as well)
import numpy as np
MIN_DISTANCE = 1
a = np.array([0.1, 0.5, 0.6, 0.7, 1.1, 2.5, 3., 4., 6., 6.1])
print("original: \n" + str(a))
def my_py_function(arr1, arr2):
if(arr2 - arr1 < MIN_DISTANCE):
arr2 = arr1
return arr2
my_np_function = np.frompyfunc(my_py_function, 2, 1)
my_np_function.accumulate(a, dtype=np.object, out=a).astype(float)
print("complete: \n" + str(a))
a = np.unique(a)
print("unique: \n" + str(a))
Result:
original:
[0.1 0.5 0.6 0.7 1.1 2.5 3. 4. 6. 6.1]
complete:
[0.1 0.1 0.1 0.1 1.1 2.5 2.5 4. 6. 6. ]
unique:
[0.1 1.1 2.5 4. 6. ]
Concerning execution time timeit shows a turnaround at array length of about 20.
Your code is much faster (relative) for your array length of 5
whereas for array length >>20 the accumulate option speeds up considerably (~35% in time for array length 300)

Why is my gaussian np.array not symmetric?

I am trying to write a function that returns an np.array of size nx x ny that contains a centered gaussian distribution with mean mu and sd sig. It works in principle like below but the problem is that the result is not completely symmetric. This is not a problem for larger nx x ny but for smaller ones it is obvious that something is not quite right in my implementation ...
For:
create2dGaussian (1, 1, 5, 5)
It outputs:
[[ 0. 0.2 0.3 0.1 0. ]
[ 0.2 0.9 1. 0.5 0. ]
[ 0.3 1. 1. 0.6 0. ]
[ 0.1 0.5 0.6 0.2 0. ]
[ 0. 0. 0. 0. 0. ]]
... which is not symmetric. For larger nx and ny a 3d plot looks perfectly fine/smooth but why are the detailed numerics not correct and how can I fix it?
import numpy as np
def create2dGaussian (mu, sigma, nx, ny):
x, y = np.meshgrid(np.linspace(-nx/2, +nx/2+1,nx), np.linspace(-ny/2, +ny/2+1,ny))
d = np.sqrt(x*x+y*y)
g = np.exp(-((d-mu)**2 / ( 2.0 * sigma**2 )))
np.set_printoptions(precision=1, suppress=True)
print(g.shape)
print(g)
return g
----- EDIT -----
While the below described solution works for the problem mentioned in the headline (non-symmetric distribution) this code has also some other issues that are discussed here.
Numpy's linspace is inclusive of both edges by default, unlike range, you don't need to add one to the right side. I'd also recommend only dividing by floats, just to be safe:
x, y = np.meshgrid(np.linspace(-nx/2.0, +nx/2.0,nx), np.linspace(-ny/2.0, +ny/2.0,ny))

To find N Maximum indices of a numpy array whose corresponding values should greater than M in another array

I have 3 Numpy arrays each of length 107952899.
Lets Say :
1. Time = [2.14579526e+08 2.14579626e+08 2.14579726e+08 ...1.10098692e+10 1.10098693e+10]
2. Speed = [0.66 0.66 0.66 .............0.06024864 0.06014756]
3. Brak_press = [0.3, 0.3, 0.3 .............. 0.3, 0.3]
What it mean
Each index Value in Time corresponds to same index Value in Speed & Brake array.
Time Speed Brake
2.14579526e+08 0.66 0.3
.
.
Requirement
No 1 :I want to find the indices in Speed array whose values inside are greater than 20
No 2 : for those indices, what will be values in Brake Array
No 3 : Now i want to find the Top N Maximum Value indices in Brake Array & Store it another list/array
So finally if I take one indices from Top N Maximum Indices and use it in Brake & Speed array it must show..
Brake[idx] = valid Value & more importantly Speed [idx] = Value > than 20
General Summary
Simply, What i needed is, to find the Maximum N Brake point indices whose corresponding speed Value should be greater than 20
What i tried
speed_20 = np.where(Speed > 20) # I got indices as tupple
brake_values = Brake[speed_20] # Found the Brake Values corresponds to speed_20 indices
After that i tried argsort/argpartition but none of result matches my requirement
Request
I believe there will be a best method to do this..Kindly shed some light
(I converted the above np arrays to pandas df, it works fine, due to memory concern i prefer to do using numpy operations)
You are almost there. This should do what you want:
speed_20 = np.where(Speed > 20)[0]
sort = np.argsort(-Brake[speed_20])
result = speed_20[sort[:N]]
Maybe this is an option you can consider, using NumPy.
First create a multidimensional matrix (I changed the values so it's easier to follow):
Time = [ 2, 1, 5, 4, 3]
Speed = [ 10, 20, 40, 30, 50]
Brak_press = [0.1, 0.3, 0.5, 0.4, 0.2]
data = np.array([Time, Speed, Brak_press]).transpose()
So data are stored as:
print(data)
# [[ 2. 10. 0.1]
# [ 1. 20. 0.3]
# [ 5. 40. 0.5]
# [ 4. 30. 0.4]
# [ 3. 50. 0.2]]
To extract speed greater than 20:
data[data[:,1] > 20]
# [[ 5. 40. 0.5]
# [ 4. 30. 0.4]
# [ 3. 50. 0.2]]
To get the n greatest Brak_press:
n = 2
data[data[:,2].argsort()[::-1][:n]]
# [[ 5. 40. 0.5]
# [ 4. 30. 0.4]]

Efficiently select subsection of numpy array

I want to split a numpy array into three different arrays based on a logical comparison. The numpy array I want to split is called x. It's shape looks as follows, but it's entries vary: (In response to Saullo Castro's comment I included a slightly different array x.)
array([[ 0.46006547, 0.5580928 , 0.70164242, 0.84519205, 1.4 ],
[ 0.00912908, 0.00912908, 0.05 , 0.05 , 0.05 ]])
This values of this array are monotonically increasing along columns. I also have two other arrays called lowest_gridpoints and highest_gridpoints. The entries of these arrays also vary, but the shape is always identical to the following:
array([ 0.633, 0.01 ]), array([ 1.325, 0.99 ])
The selection procedure I want to apply is as follows:
All columns containing values lower than any value in lowest_gridpoints should be removed from x and constitute the array temp1.
All columns containing values higher than any value in highest_gridpoints should be removed from x and constitute the array temp2.
All columns of x that are included in neither temp1 or temp2 constitute the array x_new.
The following code I wrote achieves the task.
if np.any( x[:,-1] > highest_gridpoints ) or np.any( x[:,0] < lowest_gridpoints ):
for idx, sample, in enumerate(x.T):
if np.any( sample > highest_gridpoints):
max_idx = idx
break
elif np.any( sample < lowest_gridpoints ):
min_idx = idx
temp1, temp2 = np.array([[],[]]), np.array([[],[]])
if 'min_idx' in locals():
temp1 = x[:,0:min_idx+1]
if 'max_idx' in locals():
temp2 = x[:,max_idx:]
if 'min_idx' in locals() or 'max_idx' in locals():
if 'min_idx' not in locals():
min_idx = -1
if 'max_idx' not in locals():
max_idx = x.shape[1]
x_new = x[:,min_idx+1:max_idx]
However, I suspect that this code is very inefficient because of the heavy use of loops. Additionally, I think the syntax is bloated.
Does someone have an idea for a code which achieve the task outlined above more efficiently or looks concise?
Only the first part of your question
from numpy import *
x = array([[ 0.46006547, 0.5580928 , 0.70164242, 0.84519205, 1.4 ],
[ 0.00912908, 0.00912908, 0.05 , 0.05 , 0.05 ]])
low, high = array([ 0.633, 0.01 ]), array([ 1.325, 0.99 ])
# construct an array of two rows of bools expressing your conditions
indices1 = array((x[0,:]<low[0], x[1,:]<low[1]))
print indices1
# do an or of the values along the first axis
indices1 = any(indices1, axis=0)
# now it's a single row array
print indices1
# use the indices1 to extract what you want,
# the double transposition because the elements
# of a 2d array are the rows
tmp1 = x.T[indices1].T
print tmp1
# [[ True True False False False]
# [ True True False False False]]
# [ True True False False False]
# [[ 0.46006547 0.5580928 ]
# [ 0.00912908 0.00912908]]
next construct similarly indices2 and tmp2, the indices of the remnant are the negation of the oring of the first two indices. (i.e., numpy.logical_not(numpy.logical_or(i1,i2))).
Addendum
Another approach, possibly faster if you have thousands of entries, implies numpy.searchsorted
from numpy import *
x = array([[ 0.46006547, 0.5580928 , 0.70164242, 0.84519205, 1.4 ],
[ 0.00912908, 0.00912908, 0.05 , 0.05 , 0.05 ]])
low, high = array([ 0.633, 0.01 ]), array([ 1.325, 0.99 ])
l0r = searchsorted(x[0,:], low[0], side='right')
l1r = searchsorted(x[1,:], low[1], side='right')
h0l = searchsorted(x[0,:], high[0], side='left')
h1l = searchsorted(x[1,:], high[1], side='left')
lr = max(l0r, l1r)
hl = min(h0l, h1l)
print lr, hl
print x[:,:lr]
print x[:,lr:hl]
print x[:,hl]
# 2 4
# [[ 0.46006547 0.5580928 ]
# [ 0.00912908 0.00912908]]
# [[ 0.70164242 0.84519205]
# [ 0.05 0.05 ]]
# [ 1.4 0.05]
Excluding overlaps can be obtained by hl = max(lr, hl). NB in previuos approach the array slices are copied to new objects, here you get views on x and you have to be explicit if you want new objects.
Edit An unnecessary optimization
If we use only the upper part of x in the second couple of sortedsearches (if you look at the code you'll see what I mean...) we get two benefits, 1) a very small speedup of the searches (sortedsearch is always fast enough) and 2) the case of overlap is automatically managed.
As a bonus, code for copying the segments of x in the new arrays. NB x was changed to force overlap
from numpy import *
# I changed x to force overlap
x = array([[ 0.46006547, 1.4 , 1.4, 1.4, 1.4 ],
[ 0.00912908, 0.00912908, 0.05, 0.05, 0.05 ]])
low, high = array([ 0.633, 0.01 ]), array([ 1.325, 0.99 ])
l0r = searchsorted(x[0,:], low[0], side='right')
l1r = searchsorted(x[1,:], low[1], side='right')
lr = max(l0r, l1r)
h0l = searchsorted(x[0,lr:], high[0], side='left')
h1l = searchsorted(x[1,lr:], high[1], side='left')
hl = min(h0l, h1l) + lr
t1 = x[:,range(lr)]
xn = x[:,range(lr,hl)]
ncol = shape(x)[1]
t2 = x[:,range(hl,ncol)]
print x
del(x)
print
print t1
print
# note that xn is a void array
print xn
print
print t2
# [[ 0.46006547 1.4 1.4 1.4 1.4 ]
# [ 0.00912908 0.00912908 0.05 0.05 0.05 ]]
#
# [[ 0.46006547 1.4 ]
# [ 0.00912908 0.00912908]]
#
# []
#
# [[ 1.4 1.4 1.4 ]
# [ 0.05 0.05 0.05]]

Categories