My script calculates the location error using a set of the equation for different values of x and y and stores the output into an empty array t_error. However, there are two issues that need to be resolved:
1: How to store the output in a 20_by_20 matrix instead of a 400_by_1 dimension.
2: How to make a contour plot (error surface) using x, y, and out_put parameter that is t_error in our case.
The sample script is as below:
**import pandas as pd
import numpy as np
import math
ev_loc= pd.read_csv("test_grid.txt", sep='\t',header=None)
x=np.array(ev_loc[1])
y=np.array(ev_loc[0])
v=3.5
t_error=[]
for s in x:
for t in y:
for i, j, k in [[73.9,33.1, 1.268571], [73.5,33.1, 1.268571], [73.4,33.1, 2.854286], [73.7,33.2, 0.317143],[73.7,33.0, 0.317143]]:
u=((np.sqrt((t-j)**2 + (s-i)**2)/v)*111 - k)
v=u*u
t_error.append(float(v))
df_hr = pd.DataFrame(t_error)
numbers = np.array(df_hr)
window_size = 5
i = 0
moving_averages = []
while i < len(numbers) - window_size + 1:
this_window = numbers[i : i + window_size]
window_average = sum(this_window)
moving_averages.append(window_average)
i += 5
Error = pd.DataFrame(moving_averages)
Error.to_csv('test_total_error.csv')
print(Error)**
The data of test_grid.txt is as below
x1=np.linspace(73,75,num=41)
y1=np.linspace(33,35,num=41)
v=3.5
t_error=[]
for i, j, k in [[71.91500,33.82850, 57.2], [72.32200,33.16267, 38.28], [72.57900, 33.61317, 37.48], [73.44883, 33.83300, 27.8], [71.52967,33.15267, 58.8],
[73.27017,33.65167, 18.44], [73.14017, 33.75200, 29.97], [72.46550,32.63183, 39.98], [73.22900, 32.99867, 14.77], [72.67167, 31.92100, 48.71],
[71.91817, 32.53983, 54.73],[71.92333,33.04400, 49.67],[71.74417,32.79617, 57.39]]:
u=((np.sqrt((y1-j)**2 + (x1-i)**2)/v)*111 - k)
c=np.sum(u)
t_error.append(c)
plt.plot(t_error)
plt.show()
What is the error suppose to show?
Related
I am trying to make a program, that tells me when a note has been pressed.
I have the following notes exported as a .wav file (The C Major Scale 4 times with different rhythms, dynamics and in different octaves):
I can get the volumes of my sound file using the following code:
from scipy.io import wavfile
def get_volume(file):
sr, data = wavfile.read(file)
if data.ndim > 1:
data = data[:, 0]
return data
volumes = get_volume("FILE")
Here are some information about the output:
Max: 27851
Min: -25664
Mean: -0.7569383391943734
A Sample from the array: [ -7987 -8615 -8983 -9107 -9019 -8750 -8324 -7752 -7033 -6156
-5115 -3920 -2610 -1245 106 1377 2520 3515 4364 5077
5659 6113 6441 6639 6708 6662 6518 6288 5962 5525
4963 4265 3420 2418 1264 -27 -1429 -2901 -4388 -5814
-7101 -8186 -9028 -9614 -9955 -10077 -10012 -9785 -9401 -8846]
And here is what I get when I plot the volumes array (x is the index, y is the volume):
I want to get the indices of the start and end of the notes like the ones in the image (Did it by hand not accurate):
When I looked at the data I realized, that it is a 1d array and I also noticed, that when a note gets louder or quiter it is not smooth. It is like a ZigZag, but there is still a trend. So basically I can't just get the gradients (slope) of each point. So I though about grouping notes into batches and getting the average gradient there and thus doing the calculations with it, like so:
def get_average_gradient(arr):
# Calculates average gradient
return sum([i - (sum(arr) / len(arr)) for i in arr]) / len(arr)
def get_note_start_end(arr_size, batch_size, arr):
# Finds start and end indices
ranges = []
curr_range = [0]
prev_slope = curr_slope = "NO SLOPE"
has_ended = False
for i, j in enumerate(arr):
if j > 0:
curr_slope = "INCREASING"
elif j < 0:
curr_slope = "DECREASING"
else:
curr_slope = "NO SLOPE"
if prev_slope == "DECREASING" and not has_ended:
if i == len(arr) - 1 or arr[i + 1] < 0:
if curr_slope != "DECREASING":
curr_range.append((i + 1) * batch_size + batch_size)
ranges.append(curr_range)
curr_range = [(i + 1) * batch_size + batch_size + 1]
has_ended = True
if has_ended and curr_slope == "INCREASING":
has_ended = False
prev_slope = curr_slope
ranges[-1][-1] = arr_size - 1
return ranges
def get_notes(batch_size, arr):
# Gets the gradients of the batches
out = []
for i in range(0, len(arr), batch_size):
if i + batch_size > len(arr):
gradient = get_average_gradient(arr[i:])
else:
gradient = get_average_gradient(arr[i: i+batch_size])
# print(gradient, i)
out.append(gradient)
return get_note_start_end(len(arr), batch_size, out)
notes = get_notes(128, volumes)
The problem with this is, that if the batch size is too small, then it returns the indices of small peaks, which aren't a note on their own. If the batch size is too big then the program misses the start and end indices.
I also tried to get the notes, by using the silence.
Here is the code I used:
from pydub import AudioSegment, silence
audio = intro = AudioSegment.from_wav("C - Major - Test.wav")
dBFS = audio.dBFS
notes = silence.detect_nonsilent(audio, min_silence_len=50, silence_thresh=dBFS-10)
This worked the best, but it still wasn't good enough. Here is what I got:
It some notes pretty well, but it wasn't able to identify notes accurately if the notes themselves didn't become very quite before a different one was played (Like in the second scale and in the fourth scale).
I have been thinking about this problem for days and I have basically tried most if not all of the good(?) ideas I had. I am new to analysing audio files. Maybe I am using the wrong data to do what I want to do. Maybe I need to use the frequency data (I tried getting it, but couldn't make sense of it)
Frequency code:
from scipy.fft import *
from scipy.io import wavfile
import matplotlib.pyplot as plt
def get_freq(file, start_time, end_time):
sr, data = wavfile.read(file)
if data.ndim > 1:
data = data[:, 0]
else:
pass
# Fourier Transform
N = len(data)
yf = rfft(data)
xf = rfftfreq(N, 1 / sr)
return xf, yf
FILE = "C - Major - Test.wav"
plt.plot(*get_freq(FILE, 0, 10))
plt.show()
And the frequency graph:
And here is the .wav file:
https://drive.google.com/file/d/1CERH-eovu20uhGoV1_O3B2Ph-4-uXpiP/view?usp=sharing
Any help is appreciated :)
think this is what you need:
first you convert negative numbers into positive ones and smooth the line to eliminate noise, to find the lower peaks yo work with the negative values.
from scipy.io import wavfile
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
import numpy as np
from scipy.signal import savgol_filter
def get_volume(file):
sr, data = wavfile.read(file)
if data.ndim > 1:
data = data[:, 0]
return data
v1 = abs(get_volume("test.wav"))
#Smooth the curve
volumes=savgol_filter(v1,10000 , 3)
lv=volumes*-1
#find peaks
peaks,_ = find_peaks(volumes,distance=8000,prominence=300)
lpeaks,_= find_peaks(lv,distance=8000,prominence=300)
# plot them
plt.plot(volumes)
plt.plot(peaks,volumes[peaks],"x")
plt.plot(lpeaks,volumes[lpeaks],"o")
plt.plot(np.zeros_like(volumes), "--", color="gray")
plt.show()
Plot with your test file, x marks the high peaks and o the lower peaks
This article presents two python libraries (Aubio, librosa) to achieve what you need and includes examples of how to use them: How to Use Python to Detect Music Onsets by Lynn Zheng
I have an array of magnetometer data with artifacts every two hours due to power cycling.
I'd like to replace those indices with NaN so that the length of the array is preserved.
Here's a code example, adapted from https://www.kdnuggets.com/2017/02/removing-outliers-standard-deviation-python.html.
import numpy as np
import plotly.express as px
# For pulling data from CDAweb:
from ai import cdas
import datetime
# Import data:
start = datetime.datetime(2016, 1, 24, 0, 0, 0)
end = datetime.datetime(2016, 1, 25, 0, 0, 0)
data = cdas.get_data(
'sp_phys',
'THG_L2_MAG_'+ 'PG2',
start,
end,
['thg_mag_'+ 'pg2']
)
x =data['UT']
y =data['VERTICAL_DOWN_-_Z']
def reject_outliers(y): # y is the data in a 1D numpy array
n = 5 # 5 std deviations
mean = np.mean(y)
sd = np.std(y)
final_list = [x for x in y if (x > mean - 2 * sd)]
final_list = [x for x in final_list if (x < mean + 2 * sd)]
return final_list
px.scatter(reject_outliers(y))
print('Length of y: ')
print(len(y))
print('Length of y with outliers removed (should be the same): ')
print(len(reject_outliers(y)))
px.line(y=y, x=x)
# px.scatter(y) # It looks like the outliers are successfully dropped.
# px.line(y=reject_outliers(y), x=x) # This is the line I'd like to see work.
When I run 'px.scatter(reject_outliers(y))', it looks like the outliers are successfully getting dropped:
...but that's looking at the culled y vector relative to the index, rather than the datetime vector x as in the above plot. As the debugging text indicates, the vector is shortened because the outlier values are dropped rather than replaced.
How can I edit my 'reject_outliers()` function to assign those values to NaN, or to adjacent values, in order to keep the length of the array the same so that I can plot my data?
Use else in the list comprehension along the lines of:
[x if x_condition else other_value for x in y]
Got a less compact version to work. Full code:
import numpy as np
import plotly.express as px
# For pulling data from CDAweb:
from ai import cdas
import datetime
# Import data:
start = datetime.datetime(2016, 1, 24, 0, 0, 0)
end = datetime.datetime(2016, 1, 25, 0, 0, 0)
data = cdas.get_data(
'sp_phys',
'THG_L2_MAG_'+ 'PG2',
start,
end,
['thg_mag_'+ 'pg2']
)
x =data['UT']
y =data['VERTICAL_DOWN_-_Z']
def reject_outliers(y): # y is the data in a 1D numpy array
mean = np.mean(y)
sd = np.std(y)
final_list = np.copy(y)
for n in range(len(y)):
final_list[n] = y[n] if y[n] > mean - 5 * sd else np.nan
final_list[n] = final_list[n] if final_list[n] < mean + 5 * sd else np.nan
return final_list
px.scatter(reject_outliers(y))
print('Length of y: ')
print(len(y))
print('Length of y with outliers removed (should be the same): ')
print(len(reject_outliers(y)))
# px.line(y=y, x=x)
px.line(y=reject_outliers(y), x=x) # This is the line I wanted to get working - check!
More compact answer, sent via email by a friend:
In numpy you can select/index based on a Boolean array, and then make assignment with it:
def reject_outliers(y): # y is the data in a 1D numpy array
n = 5 # 5 std deviations
mean = np.mean(y)
sd = np.std(y)
final_list = y.copy()
final_list[np.abs(y - mean) > n * sd] = np.nan
return final_list
I also noticed that you didn’t use the value of n in your example code.
Alternatively, you can use the where method (https://numpy.org/doc/stable/reference/generated/numpy.where.html)
np.where(np.abs(y - mean) > n * sd, np.nan, y)
You don’t need the .copy() if you don’t mind modifying the input array.
Replace np.mean and np.std with np.nanmean and np.nanstd if you want the function to work on arrays that already contain nans, i.e. if you want to use this function recursively.
The answer about using if else in a list comprehension would work, but avoiding the list comprehension makes the function much faster if the arrays are large.
I have two loops that runs for a different x and y coordinates and for each different (x,y) coordinates, a linear equation is being solved for force 1 and force 2 using matrices method i.e. finding the inverse of A if Ax = C. For each loop it gives an answer as a matrix where first element is force 1 and 2nd element is force 2 at those specific coordinates. Here's my code:
import numpy as np
from scipy import linalg
def Force():
Force1 = np.zeros((160,90))
Force2 = np.zeros((160,90))
for x in np.arange(0,16.1,0.1):
for y in np.arange(1,9.1,0.1):
l1 = np.hypot(x,y)
l2 = np.hypot(15-x,y)
A = np.array([[(x/l1),((x-15)/l2)],[(y/l1),(y/l2)]])
c = np.array([[0],[70*9.81]])
F = linalg.solve(A,c)
Force1[x,y] = F[0]
Force2[x,y] = F[1]
print("Force 1 = {} \nForce 2 = {}\n".format(F[0], F[1]))
so at each point (x,y) a matrix [[Force 1],[Force 2]] is solved. Now I would like to append all the Force1(s) into a list of Force1[x,y] and similarly for Forces2(s) so that I can do
plt.imshow[Force1]
plt.imshow[Force2]
to plot a 2 heatmaps. How would I go about doing that?
This solves your issue - you were trying to assign to indices in Force1 and Force2 of type float. I've changed the for loops to use enumerate instead, and tweaked the assignment so it assigns F[0][0] and F[1][0].
import numpy as np
from scipy import linalg
def Force():
Force1 = np.zeros((160,90))
Force2 = np.zeros((160,90))
for i, x in enumerate(np.arange(0,16,0.1)):
for j, y in enumerate(np.arange(1,9,0.1)):
l1 = np.hypot(x,y)
l2 = np.hypot(15-x,y)
A = np.array([[(x/l1),((x-15)/l2)],[(y/l1),(y/l2)]])
c = np.array([[0],[70*9.81]])
F = linalg.solve(A,c)
Force1[i, j] = F[0][0]
Force2[i, j] = F[1][0]
# print("Force 1 = {} \nForce 2 = {}\n".format(F[0], F[1]))
plt.imshow(Force1)
plt.show()
plt.imshow(Force2)
plt.show()
Force()
The generated plots are:
and
I am converting mesh-grid points (2D Maze) into Adjacency matrix which I will use later to find the shortest path between the given coordinates. Please have a look into my code:
import numpy as np
import numpy.matlib #for matrices
def cutblocks(xv,yv,allBlocks):
x_new,y_new = np.copy(xv), np.copy(yv)
ori_x,ori_y = xv[0][0], yv[0][0]
for block in allBlocks:
block = sorted(block)
block_xmin = np.min((block[0][0], block[2][0]))
block_xmax = np.max((block[0][0], block[2][0]))
block_ymin = np.min((block[0][1], block[1][1]))
block_ymax = np.max((block[0][1], block[1][1]))
rx_min, rx_max = int((block_xmin-ori_x)/stepSize)+1, int((block_xmax-ori_x)/stepSize)+1
ry_min, ry_max = int((block_ymin-ori_y)/stepSize)+1, int((block_ymax-ori_y)/stepSize)+1
for i in range(rx_min,rx_max):
for j in range(ry_min,ry_max):
x_new[j][i] = np.nan
for i in range(ry_min,ry_max):
for j in range(rx_min,rx_max):
y_new[i][j] = np.nan
return x_new, y_new
stepSize = 0.2
##Blocks that should be disabled
allBlocks = [[(139.6, 93.6), (143.6, 93.6), (143.6, 97.6), (139.6, 97.6)],
[(154.2, 93.4), (158.2, 93.4), (158.2, 97.4), (154.2, 97.4)],
[(139.2, 77.8), (143.2, 77.8), (143.2, 81.8), (139.2, 81.8)],
[(154.2, 77.8), (158.2, 77.8), (158.2, 81.8), (154.2, 81.8)],
[(139.79999999999998, 86.4),
(142.6, 86.4),
(142.6, 88.0),
(139.79999999999998, 88.0)],
[(154.79999999999998, 87.2),
(157.6, 87.2),
(157.6, 88.8),
(154.79999999999998, 88.8)]]
x = np.arange(136.0, 161.0, stepSize)
y = np.arange(75.0, 101.0, stepSize)
xv, yv = np.meshgrid(x, y)
xv, yv = cutblocks(xv,yv,allBlocks)
MazeSize = xv.shape[0]*xv.shape[1]
adj = np.matlib.zeros((MazeSize,MazeSize)) #initialize AdjacencyMatrix
#make 1 whenever there is a connection between neighboring coordinates
mazeR, mazeC = 0,0
for row in range(xv.shape[0]):
for col in range(xv.shape[1]):
if xv[row][col]>0 and col+1<xv.shape[1] and round(np.abs(xv[row][col] - xv[row][col+1]),2) == stepSize:
adj[mazeR,mazeC+1] = 1
break
mazeC = mazeC+1
mazeR = mazeR+1
This code generates a mesh-grid in which some of the points are disabled because they are walls in the maze. The cost for every step (between connected vertices) is 1. My questions are:
1) The adjacency Matrix would be N.N and N=x.y (. is multiply). is that correct?
2) What could be the efficient way of finding and assigning the neighbors to values 1 in the adjacency matrix. ( I tried it but it doesn't work correctly)
3) Should I use graphs for this kind of problems ? My final goal is find the shortest path between the 2 coordinates (vertices).
Thanks
Main Problem
What is the better/pythonic way of retrieving elements in a particular array that are not found in a different array. This is what I have;
idata = [np.column_stack(data[k]) for k in range(len(data)) if data[k] not in final]
idata = np.vstack(idata)
My interest is in performance. My data is an (X,Y,Z) array of size (7000 x 3) and my gdata is an (X,Y) array of (11000 x 2)
Preamble
I am working on an octant search to find the n-number(e.g. 8) of points (+) closest to my circular point (o) in each octant. This would mean that my points (+) are reduced to only 64 (8 per octant). Then for each gdata I would save the elements that are not found in data.
import tkinter as tk
from tkinter import filedialog
import pandas as pd
import numpy as np
from scipy.spatial.distance import cdist
from collections import defaultdict
root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename()
data = pd.read_excel(file_path)
data = np.array(data, dtype=np.float)
nrow, cols = data.shape
file_path1 = filedialog.askopenfilename()
gdata = pd.read_excel(file_path1)
gdata = np.array(gdata, dtype=np.float)
gnrow, gcols = gdata.shape
N=8
delta = gdata - data[:,:2]
angles = np.arctan2(delta[:,1], delta[:,0])
bins = np.linspace(-np.pi, np.pi, 9)
bins[-1] = np.inf # handle edge case
octantsort = []
for j in range(gnrow):
delta = gdata[j, ::] - data[:, :2]
angles = np.arctan2(delta[:, 1], delta[:, 0])
octantsort = []
for i in range(8):
data_i = data[(bins[i] <= angles) & (angles < bins[i+1])]
if data_i.size > 0:
dist_order = np.argsort(cdist(data_i[:, :2], gdata[j, ::][np.newaxis]), axis=0)
if dist_order.size < npoint_per_octant+1:
[octantsort.append(data_i[dist_order[:npoint_per_octant][j]]) for j in range(dist_order.size)]
else:
[octantsort.append(data_i[dist_order[:npoint_per_octant][j]]) for j in range(npoint_per_octant)]
final = np.vstack(octantsort)
idata = [np.column_stack(data[k]) for k in range(len(data)) if data[k] not in final]
idata = np.vstack(idata)
Is there an efficient and pythonic way of doing this do increase performance in the last two lines of the code?
If I understand your code correctly, then I see the following potential savings:
dedent the final = ... line
don't use arctan it's expensive; since you only want octants compare the coordinates to zero and to each other
don't do a full argsort, use argpartition instead
make your octantsort an "octantargsort", i.e. store the indices into data, not the data points themselves; this would save you the search in the last but one line and allow you to use np.delete for removing
don't use append inside a list comprehension. This will produce a list of Nones that is immediately discarded. You can use list.extend outside the comprehension instead
besides, these list comprehensions look like a convoluted way of converting data_i[dist_order[:npoint_per_octant]] into a list, why not simply cast, or even keep as an array, since you want to vstack in the end?
Here is some sample code illustrating these ideas:
import numpy as np
def discard_nearest_in_each_octant(eater, eaten, n_eaten_p_eater):
# build octants
# start with quadrants ...
top, left = (eaten < eater).T
quadrants = [np.where(v&h)[0] for v in (top, ~top) for h in (left, ~left)]
dcoord2 = (eaten - eater)**2
dc2quadrant = [dcoord2[q] for q in quadrants]
# ... and split them
oct4158 = [q[:, 0] < q [:, 1] for q in dc2quadrant]
# main loop
dc2octants = [[q[o], q[~o]] for q, o in zip (dc2quadrant, oct4158)]
reloap = [[
np.argpartition(o.sum(-1), n_eaten_p_eater)[:n_eaten_p_eater]
if o.shape[0] > n_eaten_p_eater else None
for o in opair] for opair in dc2octants]
# translate indices
octantargpartition = [q[so] if oap is None else q[np.where(so)[0][oap]]
for q, o, oaps in zip(quadrants, oct4158, reloap)
for so, oap in zip([o, ~o], oaps)]
octantargpartition = np.concatenate(octantargpartition)
return np.delete(eaten, octantargpartition, axis=0)