I am trying to preprocess audio clips for a keyword spotting task that uses machine learning models.
The first step is to calculate the spectrogram starting from the waveform and in order to do so I have found that there are two ways within the tensorflow framework.
The first one is to use the tf.signal library.
This means the functions:
stft = tf.signal.stft(signals, frame_length, frame_step)
spectrogram = tf.abs(stft)
# matrix computed beforehand
tf.tensordot(spectrogram, linear_to_mel_weight_matrix, 1)
log_mel_spectrogram = tf.math.log(mel_spectrogram + 1.e-6)
mfccs = tf.signal.mfccs_from_log_mel_spectrograms(log_mel_spectrogram)
The second is to use tf.raw_ops library.
This results in the following code:
# spectrogram computation
spectrogram = tf.raw_ops.AudioSpectrogram(
input=sample,
window_size=window_size_samples,
stride=window_stride_samples
)
# mfcc computation
mfcc_features = tf.raw_ops.Mfcc(
spectrogram=spectrogram,
sample_rate=sample_rate,
dct_coefficient_count=dct_coefficient_count
)
The problem is that the second one is much faster (~10x). As you can see from this table.
Operation
tf.signal
tf.raw_ops
STFT
5.09ms
0.47ms
Mel+MFCC
3.05ms
0.25ms
In both cases the same parameters were used (window size, hop size, number of coefficients...).
I have done some tests and the output is the same up to the 3rd decimal digit.
My question is: does someone have some experience with these functions or is someone able to explain this behavior?
Related
I'm trying to copy the second exercise ("Forecasts constrained to an interval") in the link below:
https://otexts.com/fpp2/limits.html
What the link does is an ARIMA with forecasts constrained to an interval using a certain logarithmic transformation and then back-transformation at the end. But the example in the link uses R language, and I can't find a similar example for Python no matter how much I search.
Can anyone tell me how I can do the exact same thing described in the link with Python? I'm certain it is possible using the statsmodels library, but I'm not sure how to exactly replicate the transformation constraints.
The standard ARIMA in Python:
from statsmodels.tsa.arima_model import ARIMA
import numpy as np
model = ARIMA(series, order=(0,1,1))
model_fit = model.fit(trend='nc',full_output=True, disp=1)
print(model_fit.summary())
I have a feeling that I need to add something like this somewhere (transformation formula):
series = np.log((series-a)/(b-series))
as well as the back-transformation formula. But since they don't produce explicit errors I can't be sure whether I'm coding it right.
Also, I'm stuck at where I should be adding the transformation and back-transformation. I would appreciate it if someone could explain how the exercise in the link could be replicated in Python.
P.S. By 'transformation' here, it has nothing to do with making the time series stationary. I didn't mention the stationary part because it's unrelated to my current question. The link above uses the word 'transformation' to use the logarithmic formula to make the time series constrained to lie between 'a' and 'b'.
What I tried so far:
series = np.log((series-a)/(b-series))
model = ARIMA(series, order=(0,1,1))
model_fit = model.fit(trend='c',full_output=True, disp=1)
print(model_fit.summary())
fore = model_fit.forecast(steps=1)
fore = (b-a)*np.exp(fore)/(1+np.exp(fore)) + a
it's so clear from the link that you referred to in the question that the transformation is going to take place just before forecasting. so:
you do the transformation on your data
forecast using ARIMA model on transformed data
reverse the transformation on predicted data!
a = 50
b = 400
# Transformation on the data
train = np.log((series-a)/(b-series))
# Choose suitable order
model = ARIMA(train,order=(2,2,2))
results = model.fit()
start=len(train)
# One step ahead forecasting. You should set value of the end to what you prefer
predictions = results.predict(start = start , end = 1 , dynamic=False , typ='levels')
# reverse transformation
predictions = ((b-a)*np.exp(predictions)/(1+np.exp(predictions))) + a
Passing dynamic=False means that forecasts at each point are generated using the full history up to that point (all lagged values).
Passing typ='levels' predicts the levels of the original endogenous variables. If we'd used the
default typ='linear' we would have seen linear predictions in terms of the differenced
endogenous variables.
I am currently using fipy but am still relatively new to the nuiances associated with the package. While I have been able to regenerate the desired heatmap from the examples folder in for the mesh20x20 diffusion example using the command line, I have struggled to replicate it within a Spyder IDE. I am using python version 3.8 . It is simple enough to generate it using the "examples" folder from the command line the command line image generated, however, when I attempt to "re-program" it I end up with iterations of the following. the following result. I am hoping to be able to regenerate the smooth color transition from the examples folder, as opposed to the discrete dichromatic option that I have been limited to at present. I believe there is some issues with the viewer in some capacity I believe some related issues may have cropped up in the past for others, potentially as it relates to colorbar reformatting, though I have not yet been capable of effectively implementing these workarounds to generate the desired imagery. datamin and datamax in Viewer() did not work
I would be greatly indebted for any assitance the community could provide.
from fipy.terms.transientTerm import TransientTerm
from fipy.terms.implicitDiffusionTerm import ImplicitDiffusionTerm
from fipy.terms.explicitDiffusionTerm import ExplicitDiffusionTerm
from fipy.meshes.nonUniformGrid2D import NonUniformGrid2D
from fipy.variables.cellVariable import CellVariable
from fipy.viewers.matplotlibViewer.matplotlib2DViewer import Matplotlib2DViewer
####
#Global Inputs
D=1
steps=10
#Dimensional Inputs
nx=20
dx=1
ny=20
dy=1
L=dx*nx
#Temporal Inputs
#nt=20
#dt=1
#cell variable initial values
value=0
#construct mesh from dimensional pts
mesh=NonUniformGrid2D(nx=nx, dx=dx, ny=ny, dy=dy)
#construct term variables phi with name, mesh design
phi=CellVariable(name="solutionvariable", mesh=mesh, value=0)
#construct boundary conditions
#dirichlet ---> we can an automatic application of neumann to top right and bottom left
valueTopLeft=0
valueBottomRight=1
#assign boundary conditions to a face or cell
X, Y=mesh.faceCenters
facesTopLeft=((mesh.facesLeft & (Y > L/2 )) | (mesh.facesTop &( X < L/2)))
facesBottomRight=((mesh.facesRight & (Y < L/2)) | (mesh.facesBottom & (X > L/2)))
#constrain variables
phi.constrain(valueTopLeft, facesTopLeft)
phi.constrain(valueBottomRight, facesBottomRight)
#equation construction
eq=TransientTerm()==ExplicitDiffusionTerm(coeff=D)
#equation solving and either viewing and/or extraction
timestepduration=0.9 *(dx**2)/(2*D)
for step in range(steps):
eq.solve(var=phi, dt=timestepduration)
print(phi[step])
viewer=Matplotlib2DViewer(vars=phi, datamin=0, datamax=1)
viewer.axes.set_title("Solutionvbl(Step %d)" % (step+1,))
Figured it out I think. I was using ExplicitDiffusion and the example utilizes ImplicitDiffusion. When I tried this all I got back was a blank monochromatic image (and returned zeros for my phi[step] at the end. I am happy to report that once a "kickstart" value is provided in the value section for cellVariable (I used 0.001), and utilized in conjunction with ImplicitDiffusion, and the timestepduration is increased from its limit of 0.9x**2/2D to the utilized 9x**2/2D used in the example documentation it more or less adheres to the image generated when run from the command line. Grateful to have this sorted. Hope this provides assistance to anyone else who might run into a similar problem.
I need to feed an image and a vector sampled from normal distribution simultaneously. As the image dataset I'm using is too large, I create a ImageDeserializer for that part. But I also need to add random vector (sampled from numpy normal distribution), to the input map before feed it to the network. Is there any way to achieve this?
I also test:
mb_data = reader_train.next_minibatch(mb_size, input_map=input_map)
mb_data[random_input_node] = np.random.normal((mb_size, 100))
but get the following error:
TypeError: cannot convert value of dictionary to N4CNTK13MinibatchDataE
The problem solved with the following snippet to feed data to trainer:
mb_data = reader_train.next_minibatch(mb_size, input_map=input_map)
z = np.random.normal(mb_size)
my_trainer.train_minibatch({feature_image: mb_data[image].data, feature_z: z})
Also thanks to #mewahl. Defining new reader is another suitable way to solve the problem, and I think it must be faster than what I have done.
I have a problem with two signals in an array that i want to use with the function fftconvolve.
They represent two measurements of same time duration and the start and the end of the signal are matched.
The trouble is that because of their different sampling rate, at which each measurement was taken, the array lengths are different
LS1= len(SIG1) # - > LS1=819
LS2= len(SIG2) # - > LS2=3441
therefore the convolution is not calculated properly.
What i need is basically a way to correctly down-sample the longer array signal to get LS1=LS2.
I have tried using it with mode='same' as it says in the function description
KOR=signal.fftconvolve(SIG1, SIG2, mode='same')
but the output still seems strange and I realy dont know if the calculation is correct.
Here is an example of signal convolution plot.
Than you for any help.
SOLUTION: It was quick & simple! Thank you J. Piquard!! The 'resample' function does the trick
SIG2 = signal.resample(SIG2, LS1)
I wrote a code which is working perfectly with the small size data, but when I run it over a dataset with 52000 features, it seems to be stuck in the below function:
def extract_neighboring_OSM_nodes(ref_nodes,cor_nodes):
time_start=time.time()
print "here we start finding neighbors at ", time_start
for ref_node in ref_nodes:
buffered_node = ref_node[2].buffer(10)
for cor_node in cor_nodes:
if cor_node[2].within(buffered_node):
ref_node[4].append(cor_node[0])
cor_node[4].append(ref_node[0])
# node[4][:] = [cor_nodes.index(x) for x in cor_nodes if x[2].within(buffered_node)]
time_end=time.time()
print "neighbor extraction took ", time_end
return ref_nodes
the ref_node and cor_node are a list of tuples as follows:
[(FID, point, geometry, links, neighbors)]
neighbors is an empty list which is going to be populated in the above function.
As I said the last message printed out is the first print command in this function. it seems that this function is so slow but for 52000 thousand features it should not take 24 hours, should it?
Any Idea where the problem would be or how to make the function faster?
You can try multiprocessing, here is an example - http://pythongisandstuff.wordpress.com/2013/07/31/using-arcpy-with-multiprocessing-%E2%80%93-part-3/.
If you want to get K Nearest Neighbors of every (or some, it doesn't matter) sample of a dataset or eps neighborhood of samples, there is no need to implement it yourself. There is libraries out there specially for this purpose.
Once they built the data structure (usually some kind of tree) you can query the data for neighborhood of a certain sample. Usually for high dimensional data these data structure are not as good as they are for low dimensions but there is solutions for high dimensional data as well.
One I can recommend here is KDTree which has a Scipy implementation.
I hope you find it useful as I did.