I have a problem with two signals in an array that i want to use with the function fftconvolve.
They represent two measurements of same time duration and the start and the end of the signal are matched.
The trouble is that because of their different sampling rate, at which each measurement was taken, the array lengths are different
LS1= len(SIG1) # - > LS1=819
LS2= len(SIG2) # - > LS2=3441
therefore the convolution is not calculated properly.
What i need is basically a way to correctly down-sample the longer array signal to get LS1=LS2.
I have tried using it with mode='same' as it says in the function description
KOR=signal.fftconvolve(SIG1, SIG2, mode='same')
but the output still seems strange and I realy dont know if the calculation is correct.
Here is an example of signal convolution plot.
Than you for any help.
SOLUTION: It was quick & simple! Thank you J. Piquard!! The 'resample' function does the trick
SIG2 = signal.resample(SIG2, LS1)
Related
I am trying to preprocess audio clips for a keyword spotting task that uses machine learning models.
The first step is to calculate the spectrogram starting from the waveform and in order to do so I have found that there are two ways within the tensorflow framework.
The first one is to use the tf.signal library.
This means the functions:
stft = tf.signal.stft(signals, frame_length, frame_step)
spectrogram = tf.abs(stft)
# matrix computed beforehand
tf.tensordot(spectrogram, linear_to_mel_weight_matrix, 1)
log_mel_spectrogram = tf.math.log(mel_spectrogram + 1.e-6)
mfccs = tf.signal.mfccs_from_log_mel_spectrograms(log_mel_spectrogram)
The second is to use tf.raw_ops library.
This results in the following code:
# spectrogram computation
spectrogram = tf.raw_ops.AudioSpectrogram(
input=sample,
window_size=window_size_samples,
stride=window_stride_samples
)
# mfcc computation
mfcc_features = tf.raw_ops.Mfcc(
spectrogram=spectrogram,
sample_rate=sample_rate,
dct_coefficient_count=dct_coefficient_count
)
The problem is that the second one is much faster (~10x). As you can see from this table.
Operation
tf.signal
tf.raw_ops
STFT
5.09ms
0.47ms
Mel+MFCC
3.05ms
0.25ms
In both cases the same parameters were used (window size, hop size, number of coefficients...).
I have done some tests and the output is the same up to the 3rd decimal digit.
My question is: does someone have some experience with these functions or is someone able to explain this behavior?
I have big dataset in array form and its arranged like this:
Rainfal amount arranged in array form
Average or mean mean for each latitude and longitude at axis=0 is computed using this method declaration:
Lat=data[:,0]
Lon=data[:,1]
rain1=data[:,2]
rain2=data[:,3]
--
rain44=data[:,44]
rainT=[rain1,rain2,rain3,rain4,....rain44]
mean=np.mean(rainT)
The result was aweseome but requires time computation and I look forward to use For Loop to ease the calculation. As for the moment the script that I used is like this:
mean=[]
lat=data[:,0]
lon=data[:,1]
for x in range(2,46):
rainT=data[:,x]
mean=np.mean(rainT,axis=0)
print mean
But weird result is appeared. Anyone?
First, you probably meant to make the for loop add the subarrays rather than keep replacing rainT with other slices of the subarray. Only the last assignment matters, so the code averages that one subarray rainT=data[:,45], also it doesn't have the correct number of original elements to divide by to compute an average. Both of these mistakes contribute to the weird result.
Second, numpy should be able to average elements faster than a Python for loop can do it since that's just the kind of thing that numpy is designed to do in optimized native code.
Third, your original code copies a bunch of subarrays into a Python List, then asks numpy to average that. You should get much faster results by asking numpy to sum the relevant subarray without making a copy, something like this:
rainT = data[:,2:] # this gets a view onto data[], not a copy
mean = np.mean(rainT)
That computes an average over all the rainfall values, like your original code.
If you want an average for each latitude or some such, you'll need to do it differently. You can average over an array axis, but latitude and longitude aren't axes in your data[].
Thanks friends, you are giving me such aspiration. Here is the working script ideas by #Jerry101 just now but I decided NOT to apply Python Loop. New declaration would be like this:
lat1=data[:,0]
lon1=data[:,1]
rainT=data[:,2:46] ---THIS IS THE STEP THAT I AM MISSING EARLIER
mean=np.mean(rainT,axis=1)*24 - MAKE AVERAGE DAILY RAINFALL BY EACH LAT AND LON
mean2=np.array([lat1,lon1,mean])
mean2=mean2.T
np.savetxt('average-daily-rainfall.dat2',mean2,fmt='%9.3f')
And finally the result is exactly same to program made in Fortran.
I'm processing some experimental data in Python 3. The data (raw_data in my code) is pretty noisy:
One of my goal is to find the peaks, and for this I'd like to filter the noise. Based on what I found in the documentation of SciPy's Signal module, the theory of filtering seems to be really complicated, and unfortunately I have zero background. Of course I got to learn it sooner or later - and I intend to - but now now the profit doesn't worth the time (and learning filter theory isn't the purpose of my work), so I shamefully copied the code in Lyken Syu's answer without a chance of understanding the background:
import numpy as np
from scipy import signal as sg
from matplotlib import pyplot as plt
# [...] code, resulting in this:
raw_data = [arr_of_xvalues, arr_of_yvalues] # xvalues are in decreasing order
# <magic beyond my understanding>
n = 20 # the larger n is, the smoother the curve will be
b = [1.0 / n] * n
a = 2
filt = sg.lfilter(b, a, raw_data)
filtered = sg.lfilter(b, a, filt)
# <\magic>
plt.plot(filtered[0], filtered[1], ".")
plt.show()
It kind of works:
What concerns me is the curve from 0 to the beginning of my dataset the filter adds. I guess it's a property of the IIR filter I used, but I don't know how to prevent this. Also, I couldn't make other filters work so far. I need to use this code on other experimental results alike this, so I need a somewhat more general solution than e.g. cutting out all y<10 points.
Is there a better (possibly simpler) way, or choice of filter that is easy to implement without serious theoretical background?
How, if, could I prevent my filter adding that curve to my data?
I'm building indicator series based on market prices using ta-lib. I made a couple of implementations of the same concept but I found the same issue in any implementation. To obtain a correct series of values I must revert the input series and finally revert the resulting series. The python code that does the call to ta-lib library through a convenient wrapper is:
rsi1 = np.asarray(run_example( function_name,
arguments,
30,
weeklyNoFlatOpen[0],
weeklyNoFlatHigh[0],
weeklyNoFlatLow[0],
weeklyNoFlatClose[0],
weeklyNoFlatVolume[0][::-1]))
rsi2 = np.asarray(run_example( function_name,
arguments,
30,
weeklyNoFlatOpen[0][::-1],
weeklyNoFlatHigh[0][::-1],
weeklyNoFlatLow[0][::-1],
weeklyNoFlatClose[0][::-1],
weeklyNoFlatVolume[0][::-1]))[::-1]
The graphs of both series can be observed here (the indicator is really SMA):
The green line is clearly computed in reverse order (from n sample to 0) and the red one in the expected order. To achieve the red line I must reverse input series and output series.
The code of this test is available on: python code
Anybody observed the same behavior?
I found what's wrong with my approach to the problem. The simple answer is that the MA indicator puts the first valid value on the results array in the position zero, so the result series starts from zero and has N less samples than the input series (where N is the period value in this case). The reverted computation idea was completely wrong.
Here's the proof:
enter image description here
Adding 30 zeros at the beginning and removing the last ones the indicator fits over the input series nicely.
enter image description here
I wrote a code which is working perfectly with the small size data, but when I run it over a dataset with 52000 features, it seems to be stuck in the below function:
def extract_neighboring_OSM_nodes(ref_nodes,cor_nodes):
time_start=time.time()
print "here we start finding neighbors at ", time_start
for ref_node in ref_nodes:
buffered_node = ref_node[2].buffer(10)
for cor_node in cor_nodes:
if cor_node[2].within(buffered_node):
ref_node[4].append(cor_node[0])
cor_node[4].append(ref_node[0])
# node[4][:] = [cor_nodes.index(x) for x in cor_nodes if x[2].within(buffered_node)]
time_end=time.time()
print "neighbor extraction took ", time_end
return ref_nodes
the ref_node and cor_node are a list of tuples as follows:
[(FID, point, geometry, links, neighbors)]
neighbors is an empty list which is going to be populated in the above function.
As I said the last message printed out is the first print command in this function. it seems that this function is so slow but for 52000 thousand features it should not take 24 hours, should it?
Any Idea where the problem would be or how to make the function faster?
You can try multiprocessing, here is an example - http://pythongisandstuff.wordpress.com/2013/07/31/using-arcpy-with-multiprocessing-%E2%80%93-part-3/.
If you want to get K Nearest Neighbors of every (or some, it doesn't matter) sample of a dataset or eps neighborhood of samples, there is no need to implement it yourself. There is libraries out there specially for this purpose.
Once they built the data structure (usually some kind of tree) you can query the data for neighborhood of a certain sample. Usually for high dimensional data these data structure are not as good as they are for low dimensions but there is solutions for high dimensional data as well.
One I can recommend here is KDTree which has a Scipy implementation.
I hope you find it useful as I did.