Slow performance using scipy.interpolate inside loop

Slow performance using scipy.interpolate inside loop - python

I am using Python to generate input data for engineering simulations. I need to define how a certain physical quantity (let's call it force) varies with time.
The force depends on a known time-varying quantity (let's call it angle). The dependency between force and angle is different for each time step. The force-angle dependency for each time step is defined as data points, not as a function. For each time step I need to interpolate the value for force from the force-angle dependency of that time step. I am using scipy.interpolate.interp1d inside a list comprehension.
However, I am unhappy with the performance. The interpolation loop takes almost 20 seconds, which is unacceptably slow. The number of time steps is approximately 250k. The number of data points in a force-angle dependency is approximately 2k. I tried using a for loop instead, but this was even slower.
How can I improve performance? The execution time needs to be less than a second, if possible. The code here does not contain the actual data I'm using, but is similar enough.
import numpy as np
import random
from scipy.interpolate import interp1d
import time
nTimeSteps = 250000
nPoints_forceAngleDependency = 2000
# Generating an example (bogus data)
angle = np.linspace(0., 2*np.pi, nPoints_forceAngleDependency)
forceAngleDependency_forEachTimeStep = [random.random() * np.sin(angle) for i in range(nTimeSteps)]
angleHistory = [random.random() * 2 * np.pi for i in range(nTimeSteps)]
# Interpolation
start = time.time()
forceHistory = [interp1d(angle, forceAngleDependency_forEachTimeStep[i])(angleHistory[i]) \
for i in range(nTimeSteps)]
end = time.time()
print 'interpolation duration: %s s' % (end - start)

Use np.random.random(size=nTineSteps) to generate samples
For linear interpolation, use numpy.interp instead of interp1d. For higher order spline interpolation use either CubicSpline or make_interp_spline, and use vectorized evaluation.

Related

Is there a way to speed up or vectorize a nested for loop?

I am fairly new to coding in general and to Python in particular. I am trying to apply a weighted average scheme into a big dataset, which at the moment is taking hours to complete and I would love to speed up the process also because this has to be repeated several times.
The weighted average represents a method used in marine biogeochemistry that includes the history of gas transfer velocities (k) in-between sampling dates, where k is weighted according to the fraction of water column (f) ventilated by the atmosphere as a function of the history of k and assigning more importance to values that are closer to sampling time (so the weight at sampling time step = 1 and then it decreases moving away in time):
Weight average equation extracted from (https://doi.org/10.1029/2017GB005874) pp. 1168
In my attempt I used a nested for loop where at each time step t I calculated the weighted average:
def kw_omega (k, depth, window, samples_day):
"""
calculate the scheme weights for gas transfer velocity of oxygen
over the previous window of time, where the most recent gas transfer velocity
has a weight of 1, and the weighting decreases going back in time. The rate of decrease
depends on the wind history and MLD.
Parameters
----------
k: ndarray
instantaneous O2 gas transfer velocity
depth: ndarray
Water depth
window: integer
weighting period in days which equals the residence time of oxygen at sampling day
samples_day: integer
number of samples in each day composing window
Returns
---------
weighted_kw: ndarray
Notes
---------
n = the weighting period / the time resolution of the wind data
samples_day = the time resolution of the wind data
omega = is the weighting coefficient at each time step within the weighting window
f = the fraction of the water column (mixed layer, photic zone or full water column) ventilated at each time
"""
Dt = 1./samples_day
f = (k*Dt)/depth
f = np.flip(f)
k = np.flip(k)
n = window*samples_day
weighted_kw = np.zeros(len(k))
for t in np.arange(len(k) - n):
omega = np.zeros((n))
omega[0] = 1.
for i in np.arange(1,len(omega)):
omega[i] = omega[i-1]*(1-f[t+(i-1)])
weighted_kw[t] = sum(k[t:t+n]*omega)/sum(omega)
print(f"t = {t}")
return np.flip(weighted_kw)
This should be used on model simulation data which was set to run for almost 2 years where the model time step was set to 60 seconds, and sampling is done at intervals of 7 days. Therefore k has shape (927360) and n, representing the number of minutes in 7 days has shape (10080). At the moment it is taking several hours to run. Is there a way to make this calculation faster?

I would recommend to use the package numba to speed up your calculation.
import numpy as np
from numba import njit
from numpy.lib.stride_tricks import sliding_window_view
#njit
def k_omega(k_win, f_win):
delta_t = len(k_win)
omega_sum = omega = 1.0
k_omega_sum = k_win[0]
for t in range(1, delta_t):
omega *= (1 - f_win[t])
omega_sum += omega
k_omega_sum = k_win[t] * omega
return k_omega_sum / omega_sum
#njit
def windows_k_omega(k_wins, f_wins):
size = len(k_wins)
result = np.empty(size)
for i in range(size):
result[i] = k_omega(k_wins[i], f_wins[i])
return result
def kw_omega(k, depth, window, samples_day):
n = window * samples_day # delta_t
f = k / depth / samples_day
k_wins = sliding_window_view(k, n)
f_wins = sliding_window_view(f, n)
k_omegas = windows_k_omega(k_wins, f_wins)
weighted_kw = np.pad(weighted_kw, (len(k)-len(k_omegas), 0))
return weighted_kw
Here, I have split up the function into three in order to make it more comprehensible. The function k_omega is basically applying your weighted mean function to a k and f window. The function windows_k_omega is just to speed up the loop to apply the function element wise on the windows. Finally, the outer function kw_omega implements your original function interface. It uses the numpy function sliding_window_view to create the moving windows (note that this is a fancy numpy indexing under the hood, so this is not creating a copy of the original array) and performs the calculation with the helper functions and takes care of the padding of the result array (initial zeros).
A short test with your original function showed some different results, which is likely due to your np.flip calls reverse the arrays for your indexing. I just implemented the formula which you provided without checking your indexing in depth, so I leave this task to you. You should maybe call it with some dummy inputs which you can check manually.
As an additional note to your code: If you want to loop on index, you should use the build in range instead of using np.arange. Internally, python uses a generator for range instead of creating the array of indexes first to iterate over each individually. Furthermore, you should try to reduce the number of arrays which you need to create, but instead re-use them, e.g. the omega = np.zeros(n) could be created outside the outer for loop using omega = np.empty(n) and internally only initialized on each new iteration omega[:] = 0.0. Note, that all kind of memory management which is typically the speed penalty, beside array element access by index, is something which you need to do with numpy yourself, because there is no compiler which helps you, therefore I recommend using numba, which compiles your python code and helps you in many ways to make your number crunching faster.

Fastest algorithm for computing 3-D curl

I'm trying to write a section of code that computes the curl of a vector field numerically to second order with periodic boundary conditions. However, the algorithm I made is very slow and I'm wondering if anyone knows of any alternative algorithms.
To give more specific context: I'm using a 3xAxBxC numpy array as my vector field where the first axis refers to the Cartesian direction (x,y,z) and A,B,C refer to the number of bins in that Cartesian direction (i.e the resolution). So for example, I might have a vector field F = np.zeros((3,64,64,64)) where Fx = F[0] is a 64x64x64 Cartesian lattice in its own right. So far, my solution was to use the 3-point centered difference stencil to calculate the derivatives and used a nested loop to iterate over all the different dimensions using modular arithmetic to enforce the periodic boundary conditions (see below for example). However, as my resolution increases (the size of A,B,C) this begins to take a long time (upwards 2 minutes, which adds up if I do this several hundred times for my simulation - this is just one small part of a larger algorithm). I was wondering if anyone know of an alternative method for doing this?
import numpy as np
F =np.array([np.ones([128,128,128]),2*np.ones([128,128,128]),
3*np.ones([128,128,128])])
VxF =np.array([np.zeros([128,128,128]),np.zeros([128,128,128]),
np.zeros([128,128,128])])
for i in range(0,128):
for j in range(0,128):
for k in range(0,128):
VxF[0][i,j,k] = 0.5*((F[2][i,(j+1)%128,k]-
F[2][i,j-1,k])-(F[1][i,j,(k+1)%128]-F[1][i,j,k-1]))
VxF[1][i,j,k] = 0.5*((F[0][i,j,(k+1)%128]-
F[0][i,j,k-1])-(F[2][(i+1)%128,j,k]-F[2][i-1,j,k]))
VxF[2][i,j,k] = 0.5*((F[1][(i+1)%128,j,k]-
F[1][i-1,j,k])-(F[0][i,(j+1)%128,k]-F[0][i,j-1,k]))
Just to re-iterate, I'm looking for an algorithm that'll compute the curl of a vector field array to second order given periodic boundary conditions faster than the one I have. Maybe there's nothing that will do this, but I just want to check before I keep spending time running this algorithm. Thank. you everyone in advance!

There may be better tools for this, but here is a trivial 200x speedup with numba:
import numpy as np
from numba import jit
def pure_python():
F =np.array([np.ones([128,128,128]),2*np.ones([128,128,128]),
3*np.ones([128,128,128])])
VxF =np.array([np.zeros([128,128,128]),np.zeros([128,128,128]),
np.zeros([128,128,128])])
for i in range(0,128):
for j in range(0,128):
for k in range(0,128):
VxF[0][i,j,k] = 0.5*((F[2][i,(j+1)%128,k]-
F[2][i,j-1,k])-(F[1][i,j,(k+1)%128]-F[1][i,j,k-1]))
VxF[1][i,j,k] = 0.5*((F[0][i,j,(k+1)%128]-
F[0][i,j,k-1])-(F[2][(i+1)%128,j,k]-F[2][i-1,j,k]))
VxF[2][i,j,k] = 0.5*((F[1][(i+1)%128,j,k]-
F[1][i-1,j,k])-(F[0][i,(j+1)%128,k]-F[0][i,j-1,k]))
return VxF
#jit(fastmath=True)
def with_numba():
F =np.array([np.ones([128,128,128]),2*np.ones([128,128,128]),
3*np.ones([128,128,128])])
VxF =np.array([np.zeros([128,128,128]),np.zeros([128,128,128]),
np.zeros([128,128,128])])
for i in range(0,128):
for j in range(0,128):
for k in range(0,128):
VxF[0][i,j,k] = 0.5*((F[2][i,(j+1)%128,k]-
F[2][i,j-1,k])-(F[1][i,j,(k+1)%128]-F[1][i,j,k-1]))
VxF[1][i,j,k] = 0.5*((F[0][i,j,(k+1)%128]-
F[0][i,j,k-1])-(F[2][(i+1)%128,j,k]-F[2][i-1,j,k]))
VxF[2][i,j,k] = 0.5*((F[1][(i+1)%128,j,k]-
F[1][i-1,j,k])-(F[0][i,(j+1)%128,k]-F[0][i,j-1,k]))
return VxF
The pure Python version takes 13 seconds on my machine, while the numba version takes 65 ms.

Why is it scipy.stats.gaussian_kde() slower than seaborn.kde_plot() for the same data?

In python 3.7, I have this numpy array with shape=(2, 34900). This arrays is a list of coordinates where the index 0 represents the X axis and the index 1 represents the y axis.
When I use seaborn.kde_plot() to make a visualization of the distribution of this data, I'm able to get the result in about 5-15 seconds when running on a i5 7th generation.
But when I try to run the following piece of code:
#Find the kernel for
k = scipy.stats.kde.gaussian_kde(data, bw_method=.3)
#Define the grid
xi, yi = np.mgrid[0:1:2000*1j, 0:1:2000*1j]
#apply the function
zi = k(np.vstack([xi.flatten(), yi.flatten()]))
which finds the gaussian kernel for this data and applies it to a grid I defined, it takes much more time. I wasn't able to run the full array but when running on a slice with the size of 140, it takes about 40 seconds to complete.
The 140 sized slice does make an interesting result which I was able to visualize using plt.pcolormesh().
My question is what I am missing here. If I understood what is happening correctly, I'm using the scipy.stats.kde.gaussian_kde() to create an estimation of a function defined by the data. Then I'm applying the function to a 2D space and getting it's Z component as result. Then I'm plotting the Z component. But how can this process be any different from
seaborn.kde_plot() that makes the code take so much longer.
Scipy's implementation just goes through each point doing this:
for i in range(self.n):
diff = self.dataset[:, i, newaxis] - points
tdiff = dot(self.inv_cov, diff)
energy = sum(diff*tdiff,axis=0) / 2.0
result = result + exp(-energy)

Seaborn has in general two ways to calculate the bivariate kde. If available, it uses statsmodels, if not, it falls back to scipy.
The scipy code is similar to what is shown in the question. It uses scipy.stats.gaussian_kde. The statsmodels code uses statsmodels.nonparametric.api.KDEMultivariate.
However, for a fair comparisson we would need to take the same grid size for both methods. The standard gridsize for seaborn is 100 points.
import numpy as np; np.random.seed(42)
import seaborn.distributions as sd
N = 34900
x = np.random.randn(N)
y = np.random.randn(N)
bw="scott"
gridsize=100
cut=3
clip = [(-np.inf, np.inf), (-np.inf, np.inf)]
f = lambda x,y : sd._statsmodels_bivariate_kde(x, y, bw, gridsize, cut, clip)
g = lambda x,y : sd._scipy_bivariate_kde(x, y, bw, gridsize, cut, clip)
If we time those two functions,
# statsmodels
%timeit f(x,y) # 1 loop, best of 3: 16.4 s per loop
# scipy
%timeit g(x,y) # 1 loop, best of 3: 8.67 s per loop
Scipy is hence twice as fast as statsmodels (the seaborn default). The reason why the code in the question takes so long is that instead of a grid of size 100, a grid of size 2000 is used.
Seeing those results one would actually be tempted to use scipy instead of statsmodels. Unfortunately it does not allow to choose which one to use. One hence needs to manually set the respective flag.
import seaborn.distributions as sd
sd._has_statsmodels = False
# plot kdeplot with scipy.stats.kde.gaussian_kde
sns.kdeplot(x,y)

It seems that seaborn just takes a sample of my data. Since the size is smaller, it is able to finish it in a small amount. On the other hand, SciPy uses every single point in its processing. So it takes way longer with the size of dataset I'm using.

How to get time/freq from FFT in Python

I've got a little problem managing FFT data. I was looking for many examples of how to do FFT, but I couldn't get what I want from any of them. I have a random wave file with 44kHz sample rate and I want to get magnitude of N harmonics each X ms, let's say 100ms should be enough. I tried this code:
import scipy.io.wavfile as wavfile
import numpy as np
import pylab as pl
rate, data = wavfile.read("sound.wav")
t = np.arange(len(data[:,0]))*1.0/rate
p = 20*np.log10(np.abs(np.fft.rfft(data[:2048, 0])))
f = np.linspace(0, rate/2.0, len(p))
pl.plot(f, p)
pl.xlabel("Frequency(Hz)")
pl.ylabel("Power(dB)")
pl.show()
This was last example I used, I found it somewhere on stackoverflow. The problem is, this gets magnitude which I want, gets frequency, but no time at all. FFT analysis is 3D as far as I know and this is "merged" result of all harmonics. I get this:
X-axis = Frequency, Y-axis = Magnitude, Z-axis = Time (invisible)
From my understanding of the code, t is time - and it seems like that, but is not needed in the code - We'll maybe need it though. p is array of powers (or magnitude), but it seems like some average of all magnitudes of each frequency f, which is array of frequencies. I don't want average/merged value, I want magnitude for N harmonics each X milliseconds.
Long story short, we can get: 1 magnitude of all frequencies.
We want: All magnitudes of N freqeuencies including time when certain magnitude is present.
Result should look like this array: [time,frequency,amplitude]
So in the end if we want 3 harmonics, it would look like:
[0,100,2.85489] #100Hz harmonic has 2.85489 amplitude on 0ms
[0,200,1.15695] #200Hz ...
[0,300,3.12215]
[100,100,1.22248] #100Hz harmonic has 1.22248 amplitude on 100ms
[100,200,1.58758]
[100,300,2.57578]
[200,100,5.16574]
[200,200,3.15267]
[200,300,0.89987]
Visualization is not needed, result should be just arrays (or hashes/dictionaries) as listed above.

Further to #Paul R's answer, scipy.signal.spectrogram is a spectrogram function in scipy's signal processing module.
The example at the above link is as follows:
from scipy import signal
import matplotlib.pyplot as plt
# Generate a test signal, a 2 Vrms sine wave whose frequency linearly
# changes with time from 1kHz to 2kHz, corrupted by 0.001 V**2/Hz of
# white noise sampled at 10 kHz.
fs = 10e3
N = 1e5
amp = 2 * np.sqrt(2)
noise_power = 0.001 * fs / 2
time = np.arange(N) / fs
freq = np.linspace(1e3, 2e3, N)
x = amp * np.sin(2*np.pi*freq*time)
x += np.random.normal(scale=np.sqrt(noise_power), size=time.shape)
#Compute and plot the spectrogram.
f, t, Sxx = signal.spectrogram(x, fs)
plt.pcolormesh(t, f, Sxx)
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.show()

It looks like you're trying to implement a spectrogram, which is a sequence of power spectrum estimates, typically implemented with a succession of (usually overlapping) FFTs. Since you only have one FFT (spectrum) then you have no time dimension yet. Put your FFT code in a loop, and process one block of samples (e.g. 1024) per iteration, with a 50% overlap between successive blocks. The sequence of generated spectra will then be a 3D array of time v frequency v magnitude.
I'm not a Python person, but I can give you some pseudo code which should be enough to get you coding:
N = length of data input
N_FFT = no of samples per block (== FFT size, e.g. 1024)
i = 0 ;; i = index of spectrum within 3D output array
for block_start = 0 to N - block_start
block_end = block_start + N_FFT
get samples from block_start .. block_end
apply window function to block (e.g. Hamming)
apply FFT to windowed block
calculate magnitude spectrum (20 * log10( re*re + im*im ))
store spectrum in output array at index i
block_start += N_FFT / 2 ;; NB: 50% overlap
i++
end

Edit: Oh, so it seems this returns values, but they don't fit to the audio file at all. Even though they can be used as magnitude on spectrogram, they won't work for example in those classic audio visualizers which you can see in many music players. I also tried matplotlib's pylab for the spectrogram, but the result is same.
import os
import wave
import pylab
import math
from numpy import amax
from numpy import amin
def get_wav_info(wav_file,mi,mx):
wav = wave.open(wav_file, 'r')
frames = wav.readframes(-1)
sound_info = pylab.fromstring(frames, 'Int16')
frame_rate = wav.getframerate()
wav.close()
spectrum, freqs, t, im = pylab.specgram(sound_info, NFFT=1024, Fs=frame_rate)
n = 0
while n < 20:
for index,power in enumerate(spectrum[n]):
print("%s,%s,%s" % (n,int(round(t[index]*1000)),math.ceil(power*100)/100))
n += 1
get_wav_info("wave.wav",1,20)
Any tips how to obtain dB that's usable in visualization?
Basically, we apparently have all we need from the code above, just how to make it return normal values? Ignore mi and mx as these are just adjusting values in array to fit into mi..mx interval - that would be for visualization usage. If I am correct, spectrum in this code returns array of arrays which contains amplitudes for each frequency from freqs array, which are present on time according to t array, but how does the value work - is it really amplitude if it returns these weird values and if it is, how to convert it to dBs for example.
tl;dr I need output for visualizer like music players have, but it shouldn't work realtime, I want just the data, but values don't fit the wav file.
Edit2: I noticed there's one more issue. For 90 seconds wav, t array contains times till 175.x, which seems very weird considering the frame_rate is correct with the wav file. So now we have 2 problems: spectrum doesn't seem to return correct values (maybe it will fit if we get correct time) and t seems to return exactly double time of the wav.
Fixed: Case completely solved.
import os
import pylab
import math
from numpy import amax
from numpy import amin
from scipy.io import wavfile
frame_rate, snd = wavfile.read(wav_file)
sound_info = snd[:,0]
spectrum, freqs, t, im = pylab.specgram(sound_info,NFFT=1024,Fs=frame_rate,noverlap=5,mode='magnitude')
Specgram needed a little adjustment and I loaded only one channel with scipy.io library (instead of wave library). Also without mode set to magnitude, it returns 10log10 instead of 20log10, which is reason why it didn't return correct values.

Calculate histogram of distances between points in big data set

I have big data set, representing 1.2M points in 220 dimensional periodic space (x changes fom (-pi,pi))... (matrix: 1.2M x 220).
I would like to calculate histogram of distances between these points taking into account periodicity. I have written some code in python but still it works quite slow for my test case (I am not even trying to run it on the whole set...).
Can you maybe take a look and help me with some tweaking?
Any suggestions and comments much appreciated.
import numpy as np
# 1000x220 test set (-pi,pi)
d=np.random.random((1000, 220))*2*np.pi-np.pi
# calculating theoretical limit on the histogram range, max distance between
# two points can be pi in each dimension
m=np.zeros(np.shape(d)[1])+np.pi
m_=np.sqrt(np.sum(m**2))
# hist range is from 0 to mm
mm=np.floor(m_)
bins=mm/0.01
m=np.zeros(bins)
# proper calculations
import time
start_time = time.time()
for i in range(np.shape(d)[0]):
diff=d[:-(i+1),:]-d[i+1:,:]
diff=np.absolute(diff)
adiff=diff-np.pi
diff=np.pi-np.absolute(adiff)
s=np.sqrt(np.einsum('ij,ij->i', diff,diff))
m+=np.histogram(s,range=(0,mm),bins=bins)[0]
print time.time() - start_time

I think you will see the most improvement from breaking the main loop to smaller parts by dividing range(...) to a couple of smaller ranges and use the threading module to have a couple of threads run the loop concurrently

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Slow performance using scipy.interpolate inside loop - python

Use np.random.random(size=nTineSteps) to generate samples For linear interpolation, use numpy.interp instead of interp1d. For higher order spline interpolation use either CubicSpline or make_interp_spline, and use vectorized evaluation.

Related

Is there a way to speed up or vectorize a nested for loop?

Fastest algorithm for computing 3-D curl

Why is it scipy.stats.gaussian_kde() slower than seaborn.kde_plot() for the same data?

How to get time/freq from FFT in Python

Calculate histogram of distances between points in big data set

Categories

Resources