Fast String Array (Individual) Hashing in Python

Fast String Array (Individual) Hashing in Python - python

As the title suggests, I would love some suggestions on how to make the following code faster. I have tried multiple methods including utilizing Numba, a combination of the np.fromiter() and map() functions, as well as Numpy vectorizing the hash function so as to get it all to work faster.
Take a look at the code below! Below the code are the results of the code from running it on my machine.
Some notes for the future:
the CODES variable holds the dimensions of the arrays that need to be hashed (generated random strings for StackOverflow purposes)
the first compilation time for Numba is quite high, but I am willing to take on this cost if it means speeding up the code in the long run when running hundreds of thousands of times
import numpy as np
import numba as nb
import time
# GENERATE RANDOM "<U64" CHARACTER ARRAYS
A, Z = np.array(["A","Z"]).view("int32")
CODES = (25058, 64272, 61425)
LENGTH = 64
tmp1 = np.random.randint(low = A, high = Z, size = CODES[0] * LENGTH, dtype = "int32").view(f"U{LENGTH}")
tmp2 = np.random.randint(low = A, high = Z, size = CODES[1] * LENGTH, dtype = "int32").view(f"U{LENGTH}")
tmp3 = np.random.randint(low = A, high = Z, size = CODES[2] * LENGTH, dtype = "int32").view(f"U{LENGTH}")
# NUMBA ITERATION AND HASHING ONE BY ONE
#nb.jit(nopython=True, fastmath=True, parallel=True, cache=True)
def form_hashed_array(to_hash: np.ndarray) -> np.ndarray:
hashed = np.empty_like(to_hash, dtype=np.int64)
for i in nb.prange(len(to_hash)):
hashed[i] = hash(to_hash[i])
return hashed
print("--------")
t2 = time.monotonic()
t = time.monotonic()
tmp4 = form_hashed_array(tmp1)
print(time.monotonic() - t) # this first one will be larger due to compilation time
t = time.monotonic()
tmp5 = form_hashed_array(tmp2)
print(time.monotonic() - t)
t = time.monotonic()
tmp6 = form_hashed_array(tmp3)
print(time.monotonic() - t)
print("NUMBA ITERATION TOOK: " + str(time.monotonic() - t2))
# MAP + FROMITER COMBINATION
print("--------")
t2 = time.monotonic()
t = time.monotonic()
tmp7 = np.fromiter((map(hash, tmp1)), dtype=np.int64)
print(time.monotonic() - t)
t=time.monotonic()
tmp8 = np.fromiter((map(hash, tmp2)), dtype=np.int64)
print(time.monotonic()-t)
t = time.monotonic()
tmp9 = np.fromiter((map(hash, tmp3)), dtype=np.int64)
print(time.monotonic()-t)
print("MAP + FROMITER COMBINATION TOOK : " + str(time.monotonic()-t2))
# NUMPY FUNCTION VECTORIZATION
vfunc = np.vectorize(hash)
print("--------")
t2 = time.monotonic()
t = time.monotonic()
tmp10 = vfunc(tmp1)
print(time.monotonic() - t)
t = time.monotonic()
tmp11 = vfunc(tmp2)
print(time.monotonic() - t)
t = time.monotonic()
tmp12 = vfunc(tmp3)
print(time.monotonic() - t)
print("NUMPY FUNCTION VECTORIZATION TOOK: " + str(time.monotonic()-t2))
# SANITY CHECKS
print("--------")
print(not (tmp4 - tmp7).any() and not (tmp7 - tmp10).any(), end=" ")
print(not (tmp5 - tmp8).any() and not (tmp8 - tmp11).any(), end=" ")
print(not (tmp6 - tmp9).any() and not (tmp9 - tmp12).any())
print("--------")
breakpoint()
This code will be run a significant number of times, so it is important that it is as fast as possible.
--------
6.9208437809720635
0.08914285799255595
0.09502507897559553
NUMBA ITERATION TOOK: 7.1051117710303515
--------
0.009926816972438246
0.02683716599131003
0.026946138008497655
MAP + FROMITER COMBINATION TOOK : 0.06381386297289282
--------
0.011753249040339142
0.02864329604199156
0.029279633017722517
NUMPY FUNCTION VECTORIZATION TOOK: 0.06976548000238836
--------
True True True
--------
Thanks for the help in advance!

Related

Ways to Optimize Independent N-Body Simulations

I am relatively new to parallel computing and the Numba package. I am looking for optimization methods for my stupendously parallel N-body simulation. I've applied everything I know so far with Numpy arrays, JIT compliers, and multiprocessing. However, I'm still not getting the speed I desire (I've seen videos where their codes are MUCH faster still)
What I have currently is a rather simple python integrator using Runge-Kutta Integration and two equations of motion. I work with numerical integrators a lot in my field so I would definitely like to pick up a few more tricks from you guys.
I have posted my code below, but essentially, I have one main function called "Motion" which takes 2 initial conditions and integrate their motion for a set amount of time. I have JITTED this function and all the functions it called upon iteratively: "RK4", "ODE", "Electric Field". Lastly, I call the pool function from Multiprocessing to parallelize the "Motion" function and insert different initial conditions for each simulation it runs.
Again, I've implemented every types of optimization I'm aware of, however I'm still not very happy with its speed. I've posted my code below. If anyone can spot a piece of algorithm that could be further optimized, that would be extremely helpful and educational (for me at least)! Thank you for your time.
import numpy as np
import matplotlib.pyplot as plt
from numba import njit, prange
from time import time
from tqdm import tqdm
import multiprocessing as mp
from IPython.display import clear_output
from scipy import interpolate
"Electric Field Information"
A = np.float32(1.00E-04)
N_waves = np.int(19)
frequency = np.linspace(37.5,46.5,N_waves)*1e-3 #Set of frequencies used for Electric Field
m = np.int(20) #Azimuthal Wave Number
sigma = np.float32(0.5) #Gaussian Width of E wave in L
zeta = np.float32(1)
"Particle Information"
N_Particles = np.int(10000)
q = np.float32(-1) #Charge of electron
mass = np.float32(0.511e6) #Mass of Proton eV/c^2
FirstAdiabatic = np.float32(2000e10) #MeV/Gauss Adiabatic Invariant
"Runge-Kutta Paramters"
Total_Time = np.float32(10) #hours
Step_Size = np.float32(0.2) #second
Plot_Time = np.float32(60) #seconds
time_array = np.arange(0, Total_Time*3600+Step_Size, Step_Size) #Convert to seconds and Add End Point
N_points = len(time_array)
Skip_How_Many = int(Plot_Time/Step_Size) #Used to shorten our data set and save RAM
"Constants"
Beq = np.float64(31221.60592e-9) #nT
Re = np.float32(6371e3) #Meters
c = np.float32(2.998e8) #m/s
"Start Electric Field Code"
def wave_peak(omega): #Called once so no need to JIT or Optimize this
L_sample = np.linspace(1,10,100)
phidot = -3*FirstAdiabatic / (q* (L_sample*Re)**2 * np.sqrt(1+ (2*FirstAdiabatic*Beq/ (mass*L_sample**3)) ) )
phidot_to_L = interpolate.interp1d(phidot,L_sample, kind = 'cubic')
L0i = phidot_to_L(omega/m)
return L0i
omega = 2*np.pi*frequency
L0i_wave = wave_peak(omega)
Phi0i_wave = np.linspace(0,2*np.pi,N_waves)
np.random.shuffle(Phi0i_wave)
#njit(nogil= True)
def Electric_Field(t,r):
E0 = A*np.exp(-(r[0]-L0i_wave)**2 / (2*sigma**2))
Delta = np.arctan2( (r[0] * (r[0]-L0i_wave)/sigma**2 - 1), (2*np.pi*r[0]/zeta) )
Er = E0/m * np.sqrt( (2*np.pi*r[0]/zeta)**2 + (r[0]*(r[0]-L0i_wave)/sigma**2 -1)**2 ) * np.cos(m*r[1] - omega*t + Phi0i_wave + 2*np.pi*r[0]/zeta + Delta)
Ephi = E0*np.cos(m*r[1] - omega*t + Phi0i_wave + 2*np.pi*r[0]/zeta)
return np.sum(Er),np.sum(Ephi)
"End of Electric Field Code"
"Particle's ODE - Equation of Motion"
#njit(nogil= True)
def ODE(t,r):
Er, Ephi = Electric_Field(t,r) #Pull out the electric so we only call it once.
Ldot = Ephi * r[0]**3 / (Re*Beq)
Phidot = -Er * r[0]**2 / (Re*Beq) - 3* FirstAdiabatic / (q*r[0]**2*Re**2) * 1/np.sqrt(2*FirstAdiabatic*Beq/ (r[0]**3*mass) + 1)
return np.array([Ldot,Phidot])
#njit(nogil= True)
def RK4(t,r): #Standard Runge-Kutta Integration Algorthim
k1 = Step_Size*ODE(t,r)
k2 = Step_Size*ODE(t+Step_Size/2, r+k1/2)
k3 = Step_Size*ODE(t+Step_Size/2, r+k2)
k4 = Step_Size*ODE(t+Step_Size, r+k3)
return r + k1/6 + k2/3 + k3/3 + k4/6
#njit(nogil= True)
def Motion(L0,Phi0): #Insert Inital Conditions and it will loop through the RK4 integrator and output all its positions.
L_Array = np.zeros_like(time_array)
Phi_Array = np.zeros_like(time_array)
L_Array[0] = L0
Phi_Array[0] = Phi0
for i in range(1,N_points):
L_Array[i], Phi_Array[i] = RK4(time_array[i-1], np.array([ L_Array[i-1],Phi_Array[i-1] ]) )
return L_Array[::Skip_How_Many], Phi_Array[::Skip_How_Many]
#Skip_How_Many is used to take up less RAM space since we don't need that kind of percsion in our data
# Location = Motion(5,0)
# x = Location[0]*np.cos(Location[1])
# y = Location[0]*np.sin(Location[1])
# plt.plot(x,y,"o", markersize = 0.5)
# ts = time()
# Motion(5,0)
# print('Solo Time:', time() - ts)
"Getting my Arrays ready so I can index it"
Split = int(np.sqrt(N_Particles))
L0i = np.linspace(4.4,5.5,Split)
Phi0i = np.linspace(0,360,Split) / 180 * np.pi
L0_Grid = np.repeat(L0i,Split)
# ^Here I want to run a meshgrid of L0i and Phi0, so I repeat L0i using this function and mod (%) the index on the Phi Function
#Create Appending Array
results = []
def get_results(result): #Call back to this array from Multiprocessing to append the results it gives per run.
results.append(result)
clear_output()
print("Getting Results %0.2f" %(len(results)/N_Particles * 100), end='\r')
if __name__ == '__main__':
#Call In Multiprocessing
pool = mp.Pool(mp.cpu_count()) #Counting number of threads to start
ts = time() #Timing this process. Begins here
for ii in range(N_Particles): #Not too sure what this does, but it works - I assume it parallelizes this loop
pool.apply_async(Motion, args = (L0_Grid[ii],Phi0i[int(ii%Split)]), callback=get_results)
pool.close() #I'm not too sure what this does but everyone uses it, and it won't work without it
pool.join()
print('Time in MP parallel:', time() - ts) #Output Time

I think the main reason why your code is slow is because your Runge-Kutta method has fixed time steps. Fancy ODE solvers will select the biggest time step that allows a tolerable amount of error. One example is the LSODA ODE solver from ODEPACK.
Below I've re-written your code using NumbaLSODA. On my computer, it speeds up your code by about 200x.
import numpy as np
import matplotlib.pyplot as plt
from numba import njit, prange
from time import time
from tqdm import tqdm
import multiprocessing as mp
from scipy import interpolate
from NumbaLSODA import lsoda_sig, lsoda
from numba import cfunc
import numba as nb
"Electric Field Information"
A = np.float32(1.00E-04)
N_waves = np.int(19)
frequency = np.linspace(37.5,46.5,N_waves)*1e-3 #Set of frequencies used for Electric Field
m = np.int(20) #Azimuthal Wave Number
sigma = np.float32(0.5) #Gaussian Width of E wave in L
zeta = np.float32(1)
"Particle Information"
N_Particles = np.int(10000)
q = np.float32(-1) #Charge of electron
mass = np.float32(0.511e6) #Mass of Proton eV/c^2
FirstAdiabatic = np.float32(2000e10) #MeV/Gauss Adiabatic Invariant
"Runge-Kutta Paramters"
Total_Time = np.float32(10) #hours
Step_Size = np.float32(0.2) #second
Plot_Time = np.float32(60) #seconds
time_array = np.arange(0, Total_Time*3600+Step_Size, Step_Size) #Convert to seconds and Add End Point
N_points = len(time_array)
Skip_How_Many = int(Plot_Time/Step_Size) #Used to shorten our data set and save RAM
"Constants"
Beq = np.float64(31221.60592e-9) #nT
Re = np.float32(6371e3) #Meters
c = np.float32(2.998e8) #m/s
"Start Electric Field Code"
def wave_peak(omega): #Called once so no need to JIT or Optimize this
L_sample = np.linspace(1,10,100)
phidot = -3*FirstAdiabatic / (q* (L_sample*Re)**2 * np.sqrt(1+ (2*FirstAdiabatic*Beq/ (mass*L_sample**3)) ) )
phidot_to_L = interpolate.interp1d(phidot,L_sample, kind = 'cubic')
L0i = phidot_to_L(omega/m)
return L0i
omega = 2*np.pi*frequency
L0i_wave = wave_peak(omega)
Phi0i_wave = np.linspace(0,2*np.pi,N_waves)
np.random.shuffle(Phi0i_wave)
#njit
def Electric_Field(t,r):
E0 = A*np.exp(-(r[0]-L0i_wave)**2 / (2*sigma**2))
Delta = np.arctan2( (r[0] * (r[0]-L0i_wave)/sigma**2 - 1), (2*np.pi*r[0]/zeta) )
Er = E0/m * np.sqrt( (2*np.pi*r[0]/zeta)**2 + (r[0]*(r[0]-L0i_wave)/sigma**2 -1)**2 ) * np.cos(m*r[1] - omega*t + Phi0i_wave + 2*np.pi*r[0]/zeta + Delta)
Ephi = E0*np.cos(m*r[1] - omega*t + Phi0i_wave + 2*np.pi*r[0]/zeta)
return np.sum(Er),np.sum(Ephi)
"End of Electric Field Code"
"Particle's ODE - Equation of Motion"
#cfunc(lsoda_sig)
def ODE(t, r_, dr, p):
r = nb.carray(r_, (2,))
Er, Ephi = Electric_Field(t,r)
Ldot = Ephi * r[0]**3 / (Re*Beq)
Phidot = -Er * r[0]**2 / (Re*Beq) - 3* FirstAdiabatic / (q*r[0]**2*Re**2) * 1/np.sqrt(2*FirstAdiabatic*Beq/ (r[0]**3*mass) + 1)
dr[0] = Ldot
dr[1] = Phidot
funcptr = ODE.address
#njit
def Motion(L0,Phi0):
u0 = np.array([L0,Phi0],np.float64)
data = np.array([5.0])
usol, success = lsoda(funcptr, u0, time_array, data)
L_Array = usol[:,0]
Phi_Array = usol[:,1]
return L_Array[::Skip_How_Many], Phi_Array[::Skip_How_Many]
#Skip_How_Many is used to take up less RAM space since we don't need that kind of percsion in our data
Location = Motion(5,0)
x = Location[0]*np.cos(Location[1])
y = Location[0]*np.sin(Location[1])
plt.plot(x,y,"o", markersize = 0.5)
ts = time()
Motion(5,0)
print('Solo Time:', time() - ts)

Python: constrained optimization in python - fastest/efficient way?

I have a simple optimization problem in python which I need to re-run quite often (more than 10,000 times).
Most of the calculation can be done with numpy and the n-dimensional arrays quite efficient, however, when it comes to the optimization I am lost, since I have to switch to scipy.optimize.minimize.
Is there a method to run the optimization at once?
Currently I am looping through each line - see code below:
#%% imports
import numpy as np
from scipy import optimize
import time
#%% functions
def square(x, a, cov):
return (x-a).dot(cov).dot(x-a)
def minimization(args):
f, x, a, cov, beta, bnds = args
con_beta = {'type': 'eq', 'fun': lambda x, beta: np.sum(x*beta) - 1, 'jac': lambda x, beta: beta, 'args': (beta, ) }
res = optimize.minimize(f, x, bounds=bnds, method='SLSQP', args=(a, cov), constraints = con_beta)
return res.x
#%% initialize data
numberOfRuns = 260 * 100
numberOfAssets = 4
corr = np.ones((numberOfAssets,numberOfAssets))*0.8 + np.eye(numberOfAssets)*0.2
cov = (np.eye(numberOfAssets)*0.15).dot(corr).dot(np.eye(numberOfAssets)*0.15)
mu = np.zeros(numberOfAssets)
cov_n = np.zeros((numberOfRuns, numberOfAssets, numberOfAssets))
bm_n = np.zeros((numberOfRuns, numberOfAssets))
guess_n = np.ones((numberOfRuns, numberOfAssets))/(numberOfAssets-1)
bm_n[:,0] = 1
guess_n[:,0] = 0
bnds = [(0, None) for _ in range(numberOfAssets)]
bnds[0] = (0,0)
for i in range(numberOfRuns):
cov_n[i,:,:] = np.cov(np.random.multivariate_normal(mu, cov/260, 260).T)*260
beta_n = cov_n[:,0,:]/cov_n[:,0,0][:,np.newaxis]
#%% Run 1
tic = time.time()
for i in range(numberOfRuns):
res = minimization((square, guess_n[i,:], bm_n[i,:], cov_n[i,:,:], beta_n[i,:], bnds))
toc= time.time()
tictoc1 = toc - tic
print(tictoc1) #21.678 seconds
#%% Run 2
tic = time.time()
args = [(square, guess_n[i,:], bm_n[i,:], cov_n[i,:,:], bnds) for i in range(numberOfRuns)]
p = Pool(4)
res = p.map(minimization, args)
toc = time.time()
tictoc2 = toc - tic
print(toc - tic) #~11 seconds
#%% The End
Is there a more elegant and efficient way instead of looping through??
PS: I have tried to use "Pool", however this "just" halfes the time - still the same problem with the loop.

How to increase speed while maintaining memory with numpy arrays?

I need to write a code to do a one-sample t-test given the sample mean (E(X)) and sample second raw moment (E(X^2)) for each entry in a 2-dimensional array.
There are two ways I am doing this but both of them are not quite working.
With numpy vetorized operations - out of memory error for certain sizes of the array.
def calc_normal_pvals(vt_sum_counter, vt_ssum_counter):
global nsubs
vt_sum_counter = vt_sum_counter/nsubs
vt_ssum_counter = vt_ssum_counter/nsubs
sample_var = nsubs * (vt_ssum_counter - np.square(vt_sum_counter))/(nsubs - 1)
t_array = np.divide(vt_sum_counter, (np.sqrt(sample_var/nsubs)))
pvals = t.sf(t_array, nsubs-1)
pvals[np.isnan(pvals)] = 0
return pvals
Normal for loop method - takes a lot of time in comparison
def calc_normal_pvals(vt_sum_counter, vt_ssum_counter, tail=1):
global nsubs
V, T = vt_sum_counter.shape
pvals = np.zeros((V, T))
for i in range(V):
for j in range(T):
sigma = ((vt_ssum_counter[i, j]/nsubs -(vt_sum_counter[i,j]/nsubs)**2)/(nsubs - 1))**0.5
if (sigma != 0):
pvals[i, j] = t.sf(vt_sum_counter[i, j]/(nsubs*sigma), nsubs-1)
return pvals
The input arrays are huge - typically of size ~ 900000 X 400.

Python - Fastest / Best way to apply a function to each element of a numpy.array

I am wondering what is the fastest (or "best" due to some reason) method to apply a function to each element of a numpy array. I tried this method with a larger data set and it takes quite a while... Post your answer with the results (time in milliseconds) you got on my implementation and yours,as different HW will give different results on the same code
Please share your implementation between the 2 commented lines
import numpy as np
import time
# Some random data
x = np.random.rand(5,32,32,3)*255
x = x.astype(int)
# Defining some function
def normalize(x, a=0, b=1, x_min=0, x_max=255):
return a + (x - x_min)*(b - a)/(x_max-x_min)
## Start timer
start_time = time.time()
# ---------------------IMPLEMENTATION---------------------
# Apply Normalize function to each element in the array
n = np.vectorize(normalize)
x = n(x)
#_________________________________________________________
# Stop timer and show time in milliseconds
elapsed_time = time.time() - start_time
print("Time [ms] = " + str(elapsed_time*1000))

As pointed out by #sascha , I just need to apply the function to the whole array:
import numpy as np
import time
# Some random data
x = np.random.rand(5,32,32,3)*255
x = x.astype(int)
# Defining some function
def normalize(x, a=0, b=1, x_min=0, x_max=255):
return a + (x - x_min)*(b - a)/(x_max-x_min)
## Start timer
start_time = time.time()
# ---------------------IMPLEMENTATION---------------------
# Apply Normalize function to each element in the array
x = normalize(x)
#_________________________________________________________
# Stop timer and show time in milliseconds
elapsed_time = time.time() - start_time
print("Time [ms] = " + str(elapsed_time*1000))

for loop in python is 10x slower than matlab

I run python 2.7 and matlab R2010a on the same machine, doing nothing, and it gives me 10x different in speed
I looked online, and heard it should be the same order.
Python will further slow down as if statement and math operator in the for loop
My question: is this the reality? or there is some other way let them in the same speed order?
Here is python code
import time
start_time = time.time()
for r in xrange(1000):
for c in xrange(1000):
continue
elapsed_time = time.time() - start_time
print 'time cost = ',elapsed_time
Output: time cost = 0.0377440452576
Here is matlab code
tic
for i = 1:1000
for j = 1:1000
end
end
toc
Output: Escaped time is 0.004200 seconds

The reason this is happening is related to the JIT compiler, which is optimizing the MATLAB for loop. You can disable/enable the JIT accelerator using feature accel off and feature accel on. When you disable the accelerator, the times change dramatically.
MATLAB with accel on: Elapsed time is 0.009407 seconds.
MATLAB with accel off: Elapsed time is 0.287955 seconds.
python: time cost = 0.0511920452118
Thus the JIT accelerator is directly causing the speedup that you are noticing. There is another thing that you should consider, which is related to the way that you defined the iteration indices. In both cases, MATLAB and python, you used Iterators to define your loops. In MATLAB you create the actual values by adding the square brackets ([]), and in python you use range instead of xrange. When you make these changes
% MATLAB
for i = [1:1000]
for j = [1:1000]
# python
for r in range(1000):
for c in range(1000):
The times become
MATLAB with accel on: Elapsed time is 0.338701 seconds.
MATLAB with accel off: Elapsed time is 0.289220 seconds.
python: time cost = 0.0606048107147
One final consideration is if you were to add a quick computation to the loop. ie t=t+1. Then the times become
MATLAB with accel on: Elapsed time is 1.340830 seconds.
MATLAB with accel off: Elapsed time is 0.905956 seconds. (Yes off was faster)
python: time cost = 0.147221088409
I think that the moral here is that the computation speeds of for loops, out-of-the box, are comparable for extremely simple loops, depending on the situation. However, there are other, numerical tools in python which can speed things up significantly, numpy and PyPy have been brought up so far.

The basic Python implementation, CPython, is not meant to be super-speedy. If you need efficient matlab-style numerical manipulation, use the numpy package or an implementation of Python that is designed for fast work, such as PyPy or even Cython. (Writing a Python extension in C, which will of course be pretty fast, is also a possible solution, but in that case you may as well just use numpy and save yourself the effort.)

If Python execution performance is really crucial for you, you might take a look at PyPy
I did your test:
import time
for a in range(10):
start_time = time.time()
for r in xrange(1000):
for c in xrange(1000):
continue
elapsed_time = time.time()-start_time
print elapsed_time
with standard Python 2.7.3, I get:
0.0311839580536
0.0310959815979
0.0309510231018
0.0306520462036
0.0302460193634
0.0324130058289
0.0308878421783
0.0307397842407
0.0304911136627
0.0307500362396
whereas, using PyPy 1.9.0 (which corresponds to Python 2.7.2), I get:
0.00921821594238
0.0115230083466
0.00851202011108
0.00808095932007
0.00496387481689
0.00499391555786
0.00508499145508
0.00618195533752
0.005126953125
0.00482988357544
The acceleration of PyPy is really stunning and really becomes visible when its JIT compiler optimizations outweigh their cost. That's also why I introduced the extra for loop. For this example, absolutely no modification of the code was needed.

This is just my opinion, but I think the process is a bit more complex. Basically Matlab is an optimized layer of C, so with the appropriate initialization of matrices and minimization of function calls (avoid "." objects-like operators in Matlab) you obtain extremely different results. Consider the simple following example of wave generator with cosine function. Matlab time = 0.15 secs in practical debug session, Python time = 25 secs in practical debug session (Spyder), thus Python becomes 166x slower. Run directly by Python 3.7.4. machine the time is = 5 secs aprox, so still be a non negligible 33x.
MATLAB:
AW(1,:) = [800 , 0 ]; % [amp frec]
AW(2,:) = [300 , 4E-07];
AW(3,:) = [200 , 1E-06];
AW(4,:) = [ 50 , 4E-06];
AW(5,:) = [ 30 , 9E-06];
AW(6,:) = [ 20 , 3E-05];
AW(7,:) = [ 10 , 4E-05];
AW(8,:) = [ 9 , 5E-04];
AW(9,:) = [ 7 , 7E-04];
AW(10,:)= [ 5 , 8E-03];
phas = 0
tini = -2*365 *86400; % 2 years backwards in seconds
dt = 200; % step, 200 seconds
tfin = 0; % present
vec_t = ( tini: dt: tfin)'; % vector_time
nt = length(vec_t);
vec_t = vec_t - phas;
wave = zeros(nt,1);
for it = 1:nt
suma = 0;
t = vec_t(it,1);
for iW = 1:size(AW,1)
suma = suma + AW(iW,1)*cos(AW(iW,2)*t);
end
wave(it,1) = suma;
end
PYTHON:
import numpy as np
AW = np.zeros((10,2))
AW[0,:] = [800 , 0.0]
AW[1,:] = [300 , 4E-07]; # [amp frec]
AW[2,:] = [200 , 1E-06];
AW[3,:] = [ 50 , 4E-06];
AW[4,:] = [ 30 , 9E-06];
AW[5,:] = [ 20 , 3E-05];
AW[6,:] = [ 10 , 4E-05];
AW[7,:] = [ 9 , 5E-04];
AW[8,:] = [ 7 , 7E-04];
AW[9,:] = [ 5 , 8E-03];
phas = 0
tini = -2*365 *86400 # 2 years backwards
dt = 200
tfin = 0 # present
nt = round((tfin-tini)/dt) + 1
vec_t = np.linspace(tini,tfin1,nt) - phas
wave = np.zeros((nt))
for it in range(nt):
suma = 0
t = vec_t[fil]
for iW in range(np.size(AW,0)):
suma = suma + AW[iW,0]*np.cos(AW[iW,1]*t)
#endfor iW
wave[it] = suma
#endfor it
To deal such aspects in Python I would suggest to compile into executable directly to binary the numerical parts that may compromise the project (or for example C or Fortran into executable and be called by Python afterwards). Of course, other suggestions are appreciated.

I tested a FIR filter with MATLAB and same (adapted) code in Python, including a frequency sweep. The FIR filter is pretty huge, N = 100 order, I post below the two codes, but leave you here the timing results:
MATLAB: Elapsed time is 11.149704 seconds.
PYTHON: time cost = 247.8841781616211 seconds.
PYTHON IS 25 TIMES SLOWER !!!
MATLAB CODE (main):
f1 = 4000; % bandpass frequency (response = 1).
f2 = 4200; % bandreject frequency (response = 0).
N = 100; % FIR filter order.
k = 0:2*N;
fs = 44100; Ts = 1/fs; % Sampling freq. and time.
% FIR Filter numerator coefficients:
Nz = Ts*(f1+f2)*sinc((f2-f1)*Ts*(k-N)).*sinc((f2+f1)*Ts*(k-N));
f = 0:fs/2;
w = 2*pi*f;
z = exp(-i*w*Ts);
% Calculation of the expected response:
Hz = polyval(Nz,z).*z.^(-2*N);
figure(1)
plot(f,abs(Hz))
title('Gráfica Respuesta Filtro FIR (Filter Expected Response)')
xlabel('frecuencia f (Hz)')
ylabel('|H(f)|')
xlim([0, 5000])
grid on
% Sweep Frequency Test:
tic
% Start and Stop frequencies of sweep, t = tmax = 50 seconds = 5000 Hz frequency:
fmin = 1; fmax = 5000; tmax = 50;
t = 0:Ts:tmax;
phase = 2*pi*fmin*t + 2*pi*((fmax-fmin).*t.^2)/(2*tmax);
x = cos(phase);
y = filtro2(Nz, 1, x); % custom filter function, not using "filter" library here.
figure(2)
plot(t,y)
title('Gráfica Barrido en Frecuencia Filtro FIR (Freq. Sweep)')
xlabel('Tiempo Barrido: t = 10 seg = 1000 Hz')
ylabel('y(t)')
xlim([0, 50])
grid on
toc
MATLAB CUSTOM FILTER FUNCTION
function y = filtro2(Nz, Dz, x)
Nn = length(Nz);
Nd = length(Dz);
N = length(x);
Nm = max(Nn,Nd);
x1 = [zeros(Nm-1,1) ; x'];
y1 = zeros(Nm-1,1);
for n = Nm:N+Nm-1
y1(n) = Nz(Nn:-1:1)*x1(n-Nn+1:n)/Dz(1);
if Nd > 1
y1(n) = y1(n) - Dz(Nd:-1:2)*y1(n-Nd+1:n-1)/Dz(1);
end
end
y = y1(Nm:Nm+N-1);
end
PYTHON CODE (main):
import numpy as np
from matplotlib import pyplot as plt
import FiltroDigital as fd
import time
j = np.array([1j])
pi = np.pi
f1, f2 = 4000, 4200
N = 100
k = np.array(range(0,2*N+1),dtype='int')
fs = 44100; Ts = 1/fs;
Nz = Ts*(f1+f2)*np.sinc((f2-f1)*Ts*(k-N))*np.sinc((f2+f1)*Ts*(k-N));
f = np.arange(0, fs/2, 1)
w = 2*pi*f
z = np.exp(-j*w*Ts)
Hz = np.polyval(Nz,z)*z**(-2*N)
plt.figure(1)
plt.plot(f,abs(Hz))
plt.title("Gráfica Respuesta Filtro FIR")
plt.xlabel("frecuencia f (Hz)")
plt.ylabel("|H(f)|")
plt.xlim(0, 5000)
plt.grid()
plt.show()
start_time = time.time()
fmin = 1; fmax = 5000; tmax = 50;
t = np.arange(0, tmax, Ts)
fase = 2*pi*fmin*t + 2*pi*((fmax-fmin)*t**2)/(2*tmax)
x = np.cos(fase)
y = fd.filtro(Nz, [1], x)
plt.figure(2)
plt.plot(t,y)
plt.title("Gráfica Barrido en Frecuencia Filtro FIR")
plt.xlabel("Tiempo Barrido: t = 10 seg = 1000 Hz")
plt.ylabel("y(t)")
plt.xlim(0, 50)
plt.grid()
plt.show()
elapsed_time = time.time() - start_time
print('time cost = ', elapsed_time)
PYTHON CUSTOM FILTER FUNCTION
import numpy as np
def filtro(Nz, Dz, x):
Nn = len(Nz);
Nd = len(Dz);
Nz = np.array(Nz,dtype=float)
Dz = np.array(Dz,dtype=float)
x = np.array(x,dtype=float)
N = len(x);
Nm = max(Nn,Nd);
x1 = np.insert(x, 0, np.zeros((Nm-1,), dtype=float))
y1 = np.zeros((N+Nm-1,), dtype=float)
for n in range(Nm-1,N+Nm-1) :
y1[n] = sum(Nz*np.flip( x1[n-Nn+1:n+1]))/Dz[0] # = y1FIR[n]
if Nd > 1:
y1[n] = y1[n] - sum(Dz[1:]*np.flip( y1[n-Nd+1:n]))/Dz[0]
print(y1[n])
y = y1[Nm-1:]
return y

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.