Speeding up this code snippet - python

I am rewriting one code from Matlab to Python.
Matlab code looks like this:
validIntSize = length(prices1)-125; %valid interval size
interval720s = zeros(validIntSize,120+1);
interval360s = zeros(validIntSize,60+1);
interval180s = zeros(validIntSize,30+1);
for i = 1:intJump:validIntSize
interval180s(i,:) = [prices1(i:i+29),priceDiff(i+29)];
interval360s(i,:) = [prices1(i:i+59),priceDiff(i+59)];
interval720s(i,:) = [prices1(i:i+119),priceDiff(i+119)];
end
It populates the rows of this huge matrix in a nutshell. I do this through append in Python and it takes way too long:
intervals = []
for mean in self.means: intervals.append([])
for i in range(stop_point):
for idx, mean in enumerate(self.means):
intervals[idx].append(list(prices1[i:(i+mean)])+[price_diff[i+mean]])
I suspect that this is due to the append method that I am using?
P.S. it takes about 2 seconds in Matlab and about 2 minutes in Python.
P.P.S changing my code to list comprehension as suggested in the comments:
for idx, k_mean in enumerate(self.k_means):
intervals[idx] = [list(prices1[i:(i+k_mean)])+[price_diff[i+k_mean]] for i in range(stop_point)]
does not speed up the code. Takes virtually the same amount of time.

Related

Saving continuously generated simulation data with Python3

So my question is how I should save a large amount of simulation data to a file using Python (or update new data rows to the existing file).
Lets say I have NN=1000 particles, and I want to save the position and velocity data of each particle (x y z, vx vy vz). The data is in format [x1,y1,z1,vx1,vy1,vz1, x2,y2,z2,vx2,vy2,vz2, ...] and so on.
Simulation is working well, but I believe the methods I use for saving and keeping these information saved is not really optimal for me.
Pseudo code similar to my code
T_max = 1000 # for example
dt = 0.1 # time step
T = 0 # current time
iterations = int(T_max/dt) # number of iterations we are doing
NN = 1000 # Number of particles
ZZ = np.zeros( (iterations, 2+NN*6 ) ) # Here I generate whole data matrix at the beginning.
# ^ might not be the best idea as the system needs to keep everything in memory for the whole time
# So I guess saving could be done in chunks?
ZZ[0][0], ZZ[0][1] = T , dt
# ZZ[0][2:] = initialize_system(NN=NN) # so lets initialize the system.
# However, for this post I do this differently due to simplicity. See below
ZZ[0][2:] = np.random.uniform(-100,100,NN*6)
i = 0
while i < iteration:
T += dt
Z[i+1][0], Z[i+1][1] = T, dt
#Z[i+1][2:] = rk4(EOM_function, posvel=Z[i][2:])
# ^ Using this I would calculate new positions based on previous ones.
Z[i+1][2:] = np.random.uniform(-100,100,NN*6) #This is just for example here.
i += 1
# Now the simulation data is basically done, so one would need to save
# This one feels slow, as it takes 181s to save and is size of 1046246KB
np.savetxt('test1.txt', ZZ)
#other method with a bit less accuracy as I don't need to have all decimals saved
np.savetxt('test2.txt', ZZ, fmt='%1.6f') # Takes 125s and size is 426698KB
# Both of the above are kinda slow so I also tried to save to npy format
np.save('test.npy', ZZ) # It took 8.9s and size 164118KB
so this np.save() method seems to be fast, but I read somewhere that I can not append data to it. So this would not work if I keep saving the data in parts while calculating new positions.
So back to my question. How should/could I save the data efficiently (fast and memory friendly). I keep having some memory issues when NN and T_max gets larger because with this method I keep this whole ZZ all the time in memory.
So I guess I should calculate ZZ in parts, i.e. iterations/10 parts but then I should append this data to an existing file, and tests I have made felt slow. Any suggestions?
EDIT: feel free to ask more specifying questions as I feel like I forgot to explain something.
That highly depends on what you intend to use the output for. If it's stored for further calculations, .npy or some other binary format is always the way to go as it is faster, takes less space, and doesn't lose precision between loads and saves, instead of serializing it into a human readable format. If you need it to be readable, you might as well just output row by row to a csv file or something.
If you want to do it with binary, h5py allows you to extend a dataset after saving and append more stuff to it.
import numpy as np
import h5py
T_max = 10**4 # for example
dt = 0.1 # time step
T = 0 # current time
iterations = int(T_max/dt) # number of iterations we are doing
NN = 1000 # Number of particles
chunk_size = 10**3
ZZ = np.zeros( (chunk_size, 2+NN*6 ) )
ZZ[0][0], ZZ[0][1] = T , dt
# ZZ[0][2:] = initialize_system(NN=NN) # so lets initialize the system.
# However, for this post I do this differently due to simplicity. See below
ZZ[0][2:] = np.random.uniform(-100,100,NN*6)
with h5py.File("test.h5", "a") as f:
dset = f.create_dataset('ZZ', (0,2+NN*6), maxshape=(None,2+NN*6), dtype='float64', chunks=(chunk_size,2+NN+6))
for chunk in range(0, iterations, chunk_size):
for i in range(0, chunk_size - 1):
T += dt
ZZ[i + 1][0], ZZ[i + 1][1] = T, dt
#Z[i+1][2:] = rk4(EOM_function, posvel=Z[i][2:])
# ^ Using this I would calculate new positions based on previous ones.
ZZ[i + 1][2:] = np.random.uniform(-100,100,NN*6) #This is just for example here.
# Expand the file here to allow for more data.
dset.resize(dset.shape[0] + chunk_size, axis=0)
dset[chunk: chunk + chunk_size ] = ZZ
# update and initialize next chunk. the next chunk's first row should be the last row of the previous chunk + iteration
T += dt
ZZ[0][0], ZZ[0][1] = T, dt
#Z[0][2:] = rk4(EOM_function, posvel=Z[-1][2:])
# ^ Using this I would calculate new positions based on previous ones.
ZZ[0][2:] = np.random.uniform(-100,100,NN*6) #This is just for example here.
print(dset.shape)
This takes 70 seconds on the save step on my computer, generating a 45GB file, for a dataset that is 100 times your original code.
The above code is more general in case you are streaming your data and don't know your final size. If you know it from the start, you can replace the initial create_dataset with
dset = f.create_dataset('ZZ', (iterations,2+NN*6), dtype='float64')
and remove the dset.resize(dset.shape[0] + chunk_size, axis=0)
You'll probably also want to read it back in chunks afterwards for other processing, in which case you can follow the docs here: https://docs.h5py.org/en/latest/high/dataset.html#reading-writing-data
Okay so I'm continuing my question / providing possible answer to it based on the answer of EricChen1248. EDIT: Answer provided by EricChen1248 works now and is way better than this my code part. See his code
I do not yet still understand completely how this f.create_dataset () truly works (i.e. when does it write data to file in the loop etc).
Using the code provided by Eric, it created and saved the data files fastly, but when I read the file as follows
hf = h5py.File('temp/test.h5', 'r')
ZZ = np.array(hf['ZZ'])
hf.close()
and plotted the first column (time T column, which should increase by timestep dt after each iteration) I get the following figure
plt.plot(ZZ[:,0])
time T column plotted
and as can be seen, it grows to a time of 100, and then goes to zero. This happens after the first 'chunk_size' has been passed. I started to read docs provided by Eric, and using his code as reference I managed to write something like this
import numpy as np
import h5py
T_max = 10**4
dt = 0.1
T = 0
NN = 1000
iterations = int(T_max/dt)
chunk_size = 10**3
with h5py.File('temp/data12.h5', 'a') as hf:
dset = hf.create_dataset("ZZ", (chunk_size, 2+NN*6),maxshape=(None,2+NN*6) ,chunks=(chunk_size, 2+NN*6), dtype='f8' )
# ^ first I create data set equals to one chunk_size
# Here I initialize the system. Columns ; 0=T , 1=dt, 2=arbitrary data point, 3=sin(column2)
# all the rest columns are random numbers just to fill some numbers in
dset[0,0], dset[0,1] = T, dt
#dset[0,2:] = np.random.uniform(0,1,NN*6)
dset[0,2] = 1
dset[0,3] = np.sin(dset[0,2])
dset[0,4:] = np.random.uniform(0,1,NN*6 -2)
print('starts')
# Main difference down there is that I use dataset (dset)
# as a data matrix to be filled instead of matrix ZZ as in my question.
i = 0
#for j, s in enumerate(dset.iter_chunks()):
for j, s in enumerate(range(0, iterations, chunk_size )):
print(j, s)
while i < iterations and i < chunk_size*(j+1) -1:
#for i in range(chunk_size*j, chunk_size*(j+1)-1):
T += dt
dset[i+1,0], dset[i+1,1] = T, dt
#dset[i+1,2:] = np.sin(dset[i,2:]+dt)
dset[i+1,2] = dset[i,2] + dt
dset[i+1,3] = np.sin(dset[i,2]+dt)
dset[i+1,4:] = dset[i,4:] + np.random.uniform(-1,1,NN*6-2)
i+=1
print(dset.shape)
dset.resize(dset.shape[0] + chunk_size, axis=0)
This code runs in 1min 50s , and saves a file of size 4.47GB so I am happy with the speed, and what I'm really happy is that it do not use so much memory while iterating (I used to get into problem with huge RAM usage).
When I read the data file provided by my code (similarly as above) I get following image for time Time T column plotted, my code version and it grows nicely to T=10e4 as should be. It still generated one more chunk_size block to the end of dataset which is full of zeros. That I need to get rid of. One more proof that the code works and saves data without weird problems is this sinusoidal plot plt.plot(ZZ[500:1500,0] , ZZ[500:1500,3]). Sinusoidal image proof Note that the plot is limited for T ~ [50,150] so one could still see something there (if plotted the whole thing, one could not see lines well).
I believe this is not the best way to write this code, but it is the way I got this working. So if someone sees improvements, please let me know. Also, I am curious to know why the code provided by Eric did not work, at least for me.
EDIT : fixed typos

How to plot a "heatmap" of thousands of timeseries in python?

I am looking for a way to visualize, for the lack of a better word, the "density" or "heatmap" of some synthetic time series I have created.
I have a loop that creates a list, which are values of one time series. I don't think it matters but just in case, here is the code of what's going on. This is a Markov Process, so with each i, which represents the hour, i create a new value, depending on the former i and state:
for x in range(10000):
start_h = 0
start_s = 1
generated_values_list = []
for i in range(start_h,120):
if i>=24:
i=i%24
print(str(start_s)+" | " +str(i))
pot_value_list = GMM_vals_container_workingdays_spring["State: "+ str(start_s)+", hour: "+str(i)]
if len(pot_value_list)>50:
actual_value = random.choice(pot_value_list)#
#cdf, gmm_x, gmm = GMM_erstellen(pot_value_list,50)
#actual_value = gmm.sample()[0][0][0]
#print("made by GMM")
else:
actual_value = random.choice(pot_value_list)
#print("made not by GMM")
generated_values_list.append(actual_value)
probabilities_next_state = TPMs_WD[i][start_s-1]
next_state = random.choices(states,weights=probabilities_next_state)
start_s = next_state[0]
plt.plot(generated_values_list)
But - I think - the only part that matters is this:
for x in range(10000):
#some code that creates the generated_values_list
plt.plot(generated_values_list)
This creates, as expected a picture like this:
It is not clear from here which are the most common paths so I would like to make values that are hit frequently are more colorful while not so frequent values are rather grey.
I think seaborn library has something for that but I don't seem to understand the docs.

Python - Big For Loop

I'm computing a very big for cycle and i'll try to explain how does it works. There are 4320 matrices (40x80 each) that have been taken from a matlab file.
This loop takes a matrix per time: it assign to each value the right value of H and T. Once finished, it pass to the next matrix and so on.
The dataframe created is then written on a csv file needed for the creation of a database for the wave energy converters productivity.
The problem is that this code is running since 9 days and it is at half on the total computations..Is there any way to drastically reduce the computational time?
indice_4 = 0
configuration_id=-1
n_configurations=4320
for z in range(0,n_configurations,1): #iteration on all the configurations
print(z)
power_matrix=P_mat[z]
energy_wave_period_converted = pd.DataFrame([],columns=['energy_wave_period'])
H_start=0.25
H_end=10
H_step=0.25
T_start=3
T_end=17
T_step=0.177
y=T_start
relative_direction = int(direc[z])
if relative_direction==0:
configuration_id = configuration_id + 1
print(configuration_id)
r=0 #r=row
c=0 #c=column
while y <= T_end:
energy_wave_period= float('%.2f'%y)
x=H_start #initialize on the right wave haights
r=0
while x <= H_end:
significant_wave_height= float('%.2f'%x)
average_power=float('%.2f'%power_matrix[r,c])
new_line_4 = pd.Series([indice_4 , configuration_id, significant_wave_height , energy_wave_period ,relative_direction ,average_power] , index =['id','configuration_id','significant_wave_height','energy_wave_period','relative_direction','average_output_power'])
seastate_productivity = seastate_productivity.append([new_line_4], ignore_index=True)
indice_4= indice_4 + 1
r=r+1
x=x+H_step
c=c+1
y = y + T_step
seastate_productivity.to_csv('seastate_productivity.csv',index=False,sep=';')
'
One of the main things slowing your code down is that you do pandas operations in an iteration. Specifically using pd.Series and pd.DataFrame.append in the loop (which runs for over 12 million times) really slows you down. When using pandas you should really aim to vectorize your operations (meaning performing operations in batch). When I tried your original code every iteration took about 4 seconds, but the time increased gradually. When removing the pd.append every iteration only took 0.5 seconds, and when removing the pd.Series it dropped even more.
I did some improvements by saving the data in lists and later to a dataframe in one go, which took about 2 minutes to run till completion on my laptop:
import time
import numpy as np
import pandas as pd
# Generate random data for testing
P_mat = np.random.rand(4320,40,80)
direc=np.random.rand(4320)
H_start=0.25
H_end=10
H_step=0.25
T_start=3
T_end=17
T_step=0.177
indice_4 = 0
configuration_id=-1
n_configurations=4320
data = []
# Time it
t0 = time.perf_counter()
for z in range(n_configurations):
power_matrix=P_mat[z]
print(z)
y=T_start
relative_direction = int(direc[z])
if relative_direction==0:
configuration_id = configuration_id + 1
r=0 #r=row
c=0 #c=column
while y <= T_end:
energy_wave_period= float('%.2f'%y)
x=H_start #initialize on the right wave haights
r=0
while x <= H_end:
significant_wave_height= float('%.2f'%x)
average_power=float('%.2f'%power_matrix[r,c])
# Save data to list
new_line_4 = [indice_4 , configuration_id, significant_wave_height , energy_wave_period ,relative_direction ,average_power]
data.append(new_line_4) # Append to create a list of lists
indice_4= indice_4 + 1
r=r+1
x=x+H_step
c=c+1
y = y + T_step
# Make dataframe from list of lists
seastate_productivity = pd.DataFrame.from_records(data,columns =['id','configuration_id','significant_wave_height','energy_wave_period','relative_direction','average_output_power'])
# Save data
seastate_productivity.to_csv('seastate_productivity.csv',index=False,sep=';')
# Print time it took
print("Done in:",time.perf_counter()-t0)
You could probably still optimize this solution, by moving the rounding from the loop to outside, by rounding the pandas columns. Also, since you are only moving data around, there is probably also a completely vectorized solution (without a loop) but this is probably sufficient for you.
A way to find out what the issue is with slow code is by timing portions of code. You can use the timeit module, or the time module like I used. You can then isolate lines of code, and run them and analyse the performance.
You should consider using numpy. Using numpy's matrix operations you should be able to reduce computation time.
I suggest you to dig also into concurrent.futures.
It specifically enables to run parallel tasks and reduce run time.
You need to convert your code into a function and then call it into the async func, each element at a time.
The concurrent.futures module provides a high-level interface for asynchronously executing callables.
The asynchronous execution can be performed with threads, using ThreadPoolExecutor, or separate processes, using ProcessPoolExecutor.
https://docs.python.org/3/library/concurrent.futures.html
this is a scolastic example
import concurrent.futures
nums = range(10)
def f(x):
return x * x
def main():
print([val for val in map(f, nums)])
with concurrent.futures.ProcessPoolExecutor() as executor:
print([val for val in executor.map(f, nums)])
if __name__ == '__main__':
main()

How can I increase the amount of array iterated during the run-time of script?

My script cleans arrays from the unwanted string like "##$!" and other stuff.
The script works as intended but the speed of it is extremely slow when the excel row size is big.
I tried to use numpy if it could speed it up but I'm not too familiar with is so I might be using it incorrectly.
xls = pd.ExcelFile(path)
df = xls.parse("Sheet2")
TeleNum = np.array(df['telephone'].values)
def replace(orignstr): # removes the unwanted string from numbers
for elem in badstr:
if elem in orignstr:
orignstr = orignstr.replace(elem, '')
return orignstr
for UncleanNum in tqdm(TeleNum):
newnum = replace(str(UncleanNum)) # calling replace function
df['telephone'] = df['telephone'].replace(UncleanNum, newnum) # store string back in data frame
I also tried removing the method to if that would help and just place it as one block of code but the speed remained the same.
for UncleanNum in tqdm(TeleNum):
orignstr = str(UncleanNum)
for elem in badstr:
if elem in orignstr:
orignstr = orignstr.replace(elem, '')
print(orignstr)
df['telephone'] = df['telephone'].replace(UncleanNum, orignstr)
TeleNum = np.array(df['telephone'].values)
The current speed of the script running an excel file of 200,000 is around 70it/s and take around an hour to finish. Which is not that good since this is just one function of many.
I'm not too advanced in python. I'm just learning as I script so if you have any pointer it would be appreciated.
Edit:
Most of the array elements Im dealing with are numbers but some have string in them. I trying to remove all string in the array element.
Ex.
FD3459002912
*345*9002912$
If you are trying to clear everything that isn't a digit from the strings you can directly use re.sub like this:
import re
string = "FD3459002912"
regex_result = re.sub("\D", "", string)
print(regex_result) # 3459002912

Numpy/Python performing terribly vs. Matlab

Novice programmer here. I'm writing a program that analyzes the relative spatial locations of points (cells). The program gets boundaries and cell type off an array with the x coordinate in column 1, y coordinate in column 2, and cell type in column 3. It then checks each cell for cell type and appropriate distance from the bounds. If it passes, it then calculates its distance from each other cell in the array and if the distance is within a specified analysis range it adds it to an output array at that distance.
My cell marking program is in wxpython so I was hoping to develop this program in python as well and eventually stick it into the GUI. Unfortunately right now python takes ~20 seconds to run the core loop on my machine while MATLAB can do ~15 loops/second. Since I'm planning on doing 1000 loops (with a randomized comparison condition) on ~30 cases times several exploratory analysis types this is not a trivial difference.
I tried running a profiler and array calls are 1/4 of the time, almost all of the rest is unspecified loop time.
Here is the python code for the main loop:
for basecell in range (0, cellnumber-1):
if firstcelltype == np.array((cellrecord[basecell,2])):
xloc=np.array((cellrecord[basecell,0]))
yloc=np.array((cellrecord[basecell,1]))
xedgedist=(xbound-xloc)
yedgedist=(ybound-yloc)
if xloc>excludedist and xedgedist>excludedist and yloc>excludedist and yedgedist>excludedist:
for comparecell in range (0, cellnumber-1):
if secondcelltype==np.array((cellrecord[comparecell,2])):
xcomploc=np.array((cellrecord[comparecell,0]))
ycomploc=np.array((cellrecord[comparecell,1]))
dist=math.sqrt((xcomploc-xloc)**2+(ycomploc-yloc)**2)
dist=round(dist)
if dist>=1 and dist<=analysisdist:
arraytarget=round(dist*analysisdist/intervalnumber)
addone=np.array((spatialraw[arraytarget-1]))
addone=addone+1
targetcell=arraytarget-1
np.put(spatialraw,[targetcell,targetcell],addone)
Here is the matlab code for the main loop:
for basecell = 1:cellnumber;
if firstcelltype==cellrecord(basecell,3);
xloc=cellrecord(basecell,1);
yloc=cellrecord(basecell,2);
xedgedist=(xbound-xloc);
yedgedist=(ybound-yloc);
if (xloc>excludedist) && (yloc>excludedist) && (xedgedist>excludedist) && (yedgedist>excludedist);
for comparecell = 1:cellnumber;
if secondcelltype==cellrecord(comparecell,3);
xcomploc=cellrecord(comparecell,1);
ycomploc=cellrecord(comparecell,2);
dist=sqrt((xcomploc-xloc)^2+(ycomploc-yloc)^2);
if (dist>=1) && (dist<=100.4999);
arraytarget=round(dist*analysisdist/intervalnumber);
spatialsum(1,arraytarget)=spatialsum(1,arraytarget)+1;
end
end
end
end
end
end
Thanks!
Here are some ways to speed up your python code.
First: Don't make np arrays when you are only storing one value. You do this many times over in your code. For instance,
if firstcelltype == np.array((cellrecord[basecell,2])):
can just be
if firstcelltype == cellrecord[basecell,2]:
I'll show you why with some timeit statements:
>>> timeit.Timer('x = 111.1').timeit()
0.045882196294822819
>>> t=timeit.Timer('x = np.array(111.1)','import numpy as np').timeit()
0.55774970267830071
That's an order of magnitude in difference between those calls.
Second: The following code:
arraytarget=round(dist*analysisdist/intervalnumber)
addone=np.array((spatialraw[arraytarget-1]))
addone=addone+1
targetcell=arraytarget-1
np.put(spatialraw,[targetcell,targetcell],addone)
can be replaced with
arraytarget=round(dist*analysisdist/intervalnumber)-1
spatialraw[arraytarget] += 1
Third: You can get rid of the sqrt as Philip mentioned by squaring analysisdist beforehand. However, since you use analysisdist to get arraytarget, you might want to create a separate variable, analysisdist2 that is the square of analysisdist and use that for your comparison.
Fourth: You are looking for cells that match secondcelltype every time you get to that point rather than finding those one time and using the list over and over again. You could define an array:
comparecells = np.where(cellrecord[:,2]==secondcelltype)[0]
and then replace
for comparecell in range (0, cellnumber-1):
if secondcelltype==np.array((cellrecord[comparecell,2])):
with
for comparecell in comparecells:
Fifth: Use psyco. It is a JIT compiler. Matlab has a built-in JIT compiler if you're using a somewhat recent version. This should speed-up your code a bit.
Sixth: If the code still isn't fast enough after all previous steps, then you should try vectorizing your code. It shouldn't be too difficult. Basically, the more stuff you can have in numpy arrays the better. Here's my try at vectorizing:
basecells = np.where(cellrecord[:,2]==firstcelltype)[0]
xlocs = cellrecord[basecells, 0]
ylocs = cellrecord[basecells, 1]
xedgedists = xbound - xloc
yedgedists = ybound - yloc
whichcells = np.where((xlocs>excludedist) & (xedgedists>excludedist) & (ylocs>excludedist) & (yedgedists>excludedist))[0]
selectedcells = basecells[whichcells]
comparecells = np.where(cellrecord[:,2]==secondcelltype)[0]
xcomplocs = cellrecords[comparecells,0]
ycomplocs = cellrecords[comparecells,1]
analysisdist2 = analysisdist**2
for basecell in selectedcells:
dists = np.round((xcomplocs-xlocs[basecell])**2 + (ycomplocs-ylocs[basecell])**2)
whichcells = np.where((dists >= 1) & (dists <= analysisdist2))[0]
arraytargets = np.round(dists[whichcells]*analysisdist/intervalnumber) - 1
for target in arraytargets:
spatialraw[target] += 1
You can probably take out that inner for loop, but you have to be careful because some of the elements of arraytargets could be the same. Also, I didn't actually try out all of the code, so there could be a bug or typo in there. Hopefully, it gives you a good idea of how to do this. Oh, one more thing. You make analysisdist/intervalnumber a separate variable to avoid doing that division over and over again.
Not too sure about the slowness of python but you Matlab code can be HIGHLY optimized. Nested for-loops tend to have horrible performance issues. You can replace the inner loop with a vectorized function ... as below:
for basecell = 1:cellnumber;
if firstcelltype==cellrecord(basecell,3);
xloc=cellrecord(basecell,1);
yloc=cellrecord(basecell,2);
xedgedist=(xbound-xloc);
yedgedist=(ybound-yloc);
if (xloc>excludedist) && (yloc>excludedist) && (xedgedist>excludedist) && (yedgedist>excludedist);
% for comparecell = 1:cellnumber;
% if secondcelltype==cellrecord(comparecell,3);
% xcomploc=cellrecord(comparecell,1);
% ycomploc=cellrecord(comparecell,2);
% dist=sqrt((xcomploc-xloc)^2+(ycomploc-yloc)^2);
% if (dist>=1) && (dist<=100.4999);
% arraytarget=round(dist*analysisdist/intervalnumber);
% spatialsum(1,arraytarget)=spatialsum(1,arraytarget)+1;
% end
% end
% end
%replace with:
secondcelltype_mask = secondcelltype == cellrecord(:,3);
xcomploc_vec = cellrecord(secondcelltype_mask ,1);
ycomploc_vec = cellrecord(secondcelltype_mask ,2);
dist_vec = sqrt((xcomploc_vec-xloc)^2+(ycomploc_vec-yloc)^2);
dist_mask = dist>=1 & dist<=100.4999
arraytarget_vec = round(dist_vec(dist_mask)*analysisdist/intervalnumber);
count = accumarray(arraytarget_vec,1, [size(spatialsum,1),1]);
spatialsum(:,1) = spatialsum(:,1)+count;
end
end
end
There may be some small errors in there since I don't have any data to test the code with but it should get ~10X speed up on the Matlab code.
From my experience with numpy I've noticed that swapping out for-loops for vectorized/matrix-based arithmetic has noticeable speed-ups as well. However, without the shapes the shapes of all of your variables its hard to vectorize things.
You can avoid some of the math.sqrt calls by replacing the lines
dist=math.sqrt((xcomploc-xloc)**2+(ycomploc-yloc)**2)
dist=round(dist)
if dist>=1 and dist<=analysisdist:
arraytarget=round(dist*analysisdist/intervalnumber)
with
dist=(xcomploc-xloc)**2+(ycomploc-yloc)**2
dist=round(dist)
if dist>=1 and dist<=analysisdist_squared:
arraytarget=round(math.sqrt(dist)*analysisdist/intervalnumber)
where you have the line
analysisdist_squared = analysis_dist * analysis_dist
outside of the main loop of your function.
Since math.sqrt is called in the innermost loop, you should have from math import sqrt at the top of the module and just call the function as sqrt.
I would also try replacing
dist=(xcomploc-xloc)**2+(ycomploc-yloc)**2
with
dist=(xcomploc-xloc)*(xcomploc-xloc)+(ycomploc-yloc)*(ycomploc-yloc)
There's a chance it will produce faster byte code to do multiplication rather than exponentiation.
I doubt these will get you all the way to MATLABs performance, but they should help reduce some overhead.
If you have a multicore, you could maybe give the multiprocessing module a try and use multiple processes to make use of all the cores.
Instead of sqrt you could use x**0.5, which is, if I remember correct, slightly faster.

Categories