numpy.choose 32 choice limitation - python

Low and behold, I ran into a regression in numpy.choose after upgrading to 1.5.1. Past versions (and numeric) supported an, as far as I could tell, unlimited number of potential choices. The "new" choose is limited to 32. Here is a post where another user laments the regression.
I have a list with 100 choices (0-99) that I was using to modify an array. As a work around, I am using the following code. Understandably, it is 7 times slower than using choose. I am not a C programmer, and while I would to get in an fix the numpy issue, I wonder what other potentially faster work arounds exist. Thoughts?
d={...} #A dictionary with my keys and their new mappings
for key, value in d.iteritems():
array[array==key]=value

I gather that d has the keys 0 to 99. In that case, the solution is really simple. First, write the values of d in a NumPy array values, in a way that d[i] == values[i] – this seems to be the natural data structure for these values anyway. Then you can access the new array with the values replaced by
values[array]
If you want to modify array in place, simply do
array[:] = values[array]

In the Numpy documentation, there is an example of how a simplified version of the choose function could look like.
[...] this function is less simple than it might seem from the
following code description (below ndi = numpy.lib.index_tricks):
np.choose(a,c) == np.array([c[a[I]][I] for I in ndi.ndindex(a.shape)]).
See https://docs.scipy.org/doc/numpy/reference/generated/numpy.choose.html
Putting this into a function could look like this:
import numpy
def choose(selector, choices):
"""
A simplified version of the numpy choose function to workaround the 32
choices limit.
"""
return numpy.array([choices[selector[idx]][idx] for idx in numpy.lib.index_tricks.ndindex(selector.shape)]).reshape(selector.shape)
I am not sure how this translates in terms of efficiency and when exactly this breaks down when compared to the numpy.choose function. But it worked fine for me. Note that the patched function assumes that the entries in the choices are subscriptable.

I'm not sure about efficiency and it's not in-place (nb: I don't use numpy that often - so somewhat rusty):
import numpy as np
d = {0: 5, 1: 3, 2: 20}
data = np.array([[1, 0, 2], [2, 1, 1], [1, 0, 1]])
new_data = np.array([d.get(i, i) for i in data.flat]).reshape(data.shape) # adapt for list/other

When colorizing microscopy images of mouse embryos I ran into a need for
a choose implementation where the number of choices was in the hundreds
(hundreds of mouse embryo cells).
https://github.com/flatironinstitute/mouse_embryo_labeller
As I was not sure whether the above suggestions were general or fast
I wrote this alternative:
import numpy as np
def big_choose(indices, choices):
"Alternate to np.choose that supports more than 30 choices."
indices = np.array(indices)
if (indices.max() <= 30) or (len(choices) <= 31):
# optimized fallback
choices = choices[:31]
return np.choose(indices, choices)
result = 0
while (len(choices) > 0) and not np.all(indices == -1):
these_choices = choices[:30]
remaining_choices = choices[30:]
shifted_indices = indices + 1
too_large_indices = (shifted_indices > 30).astype(np.int)
clamped_indices = np.choose(too_large_indices, [shifted_indices, 0])
choices_with_default = [result] + list(these_choices)
result = np.choose(clamped_indices, choices_with_default)
choices = remaining_choices
if len(choices) > 0:
indices = indices - 30
too_small = (indices < -1).astype(np.int)
indices = np.choose(too_small, [indices, -1])
return result
Note that the generalized function uses the underlying implementation
when it can.

Related

How to count elements in an array withtin a given increasing interval?

I have an array of time values. I want to know how many values are in each 0.05 seconds window.
For example, some values of my array are: -1.9493, -1.9433, -1.911 , -1.8977, -1.8671,..
In the first interval of 0.050 seconds (from -1.9493 to -1.893) I´m expecting to have 3 elements
I already create another array with the 0.050 seconds steps.
a=max(array)
b=min(array)
ventanalinea1=np.arange(b,a,0.05)
v1=np.array(ventanalinea1)
In other words, I would like to compare my original array with this one.
I would like to know if there is a way to ask python to evaluate my array within a given dynamic range.
One of the variants:
import numpy as np
# original array
a = [-1.9493, -1.9433, -1.911 , -1.8977, -1.8671]
step = 0.05
bounds = np.arange(min(a), max(a) + step, step)
result = [
list(filter(lambda x: b[i] <= x <= b[i+1], a))
for i in range(len(b)-1)
]
I have found a cool python library python-intervals that simplify your problem a lot:
Install it with pip install python-intervals and try the code below.
import intervals as I
# This is a recursive function
def counter(timevalues, w=0.050):
if not timevalues:
return "" # stops recursion when timevalues is empty
# Make an interval object that provides convenient interval operations like 'contains'
window = I.closed(
timevalues[0], timevalues[0] + w)
interval = list(
filter(window.contains, timevalues))
count = len(interval)
timevalues = timevalues[count:]
print(f"[{interval[0]} : {interval[-1]}] : {count}")
return counter(timevalues)
if __name__ == "__main__":
times = [-1.9493, -1.9433, -1.911, -1.8977, -1.8671]
print(counter(times))
Adapt it as you wish, for example you might want to return a dictionary rather that a string.
You could still get around this without using the python-intervals library here but i have introduced it here because it will be very likely that you would need other complex operations along the way on your code.

Making nested 'for' loops more pythonic

I'm relatively new to python and am wondering how to make the following more efficient by avoiding explicit nested 'for' loops and using python's implicit looping instead. I'm working with image data, and in this case trying to speed up my k-means algorithm. Here's a sample of what I'm trying to do:
# shape of image will be something like 140, 150, 3
num_sets, rows_per_set, num_columns = image_values.shape
for set in range(0, num_sets):
for row in range(0, rows_per_set):
pos = np.argmin(calc_euclidean(rgb_[set][row], means_list)
buckets[pos].append(image_values[set][row])
What I have today works great but I'd like to make it more efficient.
Feedback and recommendations are greatly appreciated.
Here is a vectorised solution. I'm almost certain I got your dimensions muddled up (3 is not really the number of columns, is it?), but the principle should be recognisable anyway:
For demonstration I only collect the (flat) indices into set and row in the buckets.
import numpy as np
k = 6
rgb_=np.random.randint(0, 9, (140, 150, 3))
means_list = np.random.randint(0, 9, (k, 3))
# compute distance table; use some algebra to leverage highly optimised
# dot product
squared_dists = np.add.outer((rgb_*rgb_).sum(axis=-1),
(means_list*means_list).sum(axis=-1)) \
- 2*np.dot(rgb_, means_list.T)
# find best cluster
best = np.argmin(squared_dists, axis=-1)
# find group sizes
counts = np.bincount(best.ravel())
# translate to block boundaries
bnds = np.cumsum(counts[:-1])
# group indices by best cluster; argpartition should be
# a bit cheaper than argsort
chunks = np.argpartition(best.ravel(), bnds)
# split into buckets
buckets = np.split(chunks, bnds)
# check
num_sets, rows_per_set, num_columns = rgb_.shape
def calc_euclidean(a, b):
return ((a-b)**2).sum(axis=-1)
for set in range(0, num_sets):
for row in range(0, rows_per_set):
pos = np.argmin(calc_euclidean(rgb_[set][row], means_list))
assert pos == best[set, row]
assert rows_per_set*set+row in buckets[pos]

Python: Function doesn't receive a value within a for loop

I'm using the bisection method from the scipy.optimize package within a for loop.
The idea is to get a value of "sig" with the bisection method for each element (value) in the "eps_komp" vector. I've coded this much:
import numpy as np
import scipy.optimize as optimize
K=300
n = 0.43
E = 210000
Rm = 700
sig_a = []
RO_K = 300
RO_n = 0.43
eps_komp = [0.00012893048999999997,
0.018839115269999998,
0.01230539995,
0.022996934109999999,
-0.0037319012899999999,
0.023293921169999999,
0.0036927752099999997,
0.020621037629999998,
0.0063656587500000002,
0.020324050569999998,
-0.0025439530500000001,
0.018542128209999998,
0.01230539995,
0.019730076449999998,
0.0045837363899999999,
0.015275270549999997,
-0.0040288883499999999,
0.021215011749999999,
-0.0031379271699999997,
0.023590908229999999]
def eps_f(i):
return eps_komp[i]
for j in range(len(eps_komp)):
eps_komp_j = eps_f(j)
if j <= len(eps_komp):
def func(sig):
return eps_komp_j - sig/E - (sig/RO_K)**(1/RO_n)
sig_a.append(optimize.bisect(func, 0, Rm))
else:
break
print(sig_a)
Now if I change the the value of "j" in eps_f(j) to 0:
eps_komp_j = eps_f(0)
it works, and so it does for all other values that I insert by hand, but if I keep it like it is in the for loop, the "j" value doesnt change automatically and I get an error:
f(a) and f(b) must have different signs
Has anyone a clue what is the problem and how could this be solved?
Regards,
L
P.S. I did post another topic on this problem yesterday, but I wasnt very specific with the problem and got negative feedback. However, I do need to solve this today, so I was forced to post it again, however I did manage to get a bit further with the code then I did in the earlier post, so it isn't a repost...
If you read the docs you'll find that:
Basic bisection routine to find a zero of the function f between the arguments a and b. f(a) and f(b) cannot have the same signs. Slow but sure.
In your code:
def func(sig):
return eps_komp_j - sig/Emod - (sig/RO_K)**(1/RO_n)
sig_a.append(optimize.bisect(func, 0, Rm))
You're passing it func(0) and func(700).
By replacing the optimize.bisect line with print(func(0), func(700)) I get the following output:
0.00012893048999999997 -7.177181168628421
0.018839115269999998 -7.158470983848421
0.01230539995 -7.165004699168421
0.02299693411 -7.15431316500842
-0.00373190129 -7.1810420004084206
0.02329392117 -7.154016177948421
0.0036927752099999997 -7.173617323908421
0.02062103763 -7.156689061488421
0.00636565875 -7.17094444036842
0.02032405057 -7.156986048548421
-0.00254395305 -7.17985405216842
0.018542128209999998 -7.15876797090842
0.01230539995 -7.165004699168421
0.019730076449999998 -7.157580022668421
0.00458373639 -7.172726362728421
0.015275270549999997 -7.162034828568421
-0.00402888835 -7.181338987468421
0.02121501175 -7.156095087368421
-0.0031379271699999997 -7.1804480262884205
0.02359090823 -7.153719190888421
Note the multiple pairs that have the same signs. optimize.bisect can't handle those. I don't know what you're trying to accomplish, but this is the wrong approach.

Size-Incremental Numpy Array in Python

I just came across the need of an incremental Numpy array in Python, and since I haven't found anything I implemented it. I'm just wondering if my way is the best way or you can come up with other ideas.
So, the problem is that I have a 2D array (the program handles nD arrays) for which the size is not known in advance and variable amount of data need to be concatenated to the array in one direction (let's say that I've to call np.vstak a lot of times). Every time I concatenate data, I need to take the array, sort it along axis 0 and do other stuff, so I cannot construct a long list of arrays and then np.vstak the list at once.
Since memory allocation is expensive, I turned to incremental arrays, where I increment the size of the array of a quantity bigger than the size I need (I use 50% increments), so that I minimize the number of allocations.
I coded this up and you can see it in the following code:
class ExpandingArray:
__DEFAULT_ALLOC_INIT_DIM = 10 # default initial dimension for all the axis is nothing is given by the user
__DEFAULT_MAX_INCREMENT = 10 # default value in order to limit the increment of memory allocation
__MAX_INCREMENT = [] # Max increment
__ALLOC_DIMS = [] # Dimensions of the allocated np.array
__DIMS = [] # Dimensions of the view with data on the allocated np.array (__DIMS <= __ALLOC_DIMS)
__ARRAY = [] # Allocated array
def __init__(self,initData,allocInitDim=None,dtype=np.float64,maxIncrement=None):
self.__DIMS = np.array(initData.shape)
self.__MAX_INCREMENT = maxIncrement
if self.__MAX_INCREMENT == None:
self.__MAX_INCREMENT = self.__DEFAULT_MAX_INCREMENT
# Compute the allocation dimensions based on user's input
if allocInitDim == None:
allocInitDim = self.__DIMS.copy()
while np.any( allocInitDim < self.__DIMS ) or np.any(allocInitDim == 0):
for i in range(len(self.__DIMS)):
if allocInitDim[i] == 0:
allocInitDim[i] = self.__DEFAULT_ALLOC_INIT_DIM
if allocInitDim[i] < self.__DIMS[i]:
allocInitDim[i] += min(allocInitDim[i]/2, self.__MAX_INCREMENT)
# Allocate memory
self.__ALLOC_DIMS = allocInitDim
self.__ARRAY = np.zeros(self.__ALLOC_DIMS,dtype=dtype)
# Set initData
sliceIdxs = [slice(self.__DIMS[i]) for i in range(len(self.__DIMS))]
self.__ARRAY[sliceIdxs] = initData
def shape(self):
return tuple(self.__DIMS)
def getAllocArray(self):
return self.__ARRAY
def getDataArray(self):
"""
Get the view of the array with data
"""
sliceIdxs = [slice(self.__DIMS[i]) for i in range(len(self.__DIMS))]
return self.__ARRAY[sliceIdxs]
def concatenate(self,X,axis=0):
if axis > len(self.__DIMS):
print "Error: axis number exceed the number of dimensions"
return
# Check dimensions for remaining axis
for i in range(len(self.__DIMS)):
if i != axis:
if X.shape[i] != self.shape()[i]:
print "Error: Dimensions of the input array are not consistent in the axis %d" % i
return
# Check whether allocated memory is enough
needAlloc = False
while self.__ALLOC_DIMS[axis] < self.__DIMS[axis] + X.shape[axis]:
needAlloc = True
# Increase the __ALLOC_DIMS
self.__ALLOC_DIMS[axis] += min(self.__ALLOC_DIMS[axis]/2,self.__MAX_INCREMENT)
# Reallocate memory and copy old data
if needAlloc:
# Allocate
newArray = np.zeros(self.__ALLOC_DIMS)
# Copy
sliceIdxs = [slice(self.__DIMS[i]) for i in range(len(self.__DIMS))]
newArray[sliceIdxs] = self.__ARRAY[sliceIdxs]
self.__ARRAY = newArray
# Concatenate new data
sliceIdxs = []
for i in range(len(self.__DIMS)):
if i != axis:
sliceIdxs.append(slice(self.__DIMS[i]))
else:
sliceIdxs.append(slice(self.__DIMS[i],self.__DIMS[i]+X.shape[i]))
self.__ARRAY[sliceIdxs] = X
self.__DIMS[axis] += X.shape[axis]
The code shows considerably better performances than vstack/hstack several random sized concatenations.
What I'm wondering about is: is it the best way? Is there anything that do this already in numpy?
Further it would be nice to be able to overload the slice assignment operator of np.array, so that as soon as the user assign anything outside the actual dimensions, an ExpandingArray.concatenate() is performed. How to do such overloading?
Testing code: I post here also some code I used to make comparison between vstack and my method. I add up random chunk of data of maximum length 100.
import time
N = 10000
def performEA(N):
EA = ExpandingArray(np.zeros((0,2)),maxIncrement=1000)
for i in range(N):
nNew = np.random.random_integers(low=1,high=100,size=1)
X = np.random.rand(nNew,2)
EA.concatenate(X,axis=0)
# Perform operations on EA.getDataArray()
return EA
def performVStack(N):
A = np.zeros((0,2))
for i in range(N):
nNew = np.random.random_integers(low=1,high=100,size=1)
X = np.random.rand(nNew,2)
A = np.vstack((A,X))
# Perform operations on A
return A
start_EA = time.clock()
EA = performEA(N)
stop_EA = time.clock()
start_VS = time.clock()
VS = performVStack(N)
stop_VS = time.clock()
print "Elapsed Time EA: %.2f" % (stop_EA-start_EA)
print "Elapsed Time VS: %.2f" % (stop_VS-start_VS)
I think the most common design pattern for these things is to just use a list for the small arrays. Sure you could do things like dynamic resizing (if you want to do crazy things, you can try to use the resize array method too). I think a typical method is to always double the size, when you really don't know how large things will be. Of course if you know how large the array will grow to, just allocating the full thing up front is simplest.
def performVStack_fromlist(N):
l = []
for i in range(N):
nNew = np.random.random_integers(low=1,high=100,size=1)
X = np.random.rand(nNew,2)
l.append(X)
return np.vstack(l)
I am sure there are some use cases where an expanding array could be useful (for example when the appending arrays are all very small), but this loop seems better handled with the above pattern. The optimization is mostly about how often you need to copy everything around, and doing a list like this (other then the list itself) this is exactly once here. So it is much faster normally.
When I faced a similar problem, I used ndarray.resize() (http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.resize.html#numpy.ndarray.resize). Most of the time, it will avoid reallocation+copying altogether. I can't guarantee it would prove to be faster (it probably would), but it's so much simpler.
As for your second question, I think overriding slice assignment for extending purposes is not a good idea. That operator is meant for assigning to existing items/slices. If you want to change that, it's not immediately clear how you'd want it to behave in some cases, e.g.:
a = MyExtendableArray(np.arange(100))
a[200] = 6 # resize to 200? pad [100:200] with what?
a[90:110] = 7 # assign to existing items AND automagically-allocated items?
a[::-1][200] = 6 # ...
My suggestion is that slice-assignment and data appending should remain separate.

Numpy/Python performing terribly vs. Matlab

Novice programmer here. I'm writing a program that analyzes the relative spatial locations of points (cells). The program gets boundaries and cell type off an array with the x coordinate in column 1, y coordinate in column 2, and cell type in column 3. It then checks each cell for cell type and appropriate distance from the bounds. If it passes, it then calculates its distance from each other cell in the array and if the distance is within a specified analysis range it adds it to an output array at that distance.
My cell marking program is in wxpython so I was hoping to develop this program in python as well and eventually stick it into the GUI. Unfortunately right now python takes ~20 seconds to run the core loop on my machine while MATLAB can do ~15 loops/second. Since I'm planning on doing 1000 loops (with a randomized comparison condition) on ~30 cases times several exploratory analysis types this is not a trivial difference.
I tried running a profiler and array calls are 1/4 of the time, almost all of the rest is unspecified loop time.
Here is the python code for the main loop:
for basecell in range (0, cellnumber-1):
if firstcelltype == np.array((cellrecord[basecell,2])):
xloc=np.array((cellrecord[basecell,0]))
yloc=np.array((cellrecord[basecell,1]))
xedgedist=(xbound-xloc)
yedgedist=(ybound-yloc)
if xloc>excludedist and xedgedist>excludedist and yloc>excludedist and yedgedist>excludedist:
for comparecell in range (0, cellnumber-1):
if secondcelltype==np.array((cellrecord[comparecell,2])):
xcomploc=np.array((cellrecord[comparecell,0]))
ycomploc=np.array((cellrecord[comparecell,1]))
dist=math.sqrt((xcomploc-xloc)**2+(ycomploc-yloc)**2)
dist=round(dist)
if dist>=1 and dist<=analysisdist:
arraytarget=round(dist*analysisdist/intervalnumber)
addone=np.array((spatialraw[arraytarget-1]))
addone=addone+1
targetcell=arraytarget-1
np.put(spatialraw,[targetcell,targetcell],addone)
Here is the matlab code for the main loop:
for basecell = 1:cellnumber;
if firstcelltype==cellrecord(basecell,3);
xloc=cellrecord(basecell,1);
yloc=cellrecord(basecell,2);
xedgedist=(xbound-xloc);
yedgedist=(ybound-yloc);
if (xloc>excludedist) && (yloc>excludedist) && (xedgedist>excludedist) && (yedgedist>excludedist);
for comparecell = 1:cellnumber;
if secondcelltype==cellrecord(comparecell,3);
xcomploc=cellrecord(comparecell,1);
ycomploc=cellrecord(comparecell,2);
dist=sqrt((xcomploc-xloc)^2+(ycomploc-yloc)^2);
if (dist>=1) && (dist<=100.4999);
arraytarget=round(dist*analysisdist/intervalnumber);
spatialsum(1,arraytarget)=spatialsum(1,arraytarget)+1;
end
end
end
end
end
end
Thanks!
Here are some ways to speed up your python code.
First: Don't make np arrays when you are only storing one value. You do this many times over in your code. For instance,
if firstcelltype == np.array((cellrecord[basecell,2])):
can just be
if firstcelltype == cellrecord[basecell,2]:
I'll show you why with some timeit statements:
>>> timeit.Timer('x = 111.1').timeit()
0.045882196294822819
>>> t=timeit.Timer('x = np.array(111.1)','import numpy as np').timeit()
0.55774970267830071
That's an order of magnitude in difference between those calls.
Second: The following code:
arraytarget=round(dist*analysisdist/intervalnumber)
addone=np.array((spatialraw[arraytarget-1]))
addone=addone+1
targetcell=arraytarget-1
np.put(spatialraw,[targetcell,targetcell],addone)
can be replaced with
arraytarget=round(dist*analysisdist/intervalnumber)-1
spatialraw[arraytarget] += 1
Third: You can get rid of the sqrt as Philip mentioned by squaring analysisdist beforehand. However, since you use analysisdist to get arraytarget, you might want to create a separate variable, analysisdist2 that is the square of analysisdist and use that for your comparison.
Fourth: You are looking for cells that match secondcelltype every time you get to that point rather than finding those one time and using the list over and over again. You could define an array:
comparecells = np.where(cellrecord[:,2]==secondcelltype)[0]
and then replace
for comparecell in range (0, cellnumber-1):
if secondcelltype==np.array((cellrecord[comparecell,2])):
with
for comparecell in comparecells:
Fifth: Use psyco. It is a JIT compiler. Matlab has a built-in JIT compiler if you're using a somewhat recent version. This should speed-up your code a bit.
Sixth: If the code still isn't fast enough after all previous steps, then you should try vectorizing your code. It shouldn't be too difficult. Basically, the more stuff you can have in numpy arrays the better. Here's my try at vectorizing:
basecells = np.where(cellrecord[:,2]==firstcelltype)[0]
xlocs = cellrecord[basecells, 0]
ylocs = cellrecord[basecells, 1]
xedgedists = xbound - xloc
yedgedists = ybound - yloc
whichcells = np.where((xlocs>excludedist) & (xedgedists>excludedist) & (ylocs>excludedist) & (yedgedists>excludedist))[0]
selectedcells = basecells[whichcells]
comparecells = np.where(cellrecord[:,2]==secondcelltype)[0]
xcomplocs = cellrecords[comparecells,0]
ycomplocs = cellrecords[comparecells,1]
analysisdist2 = analysisdist**2
for basecell in selectedcells:
dists = np.round((xcomplocs-xlocs[basecell])**2 + (ycomplocs-ylocs[basecell])**2)
whichcells = np.where((dists >= 1) & (dists <= analysisdist2))[0]
arraytargets = np.round(dists[whichcells]*analysisdist/intervalnumber) - 1
for target in arraytargets:
spatialraw[target] += 1
You can probably take out that inner for loop, but you have to be careful because some of the elements of arraytargets could be the same. Also, I didn't actually try out all of the code, so there could be a bug or typo in there. Hopefully, it gives you a good idea of how to do this. Oh, one more thing. You make analysisdist/intervalnumber a separate variable to avoid doing that division over and over again.
Not too sure about the slowness of python but you Matlab code can be HIGHLY optimized. Nested for-loops tend to have horrible performance issues. You can replace the inner loop with a vectorized function ... as below:
for basecell = 1:cellnumber;
if firstcelltype==cellrecord(basecell,3);
xloc=cellrecord(basecell,1);
yloc=cellrecord(basecell,2);
xedgedist=(xbound-xloc);
yedgedist=(ybound-yloc);
if (xloc>excludedist) && (yloc>excludedist) && (xedgedist>excludedist) && (yedgedist>excludedist);
% for comparecell = 1:cellnumber;
% if secondcelltype==cellrecord(comparecell,3);
% xcomploc=cellrecord(comparecell,1);
% ycomploc=cellrecord(comparecell,2);
% dist=sqrt((xcomploc-xloc)^2+(ycomploc-yloc)^2);
% if (dist>=1) && (dist<=100.4999);
% arraytarget=round(dist*analysisdist/intervalnumber);
% spatialsum(1,arraytarget)=spatialsum(1,arraytarget)+1;
% end
% end
% end
%replace with:
secondcelltype_mask = secondcelltype == cellrecord(:,3);
xcomploc_vec = cellrecord(secondcelltype_mask ,1);
ycomploc_vec = cellrecord(secondcelltype_mask ,2);
dist_vec = sqrt((xcomploc_vec-xloc)^2+(ycomploc_vec-yloc)^2);
dist_mask = dist>=1 & dist<=100.4999
arraytarget_vec = round(dist_vec(dist_mask)*analysisdist/intervalnumber);
count = accumarray(arraytarget_vec,1, [size(spatialsum,1),1]);
spatialsum(:,1) = spatialsum(:,1)+count;
end
end
end
There may be some small errors in there since I don't have any data to test the code with but it should get ~10X speed up on the Matlab code.
From my experience with numpy I've noticed that swapping out for-loops for vectorized/matrix-based arithmetic has noticeable speed-ups as well. However, without the shapes the shapes of all of your variables its hard to vectorize things.
You can avoid some of the math.sqrt calls by replacing the lines
dist=math.sqrt((xcomploc-xloc)**2+(ycomploc-yloc)**2)
dist=round(dist)
if dist>=1 and dist<=analysisdist:
arraytarget=round(dist*analysisdist/intervalnumber)
with
dist=(xcomploc-xloc)**2+(ycomploc-yloc)**2
dist=round(dist)
if dist>=1 and dist<=analysisdist_squared:
arraytarget=round(math.sqrt(dist)*analysisdist/intervalnumber)
where you have the line
analysisdist_squared = analysis_dist * analysis_dist
outside of the main loop of your function.
Since math.sqrt is called in the innermost loop, you should have from math import sqrt at the top of the module and just call the function as sqrt.
I would also try replacing
dist=(xcomploc-xloc)**2+(ycomploc-yloc)**2
with
dist=(xcomploc-xloc)*(xcomploc-xloc)+(ycomploc-yloc)*(ycomploc-yloc)
There's a chance it will produce faster byte code to do multiplication rather than exponentiation.
I doubt these will get you all the way to MATLABs performance, but they should help reduce some overhead.
If you have a multicore, you could maybe give the multiprocessing module a try and use multiple processes to make use of all the cores.
Instead of sqrt you could use x**0.5, which is, if I remember correct, slightly faster.

Categories