Making nested 'for' loops more pythonic - python

I'm relatively new to python and am wondering how to make the following more efficient by avoiding explicit nested 'for' loops and using python's implicit looping instead. I'm working with image data, and in this case trying to speed up my k-means algorithm. Here's a sample of what I'm trying to do:
# shape of image will be something like 140, 150, 3
num_sets, rows_per_set, num_columns = image_values.shape
for set in range(0, num_sets):
for row in range(0, rows_per_set):
pos = np.argmin(calc_euclidean(rgb_[set][row], means_list)
What I have today works great but I'd like to make it more efficient.
Feedback and recommendations are greatly appreciated.

Here is a vectorised solution. I'm almost certain I got your dimensions muddled up (3 is not really the number of columns, is it?), but the principle should be recognisable anyway:
For demonstration I only collect the (flat) indices into set and row in the buckets.
import numpy as np
k = 6
rgb_=np.random.randint(0, 9, (140, 150, 3))
means_list = np.random.randint(0, 9, (k, 3))
# compute distance table; use some algebra to leverage highly optimised
# dot product
squared_dists = np.add.outer((rgb_*rgb_).sum(axis=-1),
(means_list*means_list).sum(axis=-1)) \
- 2*, means_list.T)
# find best cluster
best = np.argmin(squared_dists, axis=-1)
# find group sizes
counts = np.bincount(best.ravel())
# translate to block boundaries
bnds = np.cumsum(counts[:-1])
# group indices by best cluster; argpartition should be
# a bit cheaper than argsort
chunks = np.argpartition(best.ravel(), bnds)
# split into buckets
buckets = np.split(chunks, bnds)
# check
num_sets, rows_per_set, num_columns = rgb_.shape
def calc_euclidean(a, b):
return ((a-b)**2).sum(axis=-1)
for set in range(0, num_sets):
for row in range(0, rows_per_set):
pos = np.argmin(calc_euclidean(rgb_[set][row], means_list))
assert pos == best[set, row]
assert rows_per_set*set+row in buckets[pos]


How to generate complex Hypothesis data frames with internal row and column dependencies?

Is there an elegant way of using hypothesis to directly generate complex pandas data frames with internal row and column dependencies? Let's say I want columns such as:
Geographic coordinates can be individually picked at random, but sets must usually come from a general area (e.g. standard reprojections don't work if you have two points on opposite sides of the globe). It's easy to handle that by choosing an area with one strategy and columns of coordinates from inside that area with another. All good so far…
def plaus_spamspam_arrs(
"""Returns plausible spamspamspam arrays"""
size = draw(st.integers(*bounds))
coords = draw(st_lonlat(size=size))
values = draw(st_values(size=size))
areas = draw(st_areas(size=size))
meta = draw(st_meta(size=size))
return PlausibleData(coords, values, areas, meta)
The snippet above makes clean numpy arrays of coordinated single-value data. But the numeric data in the columns example (n-columns interspersed with junk) can also have row-wise dependencies such as needing to be normalised to some factor involving a row-wise sum and/or something else chosen dynamically at runtime.
I can generate all these bits separately, but I can't see how to stitch them into a single data frame without using a clumsy concat-based technique that, I presume, would disrupt draw-based shrinking. Moreover, I need a solution that adapts beyond what's above, so a hack likely get me too far…
Maybe there's something with builds? I just can't quite see out how to do it. Thanks for sharing if you know! A short example as inspiration would likely be enough.
I can generate columns roughly as follows:
def plaus_df_inputs(
draw, *, nrows=None, ncols=None, nrow_bounds=ARR_LEN, ncol_bounds=COL_LEN
"""Returns …"""
box_lon, box_lat = draw(plaus_box_geo())
ncols_jnk = draw(st.integers(*ncol_bounds)) if ncols is None else ncols
ncols_val = draw(st.integers(*ncol_bounds)) if ncols is None else ncols
keys_val = draw(plaus_smp_key_elm(size=ncols_val))
nrows = draw(st.integers(*nrow_bounds)) if nrows is None else nrows
cols = (
plaus_df_cols_lonlat(lons=plaus_lon(box_lon), lats=plaus_lat(box_lat))
+ plaus_df_cols_meta()
+ plaus_df_cols_value(keys=keys_val)
+ draw(plaus_df_cols_junk(size=ncols_jnk))
return draw(st_pd.data_frames(cols, index=plaus_df_idx(size=nrows)))
where the sub-stats are things like
def plaus_df_cols_junk(
draw, *, size=1, names=plaus_meta(), dtypes=plaus_dtype(), unique=False
"""Returns strategy for list of columns of plausible junk data."""
result = set()
for _ in range(size):
result.add(draw(names.filter(lambda name: name not in result)))
return [
st_pd.column(name=result.pop(), dtype=draw(dtypes), unique=unique)
for _ in range(size)
What I need is something more elegant that incorporates the row-based dependencies.
from hypothesis import strategies as st
def interval_sets(draw):
# To create our interval sets, we'll draw from a strategy that shrinks well,
# and then transform it into the format we want. More specifically, we'll use
# a single lists() strategy so that the shrinker can delete chunks atomically,
# and then rearrange the floats that we draw as part of this.
base_elems = st.tuples(
# Different floats bounds to ensure we get at least one valid start and end.
st.floats(0, 1, exclude_max=True),
st.floats(0, 1, exclude_min=True),
base = draw(st.lists(base_elems, min_size=1, unique_by=lambda t: t[0]))
nums = sorted(sum((t[1:] for t in base), start=())) # arrange our endpoints
return [
{"name": name, "start": start, "end": end, "size": end - start}
for (name, _, _), start, end in zip(base, nums[::2], nums[1::2])

Smart indexing using numpy

So, this is more like a structural problem but I think it's looking fairy ugly at the moment, I have code looking like:
for i in range(length_of_tree):
potential_ways = np.zeros((M, 2))
for m in range(omega):
for s in range(Z):
potential_ways[m][s] = sum([quad[r][m][s] for r in range(reps)])
The code is currently working, but I've noticed that there are several ways using numpy to avoid for-loops, my question is therefore, is there a way for me to make this code a bit more minimalistic?
A sum over values in an array can always be changed into an inner product which is optimised in numpy. As has been suggested here, I don't really understand the context of your question without examples but you should be able to do something like the following:
# your examples
M = 2
length_of_tree,reps = 100,100
omega,Z = 2,2
# a random matrix of values of shape 100,2,2
quad = np.random.normal(0,1,size=(100,2,2))
# useful initializations
quadT = quad.T
dummy = np.ones(shape=(100,))
for i in range(length_of_tree):
# option 1
potential_ways = np.zeros((M, 2))
for m in range(omega):
for s in range(Z):
potential_ways[m][s] = sum([quad[r][m][s] for r in range(reps)])
# option 2
potential_ways =

Use multi-processing/threading to break numpy array operation into chunks

I have a function defined which renders a MxN array.
The array is very huge hence I want to use the function to produce small arrays (M1xN, M2xN, M3xN --- MixN. M1+M2+M3+---+Mi = M) simultaneously using multi-processing/threading and eventually join these arrays to form mxn array. As Mr. Boardrider rightfully suggested to provide a viable example, following example would broadly convey what I intend to do
import numpy as n
def mult(y,x):
r = n.empty([len(y),len(x)])
for i in range(len(r)):
r[i] = y[i]*x
return r
x = n.random.rand(10000)
y = n.arange(0,100000,1)
test = mult(y=y,x=x)
As the lengths of x and y increase the system will take more and more time. With respect to this example, I want to run this code such that if I have 4 cores, I can give quarter of the job to each, i.e give job to compute elements r[0] to r[24999] to the 1st core, r[25000] to r[49999] to the 2nd core, r[50000] to r[74999] to the 3rd core and r[75000] to r[99999] to the 4th core. Eventually club the results, append them to get one single array r[0] to r[99999].
I hope this example makes things clear. If my problem is still not clear, please tell.
The first thing to say is: if it's about multiple cores on the same processor, numpy is already capable of parallelizing the operation better than we could ever do by hand (see the discussion at multiplication of large arrays in python )
In this case the key would be simply to ensure that the multiplication is all done in a wholesale array operation rather than a Python for-loop:
test2 = x[n.newaxis, :] * y[:, n.newaxis]
n.abs( test - test2 ).max() # verify equivalence to mult(): output should be 0.0, or very small reflecting floating-point precision limitations
[If you actually wanted to spread this across multiple separate CPUs, that's a different matter, but the question seems to suggest a single (multi-core) CPU.]
OK, bearing the above in mind: let's suppose you want to parallelize an operation more complicated than just mult(). Let's assume you've tried hard to optimize your operation into wholesale array operations that numpy can parallelize itself, but your operation just isn't susceptible to this. In that case, you can use a shared-memory multiprocessing.Array created with lock=False, and multiprocessing.Pool to assign processes to address non-overlapping chunks of it, divided up over the y dimension (and also simultaneously over x if you want). An example listing is provided below. Note that this approach does not explicitly do exactly what you specify (club the results together and append them into a single array). Rather, it does something more efficient: multiple processes simultaneously assemble their portions of the answer in non-overlapping portions of shared memory. Once done, no collation/appending is necessary: we just read out the result.
import os, numpy, multiprocessing, itertools
SHARED_VARS = {} # the best way to get multiprocessing.Pool to send shared multiprocessing.Array objects between processes is to attach them to something global - see
def operate( slices ):
# grok the inputs
yslice, xslice = slices
y, x, r = get_shared_arrays('y', 'x', 'r')
# create views of the appropriate chunks/slices of the arrays:
y = y[yslice]
x = x[xslice]
r = r[yslice, xslice]
# do the actual business
for i in range(len(r)):
r[i] = y[i] * x # If this is truly all operate() does, it can be parallelized far more efficiently by numpy itself.
# But let's assume this is a placeholder for something more complicated.
return 'Process %d operated on y[%s] and x[%s] (%d x %d chunk)' % (os.getpid(), slicestr(yslice), slicestr(xslice), y.size, x.size)
def check(y, x, r):
r2 = x[numpy.newaxis, :] * y[:, numpy.newaxis] # obviously this check will only be valid if operate() literally does only multiplication (in which case this whole business is unncessary)
print( 'max. abs. diff. = %g' % numpy.abs(r - r2).max() )
return y, x, r
def slicestr(s):
return ':'.join( '' if x is None else str(x) for x in [s.start, s.stop, s.step] )
def m2n(buf, shape, typecode, ismatrix=False):
Return a numpy.array VIEW of a multiprocessing.Array given a
handle to the array, the shape, the data typecode, and a boolean
flag indicating whether the result should be cast as a matrix.
a = numpy.frombuffer(buf, dtype=typecode).reshape(shape)
if ismatrix: a = numpy.asmatrix(a)
return a
def n2m(a):
Return a multiprocessing.Array COPY of a numpy.array, together
with shape, typecode and matrix flag.
if not isinstance(a, numpy.ndarray): a = numpy.array(a)
return multiprocessing.Array(a.dtype.char, a.flat, lock=False), tuple(a.shape), a.dtype.char, isinstance(a, numpy.matrix)
def new_shared_array(shape, typecode='d', ismatrix=False):
Allocate a new shared array and return all the details required
to reinterpret it as a numpy array or matrix (same order of
output arguments as n2m)
typecode = numpy.dtype(typecode).char
return multiprocessing.Array(typecode, int(, lock=False), tuple(shape), typecode, ismatrix
def get_shared_arrays(*names):
return [m2n(*SHARED_VARS[name]) for name in names]
def init(*pargs, **kwargs):
SHARED_VARS.update(pargs, **kwargs)
if __name__ == '__main__':
ylen = 1000
xlen = 2000
init( y=n2m(range(ylen)) )
init( x=n2m(numpy.random.rand(xlen)) )
init( r=new_shared_array([ylen, xlen], float) )
print('Master process ID is %s' % os.getpid())
#print( operate([slice(None), slice(None)]) ); check(*get_shared_arrays('y', 'x', 'r')) # local test
pool = multiprocessing.Pool(initializer=init, initargs=SHARED_VARS.items())
yslices = [slice(0,333), slice(333,666), slice(666,None)]
xslices = [slice(0,1000), slice(1000,None)]
#xslices = [slice(None)] # uncomment this if you only want to divide things up in the y dimension
reports =, itertools.product(yslices, xslices))
y, x, r = check(*get_shared_arrays('y', 'x', 'r'))

Size-Incremental Numpy Array in Python

I just came across the need of an incremental Numpy array in Python, and since I haven't found anything I implemented it. I'm just wondering if my way is the best way or you can come up with other ideas.
So, the problem is that I have a 2D array (the program handles nD arrays) for which the size is not known in advance and variable amount of data need to be concatenated to the array in one direction (let's say that I've to call np.vstak a lot of times). Every time I concatenate data, I need to take the array, sort it along axis 0 and do other stuff, so I cannot construct a long list of arrays and then np.vstak the list at once.
Since memory allocation is expensive, I turned to incremental arrays, where I increment the size of the array of a quantity bigger than the size I need (I use 50% increments), so that I minimize the number of allocations.
I coded this up and you can see it in the following code:
class ExpandingArray:
__DEFAULT_ALLOC_INIT_DIM = 10 # default initial dimension for all the axis is nothing is given by the user
__DEFAULT_MAX_INCREMENT = 10 # default value in order to limit the increment of memory allocation
__MAX_INCREMENT = [] # Max increment
__ALLOC_DIMS = [] # Dimensions of the allocated np.array
__DIMS = [] # Dimensions of the view with data on the allocated np.array (__DIMS <= __ALLOC_DIMS)
__ARRAY = [] # Allocated array
def __init__(self,initData,allocInitDim=None,dtype=np.float64,maxIncrement=None):
self.__DIMS = np.array(initData.shape)
self.__MAX_INCREMENT = maxIncrement
if self.__MAX_INCREMENT == None:
# Compute the allocation dimensions based on user's input
if allocInitDim == None:
allocInitDim = self.__DIMS.copy()
while np.any( allocInitDim < self.__DIMS ) or np.any(allocInitDim == 0):
for i in range(len(self.__DIMS)):
if allocInitDim[i] == 0:
allocInitDim[i] = self.__DEFAULT_ALLOC_INIT_DIM
if allocInitDim[i] < self.__DIMS[i]:
allocInitDim[i] += min(allocInitDim[i]/2, self.__MAX_INCREMENT)
# Allocate memory
self.__ALLOC_DIMS = allocInitDim
self.__ARRAY = np.zeros(self.__ALLOC_DIMS,dtype=dtype)
# Set initData
sliceIdxs = [slice(self.__DIMS[i]) for i in range(len(self.__DIMS))]
self.__ARRAY[sliceIdxs] = initData
def shape(self):
return tuple(self.__DIMS)
def getAllocArray(self):
return self.__ARRAY
def getDataArray(self):
Get the view of the array with data
sliceIdxs = [slice(self.__DIMS[i]) for i in range(len(self.__DIMS))]
return self.__ARRAY[sliceIdxs]
def concatenate(self,X,axis=0):
if axis > len(self.__DIMS):
print "Error: axis number exceed the number of dimensions"
# Check dimensions for remaining axis
for i in range(len(self.__DIMS)):
if i != axis:
if X.shape[i] != self.shape()[i]:
print "Error: Dimensions of the input array are not consistent in the axis %d" % i
# Check whether allocated memory is enough
needAlloc = False
while self.__ALLOC_DIMS[axis] < self.__DIMS[axis] + X.shape[axis]:
needAlloc = True
# Increase the __ALLOC_DIMS
self.__ALLOC_DIMS[axis] += min(self.__ALLOC_DIMS[axis]/2,self.__MAX_INCREMENT)
# Reallocate memory and copy old data
if needAlloc:
# Allocate
newArray = np.zeros(self.__ALLOC_DIMS)
# Copy
sliceIdxs = [slice(self.__DIMS[i]) for i in range(len(self.__DIMS))]
newArray[sliceIdxs] = self.__ARRAY[sliceIdxs]
self.__ARRAY = newArray
# Concatenate new data
sliceIdxs = []
for i in range(len(self.__DIMS)):
if i != axis:
self.__ARRAY[sliceIdxs] = X
self.__DIMS[axis] += X.shape[axis]
The code shows considerably better performances than vstack/hstack several random sized concatenations.
What I'm wondering about is: is it the best way? Is there anything that do this already in numpy?
Further it would be nice to be able to overload the slice assignment operator of np.array, so that as soon as the user assign anything outside the actual dimensions, an ExpandingArray.concatenate() is performed. How to do such overloading?
Testing code: I post here also some code I used to make comparison between vstack and my method. I add up random chunk of data of maximum length 100.
import time
N = 10000
def performEA(N):
EA = ExpandingArray(np.zeros((0,2)),maxIncrement=1000)
for i in range(N):
nNew = np.random.random_integers(low=1,high=100,size=1)
X = np.random.rand(nNew,2)
# Perform operations on EA.getDataArray()
return EA
def performVStack(N):
A = np.zeros((0,2))
for i in range(N):
nNew = np.random.random_integers(low=1,high=100,size=1)
X = np.random.rand(nNew,2)
A = np.vstack((A,X))
# Perform operations on A
return A
start_EA = time.clock()
EA = performEA(N)
stop_EA = time.clock()
start_VS = time.clock()
VS = performVStack(N)
stop_VS = time.clock()
print "Elapsed Time EA: %.2f" % (stop_EA-start_EA)
print "Elapsed Time VS: %.2f" % (stop_VS-start_VS)
I think the most common design pattern for these things is to just use a list for the small arrays. Sure you could do things like dynamic resizing (if you want to do crazy things, you can try to use the resize array method too). I think a typical method is to always double the size, when you really don't know how large things will be. Of course if you know how large the array will grow to, just allocating the full thing up front is simplest.
def performVStack_fromlist(N):
l = []
for i in range(N):
nNew = np.random.random_integers(low=1,high=100,size=1)
X = np.random.rand(nNew,2)
return np.vstack(l)
I am sure there are some use cases where an expanding array could be useful (for example when the appending arrays are all very small), but this loop seems better handled with the above pattern. The optimization is mostly about how often you need to copy everything around, and doing a list like this (other then the list itself) this is exactly once here. So it is much faster normally.
When I faced a similar problem, I used ndarray.resize() ( Most of the time, it will avoid reallocation+copying altogether. I can't guarantee it would prove to be faster (it probably would), but it's so much simpler.
As for your second question, I think overriding slice assignment for extending purposes is not a good idea. That operator is meant for assigning to existing items/slices. If you want to change that, it's not immediately clear how you'd want it to behave in some cases, e.g.:
a = MyExtendableArray(np.arange(100))
a[200] = 6 # resize to 200? pad [100:200] with what?
a[90:110] = 7 # assign to existing items AND automagically-allocated items?
a[::-1][200] = 6 # ...
My suggestion is that slice-assignment and data appending should remain separate.

numpy.choose 32 choice limitation

Low and behold, I ran into a regression in numpy.choose after upgrading to 1.5.1. Past versions (and numeric) supported an, as far as I could tell, unlimited number of potential choices. The "new" choose is limited to 32. Here is a post where another user laments the regression.
I have a list with 100 choices (0-99) that I was using to modify an array. As a work around, I am using the following code. Understandably, it is 7 times slower than using choose. I am not a C programmer, and while I would to get in an fix the numpy issue, I wonder what other potentially faster work arounds exist. Thoughts?
d={...} #A dictionary with my keys and their new mappings
for key, value in d.iteritems():
I gather that d has the keys 0 to 99. In that case, the solution is really simple. First, write the values of d in a NumPy array values, in a way that d[i] == values[i] – this seems to be the natural data structure for these values anyway. Then you can access the new array with the values replaced by
If you want to modify array in place, simply do
array[:] = values[array]
In the Numpy documentation, there is an example of how a simplified version of the choose function could look like.
[...] this function is less simple than it might seem from the
following code description (below ndi = numpy.lib.index_tricks):
np.choose(a,c) == np.array([c[a[I]][I] for I in ndi.ndindex(a.shape)]).
Putting this into a function could look like this:
import numpy
def choose(selector, choices):
A simplified version of the numpy choose function to workaround the 32
choices limit.
return numpy.array([choices[selector[idx]][idx] for idx in numpy.lib.index_tricks.ndindex(selector.shape)]).reshape(selector.shape)
I am not sure how this translates in terms of efficiency and when exactly this breaks down when compared to the numpy.choose function. But it worked fine for me. Note that the patched function assumes that the entries in the choices are subscriptable.
I'm not sure about efficiency and it's not in-place (nb: I don't use numpy that often - so somewhat rusty):
import numpy as np
d = {0: 5, 1: 3, 2: 20}
data = np.array([[1, 0, 2], [2, 1, 1], [1, 0, 1]])
new_data = np.array([d.get(i, i) for i in data.flat]).reshape(data.shape) # adapt for list/other
When colorizing microscopy images of mouse embryos I ran into a need for
a choose implementation where the number of choices was in the hundreds
(hundreds of mouse embryo cells).
As I was not sure whether the above suggestions were general or fast
I wrote this alternative:
import numpy as np
def big_choose(indices, choices):
"Alternate to np.choose that supports more than 30 choices."
indices = np.array(indices)
if (indices.max() <= 30) or (len(choices) <= 31):
# optimized fallback
choices = choices[:31]
return np.choose(indices, choices)
result = 0
while (len(choices) > 0) and not np.all(indices == -1):
these_choices = choices[:30]
remaining_choices = choices[30:]
shifted_indices = indices + 1
too_large_indices = (shifted_indices > 30).astype(
clamped_indices = np.choose(too_large_indices, [shifted_indices, 0])
choices_with_default = [result] + list(these_choices)
result = np.choose(clamped_indices, choices_with_default)
choices = remaining_choices
if len(choices) > 0:
indices = indices - 30
too_small = (indices < -1).astype(
indices = np.choose(too_small, [indices, -1])
return result
Note that the generalized function uses the underlying implementation
when it can.
