Find multi-dimensional binding more efficient way - python

I am trying to find multi-dimensional binding. This is the code which I have so far. Is there some way to do it more efficient?
a = np.random.random((3000,3000))
def flinspace(a, b, n, endpoint=True):
a, b = np.asanyarray(a), np.asanyarray(b)
return a[..., None] + (b-a)[..., None]/(n-endpoint) * np.arange(n)
#jit
def fdigitize(x,q,axis=0):
x = np.asanyarray(x).copy()
f,l = np.nanmin(x,axis=axis),np.nanmax(x,axis=axis)
y = flinspace(f,l,q)
for i in range(x.shape[1]):
mask = ~np.isnan(x[:,i]);x[:,i][mask] = np.digitize(x[:,i][mask], y[i])-1
return x
%timeit xx = fdigitize(a,100)

One solution is to apply linear transformations to the columns of x, mapping them to [0, q-1]. Then simply run a np.floor to get their bins.
(This gives slightly different solutions than your algorithm for the max-values, where your algorithm is inconsistent about wheter the max-element of each column should belong to bin q-1 or q-2)
def fdigitizeadj(x, q, axis=0):
x = np.asanyarray(x)
f = np.nanmin(x, axis=axis)
l = np.nanmax(x, axis=axis)
adj = (q-1)*(x-f)/(l-f) # Linear transformation here
return np.floor(adj)
(At least compared to the un-jitted code this gives a speedup and runs your example in 113ms)

Related

Intersection of two arrays, retaining order in larger array

I have a numpy array a of length n, which has the numbers 0 through n-1 shuffled in some way. I also have a numpy array mask of length <= n, containing some subset of the elements of a, in a different order.
The query I want to compute is "give me the elements of a that are also in mask in the order that they appear in a".
I had a similar question here, but the difference was that mask was a boolean mask instead of a mask on the individual elements.
I've outlined and tested 4 methods below:
import timeit
import numpy as np
import matplotlib.pyplot as plt
n_test = 100
n_coverages = 10
np.random.seed(0)
def method1():
return np.array([x for x in a if x in mask])
def method2():
s = set(mask)
return np.array([x for x in a if x in s])
def method3():
return a[np.in1d(a, mask, assume_unique=True)]
def method4():
bmask = np.full((n_samples,), False)
bmask[mask] = True
return a[bmask[a]]
methods = [
('naive membership', method1),
('python set', method2),
('in1d', method3),
('binary mask', method4)
]
p_space = np.linspace(0, 1, n_coverages)
for n_samples in [1000]:
a = np.arange(n_samples)
np.random.shuffle(a)
for label, method in methods:
if method == method1 and n_samples == 10000:
continue
times = []
for coverage in p_space:
mask = np.random.choice(a, size=int(n_samples * coverage), replace=False)
time = timeit.timeit(method, number=n_test)
times.append(time * 1e3)
plt.plot(p_space, times, label=label)
plt.xlabel(r'Coverage ($\frac{|\mathrm{mask}|}{|\mathrm{a}|}$)')
plt.ylabel('Time (ms)')
plt.title('Comparison of 1-D Intersection Methods for $n = {}$ samples'.format(n_samples))
plt.legend()
plt.show()
Which produced the following results:
So, binary mask, is, without a doubt, the fastest method of these 4 for any size of the mask.
My question is, is there a faster way?
So, binary mask, is, without a doubt, the fastest method of these 4 for any size of the mask.
My question is, is there a faster way?
I totally agree that binary mask method is the fastest one. I also don't think there could be any better ways in terms of computation complexity to do what you need.
Let me analyse your method time results:
Method running time is T = O(|a|*|mask|) time. Every element of a is checked to be present in mask by iterating over every its element. It gives O(|mask|) time per element in the worst case when element is missing in mask. |a| does not change,
consider it a constant.
|mask| = coverage * |a|
T = O(|a|2 * coverage)
Hence a linear dependency of coverage in plot. Note that running time has quadratic dependency of |a|. If |mask| ≤ |a| and |a| = n then T = O(n2)
Second method is using set. Set is a data-structure that performs operations of insertion/lookup in O(log(n)), where n is a number of elements in the set. s = set(mask) takes O(|mask|*log(|mask|)) to complete because there are |mask| insertion operations.
x in s is a lookup operation. So second row runs in O(|a|*log(|mask|))
Overall time complexity is O(|mask|*log(|mask|) + |a|*log(|mask|)). If |mask| ≤ |a| and |a| = n then T = O(n*log(n)). You probably observe f(x) = log(x) dependency on plot.
in1d runs in O(|mask|*log(|mask|) + |a|*log(|mask|)) as well. Same T = O(n*log(n)) complexity and f(x) = log(x) dependency on plot.
Time complexity is O(|a| + |mask|) which is T = O(n) and its the best. You observe constant dependency on plot. Algorithm simply iterates over a and mask arrays couple of times.
The thing is that if you have to output n items you will already have T = O(n) complexity. So this method 4 algorithm is optimal.
P.S. In order to observe mentioned f(n) dependencies you'd better vary |a| and let |mask| = 0.9*|a|.
EDIT: Looks like python set indeed performs lookup/insert in O(1) using hash table.
Assuming a is the bigger one.
def with_searchsorted(a, b):
sb = b.argsort()
bs = b[sb]
sa = a.argsort()
ia = np.arange(len(a))
ra = np.empty_like(sa)
ra[sa] = ia
ac = bs.searchsorted(ia) % b.size
return a[(bs[ac] == ia)[ra]]
demo
a = np.arange(10)
np.random.shuffle(a)
b = np.random.choice(a, 5, False)
print(a)
print(b)
[7 2 9 3 0 4 8 5 6 1]
[0 8 5 4 6]
print(with_searchsorted(a, b))
[0 4 8 5 6]
how it works
# sort b for faster searchsorting
sb = b.argsort()
bs = b[sb]
# sort a for faster searchsorting
sa = a.argsort()
# this is the sorted a... we just cheat because we know what it will be
ia = np.arange(len(a))
# construct the reverse sort look up
ra = np.empty_like(sa)
ra[sa] = ia
# perform searchsort
ac = bs.searchsorted(ia) % b.size
return a[(bs[ac] == ia)[ra]]

Fast Cubic Spline output computation given segments and coefficients?

I have a cubic spline activation function with 6 segments. The breaks (length= 7 (6+1)) and the coefficients are known (shape=(6L,4L)). This cubic spline works on the input relative from the breakpoints, and not the absolute values. Here is my output calculation method with some sample data:
def CubSpline(cs,x):
breaks=cs['breaks']
coefs=cs['coefs']
pieces=cs['pieces']
if x <= breaks[0] :
return coefs[0][3]
elif x >= breaks[pieces] :
ind=pieces-1
diff=breaks[ind+1]-breaks[ind]
else :
ind=0
while x > breaks[ind+1] :
ind += 1
diff=x-breaks[ind]
y=coefs[ind][3]+coefs[ind][2]*diff + coefs[ind][1]*diff*diff + coefs[ind][0]*diff*diff*diff
return y
vcubspline=np.vectorize(CubSpline)
breaks=5*np.sort(np.random.randn(7))
coefs=np.random.randn(6,4)
pieces=6
cs=dict()
cs['pieces']=pieces
cs['breaks']=breaks
cs['coefs']=coefs
arr=np.random.randint(10,size=(500,500))
start=time.clock()
a=vcubspline2(cs,arr)
print a.shape
stop=time.clock()
print stop-start
I wish to know if this is the fastest way of computing the output? How to improve this?
Using numpy.piecewise, as suggested in a comment, leads to code that is both more efficient and more concise. The function works directly with an array x, creating arrays of conditions (inequalities involving x) and corresponding arrays of functions, then passing all to piecewise
def cubicSpline(cs, x):
breaks = cs['breaks']
coefs = cs['coefs']
x = np.clip(x, breaks[0], breaks[-1]) # clip to the interval in which spline is defined
conditions = [x <= b for b in breaks]
functions = [coefs[0][3]] + [lambda x, c=c, b=b: c[3] + c[2]*(x-b) + c[1]*(x-b)**2 + c[0]*(x-b)**3 for c, b in zip(coefs, breaks)]
y = np.piecewise(x, conditions, functions)
return y
breaks = 5*np.sort(np.random.randn(7))
coefs = np.random.randn(6,4)
cs = {'breaks': breaks, 'coefs': coefs}
arr = np.random.randint(10, size=(500,500))
a = cubicSpline(cs, arr)
The last line executes at 53 ms (from timeit), versus 805 ms for the original version.
The pieces field of input dictionary was redundant, since the given breakpoints and coefficients already have that information.

Any faster way to get the same results?

I have two given arrays: x and y. I want to calculate correlation coefficient between two arrays as follows:
import numpy as np
from scipy.stats import pearsonr
x = np.array([[[1,2,3,4],
[5,6,7,8]],
[[11,22,23,24],
[25,26,27,28]]])
i,j,k = x.shape
y = np.array([[[31,32,33,34],
[35,36,37,38]],
[[41,42,43,44],
[45,46,47,48]]])
xx = np.row_stack(np.dstack(x))
yy = np.row_stack(np.dstack(y))
results = []
for a, b in zip(xx,yy):
r_sq, p_val = pearsonr(a, b)
results.append(r_sq)
results = np.array(results).reshape(j,k)
print results
[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
The answer is correct. However, would like to know if there are better and faster ways of doing it using numpy and/or scipy.
An alternate way (not necessarily better) is:
xx = x.reshape(2,-1).T # faster, minor issue though
yy = y.reshape(2,-1).T
results = [pearsonr(a,b)[0] for a,b in zip(xx,yy)]
results = np.array(results).reshape(x.shape[1:])
Another current thread was discussing the use of list comprehensions to iterate over values of an array(s): Confusion about numpy's apply along axis and list comprehensions
As discussed there, an alternative is to initialize results, and fill in values during the iteration. That's probably faster for really large cases, but for modest ones, this
np.array([... for .. in ...])
is reasonable.
The deeper question is whether pearsonr, or some alternative, can calculate this correlation for many pairs, rather than just one pair. That may require studying the internals of pearsonr, or other functions in stats.
Here's a first cut at vectorizing stats.pearsonr:
def pearsonr2(a,b):
# stats.pearsonr adapted for
# x and y are (N,2) arrays
n = x.shape[1]
mx = x.mean(1)
my = y.mean(1)
xm, ym = x-mx[:,None], y-my[:,None]
r_num = np.add.reduce(xm * ym, 1)
r_den = np.sqrt(stats.ss(xm,1) * stats.ss(ym,1))
r = r_num / r_den
r = np.clip(r, -1.0, 1.0)
return r
print pearsonr2(xx,yy)
It matches your case, though these test values don't really exercise the function. I just took the pearsonr code, added the axis=1 parameter in most of the lines, and made sure everything ran. The prob step could be included with some boolean masking.
(I can add the stats.pearsonr code to my answer if needed).
This version will take any dimension a,b (as long as they are the same), and do your pearsonr calc along the designated axis. No reshaping needed.
def pearsonr_flex(a,b, axis=1):
# stats.pearsonr adapted for
# x and y are (N,2) arrays
n = x.shape[axis]
mx = x.mean(axis, keepdims=True)
my = y.mean(axis, keepdims=True)
xm, ym = x-mx, y-my
r_num = np.add.reduce(xm * ym, axis)
r_den = np.sqrt(stats.ss(xm, axis) * stats.ss(ym, axis))
r = r_num / r_den
r = np.clip(r, -1.0, 1.0)
return r
pearsonr_flex(xx, yy, 1)
preasonr_flex(x, y, 0)

3D distance vectorization

I need help vectorizing this code. Right now, with N=100, its takes a minute or so to run. I would like to speed that up. I have done something like this for a double loop, but never with a 3D loop, and I am having difficulties.
import numpy as np
N = 100
n = 12
r = np.sqrt(2)
x = np.arange(-N,N+1)
y = np.arange(-N,N+1)
z = np.arange(-N,N+1)
C = 0
for i in x:
for j in y:
for k in z:
if (i+j+k)%2==0 and (i*i+j*j+k*k!=0):
p = np.sqrt(i*i+j*j+k*k)
p = p/r
q = (1/p)**n
C += q
print '\n'
print C
The meshgrid/where/indexing solution is already extremely fast. I made it about 65 % faster. This is not too much, but I explain it anyway, step by step:
It was easiest for me to approach this problem with all 3D vectors in the grid being columns in one large 2D 3 x M array. meshgrid is the right tool for creating all the combinations (note that numpy version >= 1.7 is required for a 3D meshgrid), and vstack + reshape bring the data into the desired form. Example:
>>> np.vstack(np.meshgrid(*[np.arange(0, 2)]*3)).reshape(3,-1)
array([[0, 0, 1, 1, 0, 0, 1, 1],
[0, 0, 0, 0, 1, 1, 1, 1],
[0, 1, 0, 1, 0, 1, 0, 1]])
Each column is one 3D vector. Each of these eight vectors represents one corner of a 1x1x1 cube (a 3D grid with step size 1 and length 1 in all dimensions).
Let's call this array vectors (it contains all 3D vectors representing all points in the grid). Then, prepare a bool mask for selecting those vectors fulfilling your mod2 criterion:
mod2bool = np.sum(vectors, axis=0) % 2 == 0
np.sum(vectors, axis=0) creates an 1 x M array containing the element sum for each column vector. Hence, mod2bool is a 1 x M array with a bool value for each column vector. Now use this bool mask:
vectorsubset = vectors[:,mod2bool]
This selects all rows (:) and uses boolean indexing for filtering the columns, both are fast operations in numpy. Calculate the lengths of the remaining vectors, using the native numpy approach:
lengths = np.sqrt(np.sum(vectorsubset**2, axis=0))
This is quite fast -- however, scipy.stats.ss and bottleneck.ss can perform the squared sum calculation even faster than this.
Transform the lengths using your instructions:
with np.errstate(divide='ignore'):
p = (r/lengths)**n
This involves finite number division by zero, resulting in Infs in the output array. This is entirely fine. We use numpy's errstate context manager for making sure that these zero divisions do not throw an exception or a runtime warning.
Now sum up the finite elements (ignore the infs) and return the sum:
return np.sum(p[np.isfinite(p)])
I have implemented this method two times below. Once exactly like just explained, and once involving bottleneck's ss and nansum functions. I have also added your method for comparison, and a modified version of your method that skips the np.where((x*x+y*y+z*z)!=0) indexing, but rather creates Infs, and finally sums up the isfinite way.
import sys
import numpy as np
import bottleneck as bn
N = 100
n = 12
r = np.sqrt(2)
x,y,z = np.meshgrid(*[np.arange(-N, N+1)]*3)
gridvectors = np.vstack((x,y,z)).reshape(3, -1)
def measure_time(func):
import time
def modified_func(*args, **kwargs):
t0 = time.time()
result = func(*args, **kwargs)
duration = time.time() - t0
print("%s duration: %.3f s" % (func.__name__, duration))
return result
return modified_func
#measure_time
def method_columnvecs(vectors):
mod2bool = np.sum(vectors, axis=0) % 2 == 0
vectorsubset = vectors[:,mod2bool]
lengths = np.sqrt(np.sum(vectorsubset**2, axis=0))
with np.errstate(divide='ignore'):
p = (r/lengths)**n
return np.sum(p[np.isfinite(p)])
#measure_time
def method_columnvecs_opt(vectors):
# On my system, bn.nansum is even slightly faster than np.sum.
mod2bool = bn.nansum(vectors, axis=0) % 2 == 0
# Use ss from bottleneck or scipy.stats (axis=0 is default).
lengths = np.sqrt(bn.ss(vectors[:,mod2bool]))
with np.errstate(divide='ignore'):
p = (r/lengths)**n
return bn.nansum(p[np.isfinite(p)])
#measure_time
def method_original(x,y,z):
ind = np.where((x+y+z)%2==0)
x = x[ind]
y = y[ind]
z = z[ind]
ind = np.where((x*x+y*y+z*z)!=0)
x = x[ind]
y = y[ind]
z = z[ind]
p=np.sqrt(x*x+y*y+z*z)/r
return np.sum((1/p)**n)
#measure_time
def method_original_finitesum(x,y,z):
ind = np.where((x+y+z)%2==0)
x = x[ind]
y = y[ind]
z = z[ind]
lengths = np.sqrt(x*x+y*y+z*z)
with np.errstate(divide='ignore'):
p = (r/lengths)**n
return np.sum(p[np.isfinite(p)])
print method_columnvecs(gridvectors)
print method_columnvecs_opt(gridvectors)
print method_original(x,y,z)
print method_original_finitesum(x,y,z)
This is the output:
$ python test.py
method_columnvecs duration: 1.295 s
12.1318801965
method_columnvecs_opt duration: 1.162 s
12.1318801965
method_original duration: 1.936 s
12.1318801965
method_original_finitesum duration: 1.714 s
12.1318801965
All methods produce the same result. Your method becomes a bit faster when doing the isfinite style sum. My methods are faster, but I would say that this is an exercise of academic nature rather than an important improvement :-)
I have one question left: you were saying that for N=3, the calculation should produce a 12. Even yours doesn't do this. All methods above produce 12.1317530867 for N=3. Is this expected?
Thanks to #Bill, I was able to get this to work. Very fast now. Perhaps could be done better, especially with the two masks to get rid of the two conditions that I originally had for loops for.
from __future__ import division
import numpy as np
N = 100
n = 12
r = np.sqrt(2)
x, y, z = np.meshgrid(*[np.arange(-N, N+1)]*3)
ind = np.where((x+y+z)%2==0)
x = x[ind]
y = y[ind]
z = z[ind]
ind = np.where((x*x+y*y+z*z)!=0)
x = x[ind]
y = y[ind]
z = z[ind]
p=np.sqrt(x*x+y*y+z*z)/r
ans = (1/p)**n
ans = np.sum(ans)
print 'ans'
print ans

Scipy Fast 1-D interpolation without any loop

I have two 2D array, x(ni, nj) and y(ni,nj), that I need to interpolate over one axis. I want to interpolate along last axis for every ni.
I wrote
import numpy as np
from scipy.interpolate import interp1d
z = np.asarray([200,300,400,500,600])
out = []
for i in range(ni):
f = interp1d(x[i,:], y[i,:], kind='linear')
out.append(f(z))
out = np.asarray(out)
However, I think this method is inefficient and slow due to loop if array size is too large. What is the fastest way to interpolate multi-dimensional array like this? Is there any way to perform linear and cubic interpolation without loop? Thanks.
The method you propose does have a python loop, so for large values of ni it is going to get slow. That said, unless you are going to have large ni you shouldn't worry much.
I have created sample input data with the following code:
def sample_data(n_i, n_j, z_shape) :
x = np.random.rand(n_i, n_j) * 1000
x.sort()
x[:,0] = 0
x[:, -1] = 1000
y = np.random.rand(n_i, n_j)
z = np.random.rand(*z_shape) * 1000
return x, y, z
And have tested them with this two versions of linear interpolation:
def interp_1(x, y, z) :
rows, cols = x.shape
out = np.empty((rows,) + z.shape, dtype=y.dtype)
for j in xrange(rows) :
out[j] =interp1d(x[j], y[j], kind='linear', copy=False)(z)
return out
def interp_2(x, y, z) :
rows, cols = x.shape
row_idx = np.arange(rows).reshape((rows,) + (1,) * z.ndim)
col_idx = np.argmax(x.reshape(x.shape + (1,) * z.ndim) > z, axis=1) - 1
ret = y[row_idx, col_idx + 1] - y[row_idx, col_idx]
ret /= x[row_idx, col_idx + 1] - x[row_idx, col_idx]
ret *= z - x[row_idx, col_idx]
ret += y[row_idx, col_idx]
return ret
interp_1 is an optimized version of your code, following Dave's answer. interp_2 is a vectorized implementation of linear interpolation that avoids any python loop whatsoever. Coding something like this requires a sound understanding of broadcasting and indexing in numpy, and some things are going to be less optimized than what interp1d does. A prime example being finding the bin in which to interpolate a value: interp1d will surely break out of loops early once it finds the bin, the above function is comparing the value to all bins.
So the result is going to be very dependent on what n_i and n_j are, and even how long your array z of values to interpolate is. If n_j is small and n_i is large, you should expect an advantage from interp_2, and from interp_1 if it is the other way around. Smaller z should be an advantage to interp_2, longer ones to interp_1.
I have actually timed both approaches with a variety of n_i and n_j, for z of shape (5,) and (50,), here are the graphs:
So it seems that for z of shape (5,) you should go with interp_2 whenever n_j < 1000, and with interp_1 elsewhere. Not surprisingly, the threshold is different for z of shape (50,), now being around n_j < 100. It seems tempting to conclude that you should stick with your code if n_j * len(z) > 5000, but change it to something like interp_2 above if not, but there is a great deal of extrapolating in that statement! If you want to further experiment yourself, here's the code I used to produce the graphs.
n_s = np.logspace(1, 3.3, 25)
int_1 = np.empty((len(n_s),) * 2)
int_2 = np.empty((len(n_s),) * 2)
z_shape = (5,)
for i, n_i in enumerate(n_s) :
print int(n_i)
for j, n_j in enumerate(n_s) :
x, y, z = sample_data(int(n_i), int(n_j), z_shape)
int_1[i, j] = min(timeit.repeat('interp_1(x, y, z)',
'from __main__ import interp_1, x, y, z',
repeat=10, number=1))
int_2[i, j] = min(timeit.repeat('interp_2(x, y, z)',
'from __main__ import interp_2, x, y, z',
repeat=10, number=1))
cs = plt.contour(n_s, n_s, np.transpose(int_1-int_2))
plt.clabel(cs, inline=1, fontsize=10)
plt.xlabel('n_i')
plt.ylabel('n_j')
plt.title('timeit(interp_2) - timeit(interp_1), z.shape=' + str(z_shape))
plt.show()
One optimization is to allocate the result array once like so:
import numpy as np
from scipy.interpolate import interp1d
z = np.asarray([200,300,400,500,600])
out = np.zeros( [ni, len(z)], dtype=np.float32 )
for i in range(ni):
f = interp1d(x[i,:], y[i,:], kind='linear')
out[i,:]=f(z)
This will save you some memory copying that occurs in your implementation, which occurs in the calls to out.append(...).

Categories