I am trying to create an array which is symmetric with elements placed as below
I have written the following code to get this form with parameter being 0.5 and dimension being 4-by-4.
import numpy as np
a = np.eye(4)
for i in range(4):
for j in range(4):
a[i, j] = (0.5) ** (np.abs(i-j))
This does what I need but for large dimension (1000s) this causes a lot of overhead. Is there any other low complexity method to get this matrix? Thanks.
We can leverage broadcasting after creating a ranged array to represent the iterator variable and then performing an outer-subtraction to simulate i-j part -
n = 4
p = 0.5
I = np.arange(n)
out = p ** (np.abs(I[:,None]-I))
Optimization #1
We can do a hashing based one with indexing, so that we optimize on expensive power computations, like so -
out = (p**np.arange(n))[(np.abs(I[:,None]-I))]
Optimization #2
We can optimize further to use multi-cores with numexpr -
import numexpr as ne
out = ne.evaluate('p**abs(I2D-I)',{'I2D':I[:,None],'I':I})
Related
Is it possible to efficiently run some calculation on all possible pairs of the elements of a vector? I.e. I want to fill the lower triangular elements of a matrix (possibly flattened).
I.e. I want to:
calculate do_my_calculation(input_vector[i], input_vector[j])
for all i, j in [1, length(input_vector)] and j < i
save all the results
The shape of the results is not terribly important. If I can choose however, I would prefer a vector corresponding to an unrolled of the triangular (i, j) matrix however.
To illustrate what I would like to do in pseudo-python:
input_vector = np.arange(100)
result_vector = []
for i in range(1, len(input_vector)):
for j in range(0, i):
result_vector.append(do_my_calculation(input_vector[i], input_vector[j])
Note: For this question, the types of input_vector and result_vector in the above code are not pertinent. Equally, I am of course happy to preallocate result_vector if required. I am using a list for the sake of conciseness of the sample code.
Edit 1: concrete example as requested by #ddejohn
Note: The question is not whether I can get this to run in jax but whether I can get it to run efficiently, i.e. vectorized .
# Set up the problem
import numpy as np
dim = 15
input_vector_x = np.random.rand(dim)
input_vector_y = np.random.rand(dim)
output_vector = np.empty(np.tril_indices(dim, k=-1)[0].size)
assert input_vector_x.size == input_vector_y.size
# alternative implementation 1
counter = 0
for i in range(1, input_vector_x.size):
for j in range(0, i):
output_vector[counter] = (input_vector_y[j] - input_vector_y[i]) / (input_vector_x[j] - input_vector_x[i])
counter += 1
# alternative implementation 2
indices = np.tril_indices(dim, k=-1)
i = indices[0]
j = indices[1]
output_vector = (input_vector_y[j] - input_vector_y[i]) / (input_vector_x[j] - input_vector_x[i])
There are a few ways to approach this. If you want to compute the full matrix of pairwise results, you could use typical numpy-style broadcasting, assuming your function supports it. Similarly, you could use JAX's Automatic Vectorization (vmap) functionality whether or not your function is compatible with broadcasting.
If you really wish to only compute each value once, you can do this using the lower or upper triangular indices. Note that although this performs fewer operations, you may find that in practice it's faster, particularly on accelerators like GPU and TPU, to compute the full result. The reason for this is that multi-dimensional indexing (the gather operation) is generally relatively expensive on this kind of hardware, so the overhead of doubling the number of function calls may be preferable.
Here's a demonstration of these three approaches:
import jax
import jax.numpy as jnp
key = jax.random.PRNGKey(5748395)
dim = 3
x = jax.random.uniform(key, (dim,))
def f(x1, x2):
return (x1 * x2) / (x1 + x2)
# Option 1: full result, broadcasted operations
print(f(x[:, None], x[None, :]))
# [[0.34950745 0.00658672 0.28704265]
# [0.00658672 0.00332469 0.00655982]
# [0.28704265 0.00655982 0.24352014]]
# Option 2: full result, via vmap
f_mapped = jax.vmap(jax.vmap(f, (None, 0)), (0, None))
print(f_mapped(x, x))
# [[0.34950745 0.00658672 0.28704265]
# [0.00658672 0.00332469 0.00655982]
# [0.28704265 0.00655982 0.24352014]]
# Option 3: explicitly computing at lower-triangular indices
i, j = jnp.tril_indices(dim)
out_tril = f(x[i], x[j])
print(out_tril)
# [0.34950745 0.00658672 0.00332469 0.28704265 0.00655982 0.24352014]
print(jnp.zeros((dim, dim)).at[i, j].set(out_tril))
# [[0.34950745 0. 0. ]
# [0.00658672 0.00332469 0. ]
# [0.28704265 0.00655982 0.24352014]]
I have an equation of the form:
where a and b are 1D arrays of size n and m. I want to avoid forming anything of size n*m in an intermediate step because n and m are both very large the memory needed will be too expensive.
One solution is a simple python loop:
total = 0
for j in range(n):
total += a[j] * b**j
I'm looking for a native numpy solution without a python loop. But to solve without a python loop I can't find a numpy function that will work without forming the n by m array of [0*b, b, b**2, b**3, ...] as an input.
Edit:
As hpaulj pointed out a polynomial solution might work. I found this:
from numpy.polynomial import polynomial
total = polynomial.polyval(b, a)
This gives the same values in a quick test, and according to the notes on polyval it uses Horner's method and appears to be optimal. My actually problem is a sum over two dimensions but I think this solution will generalize.
Edit 2:
polyval seems to cause a
RuntimeWarning: overflow encountered in multiply c0 = c[-i] + c0*x
at smaller array sizes than even np.outer causes a MemoryError. When it does work it is much faster than the python loop.
Edit 3:
Ignore Edits 1 & 2. The following runs faster (on my machine) with the python loop. polyval does at least seem to be memory efficient.
import numpy as np
from numpy.polynomial import polynomial
a_size = 3
b_size = 6
a = np.random.rand(10**a_size)
b = np.exp(2j * np.pi * np.random.rand(10**b_size))
start = time.time()
c = polynomial.polyval(b, a)
middle = time.time()
print(middle - start)
c2, b_current = 0, b**0
for i in range(0, a.size):
c2 += a[i] * b_current
b_current *= b
end = time.time()
print(end-middle)
How can I speed up this code in python?
while ( norm_corr > corr_len ):
correlation = 0.0
for i in xrange(6):
for j in xrange(6):
correlation += (p[i] * T_n[j][i]) * ((F[j] - Fbar) * (F[i] - Fbar))
Integral += correlation
T_n =np.mat(T_n) * np.mat(TT)
T_n = T_n.tolist()
norm_corr = correlation / variance
Here, TT is a fixed 6x6 matrix, p is a fixed 1x6 matrix, and F is fixed 1x6 matrix. T_n is the nth power of TT.
This while loop might be repeated for 10^4 times.
The way to do these things quickly is to use Numpy's built-in functions and operators to perform the operations. Numpy is implemented internally with optimized C code and if you set up your computation properly, it will run much faster.
But leveraging Numpy effectively can sometimes be tricky. It's called "vectorizing" your code - you have to figure out how to express it in a way that acts on whole arrays, rather than with explicit loops.
For example in your loop you have p[i] * T_n[j][i], which IMHO can be done with a vector-by-matrix multiplication: if v is 1x6 and m is 6x6 then v.dot(m) is 1x6 that computes dot products of v with the columns of m. You can use transposes and reshapes to work in different dimensions, if necessary.
I have two 3D arrays and want to identify 2D elements in one array, which have one or more similar counterparts in the other array.
This works in Python 3:
import numpy as np
import random
np.random.seed(123)
A = np.round(np.random.rand(25000,2,2),2)
B = np.round(np.random.rand(25000,2,2),2)
a_index = np.zeros(A.shape[0])
for a in range(A.shape[0]):
for b in range(B.shape[0]):
if np.allclose(A[a,:,:].reshape(-1, A.shape[1]), B[b,:,:].reshape(-1, B.shape[1]),
rtol=1e-04, atol=1e-06):
a_index[a] = 1
break
np.nonzero(a_index)[0]
But of course this approach is awfully slow. Please tell me, that there is a more efficient way (and what it is). THX.
You are trying to do an all-nearest-neighbor type query. This is something that has special O(n log n) algorithms, I'm not aware of a python implementation. However you can use regular nearest-neighbor which is also O(n log n) just a bit slower. For example scipy.spatial.KDTree or cKDTree.
import numpy as np
import random
np.random.seed(123)
A = np.round(np.random.rand(25000,2,2),2)
B = np.round(np.random.rand(25000,2,2),2)
import scipy.spatial
tree = scipy.spatial.cKDTree(A.reshape(25000, 4))
results = tree.query_ball_point(B.reshape(25000, 4), r=1e-04, p=1)
print [r for r in results if r != []]
# [[14252], [1972], [7108], [13369], [23171]]
query_ball_point() is not an exact equivalent to allclose() but it is close enough, especially if you don't care about the rtol parameter to allclose(). You also get a choice of metric (p=1 for city block, or p=2 for Euclidean).
P.S. Consider using query_ball_tree() for very large data sets. Both A and B have to be indexed in that case.
P.S. I'm not sure what effect the 2d-ness of the elements should have; the sample code I gave treats them as 1d and that is identical at least when using city block metric.
From the docs of np.allclose, we have :
If the following equation is element-wise True, then allclose returns
True.
absolute(a - b) <= (atol + rtol * absolute(b))
Using that criteria, we can have a vectorized implementation using broadcasting, customized for the stated problem, like so -
# Setup parameters
rtol,atol = 1e-04, 1e-06
# Use np.allclose criteria to detect true/false across all pairwise elements
mask = np.abs(A[:,None,] - B) <= (atol + rtol * np.abs(B))
# Use the problem context to get final output
out = np.nonzero(mask.all(axis=(2,3)).any(1))[0]
I have two matrices A and B, each with a size of NxM, where N is the number of samples and M is the size of histogram bins. Thus, each row represents a histogram for that particular sample.
What I would like to do is to compute the chi-square distance between two matrices for a different pair of samples. Therefore, each row in the matrix A will be compared to all rows in the other matrix B, resulting a final matrix C with a size of NxN and C[i,j] corresponds to the chi-square distance between A[i] and B[j] histograms.
Here is my python code that does the job:
def chi_square(histA,histB):
esp = 1.e-10
d = sum((histA-histB)**2/(histA+histB+eps))
return 0.5*d
def matrix_cost(A,B):
a,_ = A.shape
b,_ = B.shape
C = zeros((a,b))
for i in xrange(a):
for j in xrange(b):
C[i,j] = chi_square(A[i],B[j])
return C
Currently, for a 100x70 matrix, this entire process takes 0.1 seconds.
Is there any way to improve this performance?
I would appreciate any thoughts or recommendations.
Thank you.
Sure! I'm assuming you're using numpy?
If you have the RAM available, you could use broadcast the arrays and use numpy's efficient vectorization of the operations on those arrays.
Here's how:
Abroad = A[:,np.newaxis,:] # prepared for broadcasting
C = np.sum((Abroad - B)**2/(Abroad + B), axis=-1)/2.
Timing considerations on my platform show a factor of 10 speed gain compared to your algorithm.
A slower option (but still faster than your original algorithm) that uses less RAM than the previous option is simply to broadcast the rows of A into 2D arrays:
def new_way(A,B):
C = np.empty((A.shape[0],B.shape[0]))
for rowind, row in enumerate(A):
C[rowind,:] = np.sum((row - B)**2/(row + B), axis=-1)/2.
return C
This has the advantage that it can be run for arrays with shape (N,M) much larger than (100,70).
You could also look to Theano to push the expensive for-loops to the C-level if you don't have the memory available. I get a factor 2 speed gain compared to the first option (not taking into account the initial compile time) for both the (100,70) arrays as well as (1000,70):
import theano
import theano.tensor as T
X = T.matrix("X")
Y = T.matrix("Y")
results, updates = theano.scan(lambda x_i: ((x_i - Y)**2/(x_i+Y)).sum(axis=1)/2., sequences=X)
chi_square_norm = theano.function(inputs=[X, Y], outputs=[results])
chi_square_norm(A,B) # same result