Imagine I have some function, for example
def f(x, y):
return x * y
And I want to fill some matrix with its values. The easiest way to do is, for example
N = 10
X = np.arange(N)
Y = np.arange(N)
matrix = np.zeros((N, N))
for i, x in enumerate(X):
for j, y in enumerate(Y):
matrix[i][j] = f(x,y)
How can I do it in pythonic way? For example using np.vectorize?
Using something like np.fromfunction or np.vectorize will give you little, if any, advantage over a normal for loop. In numpy, you can take advantage of the fact that vectorized operations use loops implemented in C. The problem is that there is no general solution to vectorize your function. For the example you give, it's possible though:
x = np.arange(N)
y = np.arange(N)
matrix = x[:, None] * y
For more complex operations that can not be reduced to numpy function calls, you may want to consider using cython or numba.
Related
for x in range(10):
for y in range(10):
for z in range(10):
if (1111*x + 1111*y + 1111*z) == (10000*y + 1110*x + z):
print(z)
Is there a way to shorten this code, specifically the first 3 lines where I've used three similar looking for loops? I'm quite new to python so please explain any modules used, if possible.
Well, you're essentially evaluating a function in a 3d coordinate system, with coordinates given by x, y, and z. So let's look at Numpy, which implements arrays in Python. If you're familiar with matlab or IDL, these arrays have similar functionality.
import numpy
x = numpy.arange(10) #Same as range but creates an array instead of a generator
y = numpy.arange(10)
z = numpy.arange(10)
#Now build a 3d array with every point
#defined by the coordinate arrays
xg, yg, zg = numpy.meshgrid(x,y,z)
#Evaluate your functions
#and store the Boolean result in an array.
mask = (1111*xg + 1111*yg + 1111*zg) == (10000*yg + 1110*xg + zg)
#Print out the z values where the mask is True
print(zg[mask])
Is this more readable? Debatable. Is it shorter? No. But it does leverage array operations which may be faster in certain circumstances.
Currently I am trying to calculate least squares with 2 numpy arrays (X, Y) with n arrays with some same number of values in each. My output that I want is 2 numpy arrays that contain the slope and intercept respectively. Right now I have the following inefficient code:
M = []
C = []
for i in range(len(X)):
x = X[i]
y = Y[i]
A = np.vstack([x, np.ones(len(x))]).T
m, c = np.linalg.lstsq(A, y, rcond=None)[0]
M.append(m)
C.append(c)
return np.array(M), np.array(C)
Since this code relies on a couple conversions and a for loop I think there has to be a more efficient way to solve this problem, but I can't think of one that doesn't rely on the for loop and casts. I need to keep the order of M and C maintained as well.
Actually it should not be that bad, numpy is highly optimized for this kind on stuff (stacking matrices, casting...).
You cannot avoid the for-loop since np.linalg.lstsq does not support so-called "batch-processing", but it is possible to avoid memory allocation. I don't know if it will really improve anything:
M, C = [], []
for i, x, y in enumerate(zip(X, Y)):
if not i:
A = np.stack((x.T, np.ones(len(x))), axis=-1)
else:
A[:, :-1] = x.T
m, c = np.linalg.lstsq(A, y, rcond=None)[0]
M.append(m)
C.append(c)
return np.array(M), np.array(C)
After thinking twice about it, there is a way to do it by "emulating" batch processing, by reformulating your problem as a block diagonal sparse problem:
import scipy.linalg
A = scipy.linalg.block_diag(*[
np.stack((x.T, np.ones(len(x))), axis=-1) for x in X])
Y = np.concatenate(Y)
M, C = np.linalg.lstsq(A, y, rcond=None)[0]
Then, doing the inverse of block_diag for M you should be able to recover what you want. C should be fine already.
I haven't tried it myself but I think it would lead to a speedup thanks to efficient matrix algebra. You could probably even further improve efficiency by using spicy space lsqr solver in this case. Good luck with that !
I'm trying to implement the n-mode tensor-matrix product (as defined by Kolda and Bader: https://www.sandia.gov/~tgkolda/pubs/pubfiles/SAND2007-6702.pdf) efficiently in Python using Numpy. The operation effectively gets down to (for matrix U, tensor X and axis/mode k):
Extract all vectors along axis k from X by collapsing all other axes.
Multiply these vectors on the left by U using standard matrix multiplication.
Insert the vectors again into the output tensor using the same shape, apart from X.shape[k], which is now equal to U.shape[0] (initially, X.shape[k] must be equal to U.shape[1], as a result of the matrix multiplication).
I've been using an explicit implementation for a while which performs all these steps separately:
Transpose the tensor to bring axis k to the front (in my full code I added an exception in case k == X.ndim - 1, in which case it's faster to leave it there and transpose all future operations, or at least in my application, but that's not relevant here).
Reshape the tensor to collapse all other axes.
Calculate the matrix multiplication.
Reshape the tensor to reconstruct all other axes.
Transpose the tensor back into the original order.
I would think this implementation creates a lot of unnecessary (big) arrays, so once I discovered np.einsum I thought this would speed things up considerably. However using the code below I got worse results:
import numpy as np
from time import time
def mode_k_product(U, X, mode):
transposition_order = list(range(X.ndim))
transposition_order[mode] = 0
transposition_order[0] = mode
Y = np.transpose(X, transposition_order)
transposed_ranks = list(Y.shape)
Y = np.reshape(Y, (Y.shape[0], -1))
Y = U # Y
transposed_ranks[0] = Y.shape[0]
Y = np.reshape(Y, transposed_ranks)
Y = np.transpose(Y, transposition_order)
return Y
def einsum_product(U, X, mode):
axes1 = list(range(X.ndim))
axes1[mode] = X.ndim + 1
axes2 = list(range(X.ndim))
axes2[mode] = X.ndim
return np.einsum(U, [X.ndim, X.ndim + 1], X, axes1, axes2, optimize=True)
def test_correctness():
A = np.random.rand(3, 4, 5)
for i in range(3):
B = np.random.rand(6, A.shape[i])
X = mode_k_product(B, A, i)
Y = einsum_product(B, A, i)
print(np.allclose(X, Y))
def test_time(method, amount):
U = np.random.rand(256, 512)
X = np.random.rand(512, 512, 256)
start = time()
for i in range(amount):
method(U, X, 1)
return (time() - start)/amount
def test_times():
print("Explicit:", test_time(mode_k_product, 10))
print("Einsum:", test_time(einsum_product, 10))
test_correctness()
test_times()
Timings for me:
Explicit: 3.9450525522232054
Einsum: 15.873924326896667
Is this normal or am I doing something wrong? I know there are circumstances where storing intermediate results can decrease complexity (e.g. chained matrix multiplication), however in this case I can't think of any calculations that are being repeated. Is matrix multiplication so optimized that it removes the benefits of not transposing (which technically has a lower complexity)?
I'm more familiar with the subscripts style of using einsum, so worked out these equivalences:
In [194]: np.allclose(np.einsum('ij,jkl->ikl',B0,A), einsum_product(B0,A,0))
Out[194]: True
In [195]: np.allclose(np.einsum('ij,kjl->kil',B1,A), einsum_product(B1,A,1))
Out[195]: True
In [196]: np.allclose(np.einsum('ij,klj->kli',B2,A), einsum_product(B2,A,2))
Out[196]: True
With a mode parameter, your approach in einsum_product may be best. But the equivalences help me visualize the calculation better, and may help others.
Timings should basically be the same. There's an extra setup time in einsum_product that should disappear in larger dimensions.
After updating Numpy, Einsum is only slightly slower than the explicit method, with or without multi-threading (see comments to my question).
Is there any way to vectorize or otherwise speed this up? I'm already jitting it using numba, but it is still a major bottleneck. Using numba to jit my functions on 1-d numpy arrays leads to code that is orders of magnitude faster, but there is essentially a negligible improvement when using numba on the 2-d arrays below. decomposition is a numpy matrix representing the cholesky decomposition of the correlation matrix, and x and n are constants, and nrand is numpy.random
#jit
def generate_random_correlated_walks(decomposition, x, n):
uncorrelated_walks = np.empty((2*x, n))
for i in range(x):
# Generate the random uncorrelated walks
wv = nrand.normal(loc=0, scale=1, size=n)
ws = nrand.normal(loc=0, scale=1, size=n)
uncorrelated_walks[2*i] = wv
uncorrelated_walks[(2*i) + 1] = ws
# Create a matrix out of these walks
uncorrelated_walks = np.matrix(uncorrelated_walks)
m, n = uncorrelated_walks.shape
correlated_walks = np.empty((m, n))
# Go through column and correlate the walk values
for i in range(n):
correlated_timestep = np.transpose(uncorrelated_walks[:, i]) * decomposition
correlated_walks[:, i] = correlated_timestep
return correlated_walks
EDIT: I have made the suggested changes, and now my code is as below, but unfortunately still is a major bottleneck. Any ideas?
#jit
def generate_random_correlated_walks(self, decomposition, x, n):
rows = 2*x
uncorrelated_walks = np.random.normal(loc=0, scale=1, size=(rows, n))
correlated_walks = np.empty((rows, n))
for i in range(n):
correlated_timestep = np.dot(np.transpose(uncorrelated_walks[:, i]), decomposition)
correlated_walks[:, i] = correlated_timestep
return correlated_walks
First thing you can improve is to remove your for loop. There is no advantage to calling np.random.normal a bunch of times with the same input parameters if you believe that it really generates random numbers.
Instead of using np.matrix, use np.array. This will make your life easier when you consider that the previous item can be used to shorten the entire first portion of your function into one step.
You can of course completely remove the final loop with a simple matrix multiplication: uncorrelated_walks.T # decomposition will give you the transpose of your current correlated_walks. You can avoid one of the transposes by changing the order of the arguments.
You end up with something like:
def generate_random_correlated_walks(decomposition, x, n):
uncorrelated_walks = nrand.normal(loc=0, scale=1, size=(2*x, n))
correlated_walks = np.dot(decomposition.T, uncorrelated_walks)
return correlated_walks
Not sure how much this will help you, but removing the Python-level loops should be some kind of boost since it will reduce the overhead of multiple numpy calls.
You could sacrifice legibility to turn the whole thing into a one-liner:
def generate_random_correlated_walks(decomposition, x, n):
return np.dot(decomposition.T, nrand.normal(loc=0, scale=1, size=(2*x, n)))
What's the easiest way to get the DFT matrix for 2-d DFT in python? I could not find such function in numpy.fft. Thanks!
The easiest and most likely the fastest method would be using fft from SciPy.
import scipy as sp
def dftmtx(N):
return sp.fft(sp.eye(N))
If you know even faster way (might be more complicated) I'd appreciate your input.
Just to make it more relevant to the main question - you can also do it with numpy:
import numpy as np
dftmtx = np.fft.fft(np.eye(N))
When I had benchmarked both of them I have an impression scipy one was marginally faster but I
have not done it thoroughly and it was sometime ago so don't take my word for it.
Here's pretty good source on FFT implementations in python:
http://nbviewer.ipython.org/url/jakevdp.github.io/downloads/notebooks/UnderstandingTheFFT.ipynb
It's rather from speed perspective, but in this case we can actually see that sometimes it comes with simplicity too.
I don't think this is built in. However, direct calculation is straightforward:
import numpy as np
def DFT_matrix(N):
i, j = np.meshgrid(np.arange(N), np.arange(N))
omega = np.exp( - 2 * pi * 1J / N )
W = np.power( omega, i * j ) / sqrt(N)
return W
EDIT For a 2D FFT matrix, you can use the following:
x = np.zeros(N, N) # x is any input data with those dimensions
W = DFT_matrix(N)
dft_of_x = W.dot(x).dot(W)
As of scipy 0.14 there is a built-in scipy.linalg.dft:
Example with 16 point DFT matrix:
>>> import scipy.linalg
>>> import numpy as np
>>> m = scipy.linalg.dft(16)
Validate unitary property, note matrix is unscaled thus 16*np.eye(16):
>>> np.allclose(np.abs(np.dot( m.conj().T, m )), 16*np.eye(16))
True
For 2D DFT matrix, it's just a issue of tensor product, or specially, Kronecker Product in this case, as we are dealing with matrix algebra.
>>> m2 = np.kron(m, m) # 256x256 matrix, flattened from (16,16,16,16) tensor
Now we can give it a tiled visualization, it's done by rearranging each row into a square block
>>> import matplotlib.pyplot as plt
>>> m2tiled = m2.reshape((16,)*4).transpose(0,2,1,3).reshape((256,256))
>>> plt.subplot(121)
>>> plt.imshow(np.real(m2tiled), cmap='gray', interpolation='nearest')
>>> plt.subplot(122)
>>> plt.imshow(np.imag(m2tiled), cmap='gray', interpolation='nearest')
>>> plt.show()
Result (real and imag part separately):
As you can see they are 2D DFT basis functions
Link to documentation
#Alex| is basically correct, I add here the version I used for 2-d DFT:
def DFT_matrix_2d(N):
i, j = np.meshgrid(np.arange(N), np.arange(N))
A=np.multiply.outer(i.flatten(), i.flatten())
B=np.multiply.outer(j.flatten(), j.flatten())
omega = np.exp(-2*np.pi*1J/N)
W = np.power(omega, A+B)/N
return W
Lambda functions work too:
dftmtx = lambda N: np.fft.fft(np.eye(N))
You can call it by using dftmtx(N). Example:
In [62]: dftmtx(2)
Out[62]:
array([[ 1.+0.j, 1.+0.j],
[ 1.+0.j, -1.+0.j]])
If you wish to compute the 2D DFT as a single matrix operation, it is necessary to unravel the matrix X on which you wish to compute the DFT into a vector, as each output of the DFT has a sum over every index in the input, and a single square matrix multiplication does not have this ability. Taking care to be sure we are handling the indices correctly, I find the following works:
M = 16
N = 16
X = np.random.random((M,N)) + 1j*np.random.random((M,N))
Y = np.fft.fft2(X)
W = np.zeros((M*N,M*N),dtype=np.complex)
hold = []
for m in range(M):
for n in range(N):
hold.append((m,n))
for j in range(M*N):
for i in range(M*N):
k,l = hold[j]
m,n = hold[i]
W[j,i] = np.exp(-2*np.pi*1j*(m*k/M + n*l/N))
np.allclose(np.dot(W,X.ravel()),Y.ravel())
True
If you wish to change the normalization to orthogonal, you can divide by 1/sqrt(MN) or if you wish to have the inverse transformation, just change the sign in the exponent.
This might be a little late, but there is a better alternative for creating the DFT matrix, that performs faster, using NumPy's vander
also, this implementation does not use loops (explicitly)
def dft_matrix(signal):
N = signal.shape[0] # num of samples
w = np.exp((-2 * np.pi * 1j) / N) # remove the '-' for inverse fourier
r = np.arange(N)
w_matrix = np.vander(w ** r, increasing=True) # faster than meshgrid
return w_matrix
if I'm not mistaken, the main improvement is that this method generates the elements of the power from the (already calculated) previous elements
you can read about vander in the documentation:
numpy.vander