What's the easiest way to get the DFT matrix for 2-d DFT in python? I could not find such function in numpy.fft. Thanks!
The easiest and most likely the fastest method would be using fft from SciPy.
import scipy as sp
def dftmtx(N):
return sp.fft(sp.eye(N))
If you know even faster way (might be more complicated) I'd appreciate your input.
Just to make it more relevant to the main question - you can also do it with numpy:
import numpy as np
dftmtx = np.fft.fft(np.eye(N))
When I had benchmarked both of them I have an impression scipy one was marginally faster but I
have not done it thoroughly and it was sometime ago so don't take my word for it.
Here's pretty good source on FFT implementations in python:
http://nbviewer.ipython.org/url/jakevdp.github.io/downloads/notebooks/UnderstandingTheFFT.ipynb
It's rather from speed perspective, but in this case we can actually see that sometimes it comes with simplicity too.
I don't think this is built in. However, direct calculation is straightforward:
import numpy as np
def DFT_matrix(N):
i, j = np.meshgrid(np.arange(N), np.arange(N))
omega = np.exp( - 2 * pi * 1J / N )
W = np.power( omega, i * j ) / sqrt(N)
return W
EDIT For a 2D FFT matrix, you can use the following:
x = np.zeros(N, N) # x is any input data with those dimensions
W = DFT_matrix(N)
dft_of_x = W.dot(x).dot(W)
As of scipy 0.14 there is a built-in scipy.linalg.dft:
Example with 16 point DFT matrix:
>>> import scipy.linalg
>>> import numpy as np
>>> m = scipy.linalg.dft(16)
Validate unitary property, note matrix is unscaled thus 16*np.eye(16):
>>> np.allclose(np.abs(np.dot( m.conj().T, m )), 16*np.eye(16))
True
For 2D DFT matrix, it's just a issue of tensor product, or specially, Kronecker Product in this case, as we are dealing with matrix algebra.
>>> m2 = np.kron(m, m) # 256x256 matrix, flattened from (16,16,16,16) tensor
Now we can give it a tiled visualization, it's done by rearranging each row into a square block
>>> import matplotlib.pyplot as plt
>>> m2tiled = m2.reshape((16,)*4).transpose(0,2,1,3).reshape((256,256))
>>> plt.subplot(121)
>>> plt.imshow(np.real(m2tiled), cmap='gray', interpolation='nearest')
>>> plt.subplot(122)
>>> plt.imshow(np.imag(m2tiled), cmap='gray', interpolation='nearest')
>>> plt.show()
Result (real and imag part separately):
As you can see they are 2D DFT basis functions
Link to documentation
#Alex| is basically correct, I add here the version I used for 2-d DFT:
def DFT_matrix_2d(N):
i, j = np.meshgrid(np.arange(N), np.arange(N))
A=np.multiply.outer(i.flatten(), i.flatten())
B=np.multiply.outer(j.flatten(), j.flatten())
omega = np.exp(-2*np.pi*1J/N)
W = np.power(omega, A+B)/N
return W
Lambda functions work too:
dftmtx = lambda N: np.fft.fft(np.eye(N))
You can call it by using dftmtx(N). Example:
In [62]: dftmtx(2)
Out[62]:
array([[ 1.+0.j, 1.+0.j],
[ 1.+0.j, -1.+0.j]])
If you wish to compute the 2D DFT as a single matrix operation, it is necessary to unravel the matrix X on which you wish to compute the DFT into a vector, as each output of the DFT has a sum over every index in the input, and a single square matrix multiplication does not have this ability. Taking care to be sure we are handling the indices correctly, I find the following works:
M = 16
N = 16
X = np.random.random((M,N)) + 1j*np.random.random((M,N))
Y = np.fft.fft2(X)
W = np.zeros((M*N,M*N),dtype=np.complex)
hold = []
for m in range(M):
for n in range(N):
hold.append((m,n))
for j in range(M*N):
for i in range(M*N):
k,l = hold[j]
m,n = hold[i]
W[j,i] = np.exp(-2*np.pi*1j*(m*k/M + n*l/N))
np.allclose(np.dot(W,X.ravel()),Y.ravel())
True
If you wish to change the normalization to orthogonal, you can divide by 1/sqrt(MN) or if you wish to have the inverse transformation, just change the sign in the exponent.
This might be a little late, but there is a better alternative for creating the DFT matrix, that performs faster, using NumPy's vander
also, this implementation does not use loops (explicitly)
def dft_matrix(signal):
N = signal.shape[0] # num of samples
w = np.exp((-2 * np.pi * 1j) / N) # remove the '-' for inverse fourier
r = np.arange(N)
w_matrix = np.vander(w ** r, increasing=True) # faster than meshgrid
return w_matrix
if I'm not mistaken, the main improvement is that this method generates the elements of the power from the (already calculated) previous elements
you can read about vander in the documentation:
numpy.vander
Related
Given data with shape = (t,m,n), I need to find a vector variable of shape (n,) that minimizes a convex function of the data and vector. I've used cvxopt (and cvxpy) to perform convex optimizations using 2D input, but it seems like they don't support 3D arrays. Is there a way to implement this convex optimization using these or other similar packages?
Given data with shape (t,m,n) and (t,m) and var with shape (n,), here's a simplification of the type of function I need to minimize:
import numpy as np
obj_func(var,data1,data2):
#data1.shape = (t,m,n)
#data2.shape = (t,m)
#var.shape = (n,)
score = np.sum(data1*var,axis=2) #dot product along axis 2
time_series = np.sum(score*data2,axis=1) #weighted sum along axis 1
return np.sum(time_series)-np.sum(time_series**2) #some function
This seems like it should be a simple convex optimization, but unfortunately these functions aren't supported on N-dimensional arrays in cvxopt/cvxpy. Is there a way to implement this?
I think if you simply reshape data1 to be 2d temporarily you'll be fine, e.g.
import numpy as np
import cvxpy as cp
t, m, n = 10, 8, 6
data1 = np.ones((t, m, n))
data2 = np.ones((t, m))
x = cp.Variable(n)
score = cp.reshape(data1.reshape(-1, n) * x, (t, m))
time_series = cp.sum(cp.multiply(score, data2), axis=1)
expr = cp.sum(time_series) - cp.sum(time_series ** 2)
print(repr(expr))
Outputs:
Expression(CONCAVE, UNKNOWN, ())
Assume I have a set of vectors $ a_1, ..., a_d $ that are orthonormal to each other. Now, I want to find another vector $ a_{d+1} $ that is orthogonal to all the other vectors.
Is there an efficient algorithm to achieve this? I can only think of adding a random vector to the end, and then applying gram-schmidt.
Is there a python library which already achieves this?
Related. Can't speak to optimality, but here is a working solution. The good thing is that numpy.linalg does all of the heavy lifting, so this may be speedier and more robust than doing Gram-Schmidt by hand. Besides, this suggests that the complexity is not worse than Gram-Schmidt.
The idea:
Treat your input orthogonal vectors as columns of a matrix O.
Add another random column to O. Generically O will remain a full-rank matrix.
Choose b = [0, 0, ..., 0, 1] with len(b) = d + 1.
Solve a least-squares problem x O = b. Then, x is guaranteed to be non-zero and orthogonal to all original columns of O.
import numpy as np
from numpy.linalg import lstsq
from scipy.linalg import orth
# random matrix
M = np.random.rand(10, 5)
# get 5 orthogonal vectors in 10 dimensions in a matrix form
O = orth(M)
def find_orth(O):
rand_vec = np.random.rand(O.shape[0], 1)
A = np.hstack((O, rand_vec))
b = np.zeros(O.shape[1] + 1)
b[-1] = 1
return lstsq(A.T, b)[0]
res = find_orth(O)
if all(np.abs(np.dot(res, col)) < 10e-9 for col in O.T):
print("Success")
else:
print("Failure")
I have a very large scipy sparse csr matrix. It is a 100,000x2,000,000 dimensional matrix. Let's call it X. Each row is a sample vector in a 2,000,000 dimensional space.
I need to calculate the cosine distances between each pair of samples very efficiently. I have been using sklearn pairwise_distances function with a subset of vectors in X which gives me a dense matrix D: the square form of the pairwise distances which contains redundant entries. How can I use sklearn pairwise_distances to get the condensed form directly? Please refer to http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html to see what the condensed form is. It is the output of scipy pdist function.
I have memory limitations and I can't calculate the square form and then get the condensed form. Due to memory limitations, I also cannot use scipy pdist as it requires a dense matrix X which does not again fit in memory. I thought about looping through different chunks of X and calculate the condensed form for each chunk and join them together to get the complete condensed form, but this is relatively cumbersome. Any better ideas?
Any help is much much appreciated. Thanks in advance.
Below is a reproducible example (of course for demonstration purposes X is much smaller):
from scipy.sparse import rand
from scipy.spatial.distance import pdist
from sklearn.metrics.pairwise import pairwise_distances
X = rand(1000, 10000, density=0.01, format='csr')
dist1 = pairwise_distances(X, metric='cosine')
dist2 = pdist(X.A, 'cosine')
As you see dist2 is in the condensed form and is a 499500 dimensional vector. But dist1 is in the symmetric squareform and is a 1000x1000 matrix.
I dug into the code for both versions, and think I understand what both are doing.
Start with a small simple X (dense):
X = np.arange(9.).reshape(3,3)
pdist cosine does:
norms = _row_norms(X)
_distance_wrap.pdist_cosine_wrap(_convert_to_double(X), dm, norms)
where _row_norms is a row dot - using einsum:
norms = np.sqrt(np.einsum('ij,ij->i', X,X)
So this is the first place where X has to be an array.
I haven't dug into the cosine_wrap, but it appears to do (probably in cython)
xy = np.dot(X, X.T)
# or xy = np.einsum('ij,kj',X,X)
d = np.zeros((3,3),float) # square receiver
d2 = [] # condensed receiver
for i in range(3):
for j in range(i+1,3):
val=1-xy[i,j]/(norms[i]*norms[j])
d2.append(val)
d[j,i]=d[i,j]=val
print('array')
print(d)
print('condensed',np.array(d2))
from scipy.spatial import distance
d1=distance.pdist(X,'cosine')
print(' pdist',d1)
producing:
array
[[ 0. 0.11456226 0.1573452 ]
[ 0.11456226 0. 0.00363075]
[ 0.1573452 0.00363075 0. ]]
condensed [ 0.11456226 0.1573452 0.00363075]
pdist [ 0.11456226 0.1573452 0.00363075]
distance.squareform(d1) produces the same thing as my d array.
I can produce the same square array by dividing the xy dot product with the appropriate norm outer product:
dd=1-xy/(norms[:,None]*norms)
dd[range(dd.shape[0]),range(dd.shape[1])]=0 # clean up 0s
Or by normalizing X before taking dot product. This appears to be what the scikit version does.
Xnorm = X/norms[:,None]
1-np.einsum('ij,kj',Xnorm,Xnorm)
scikit has added some cython code to do faster sparse calculations (beyond those provided by sparse.sparse, but using the same csr format):
from scipy import sparse
Xc=sparse.csr_matrix(X)
# csr_row_norm - pyx of following
cnorm = Xc.multiply(Xc).sum(axis=1)
cnorm = np.sqrt(cnorm)
X1 = Xc.multiply(1/cnorm) # dense matrix
dd = 1-X1*X1.T
To get a fast condensed form with sparse matrices I think you need to implement a fast condensed version of X1*X1.T. That means you need to understand how the sparse matrix multiplication is implemented - in c code. The scikit cython 'fast sparse' code might also give ideas.
numpy has some tri... functions which are straight forward Python code. It does not attempt to save time or space by implementing tri calculations directly. It's easier to iterate over the rectangular layout of a nd array (with shape and strides) than to do the more complex variable length steps of a triangular array. The condensed form only cuts the space and calculation steps by half.
============
Here's the main part of the c function pdist_cosine, which iterates over i and the upper j, calculating dot(x[i],y[j])/(norm[i]*norm[j]).
for (i = 0; i < m; i++) {
for (j = i + 1; j < m; j++, dm++) {
u = X + (n * i);
v = X + (n * j);
cosine = dot_product(u, v, n) / (norms[i] * norms[j]);
if (fabs(cosine) > 1.) {
/* Clip to correct rounding error. */
cosine = npy_copysign(1, cosine);
}
*dm = 1. - cosine;
}
}
https://github.com/scipy/scipy/blob/master/scipy/spatial/src/distance_impl.h
I have a matrix of counts,
import numpy as np
x = np.array([[ 1,2,3],[1,4,6],[2,3,7]])
And I need the percentages of the total along axis = 1:
for i in range(x.shape[0]):
for j in range(x.shape[1]):
x[i,j] = x[i,j] / np.sum(x[i,:])
In numpy broadcast form.
Currently, I have:
x_sums = np.sum(x,axis=1)
for j in range(x.shape[1]):
x[:,j] = x[:,j] / x_sums[:]
Which puts most of the complexity in numpy code...but a numpy one liner would be best.
Also,
def percentages(a):
return a / np.sum(a)
x_percentages = np.apply_along_axis(percentages,1,x)
But that still involves python.
np.linalg.norm
Is very close, in terms of what is going on, but they only have the 8 hardcoded norms, which does not include percentage of total.
Then there is np.percentile, which is again close...but it is computing the sorted percentile.
x /= x.sum(axis=1, keepdims=True)
Altough x should have a floating point dtype for this to work correctly.
Better may be:
x = np.true_divide(x, x.sum(axis=1, keepdims=True))
Could this be what you are after:
print (x.T/np.sum(x, axis=1)).T
I am trying to convert code that contains the \ operator from Matlab (Octave) to Python. Sample code
B = [2;4]
b = [4;4]
B \ b
This works and produces 1.2 as an answer. Using this web page
http://mathesaurus.sourceforge.net/matlab-numpy.html
I translated that as:
import numpy as np
import numpy.linalg as lin
B = np.array([[2],[4]])
b = np.array([[4],[4]])
print lin.solve(B,b)
This gave me an error:
numpy.linalg.linalg.LinAlgError: Array must be square
How come Matlab \ works with non square matrix for B?
Any solutions for this?
From MathWorks documentation for left matrix division:
If A is an m-by-n matrix with m ~= n and B is a column vector with m
components, or a matrix with several such columns, then X = A\B is the
solution in the least squares sense to the under- or overdetermined
system of equations AX = B. In other words, X minimizes norm(A*X - B),
the length of the vector AX - B.
The equivalent in numpy is np.linalg.lstsq:
In [15]: B = np.array([[2],[4]])
In [16]: b = np.array([[4],[4]])
In [18]: x,resid,rank,s = np.linalg.lstsq(B,b)
In [19]: x
Out[19]: array([[ 1.2]])
Matlab will actually do a number of different operations when the \ operator is used, depending on the shape of the matrices involved (see here for more details). In you example, Matlab is returning a least squares solution, rather than solving the linear equation directly, as would happen with a square matrix. To get the same behaviour in numpy, do this:
import numpy as np
import numpy.linalg as lin
B = np.array([[2],[4]])
b = np.array([[4],[4]])
print np.linalg.lstsq(B,b)[0]
which should give you the same solution as Matlab.
You can form the left inverse:
import numpy as np
import numpy.linalg as lin
B = np.array([[2],[4]])
b = np.array([[4],[4]])
B_linv = lin.solve(B.T.dot(B), B.T)
c = B_linv.dot(b)
print('c\n', c)
Result:
c
[[ 1.2]]
Actually, we can simply run the solver once, without forming an inverse, like this:
c = lin.solve(B.T.dot(B), B.T.dot(b))
print('c\n', c)
Result:
c
[[ 1.2]]
.... as before
Why? Because:
We have:
Multiply through by B.T, gives us:
Now, B.T.dot(B) is square, full rank, does have an inverse. And therefore we can multiply through by the inverse of B.T.dot(B), or use a solver, as above, to get c.