Fast way to calculate kernel matrix, python - python

let K(x, z) be (x_transpose*z + p_constant)**2.
I want to compute the n*n matrix K, where K_ij = k(X_i, X_j)
X is a n by d matrix, and X_i is the transpose of the ith row of X.
Does anyone know of a quick way to compute this? I'm using python.
Wait a second, is K just XX^T?

import numpy as np
def K(x,z, p_constant=1.0):
return (np.dot(x.T,z)+p_constant)**2
#...
x=np.arange(100).reshape((10,10))
np.fromfunction(np.vectorize(lambda i,j: K(x[i],x[:,j])), x.shape, dtype=x.dtype)
I had some trouble with np.fromfunction's misleading documentation.

Yes, the answer on question "How to compute linear kernel matrix" is
np.dot(X , np.transpose(X).

Related

Solving A*x=x with numpy

I am new to numpy and I want to solve the equation A * x = x, where A is a matrix and x is a vector.
Searching for the vector x, if it exists.
I found the np.linalg.solve()-function , but didn't got it to work as intended.
The issue here is not so much a problem with numpy as with your understanding of the linear algebra involved. The question you are asking can be rephrased as:
A # x = x
A # x = I # x
(A - I) # x = 0
This is a specific formulation of the the general eigenvector problem, with the stipulation that the eigenvalue is 1.
Numpy solves this problem with the function np.linalg.eig:
w, v = np.linalg.eig(A)
You need to check if any of the values are 1 (there could be more than one):
mask = np.isclose(w, 1)
if mask.any():
for vec in v[:, mask].T:
print(vec)
else:
print('Nope!')
The elements of v are unit vectors. Keep in mind that any scalar multiple of such a vector is also a solution.
For issues with non-invertible matrices, you may want to use scipy.linalg.svd instead:
v, w, _ = svd(A)
The rest of the procedure will be the same.

Calculate gradient of norm of a vector with respect to a vector in python

I want to calculate gradient (f(v(X)))
where f(a) : second_norm(a);
X : a vector of 1*n dimension :: as an example for n == 2: [x1, x2]
v(X) : (((x1)^m)*P + ((x2)^m)*Q)/(x1^m + x2^m);
where P and Q are vectors
So is there any function in python which can help me with this? If so, please elaborate.
I really need help!
Thanks in advance!
If I understand your function P and Q should be two vectors of the same dimension.
In this case it's enough to use numpy array.
I put a very simple code that may help you:
import numpy as np
x1=2
x2=5
a=[x1,x2]
m=5
P=np.array([1,2,3,4])
Q=np.array([5,6,7,8])
print(( (a[0]**m)*P +(a[1]**m)*Q )/(a[0]**m + a[1]**m))
Output:
array([4.95945518, 5.95945518, 6.95945518, 7.95945518])
In general if you want to multiply a vector with a scalar you need to use numpy array.

Vectorise Python code

I have coded a kriging algorithm but I find it quite slow. Especially, do you have an idea on how I could vectorise the piece of code in the cons function below:
import time
import numpy as np
B = np.zeros((200, 6))
P = np.zeros((len(B), len(B)))
def cons():
time1=time.time()
for i in range(len(B)):
for j in range(len(B)):
P[i,j] = corr(B[i], B[j])
time2=time.time()
return time2-time1
def corr(x,x_i):
return np.exp(-np.sum(np.abs(np.array(x) - np.array(x_i))))
time_av = 0.
for i in range(30):
time_av+=cons()
print "Average=", time_av/100.
Edit: Bonus questions
What happens to the broadcasting solution if I want corr(B[i], C[j]) with C the same dimension than B
What happens to the scipy solution if my p-norm orders are an array:
p=np.array([1.,2.,1.,2.,1.,2.])
def corr(x, x_i):
return np.exp(-np.sum(np.abs(np.array(x) - np.array(x_i))**p))
For 2., I tried P = np.exp(-cdist(B, C,'minkowski', p)) but scipy is expecting a scalar.
Your problem seems very simple to vectorize. For each pair of rows of B you want to compute
P[i,j] = np.exp(-np.sum(np.abs(B[i,:] - B[j,:])))
You can make use of array broadcasting and introduce a third dimension, summing along the last one:
P2 = np.exp(-np.sum(np.abs(B[:,None,:] - B),axis=-1))
The idea is to reshape the first occurence of B to shape (N,1,M) while the second B is left with shape (N,M). With array broadcasting, the latter is equivalent to (1,N,M), so
B[:,None,:] - B
is of shape (N,N,M). Summing along the last index will then result in the (N,N)-shape correlation array you're looking for.
Note that if you were using scipy, you would be able to do this using scipy.spatial.distance.cdist (or, equivalently, a combination of scipy.spatial.distance.pdist and scipy.spatial.distance.squareform), without unnecessarily computing the lower triangular half of this symmetrix matrix. Using #Divakar's suggestion in comments for the simplest solution this way:
from scipy.spatial.distance import cdist
P3 = 1/np.exp(cdist(B, B, 'minkowski',1))
cdist will compute the Minkowski distance in 1-norm, which is exactly the sum of the absolute values of coordinate differences.

Compute numpy array pairwise Euclidean distance except with self

edit: this question is not specifically about calculating distances, rather the most efficient way to loop through a numpy array, specifying that for index i all comparisons should be made with the rest of the array, as long as the second index is not i.
I have a numpy array with columns (X, Y, ID) and want to compare each element to each other element, but not itself. So, for each X, Y coordinate, I want to calculate the distance to each other X, Y coordinate, but not itself (where distance = 0).
Here is what I have - there must be a more "numpy" way to write this.
import math, arcpy
# Point feature class
fc = "MY_FEATURE_CLASS"
# Load points to numpy array: (X, Y, ID)
npArray = arcpy.da.FeatureClassToNumPyArray(fc,["SHAPE#X","SHAPE#Y","OID#"])
for row in npArray:
for row2 in npArray:
if row[2] != row2[2]:
# Pythagoras's theorem
distance = math.sqrt(math.pow((row[0]-row2[0]),2)+math.pow((row[1]-row2[1]),2))
Obviously, I'm a numpy newbie. I will not be surprised to find this a duplicate, but I don't have the numpy vocabulary to search out the answer. Any help appreciated!
Using SciPy's pdist, you could write something like
from scipy.spatial.distance import pdist, squareform
distances = squareform(pdist(npArray, lambda a,b: np.sqrt((a[0]-b[0])**2 + (a[1]-b[1])**2)))
pdist will compute the pair-wise distances using the custom metric that ignores the 3rd coordinate (which is your ID in this case). squareform turns this into a more readable matrix such that distances[0,1] gives the distance between the 0th and 1st rows.
Each row of X is a 3 dimensional data instance or point.
The output pairwisedist[i, j] is distance of X[i, :] and X[j, :]
X = np.array([[6,1,7],[10,9,4],[13,9,3],[10,8,15],[14,4,1]])
a = np.sum(X*X,1)
b = np.repeat( a[:,np.newaxis],5,axis=1)
pairwisedist = b + b.T -2* X.dot(X.T)
I wanted to point out that custom written sqrt of sum of squares are prone to overflow and underflow. Bultin math.hypot, np.hypot are way safer for no compromise on performance
from scipy.spatial.distance import pdist, squareform
distances = squareform(pdist(npArray, lambda a,b: math.hypot(*(a-b))
Refer

DFT matrix in python

What's the easiest way to get the DFT matrix for 2-d DFT in python? I could not find such function in numpy.fft. Thanks!
The easiest and most likely the fastest method would be using fft from SciPy.
import scipy as sp
def dftmtx(N):
return sp.fft(sp.eye(N))
If you know even faster way (might be more complicated) I'd appreciate your input.
Just to make it more relevant to the main question - you can also do it with numpy:
import numpy as np
dftmtx = np.fft.fft(np.eye(N))
When I had benchmarked both of them I have an impression scipy one was marginally faster but I
have not done it thoroughly and it was sometime ago so don't take my word for it.
Here's pretty good source on FFT implementations in python:
http://nbviewer.ipython.org/url/jakevdp.github.io/downloads/notebooks/UnderstandingTheFFT.ipynb
It's rather from speed perspective, but in this case we can actually see that sometimes it comes with simplicity too.
I don't think this is built in. However, direct calculation is straightforward:
import numpy as np
def DFT_matrix(N):
i, j = np.meshgrid(np.arange(N), np.arange(N))
omega = np.exp( - 2 * pi * 1J / N )
W = np.power( omega, i * j ) / sqrt(N)
return W
EDIT For a 2D FFT matrix, you can use the following:
x = np.zeros(N, N) # x is any input data with those dimensions
W = DFT_matrix(N)
dft_of_x = W.dot(x).dot(W)
As of scipy 0.14 there is a built-in scipy.linalg.dft:
Example with 16 point DFT matrix:
>>> import scipy.linalg
>>> import numpy as np
>>> m = scipy.linalg.dft(16)
Validate unitary property, note matrix is unscaled thus 16*np.eye(16):
>>> np.allclose(np.abs(np.dot( m.conj().T, m )), 16*np.eye(16))
True
For 2D DFT matrix, it's just a issue of tensor product, or specially, Kronecker Product in this case, as we are dealing with matrix algebra.
>>> m2 = np.kron(m, m) # 256x256 matrix, flattened from (16,16,16,16) tensor
Now we can give it a tiled visualization, it's done by rearranging each row into a square block
>>> import matplotlib.pyplot as plt
>>> m2tiled = m2.reshape((16,)*4).transpose(0,2,1,3).reshape((256,256))
>>> plt.subplot(121)
>>> plt.imshow(np.real(m2tiled), cmap='gray', interpolation='nearest')
>>> plt.subplot(122)
>>> plt.imshow(np.imag(m2tiled), cmap='gray', interpolation='nearest')
>>> plt.show()
Result (real and imag part separately):
As you can see they are 2D DFT basis functions
Link to documentation
#Alex| is basically correct, I add here the version I used for 2-d DFT:
def DFT_matrix_2d(N):
i, j = np.meshgrid(np.arange(N), np.arange(N))
A=np.multiply.outer(i.flatten(), i.flatten())
B=np.multiply.outer(j.flatten(), j.flatten())
omega = np.exp(-2*np.pi*1J/N)
W = np.power(omega, A+B)/N
return W
Lambda functions work too:
dftmtx = lambda N: np.fft.fft(np.eye(N))
You can call it by using dftmtx(N). Example:
In [62]: dftmtx(2)
Out[62]:
array([[ 1.+0.j, 1.+0.j],
[ 1.+0.j, -1.+0.j]])
If you wish to compute the 2D DFT as a single matrix operation, it is necessary to unravel the matrix X on which you wish to compute the DFT into a vector, as each output of the DFT has a sum over every index in the input, and a single square matrix multiplication does not have this ability. Taking care to be sure we are handling the indices correctly, I find the following works:
M = 16
N = 16
X = np.random.random((M,N)) + 1j*np.random.random((M,N))
Y = np.fft.fft2(X)
W = np.zeros((M*N,M*N),dtype=np.complex)
​
hold = []
for m in range(M):
for n in range(N):
hold.append((m,n))
​
for j in range(M*N):
for i in range(M*N):
k,l = hold[j]
m,n = hold[i]
W[j,i] = np.exp(-2*np.pi*1j*(m*k/M + n*l/N))
np.allclose(np.dot(W,X.ravel()),Y.ravel())
True
If you wish to change the normalization to orthogonal, you can divide by 1/sqrt(MN) or if you wish to have the inverse transformation, just change the sign in the exponent.
This might be a little late, but there is a better alternative for creating the DFT matrix, that performs faster, using NumPy's vander
also, this implementation does not use loops (explicitly)
def dft_matrix(signal):
N = signal.shape[0] # num of samples
w = np.exp((-2 * np.pi * 1j) / N) # remove the '-' for inverse fourier
r = np.arange(N)
w_matrix = np.vander(w ** r, increasing=True) # faster than meshgrid
return w_matrix
if I'm not mistaken, the main improvement is that this method generates the elements of the power from the (already calculated) previous elements
you can read about vander in the documentation:
numpy.vander

Categories