in the python code I'm currently developing there is a particular function that really requires a speed optimization.
To a first approximation I would like to focus on pure python code (no C or Cython implementations).
The function generates a series of gaussian curves with varying sigma depending on the x-axis position. It takes three arguments:
x0, 1d numpy array, central values of the gaussian curves
h , 1d numpy array, heights of the gaussian curves
x , 1d numpy array, values for the definition of the total sum
My goal is to obtain the sum of all the curves in the fastest way possible (it is a sort of convolution with a gaussian curve that has a position dependent sigma).
At the moment my code is:
sigs = get_sigmas(x0) # function that returns the value of sigma at each position
all_gauss_args = -0.5*np.power((x[:, np.newaxis] - x0[np.newaxis, :]) /
sigs[np.newaxis, :], 2.0)
sum = (1.0/(np.sqrt(2 * np.pi) * sigs[np.newaxis, :])) * np.exp(all_gauss_arg) *\
h[np.newaxis, :]
sum = np.sum(sum, axis=1)
return sum
It is possible to make it faster?
Thanks in advance for the help
Related
I am using a function to calculate a likelihood density.
I am running through two xs which are vectors of length 7.
def lhd(x0, x1, dt): #Define a function to calculate the likelihood density given two values.
d = len(x0) #Save the length of the inputs for the below pdf input.
print(d)
print(len(x1))
lh = multivariate_normal.pdf(x1, mean=(1-dt)*x0, cov=2*dt*np.identity(d)) #Take the pdf from a multivariate normal built from x0, given x1.
return lh #Return this pdf value.
The mean here is a vector of length 7, and the covariance is a (7,7) array.
When I run this, I get the error
ValueError: Array 'mean' must be a vector of length 49.
but looking at the formula of the pdf I do not think this is correct. Any idea what is going wrong here?
If dt is a (7,7) array, (1-dt) is also (7,7), the * operator in (1-dt)*x0 is the element-wise multiplication, if x0 is a vector of length 7 the result will be a (7,7) array.
I guess you meant to use matrix multiplication, you can that this using the x0 - dt # x0 (where # denotes the matrix multiplication operator).
I'm hoping to find a way around the solution offered here to use 2D arrays in order to do 2D numerical integration.
import numpy as np
ksize = 50
a = 1.0
kdom = np.pi / a
x = np.linspace(- kdom, kdom, ksize)
y = np.linspace(- kdom, kdom, ksize)
dk = x[1]-x[0]
X,Y = np.meshgrid(x,y)
eigval = np.cos(X)+np.cos(Y)
eigvalflat = eigval.flatten()
intval = np.trapz(np.trapz(eigval,x),y)
sumval = np.sum(eigvalflat)*dk/ksize
print(intval,sumval)
Given my dummy example above, I'd like to find a way to properly integrate the 1D array (eigvalflat) while still as a flattened array even though it is a double integral.
Computationally, if the integrand is not separable, then the answer is that you can't recast the double integral as a single integral, unless you compute the integral one dimension at a time, which is what the assignment to intval is essentially doing.
Analytically, you'll have a better chance by asking yourself the question: given the 2d region of the integral (a rectangle in your example), can one find an integral over the boundary of that region? For that, Green's theorem has you covered with necessary and sufficient conditions.
I generate a correlation matrix by drawing from a uniform distribution:
corr = np.random.uniform(low=0.1, high=0.3, size=[20, 20])
and set the main diagonal elements to one
corr[np.diag_indices_from(corr)] = 1
Then, I make the correlation matrix symmetric
corr[np.triu_indices(n=20, k=1)] = corr.T[np.triu_indices(n=20, k=1)]
which yields a totally positive matrix ,i.e., all values of the matrix are strictly positive.
According to numpy, however, the matrix is not positive (semi-) definit.
np.all(np.linalg.eigvals(corr) >= 0)
False
That's still not guaranteed to be PSD
I will give you two easy ways:
Sny square non-singular matrix can be used to create a PSD matrix with
A = A # A.T
Any matrix can be used to produce a PSD matrix with
A = (A + A.T)/2
A = A - np.eye(len(A)) * (np.min(np.linalg.eigvalsh(A)) - 0.001)
If you want the minimum perturbation to a symmetric matrix (the least squares projection to the positive semidefinite cone)
A_ = (v * np.maximum(w, 0.01)) # v.T
print(np.linalg.eigvalsh(A_))
Notice that I am giving a margin of 0.01, if I used strictly zero your test could fail due to numerical errors.
I am doing a benchmarking test in python on different ways to calculate A'*A, with A being a N x M matrix. One of the fastest ways was to use numpy.dot().
I was curious if I can obtain the same result using numpy.cov() (which gives the covariance matrix) by somehow varying the weights or by somehow pre-processing the A matrix ? But I had no success. Does anyone know if there is any relation between the product A'*A and covariance of A, where A is a matrix with N rows/observations and M columns/variables?
Have a look at the cov source. Near the end of the function it does this:
c = dot(X, X_T.conj())
Which is basically the dot product you want to perform. However, there are all kinds of other operations: checking inputs, subtracting the mean, normalization, ...
In short, np.cov will never ever be faster than np.dot(A.T, A) because internally it contains exactly that operation.
That said - the covariance matrix is computed as
Or in Python:
import numpy as np
a = np.random.rand(10, 3)
m = np.mean(a, axis=0, keepdims=True)
x = np.dot((a - m).T, a - m) / (a.shape[0] - 1)
y = np.cov(a.T)
assert np.allclose(x, y) # check they are equivalent
As you can see, the covariance matrix is equivalent to the raw dot product if you subtract the mean of each variable and divide the result by the number of samples (minus one).
here is my problem:
I have two sets of 3d points. Lets call them "Gausspoints" and "XYZ". I define a function which is a sum of Gaussians in which every Gaussian is centered at one of the Gausspoints. Now I want to evaluate this function on the XYZ points: My approach is working fine but it is rather slow. Any idea how to speed it up by exploiting numpy a little better?
def sumgaus(r):
t=r-Gausspoints
t=map(np.linalg.norm,t)
t = -np.power(t,2.0)
t=np.exp(t)
res=np.sum(t)
return res
result=map(sumgaus,XYZ)
Thanks for any help
Edit:
shape of XYZ N*3 and Gausspoints are M*3 with M, N being different integers
Edit2: I want to apply the following function on each item in XYZ
The tricky part is how to vectorize the computation of all the differences between your points without any explicit Python looping or mapping. You can roll out your own implementation using broadcasting by doing something like:
dist2 = XYZ[:, np.newaxis, :] - Gausspoints
dist2 *= dist
dist2 = np.sum(dist, axis=-1)
And if XYZ has shape (n, 3) and Gausspoints has shape (m, 3), then dist will have shape (n, m), with dist[i, j] being the distance between points XYZ[i] and Gausspoints[j].
It may be easier to understand using scipy.spatial.distance.cdist:
from scipy.spatial.distance import cdist
dist2 = cdist(XYZ, Gausspoints)
dist2 *= dist2
But once you have your array of squared distances, it's child's play:
f = np.sum(np.exp(-dist2), axis=1)