Related
Using python/numpy, I have the following np.einsum:
np.einsum('abde,abc->bcde', X, Y)
Y is sparse: for each [a,b], only one c == 1; all others := 0.
For an example of relative size of the axes, X.shape is on the order of (1000, 5, 30, 30), and Y.shape is equivalently (1000, 5, 300).
This operation is extremely costly; I want to make this more performant. For one thing, einsum is not parallelized. For another, beecause Y is sparse, I'm effectively computing 300x the number of multiplication operations I should be doing. In fact, when I wrote the equivalent of this einsum using a loop over n, I got a speed-up of around 3x. But that's clearly not very good.
How should I approach making this more performant? I've tried using np.tensordot, but I could not figure out how to get what I want from it (and I still run into the sparse/dense problem).
If Y only contains 1 and 0 then the einsum basically does this:
result = np.zeros(Y.shape[1:] + X.shape[2:], X.dtype)
I, J, K = np.nonzero(Y)
result[J, K] += X[I, J]
But this doesn't give the correct result due to duplicate j, k indices.
I couldn't get numpy.add.at to work, but a loop over just these indices is still pretty fast, at least for the given shapes and sparsity.
result = np.zeros(Y.shape[1:] + X.shape[2:], X.dtype)
for i, j, k in zip(*np.nonzero(Y)):
result[j, k] += X[i, j]
This is the test code that I used:
a, b, c, d, e = 1000, 5, 300, 30, 30
X = np.random.randint(10, size=(a,b,d,e))
R = np.random.rand(a, b, c)
K = np.argmax(R, axis=2)
I, J = np.indices((a, b), sparse=True)
Y = np.zeros((a, b, c), int)
Y[I, J, K] = 1
You can do that pretty easily with Numba:
import numba
#numba.njit('float64[:,:,:,::1](float64[:,:,:,::1], float64[:,:,::1])', fastmath=True, parallel=True)
def compute(x, y):
na, nb, nd, ne = x.shape
nc = y.shape[2]
assert y.shape == (na, nb, nc)
out = np.zeros((nb, nc, nd, ne))
for b in numba.prange(nb):
for a in range(na):
for c in range(nc):
yVal = y[a, b, c]
if np.abs(yVal) != 0:
for d in range(nd):
for e in range(ne):
out[b, c, d, e] += x[a, b, d, e] * yVal
return out
Note that it is faster to iterate over a and then b for a sequential code. That being said, for the code to be parallel, the loop have been swapped and the parallelization is performed over b (which is a small axis). A parallel reduction over the axis a would be more efficient, but this is unfortunately not easy to do with Numba (one need to split matrices in multiple chunks since there is no simple way to create thread-local matrices).
Note you can replace values like nd and ne by the actual value (ie. 30) so for the compiler to generate a faster code specifically for this matrix size.
Here is the testing code:
np.random.seed(0)
x = np.random.rand(1000, 5, 30, 30)
y = np.random.rand(1000, 5, 300)
y[np.random.rand(*y.shape) > 0.1] = 0.0 # Make it sparse (90% of 0)
%time res = np.einsum('abde,abc->bcde', x, y) # 2.350 s
%time res2 = compute(x, y) # 0.074 s (0.061 s with hand-written sizes)
print(np.allclose(res, res2))
This is about 32 times faster on a 10-core Intel Skylake Xeon processor. It reaches a 38x speed up with hand-written sizes. It does not scale very well due to the parallelization over the b axis but using other axis will cause a less efficient memory accesses.
If this is not enough, it may be a good idea to transpose x and y first so to improve data locality (thanks to a more contiguous access pattern along the a axis) and a better scaling (by parallelizing both the b and c axis). That being said, transpositions are generally expensive so one certainly need to optimize it so to get an even better speed up.
I have a tensor input of dimensions (B,C,H,W) and I would like to find a correlation matrix of the input. The code I am using is :
def corr(x):
"""
x: [B, C, H, W]
"""
# [B, C, H, W] -> [B, C, H * W]
x = x.view((x.size(0), x.size(1), -1))
# estimated covariance
x = x - x.mean(dim=-1, keepdim=True)
factor = 1 / (x.shape[-1] - 1)
cov = factor * (x # x.transpose(-1, -2))
return torch.div(cov,torch.diagonal(cov, dim1=-2, dim2=-1))
So I rechecked myself and it looks like I am getting good results for the cov variable in a function but when I try to normalize it to get the correlation, the result's range is very strange, there are values above 1 and below -1, and overall the solution does not seem to be right.
Any suggestions on how to solve the problem?
I have two vectors X = [a,b,c,d] and Y = [m,n,o]. I'd like to construct a matrix M where each element is an operation on each pair from X and Y. i.e.
M[j,i] = f(X[i], Y[j])
# e.g. where f(x,y) = x-y:
M :=
a-m b-m c-m d-m
a-n b-n c-n d-n
a-o b-o c-o d-o
I imagine I could do this with two tf.while_loop(), but that seems inefficient, I was wondering if there is a more compact and parallel way of doing this.
P.S. There is a slight complication that X and Y are in fact not vectors, but R2. i.e. each element in X and Y is itself a fixed length vector, and f(X, Y) performs f() element wise. Plus there is a batch component too.
I.e.
X.shape => [BATCH, I, K]
Y.shape => [BATCH, J, K]
M[batch, j, i, k] = f( X[batch, i, k], Y[batch, j, k] )
# e.g.:
= X[batch, i, k] - Y[batch, j, k]
this is using the python API btw
I found a way of doing this by increasing rank and using broadcasting. I still don't know if this is the most efficient way of doing it, but it's a heck of a lot better than using tf.while_loop I guess! I'm still open to suggestions / improvements.
X_expand = tf.expand_dims(X, 1)
Y_expand = tf.expand_dims(Y, 2)
# now I think M = f(X,Y) will broadcast each tensor to the higher dimension on each axis duplicating the data e.g.:
M = X-Y
Consider three numpy arrays. Each numpy array is three dimensional. We have array X, array Y, and array Z. All these arrays are the same shape. Combining the three matching elements of X, Y, and Z at the same places gives a coordinate. I have a function (not python function, mathematical) which has to run on one of these position vectors and place an output into another three dimensional array called s. So if the arrays were defined as shown below:
X = [[[1,2],[3,4]] Y = [[[1,2],[3,4]] Z = [[[1,2],[3,4]]
[[5,6],[7,8]]] [[5,6],[7,8]]] [[5,6],[7,8]]]
Then the points to be tested would be:
(1,1,1),(2,2,2),(3,3,3),(4,4,4),(5,5,5),(6,6,6),(7,7,7),(8,8,8)
If the function s was simply a+b+c then the results matrix would be:
s=[[[ 3, 6],[ 9,12]]
[[15,18],[21,24]]]
But this is not the case instead we have a two dimensional numpy array called sv. In the actual problem, sv is a list of vectors of dimension three, like our position vectors. Each position vector must be subtracted from each support vector and the magnitude found of the resulting vector to give the classification of each vector. What numpy operations can be used to do this?
We start with the 3 arrays of components x, y, and z. I will change the values from your example so that they have unique values:
x = np.array([[[1,2],[3,4]],
[[5,6],[7,8]]])
y = x + 10
z = y + 10
Each of the above have shape (2,2,2), but they could be any (n, m, l). This shape will have little impact on our process.
We next combine the three component arrays into a new array p, the "position vector", creating a new dimension i will iterate over the three physical dimensions x, y, z,
p = np.array([x, y, z])
so p[0] is x and so on, and p has shape (d, n, m, l) (where d=3 is the physical dimensionality of the vectors).
Now we look at your list of vectors sv which presumably has shape (N, d). Let us use a small number for N:
N = 4
d = 3
sv = np.arange(d*N).reshape(N,d) # a list of N vectors in 3d
OK the above was a little repetive but I want to be clear (and please correct any misunderstandings I may have had from your question).
You want to make some difference, diff in which you take each of the n*m*l vectors buried in p and subtract from it each of the N vectors in sv. This will give you N*n*m*l vectors, which each have d components. We need to align each of these dimensions before we do subtractions.
Basically we want to take p - sv but we must make sure that their shapes match so that the d axis is aligned, and the n, m, l and N axes basically just add up. The way numpy broadcasts is to take the shapes of the array, and aligns them from the end, so the last axis of each is aligned, and so on. To broadcast, each size must match exactly, or must be empty (on the left) or 1. That is, if your shapes were (a, b, c) and (b, c), you would be fine, and the second array would be repeated ("broadcasted") a times to match the a different subarrays of shape (b, c) in the first array. You can use dimensions length 1 which will force the position, so normally two arrays of shape (a, b, c) and (a, b) will not align because the last axis does not match, but you can add a new placeholder axis at the end of the second to give it shape (a, b, 1) which will match to (a, b, c) no matter what the value of c is.
We give shape (N, d, 1, 1, 1) to sv which matches the shape (d, n, m, l) of p. This can be done several ways:
sv = sv.reshape(sv.shape + (1,1,1)])
#or
sv.shape += (1, 1, 1)
#or
sv = sv[..., None, None, None]
Then, we can do the difference:
diff = p - sv[..., None, None, None]
where we have that diff.shape is (N, d, n, m, l). Now we can square it and sum over the second (d) dimension to get the norm/magnitude of each vector:
m = (diff*diff).sum(1)
which of course will have shape (N, n, m, l), or in the example case (4, 2, 2, 2)
So, all together:
import numpy as np
x = np.array([[[1,2],[3,4]],
[[5,6],[7,8]]])
y = x + 10
z = y + 10
p = np.array([x, y, z])
print p.shape
N = 4
d = 3
sv = np.arange(d*N).reshape(N,d) # a list of N vectors in 3d
print sv.shape
diff = p - sv[..., None, None, None]
print diff.shape
m = (diff*diff).sum(1)
print m.shape
I have 4 non-linear equations with three unknowns X, Y, and Z that I want to solve for. The equations are of the form:
F(m) = X^2 + a(m)Y^2 + b(m)XYcosZ + c(m)XYsinZ
...where a, b and c are constants which are dependent on each value of F in the four equations.
What is the best way to go about solving this?
There are two ways to do this.
Use a non-linear solver
Linearize the problem and solve it in the least-squares sense
Setup
So, as I understand your question, you know F, a, b, and c at 4 different points, and you want to invert for the model parameters X, Y, and Z. We have 3 unknowns and 4 observed data points, so the problem is overdetermined. Therefore, we'll be solving in the least-squares sense.
It's more common to use the opposite terminology in this case, so let's flip your equation around. Instead of:
F_i = X^2 + a_i Y^2 + b_i X Y cosZ + c_i X Y sinZ
Let's write:
F_i = a^2 + X_i b^2 + Y_i a b cos(c) + Z_i a b sin(c)
Where we know F, X, Y, and Z at 4 different points (e.g. F_0, F_1, ... F_i).
We're just changing the names of the variables, not the equation itself. (This is more for my ease of thinking than anything else.)
Linear Solution
It's actually possible to linearize this equation. You can easily solve for a^2, b^2, a b cos(c), and a b sin(c). To make this a bit easier, let's relabel things yet again:
d = a^2
e = b^2
f = a b cos(c)
g = a b sin(c)
Now the equation is a lot simpler: F_i = d + e X_i + f Y_i + g Z_i. It's easy to do a least-squares linear inversion for d, e, f, and g. We can then get a, b, and c from:
a = sqrt(d)
b = sqrt(e)
c = arctan(g/f)
Okay, let's write this up in matrix form. We're going to translate 4 observations of (the code we'll write will take any number of observations, but let's keep it concrete at the moment):
F_i = d + e X_i + f Y_i + g Z_i
Into:
|F_0| |1, X_0, Y_0, Z_0| |d|
|F_1| = |1, X_1, Y_1, Z_1| * |e|
|F_2| |1, X_2, Y_2, Z_2| |f|
|F_3| |1, X_3, Y_3, Z_3| |g|
Or: F = G * m (I'm a geophysist, so we use G for "Green's Functions" and m for "Model Parameters". Usually we'd use d for "data" instead of F, as well.)
In python, this would translate to:
def invert(f, x, y, z):
G = np.vstack([np.ones_like(x), x, y, z]).T
m, _, _, _ = np.linalg.lstsq(G, f)
d, e, f, g = m
a = np.sqrt(d)
b = np.sqrt(e)
c = np.arctan2(g, f) # Note that `c` will be in radians, not degrees
return a, b, c
Non-linear Solution
You could also solve this using scipy.optimize, as #Joe suggested. The most accessible function in scipy.optimize is scipy.optimize.curve_fit which uses a Levenberg-Marquardt method by default.
Levenberg-Marquardt is a "hill climbing" algorithm (well, it goes downhill, in this case, but the term is used anyway). In a sense, you make an initial guess of the model parameters (all ones, by default in scipy.optimize) and follow the slope of observed - predicted in your parameter space downhill to the bottom.
Caveat: Picking the right non-linear inversion method, initial guess, and tuning the parameters of the method is very much a "dark art". You only learn it by doing it, and there are a lot of situations where things won't work properly. Levenberg-Marquardt is a good general method if your parameter space is fairly smooth (this one should be). There are a lot of others (including genetic algorithms, neural nets, etc in addition to more common methods like simulated annealing) that are better in other situations. I'm not going to delve into that part here.
There is one common gotcha that some optimization toolkits try to correct for that scipy.optimize doesn't try to handle. If your model parameters have different magnitudes (e.g. a=1, b=1000, c=1e-8), you'll need to rescale things so that they're similar in magnitude. Otherwise scipy.optimize's "hill climbing" algorithms (like LM) won't accurately calculate the estimate the local gradient, and will give wildly inaccurate results. For now, I'm assuming that a, b, and c have relatively similar magnitudes. Also, be aware that essentially all non-linear methods require you to make an initial guess, and are sensitive to that guess. I'm leaving it out below (just pass it in as the p0 kwarg to curve_fit) because the default a, b, c = 1, 1, 1 is a fairly accurate guess for a, b, c = 3, 2, 1.
With the caveats out of the way, curve_fit expects to be passed a function, a set of points where the observations were made (as a single ndim x npoints array), and the observed values.
So, if we write the function like this:
def func(x, y, z, a, b, c):
f = (a**2
+ x * b**2
+ y * a * b * np.cos(c)
+ z * a * b * np.sin(c))
return f
We'll need to wrap it to accept slightly different arguments before passing it to curve_fit.
In a nutshell:
def nonlinear_invert(f, x, y, z):
def wrapped_func(observation_points, a, b, c):
x, y, z = observation_points
return func(x, y, z, a, b, c)
xdata = np.vstack([x, y, z])
model, cov = opt.curve_fit(wrapped_func, xdata, f)
return model
Stand-alone Example of the two methods:
To give you a full implementation, here's an example that
generates randomly distributed points to evaluate the function on,
evaluates the function on those points (using set model parameters),
adds noise to the results,
and then inverts for the model parameters using both the linear and non-linear methods described above.
import numpy as np
import scipy.optimize as opt
def main():
nobservations = 4
a, b, c = 3.0, 2.0, 1.0
f, x, y, z = generate_data(nobservations, a, b, c)
print 'Linear results (should be {}, {}, {}):'.format(a, b, c)
print linear_invert(f, x, y, z)
print 'Non-linear results (should be {}, {}, {}):'.format(a, b, c)
print nonlinear_invert(f, x, y, z)
def generate_data(nobservations, a, b, c, noise_level=0.01):
x, y, z = np.random.random((3, nobservations))
noise = noise_level * np.random.normal(0, noise_level, nobservations)
f = func(x, y, z, a, b, c) + noise
return f, x, y, z
def func(x, y, z, a, b, c):
f = (a**2
+ x * b**2
+ y * a * b * np.cos(c)
+ z * a * b * np.sin(c))
return f
def linear_invert(f, x, y, z):
G = np.vstack([np.ones_like(x), x, y, z]).T
m, _, _, _ = np.linalg.lstsq(G, f)
d, e, f, g = m
a = np.sqrt(d)
b = np.sqrt(e)
c = np.arctan2(g, f) # Note that `c` will be in radians, not degrees
return a, b, c
def nonlinear_invert(f, x, y, z):
# "curve_fit" expects the function to take a slightly different form...
def wrapped_func(observation_points, a, b, c):
x, y, z = observation_points
return func(x, y, z, a, b, c)
xdata = np.vstack([x, y, z])
model, cov = opt.curve_fit(wrapped_func, xdata, f)
return model
main()
You probably want to be using scipy's nonlinear solvers, they're really easy: http://docs.scipy.org/doc/scipy/reference/optimize.nonlin.html