Consider three numpy arrays. Each numpy array is three dimensional. We have array X, array Y, and array Z. All these arrays are the same shape. Combining the three matching elements of X, Y, and Z at the same places gives a coordinate. I have a function (not python function, mathematical) which has to run on one of these position vectors and place an output into another three dimensional array called s. So if the arrays were defined as shown below:
X = [[[1,2],[3,4]] Y = [[[1,2],[3,4]] Z = [[[1,2],[3,4]]
[[5,6],[7,8]]] [[5,6],[7,8]]] [[5,6],[7,8]]]
Then the points to be tested would be:
(1,1,1),(2,2,2),(3,3,3),(4,4,4),(5,5,5),(6,6,6),(7,7,7),(8,8,8)
If the function s was simply a+b+c then the results matrix would be:
s=[[[ 3, 6],[ 9,12]]
[[15,18],[21,24]]]
But this is not the case instead we have a two dimensional numpy array called sv. In the actual problem, sv is a list of vectors of dimension three, like our position vectors. Each position vector must be subtracted from each support vector and the magnitude found of the resulting vector to give the classification of each vector. What numpy operations can be used to do this?
We start with the 3 arrays of components x, y, and z. I will change the values from your example so that they have unique values:
x = np.array([[[1,2],[3,4]],
[[5,6],[7,8]]])
y = x + 10
z = y + 10
Each of the above have shape (2,2,2), but they could be any (n, m, l). This shape will have little impact on our process.
We next combine the three component arrays into a new array p, the "position vector", creating a new dimension i will iterate over the three physical dimensions x, y, z,
p = np.array([x, y, z])
so p[0] is x and so on, and p has shape (d, n, m, l) (where d=3 is the physical dimensionality of the vectors).
Now we look at your list of vectors sv which presumably has shape (N, d). Let us use a small number for N:
N = 4
d = 3
sv = np.arange(d*N).reshape(N,d) # a list of N vectors in 3d
OK the above was a little repetive but I want to be clear (and please correct any misunderstandings I may have had from your question).
You want to make some difference, diff in which you take each of the n*m*l vectors buried in p and subtract from it each of the N vectors in sv. This will give you N*n*m*l vectors, which each have d components. We need to align each of these dimensions before we do subtractions.
Basically we want to take p - sv but we must make sure that their shapes match so that the d axis is aligned, and the n, m, l and N axes basically just add up. The way numpy broadcasts is to take the shapes of the array, and aligns them from the end, so the last axis of each is aligned, and so on. To broadcast, each size must match exactly, or must be empty (on the left) or 1. That is, if your shapes were (a, b, c) and (b, c), you would be fine, and the second array would be repeated ("broadcasted") a times to match the a different subarrays of shape (b, c) in the first array. You can use dimensions length 1 which will force the position, so normally two arrays of shape (a, b, c) and (a, b) will not align because the last axis does not match, but you can add a new placeholder axis at the end of the second to give it shape (a, b, 1) which will match to (a, b, c) no matter what the value of c is.
We give shape (N, d, 1, 1, 1) to sv which matches the shape (d, n, m, l) of p. This can be done several ways:
sv = sv.reshape(sv.shape + (1,1,1)])
#or
sv.shape += (1, 1, 1)
#or
sv = sv[..., None, None, None]
Then, we can do the difference:
diff = p - sv[..., None, None, None]
where we have that diff.shape is (N, d, n, m, l). Now we can square it and sum over the second (d) dimension to get the norm/magnitude of each vector:
m = (diff*diff).sum(1)
which of course will have shape (N, n, m, l), or in the example case (4, 2, 2, 2)
So, all together:
import numpy as np
x = np.array([[[1,2],[3,4]],
[[5,6],[7,8]]])
y = x + 10
z = y + 10
p = np.array([x, y, z])
print p.shape
N = 4
d = 3
sv = np.arange(d*N).reshape(N,d) # a list of N vectors in 3d
print sv.shape
diff = p - sv[..., None, None, None]
print diff.shape
m = (diff*diff).sum(1)
print m.shape
Related
Let's consider a function of two variables f(x1, x2) , where x1 spans over a vector v1 and x2 spans over a vector v2.
If f(x1, x2) = np.exp(x1 + x2), we can represent this function in Python as a matrix by means of the command numpy.meshgrid like this:
xx, yy = numpy.meshgrid(v1, v2)
M = numpy.exp(xx + yy)
This way, M is a representation of the function f over the cartesian product "v1 x v2", since M[i,j] = f(v1[i],v2[j]).
But this works because both sums and exponential work in parallel componentwise. My question is:
if my variable is x = numpy.array([x1, x2]) and f is a quadratic function f(x) = x.T # np.dot(Q, x), where Q is a 2x2 matrix, how can I do the same thing with the meshgrid function (i.e. calculating all the values of the function f on "v1 x v2" at once)?
Please let me know if I should include more details!
def quad(x, y, q):
"""Return an array A of a shape (len(x), len(y)) with
values A[i,j] = [x[i],y[j]] # q # [x[i],y[j]]
x, y: 1d arrays,
q: an array of shape (2,2)"""
from numpy import array, meshgrid, einsum
a = array(meshgrid(x, y)).transpose()
return einsum('ijk,kn,ijn->ij', a, q, a)
Notes
meshgrid produces 2 arrays of a shape (len(y), len(x)), where first one is with x values along the second dimension. If we apply to this pair np.array then a 3d array of shape (2, len(y), len(x)) will be produced. With transpose we obtain an array, where an element indexed by [i,j,k] is x[i] if k==0 else y[j], where k is 0 or 1, i.e. first or second array from meshgrid.
With 'ijk,kn,ijn->ij' we tell einsum to return the sum written bellow for each i, j:
sum(a[i,j,k]*q[k,n]*a[i,j,n] for k in range(2) for n in range(2))
Note, that a[i,j] == [x[i], y[j]].
I have a couple of orthonormal vectors. I would like to extend this 2-dimensional basis to a larger one. What is the fastest way of doing this in Python with NumPy?
My thoughts were the following: Generate a random vector of the required size (new_dimension > 2), perform Gram-Schmidt by substracting scaled dot-products with the previous two. Repeat. I doubt that this is the quickest way though...
You didn't specify the dimension of your space. If it is 3, then you can simply use the cross product of your two vectors. If it is not, then see below.
Example in 3-D
# 1. setup: an orthonormal basis of two vectors a, b
np.random.seed(0)
a, b = np.random.uniform(size=(2,3))
a /= np.linalg.norm(a)
b -= a.dot(b)*a
b /= np.linalg.norm(b)
# 2. check:
>>> np.allclose([1,1,0,0], [a.dot(a), b.dot(b), a.dot(b), b.dot(a)])
True
Then, making a new vector:
# 3. solve
c = np.cross(a, b)
# 4. checks
>>> np.allclose([1,0,0], [c.dot(c), c.dot(a), c.dot(b)])
True
If the dimension of your vectors is higher, then you can pick any vector that is not in the plane defined by a,b and subtract that projection, the normalize.
Example in higher dimensions
# 1. setup
n = 5
np.random.seed(0)
a, b = np.random.uniform(size=(2, n))
a /= np.linalg.norm(a)
b -= a.dot(b)*a
b /= np.linalg.norm(b)
# 2. check
assert np.allclose([1,1,0,0], [a.dot(a), b.dot(b), a.dot(b), b.dot(a)])
Then:
# 3. solve
ab = np.c_[a, b]
c = np.roll(a + b, 1) # any vector unlikely to be 0 or some
# linear combination of a and b
c -= (c # ab) # ab.T
c /= np.linalg.norm(c)
# 4. check
abc = np.c_[a, b, c]
>>> np.allclose(np.eye(3), abc.T # abc)
True
Generalization: complement an m-basis in a n-D space
In an n-dimensional space, given an (n, m) orthonormal basis x with m s.t. 1 <= m < n (in other words, m vectors in a n-dimensional space put together as columns of x): find n - m vectors that are orthonormal, and that are all orthogonal to x.
We can do this in one shot using SVD.
# 1. setup
# we use SVD for the setup as well, for convenience,
# but it's not necessary at all. It is sufficient that
# x.T # x == I
n, m = 6, 2 # for example
x, _, _ = np.linalg.svd(np.random.uniform(size=(n, m)))
x = x[:, :m]
# 2. check
>>> np.allclose(x.T # x, np.eye(m))
True
>>> x.shape
(6, 2)
So, at this point, x is orthonormal and of shape (n, m).
Find y to be one (of possibly many) orthonormal basis that is orthogonal to x:
# 3. solve
u, s, v = np.linalg.svd(x)
y = u[:, m:]
# 4. check
>>> np.allclose(y.T # y, np.eye(n-m))
True
>>> np.allclose(x.T # y, 0)
True
I have a a 4d array (called a) with shape(35, 2000, 60, 180) that I need to correlate with a 1d array (called b) that has length of 2000, while detrending and smoothing both arrays.
I managed to use a nested for-loop to correlate the 1d array with the 3d array (called c) shape(x, y, z) by iterating through each point y, z, detrending c[x, y, :] and storing the correlation coefficient between b and at that point.
However, using a 3x-nested for-loop to calculate the correlation with a 4d array takes too much time to compute. Is there a more efficient way to produce an array that contains the correlation coefficient between each timeseries in a 4d array and a 1d array?
Here is my code for calculating the correlation with only 3 dimensions involved. It takes around a minute to execute on an array with shape(2000, 60, 180).
Also, the larger array has nan's, in which case, i set the correlation for the entire x,y point t be nan.
def correlation_detrended(cs, ts, smooth=360):
cs_det = cs
ts_det = ts
signal.detrend(ts_det[~np.isnan(ts_det)], overwrite_data=True)
ts_det = pd.DataFrame(ts_det).rolling(smooth, center=True).mean().to_numpy()[:, 0]
for i in range(len(cs_det[0, :, 0])):
for j in range(len(cs_det[0, i, :])):
print(str(i) + ":" + str(j) )
if np.any(np.isnan(cs_det[:, i, j])):
r, p = (np.nan, np.nan)
else:
signal.detrend(cs_det[:, i, j], overwrite_data=True)
cs_det[:, i, j] = pd.DataFrame(cs_det[:, i, j]).rolling(smooth, center=True).mean().to_numpy()[:, 0]
offset = int((smooth/2+120))
r, p = stats.pearsonr(cs_det[offset:-(offset), i, j], ts_det[offset:-(offset)])
correlation[i, j] = r
return correlation```
I have two vectors X = [a,b,c,d] and Y = [m,n,o]. I'd like to construct a matrix M where each element is an operation on each pair from X and Y. i.e.
M[j,i] = f(X[i], Y[j])
# e.g. where f(x,y) = x-y:
M :=
a-m b-m c-m d-m
a-n b-n c-n d-n
a-o b-o c-o d-o
I imagine I could do this with two tf.while_loop(), but that seems inefficient, I was wondering if there is a more compact and parallel way of doing this.
P.S. There is a slight complication that X and Y are in fact not vectors, but R2. i.e. each element in X and Y is itself a fixed length vector, and f(X, Y) performs f() element wise. Plus there is a batch component too.
I.e.
X.shape => [BATCH, I, K]
Y.shape => [BATCH, J, K]
M[batch, j, i, k] = f( X[batch, i, k], Y[batch, j, k] )
# e.g.:
= X[batch, i, k] - Y[batch, j, k]
this is using the python API btw
I found a way of doing this by increasing rank and using broadcasting. I still don't know if this is the most efficient way of doing it, but it's a heck of a lot better than using tf.while_loop I guess! I'm still open to suggestions / improvements.
X_expand = tf.expand_dims(X, 1)
Y_expand = tf.expand_dims(Y, 2)
# now I think M = f(X,Y) will broadcast each tensor to the higher dimension on each axis duplicating the data e.g.:
M = X-Y
With
>>> a.shape
(207, 155, 3)
What does this numpy code do to the numpy array a?
a = a.T.reshape(self.channels,-1).reshape(-1)
>>> a.shape
(207, 155, 3)
I'm going to suppose that a represents an image of size 155×207 pixels, with 3 colour channels per pixel:
>>> height, width, channels = a.shape
(Note that I'm assuming here that the first axis is vertical and the second axis is horizontal: see "Multidimensional Array Indexing Order Issues" for an explanation.)
>>> b = a.T
>>> b.shape
(3, 155, 207)
a.T returns the transposed array. But actually under the hood it doesn't alter the image data in any way. A NumPy array has two parts: a data buffer containing the raw numerical data, and a view which describes how to index the data buffer. When you reshape or transpose an array, NumPy leaves the data buffer alone and creates a new view describing the new way to index the same data. (See here for a longer explanation.)
So a indexes the image using three axes (y, x, c), and b indexes the same image using the same three axes in the opposite order (c, x, y):
>>> x, y, c = 100, 200, 1
>>> a[y, x, c] == b[c, x, y]
True
The first call to numpy.reshape:
>>> c = b.reshape(3, -1)
>>> c.shape
(3, 32085)
flattens the last two indices into one (with the third index changing fastest), so that c indexes the image using two axes (c, x × height + y):
>>> a[y, x, c] == c[c, x * height + y]
True
The second reshape:
>>> d = c.reshape(-1)
>>> d.shape
(96255,)
flattens the remaining two indices into one, so that d indexes the image using the single axis ((c × width) + x) × height + y:
>>> a[y, x, c] == d[((c * width) + x) * height + y]
True
Note that the whole operation could be done in just one step using numpy.flatten:
>>> (a.flatten(order='F') == d).all()
True