I have a matrix X of dimensions (30x8100) and another one Y of dimensions (1x8100). I want to generate an array containing the difference between them (X[1]-Y, X[2]-Y,..., X[30]-Y)
Can anyone help?
All you need for that is
X - Y
Since several people have offered answers that seem to try to make the shapes match manually, I should explain:
Numpy will automatically expand Y's shape so that it matches with that of X. This is called broadcasting, and it usually does a very good job of guessing what should be done. In ambiguous cases, an axis keyword can be applied to tell it which direction to do things. Here, since Y has a dimension of length 1, that is the axis that is expanded to be length 30 to match with X's shape.
For example,
In [87]: import numpy as np
In [88]: n, m = 3, 5
In [89]: x = np.arange(n*m).reshape(n,m)
In [90]: y = np.arange(m)[None,...]
In [91]: x.shape
Out[91]: (3, 5)
In [92]: y.shape
Out[92]: (1, 5)
In [93]: (x-y).shape
Out[93]: (3, 5)
In [106]: x
Out[106]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
In [107]: y
Out[107]: array([[0, 1, 2, 3, 4]])
In [108]: x-y
Out[108]:
array([[ 0, 0, 0, 0, 0],
[ 5, 5, 5, 5, 5],
[10, 10, 10, 10, 10]])
But this is not really a euclidean distance, as your title seems to suggest you want:
df = np.asarray(x - y) # the difference between the images
dst = np.sqrt(np.sum(df**2, axis=1)) # their euclidean distances
use array and use numpy broadcasting in order to subtract it from Y
init the matrix:
>>> from numpy import *
>>> a = array([[1,2,3],[4,5,6]])
Accessing the second row in a:
>>> a[1]
array([4, 5, 6])
Subtract array from Y
>>> Y = array([3,9,0])
>>> a - Y
array([[-2, -7, 3],
[ 1, -4, 6]])
Just iterate rows from your numpy array and you can actually just subtract them and numpy will make a new array with the differences!
import numpy as np
final_array = []
#X is a numpy array that is 30X8100 and Y is a numpy array that is 1X8100
for row in X:
output = row - Y
final_array.append(output)
output will be your resulting array of X[0] - Y, X[1] - Y etc. Now your final_array will be an array with 30 arrays inside, each that have the values of the X-Y that you need! Simple as that. Just make sure you convert your matrices to a numpy arrays first
Edit: Since numpy broadcasting will do the iteration, all you need is one line once you have your two arrays:
final_array = X - Y
And then that is your array with the differences!
a1 = numpy.array(X) #make sure you have a numpy array like [[1,2,3],[4,5,6],...]
a2 = numpy.array(Y) #make sure you have a 1d numpy array like [1,2,3,...]
a2 = [a2] * len(a1[0]) #make a2 as wide as a1
a2 = numpy.array(zip(*a2)) #transpose it (a2 is now same shape as a1)
print a1-a2 #idiomatic difference between a1 and a2 (or X and Y)
Related
I want to calculate the following:
but I have no idea how to do this in python, I do not want to implement this manually but use a predefined function for this, something from numpy for example.
But numpy seems to ignore that x.T should be transposed.
Code:
import numpy as np
x = np.array([1, 5])
print(np.dot(x, x.T)) # = 26, This is not the matrix it should be!
While your vectors are defined as 1-d arrays, you can use np.outer:
np.outer(x, x.T)
> array([[ 1, 5],
> [ 5, 25]])
Alternatively, you could also define your vectors as matrices and use normal matrix multiplication:
x = np.array([[1], [5]])
x # x.T
> array([[ 1, 5],
> [ 5, 25]])
You can do:
x = np.array([[1], [5]])
print(np.dot(x, x.T))
Your original x is of shape (2,), while you need a shape of (2,1). Another way is reshaping your x:
x = np.array([1, 5]).reshape(-1,1)
print(np.dot(x, x.T))
.reshape(-1,1) reshapes your array to have 1 column and implicitely takes care of number of rows.
output:
[[ 1 5]
[ 5 25]]
np.matmul(x[:, np.newaxis], [x])
I want to "multiply" (for lack of better description) a numpy array X of size M with a smaller numpy array Y of size N, for every N elements in X. Then, I want to sum the resulting array (almost like a dotproduct).
I hope the example makes it more clear:
Example
X = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Y = [1,2,3]
Z = mymul(X, Y)
= [0*1, 1*2, 2*3, 3*1, 4*2, 5*3, 6*1, 7*2, 8*3, 9*1]
= [ 0, 2, 6, 3, 8, 15, 6, 14, 24, 9]
result = sum(Z) = 87
X and Y can be of varying lengths and Y is always smaller than X, but not necessarily divisible (e.g. M % N != 0)
I have some solutions but they are quite slow. I'm hoping there is a faster way to do this.
import numpy as np
X = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int)
Y = np.array([1,2,3], dtype=int)
# these work but are slow for large X, Y
# simple for-loop
t = 0
for i in range(len(X)):
t += X[i] * Y[i % len(Y)]
print(t) #87
# extend Y M/N times so np.dot can be applied
Ytiled = np.tile(Y, int(np.ceil(len(X) / len(Y))))[:len(X)]
t = np.dot(X, Ytiled)
print(t) #87
Resize Y to same length as X and then use matrix-multiplication -
In [52]: np.dot(X, np.resize(Y,len(X)))
Out[52]: 87
Alternative to using np.resize would be with tiling. Hence, np.tile(Y,(m+n-1)//n)[:m] for m,n = len(X), len(Y), could replace np.resize(Y,len(X)) for a faster one.
Another without resizing Y to achieve memory-efficiency -
In [79]: m,n = len(X), len(Y)
In [80]: s = n*(m//n)
In [81]: X2D = X[:s].reshape(-1,n)
In [82]: X2D.dot(Y).sum() + np.dot(X[s:],Y[:m-s])
Out[82]: 87
Alternatively, we can use np.einsum('ij,j->',X2D,Y) to replace X2D.dot(Y).sum().
You can use convolve (documentation):
np.convolve(X, Y[::-1], 'same')[::len(Y)].sum()
Remember to reverse the second array.
I have a 1D array in NumPy that implicitly represents some 2D data in row-major order. Here's a trivial example:
import numpy as np
# My data looks like [[1,2,3,4], [5,6,7,8]]
a = np.array([1,2,3,4,5,6,7,8])
I want to get a 1D array in column-major order (ie. b = [1,5,2,6,3,7,4,8] in the example above).
Normally, I would just do the following:
mat = np.reshape(a, (-1,4))
b = mat.flatten('F')
Unfortunately, the length of my input array is not an exact multiple of the row length I want (ie. a = [1,2,3,4,5,6,7]), so I can't call reshape. I want to keep that extra data, though, which might be quite a lot since my rows are pretty long. Is there any straightforward way to do this in NumPy?
The simplest way I can think of is not to try and use reshape with methods such as ravel('F'), but just to concatenate sliced views of your array.
For example:
>>> cols = 4
>>> a = np.array([1,2,3,4,5,6,7])
>>> np.concatenate([a[i::cols] for i in range(cols)])
array([1, 5, 2, 6, 3, 7, 4])
This works for any length of array and any number of columns:
>>> cols = 5
>>> b = np.arange(17)
>>> np.concatenate([b[i::cols] for i in range(cols)])
array([ 0, 5, 10, 15, 1, 6, 11, 16, 2, 7, 12, 3, 8, 13, 4, 9, 14])
Alternatively, use as_strided to reshape. The fact that the array a is too small to fit the (2, 4) shape doesn't matter: you'll just get junk (i.e. whatever's in memory) in the last place:
>>> np.lib.stride_tricks.as_strided(a, shape=(2, 4))
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 168430121]])
>>> _.flatten('F')[:7]
array([1, 5, 2, 6, 3, 7, 4])
In the general case, given an array b and a desired number of columns cols you can do this:
>>> x = np.lib.stride_tricks.as_strided(b, shape=(len(b)//cols + 1, cols)) # reshape to min 2d array needed to hold array b
>>> np.concatenate((x[:,:len(b)%cols].ravel('F'), x[:-1, len(b)%cols:].ravel('F')))
This unravels the "good" part of the array (those columns not containing junk values) and the bad part (except for the junk values which lie in the bottom row) and concatenates the two unraveled arrays. For example:
>>> cols = 5
>>> b = np.arange(17)
>>> x = np.lib.stride_tricks.as_strided(b, shape=(len(b)//cols + 1, cols))
>>> np.concatenate((x[:,:len(b)%cols].ravel('F'), x[:-1, len(b)%cols:].ravel('F')))
array([ 0, 5, 10, 15, 1, 6, 11, 16, 2, 7, 12, 3, 8, 13, 4, 9, 14])
Use some value to represent null to make the array be a multiple of how you want to split it. If casting to float is acceptable, you could use nan's to represent the added elements that represent nulls. Then reshape to 2D, call transpose, and reshape to 1D. Then eliminate the nulls.
import numpy as np
a = np.array([1,2,3,4,5,6,7]) # input
b = np.concatenate( (a, [np.NaN]) ) # add a NaN to make it 8 = 4x2
c = b.reshape(2,4).transpose().reshape(8,) # reshape to 2x4, transpose, reshape to 8x1
d = c[-np.isnan(c)] # remove NaN
print d
[ 1. 5. 2. 6. 3. 7. 4.]
Using numpy, I want to multiple a matrix x by a column array y, elementwise:
x = numpy.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
y = numpy.array([1, 2, 3])
z = numpy.multiply(x, y)
print z
This gives the output as if y is a row array:
[[ 1 4 9]
[ 4 10 18]
[ 7 16 27]]
However, I want the output as if y is a column array:
[[ 1 2 3]
[ 8 10 12]
[21 24 27]]
So how can I manipulate y to achieve this? If I use:
y = numpy.transpose(y)
then y remains the same shape.
Enclose it in another list to make it 2D:
>>> y2 = numpy.transpose([y])
>>> y2
array([[1],
[2],
[3]])
>>> numpy.multiply(x, y2)
array([[ 1, 2, 3],
[ 8, 10, 12],
[21, 24, 27]])
The reason you can't transpose y is because it's initialized as a 1-D array. Transposing an array only makes sense in two (or more) dimensions.
To get around these mixed-dimension issues, numpy actually provides a set of convenience functions to sanitize your inputs:
y = np.array([1, 2, 3])
y1 = np.atleast_1d(y) # Converts array to 1-D if less than that
y2 = np.atleast_2d(y) # Converts array to 2-D if less than that
y3 = np.atleast_3d(y) # Converts array to 3-D if less than that
I also think np.column_stack falls under this convenience category, as it puts together 1-D and 2-D arrays as columns like you would expect, rather than having to figure out the right series of reshapes and stacks.
y1 = np.array([1, 2, 3])
y2 = np.array([2, 4, 6])
y3 = np.array([[2, 6], [2, 4], [7, 7]])
y = np.column_stack((y1, y2, y3))
I think these functions aren't as well known as they should be, and I find them much easier, more flexible, and safer than manually fiddling with reshape or array dimensions. They also avoid making copies when possible, which can be a small performance speedup.
To answer your question, you should use np.atleast_2d to convert your array to a 2-D array, then transpose it.
y = np.atleast_2d(y).T
The other way to quickly do it without worrying about y is to transpose x then transpose the result back.
z = (x.T * y).T
Though this can obfuscate the intent of the code. It is probably faster though if performance is important.
If performance is important, that can inform which method you want to use. Some timings on my computer:
%timeit x * np.atleast_2d(y).T
100000 loops, best of 3: 7.98 us per loop
%timeit (x.T*y).T
100000 loops, best of 3: 3.27 us per loop
%timeit x * np.transpose([y])
10000 loops, best of 3: 20.2 us per loop
%timeit x * y.reshape(-1, 1)
100000 loops, best of 3: 3.66 us per loop
You can use reshape:
y = y.reshape(-1,1)
The y variable has a shape of (3,). If you construct it this way:
y = numpy.array([1, 2, 3], ndmin=2)
...it will have a shape of (1,3), which you can transpose to get the result you want:
y = numpy.array([1, 2, 3], ndmin=2).T
z = numpy.multiply(x, y)
Given three numpy arrays: one multidimensional array x, one vector y with trailing singleton dimension, and one vector z without trailing singleton dimension,
x = np.zeros((M,N))
y = np.zeros((M,1))
z = np.zeros((M,))
the behaviour of broadcast operations changes depending on vector representation and context:
x[:,0] = y # error cannot broadcast from shape (M,1) into shape (M)
x[:,0] = z # OK
x[:,0] += y # error non-broadcastable output with shape (M) doesn't match
# broadcast shape(M,M)
x[:,0] += z # OK
x - y # OK
x - z # error cannot broadcast from shape (M,N) into shape (M)
I realize I can do the following:
x - z[:,None] # OK
but I don't understand what this explicit notation buys me. It certainly doesn't buy readability. I don't understand why the expression x - y is OK, but x - z is ambiguous.
Why does Numpy treat vectors with or without trailing singleton dimensions differently?
edit: The documentation states that: two dimensions are compatible when they are equal, or one of them is 1, but y and z are both functionally M x 1 vectors, since an M x 0 vector doesn't contain any elements.
The convention is that broadcasting will insert singleton dimensions at the beginning of an array's shape. This makes it convenience to perform operations over the last dimensions of an array, so (x.T - z).T should work.
If it were to automatically decide which axis of x was matched by z, an operation like x - z would result in an error if and only if N == M, making code harder to test. So the convention allows some convenience, while being robust to some error.
If you don't like the z[:, None] shorthand, perhaps you find z[:, np.newaxis] clearer.
For an assignment like x[:,0] = y to work, you can use x[:,0:1] = y instead.
Using the Numpy matrix interface as opposed to the array interface yields the desired broadcasting behaviours:
x = np.asmatrix(np.zeros((M,N)))
y = np.asmatrix(np.zeros((M,1)))
x[:,0] = y # OK
x[:,0] = y[:,0] # OK
x[:,0] = y[:,0:1] # OK
x[:,0] += y # OK
x - y # OK
x - np.mean(x, axis=0) # OK
x - np.mean(x, axis=1) # OK
One benefit of treating (M,1) and (M,) differently is to enable you to specify what dimensions to align and what dimensions to broadcast
Say you have:
a = np.arange(4)
b = np.arange(16).reshape(4,4)
# i.e a = array([0, 1, 2, 3])
# i.e b = array([[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11],
# [12, 13, 14, 15]])
When you do c = a + b, a and b will be aligned in axis=1 and a will be broadcasted along axis=0:
array([[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]])
But what if you want to align a and b in axis=0 and broadcast in axis=1 ?
array([[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3]])
(M,1) vs (M,) difference enables you to specify which dimension to align and broadcast.
(i.e if (M,1) and (M,) are treated the same, how do you tell numpy you want to broadcast on axis=1?)