Column normalization behaves differently in higher dimensions - python

I am trying to subtract the minimum value of an ndarray for an arbitrary dimension. It seems to work with 3 dimensions, but not 4
3 Dimensional Case:
x1 = np.arange(27.0).reshape((3, 3, 3))
# x1 is (3,3,3)
x2 = x1.min(axis=(1,2))
# x2 is (3,)
(x1 - x2).shape
#Output: (3, 3, 3)
(x1 - x2).shape == x1.shape
#As expected: True
4 Dimnesional Case:
mat1 = np.random.rand(10,5,2,1)
# mat1 is (10,5,2,1)
mat2 = mat1.min(axis = (1,2,3))
# mat2 is (10,)
(mat1 - mat2).shape == mat1.shape
# Should be True, but
#Output: False

Your first example is misleading because all dimensions are the same size. That hides the kind of error that you see in the 2nd. Examples with different size dimensions are better at catching errors:
In [530]: x1 = np.arange(2*3*4).reshape(2,3,4)
In [531]: x2 = x1.min(axis=(1,2))
In [532]: x2.shape
Out[532]: (2,)
In [533]: x1-x2
...
ValueError: operands could not be broadcast together with shapes (2,3,4) (2,)
Compare that with a case where I tell it to keep dimensions:
In [534]: x2 = x1.min(axis=(1,2),keepdims=True)
In [535]: x2.shape
Out[535]: (2, 1, 1)
In [536]: x1-x2
Out[536]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]]])
The basic rule of broadcasting: a (2,) array can expand to (1,1,2) if needed, but not to (2,1,1).
But why doesn't the 2nd case produce an error?
In [539]: mat1.shape
Out[539]: (10, 5, 2, 1)
In [540]: mat2.shape
Out[540]: (10,)
In [541]: (mat1-mat2).shape
Out[541]: (10, 5, 2, 10)
It's that trailing size 1, which can broadcast with the (10,):
(10,5,2,1) (10,) => (10,5,2,1)(1,1,1,10) => (10,5,2,10)
It's as though you'd added a newaxis to a 3d array:
mat1 = np.random.rand(10,5,2)
mat1[...,None] - mat2

Related

Numpy Value error x and y must have same first dimension, but have shapes (10,) and (1,)

Have this value error problem
The x is an array of 0-9 10 total digits
X is passed into the for loop and put into the equation
Struggling with how y and x aren't the same size when the equation has run 10 times
import numpy as np
import matplotlib.pyplot as plt
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
a = np.array([2])
b = np.array([-3])
print(f'Scalar check for 0 dimensions a {a.ndim}, b {b.ndim} x {x.ndim}')
for i in x:
print(i)
y = i*a + b
plt.plot(x, y)
raise ValueError(f"x and y must have same first dimension, but "
ValueError: x and y must have same first dimension, but have shapes (10,) and (1,)
Though it would have ran when I changed the dimensions of a and b to 1d arrays before they were scalar but that was obviously not the error causing it
You are overwritting the y value each time. So in the end you have y = [15].
You can re-write it as follows:
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
a = np.array(2) <-- note the removed brackets: []
b = np.array(-3) <--
y = []
for i in x:
y.append(i * a + b)
and even simpler approach is
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
a = np.array(2)
b = np.array(-3)
y = x * a + b

advanced indexing using numpy

I'm trying to use advanced indexing but I cannot get it to work with this simple array
arr = np.array([[[ 1, 10, 100,1000],[ 2, 20, 200,2000]],[[ 3, 30, 300,3000],[ 4,40,400,4000]],[[5, 50, 500,5000],[6, 60,600,6000]]])
d1=np.array([0])
d2=np.array([0,1])
d3=np.array([0,1,2])
arr[d1,d2,d3]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (1,) (2,) (3,)
and
arr[d1[:,np.newaxis],d2[np.newaxis,:],d3]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (1,1) (1,2) (3,)
Expected output:
array([[[ 1, 10, 100],
[ 2, 20, 200]]])
You can use np.ix_ to combine several one-dimensional index arrays of different lengths to index a multidimensional array. For example:
arr[np.ix_(d1,d2,d3)]
To add more context, np.ix_ returns a tuple of ndimensional arrays. The same can be achieved "by hand" by adding np.newaxis for appropriate dimensions:
xs, ys, zs = np.ix_(d1,d2,d3)
# xs.shape == (1, 1, 1) == (len(d1), 1, 1 )
# ys.shape == (1, 2, 1) == (1, len(d2), 1 )
# zs.shape == (1, 1, 3) == (1, 1, len(d3))
result_ix = arr[xs, ys, zs]
# using newaxis:
result_newaxis = arr[
d1[:, np.newaxis, np.newaxis],
d2[np.newaxis, :, np.newaxis],
d3[np.newaxis, np.newaxis, :],
]
assert (result_ix == result_newaxis).all()
You need only d1 to select the first cell:
>>> arr[d1]
array([[[ 1, 10, 100],
[ 2, 20, 200]]])

how to multiply a matrix with every row in another matrix using numpy

import numpy
A = numpy.array([
[0,1,1],
[2,2,0],
[3,0,3]
])
B = numpy.array([
[1,1,1],
[2,2,2],
[3,2,9],
[4,4,4],
[5,9,5]
])
Dimension of A: N * N(3*3)
Dimension of B: K * N(5*3)
Expected result is:
C = [ A * B[0], A * B[1], A * B[2], A * B[3], A * B[4]] (Dimension of C is also 5*3)
I am new to numpy and not sure how to perform this operation without using for loops.
Thanks!
By the math you provide, I think you are evaluating A times B transpose. If you want the resultant matrix to have the size 5*3, you can transpose it (equivalent to numpy.matmul(B.transpose(),A)).
import numpy
A = numpy.array([
[0,1,1],
[2,2,0],
[3,0,3]
])
B = numpy.array([
[1,1,1],
[2,2,2],
[3,2,9],
[4,4,4],
[5,9,5]
])
print(numpy.matmul(A,B.transpose()))
output :array([[ 2, 4, 11, 8, 14],
[ 4, 8, 10, 16, 28],
[ 6, 12, 36, 24, 30]])
for i in range(5):
print (numpy.matmul(A,B[i]))
Output:
[2 4 6]
[ 4 8 12]
[11 10 36]
[ 8 16 24]
[14 28 30]
You can move forward like this:
import numpy as np
matrix_a = np.array([
[0, 1, 1],
[2, 2, 0],
[3, 0, 3]
])
matrix_b = np.array([
[1, 1, 1],
[2, 2, 2],
[3, 2, 9],
[4, 4, 4],
[5, 9, 5]
])
Remember:
For matrix multiplication , Order of first Column of matrix-A == Order of first row of matrix-B - Such as: B -> (3, 3) == (3, 5), to get order of column and row of matrices, you can use:
rows_of_second_matrix = matrix_b.shape[0]
columns_of_first_matrix = matrix_a.shape[1]
Here, you can check whether Order of first Column of matrix-A == Order of first row of matrix-B or not. If order is not same then go for transpose of matrix-B, else simply multiply.
if columns_of_first_matrix != rows_of_second_matrix:
transpose_matrix_b = np.transpose(matrix_b)
output_1 = np.dot(matrix_a, transpose_matrix_b)
print('Shape of dot product:', output_1.shape)
print('Dot product:\n {}\n'.format(output_1))
output_2 = np.matmul(matrix_a, transpose_matrix_b)
print('Shape of matmul product:', output_2.shape)
print('Matmul product:\n {}\n'.format(output_2))
# In order to obtain -> Output_Matrix of shape (5, 3), Again take transpose
output_matrix = np.transpose(output_1)
print("Shape of required matrix: ", output_matrix.shape)
else:
output_1 = np.dot(matrix_a, matrix_b)
print('Shape of dot product:', output_1.shape)
print('Dot product:\n {}\n'.format(output_1))
output_2 = np.matmul(matrix_a, matrix_b)
print('Shape of matmul product:', output_2.shape)
print('Matmul product:\n {}\n'.format(output_2))
output_matrix = output_2
print("Shape of required matrix: ", output_matrix.shape)
Output:
- Shape of dot product: (3, 5)
Dot product:
[[ 2 4 11 8 14]
[ 4 8 10 16 28]
[ 6 12 36 24 30]]
- Shape of matmul product: (3, 5)
Matmul product:
[[ 2 4 11 8 14]
[ 4 8 10 16 28]
[ 6 12 36 24 30]]
- Shape of required matrix: (5, 3)

Matrix dot product in python

I am new to python and confused on a code
X = np.array([2,3,4,4])
print(np.dot(X,X))
This works
Y = np.array([[100],
[200],
[300],
[400]])
print(np.dot(Y,Y))
This doesn't. I understood it is because of the relationship with array dimensions. But I cant understand how. Please explain.
X is a 1d array (row vector is not the right descriptor):
In [382]: X = np.array([2,3,4,4])
In [383]: X.shape
Out[383]: (4,)
In [384]: np.dot(X,X) # docs for 1d arrays apply
Out[384]: 45
Y is 2d array.
In [385]: Y = X[:,None]
In [386]: Y
Out[386]:
array([[2],
[3],
[4],
[4]])
In [387]: Y.shape
Out[387]: (4, 1)
In [388]: np.dot(Y,Y)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-388-3a0bc5156893> in <module>()
----> 1 np.dot(Y,Y)
ValueError: shapes (4,1) and (4,1) not aligned: 1 (dim 1) != 4 (dim 0)
For 2d arrays, the last dimension of the first pairs with the 2nd to the last of second.
In [389]: np.dot(Y,Y.T) # (4,1) pair with (1,4) to produce (4,4)
Out[389]:
array([[ 4, 6, 8, 8],
[ 6, 9, 12, 12],
[ 8, 12, 16, 16],
[ 8, 12, 16, 16]])
In [390]: np.dot(Y.T,Y) # (1,4) pair with (4,1) to produce (1,1)
Out[390]: array([[45]])

Change a 1D NumPy array from (implicit) row major to column major order

I have a 1D array in NumPy that implicitly represents some 2D data in row-major order. Here's a trivial example:
import numpy as np
# My data looks like [[1,2,3,4], [5,6,7,8]]
a = np.array([1,2,3,4,5,6,7,8])
I want to get a 1D array in column-major order (ie. b = [1,5,2,6,3,7,4,8] in the example above).
Normally, I would just do the following:
mat = np.reshape(a, (-1,4))
b = mat.flatten('F')
Unfortunately, the length of my input array is not an exact multiple of the row length I want (ie. a = [1,2,3,4,5,6,7]), so I can't call reshape. I want to keep that extra data, though, which might be quite a lot since my rows are pretty long. Is there any straightforward way to do this in NumPy?
The simplest way I can think of is not to try and use reshape with methods such as ravel('F'), but just to concatenate sliced views of your array.
For example:
>>> cols = 4
>>> a = np.array([1,2,3,4,5,6,7])
>>> np.concatenate([a[i::cols] for i in range(cols)])
array([1, 5, 2, 6, 3, 7, 4])
This works for any length of array and any number of columns:
>>> cols = 5
>>> b = np.arange(17)
>>> np.concatenate([b[i::cols] for i in range(cols)])
array([ 0, 5, 10, 15, 1, 6, 11, 16, 2, 7, 12, 3, 8, 13, 4, 9, 14])
Alternatively, use as_strided to reshape. The fact that the array a is too small to fit the (2, 4) shape doesn't matter: you'll just get junk (i.e. whatever's in memory) in the last place:
>>> np.lib.stride_tricks.as_strided(a, shape=(2, 4))
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 168430121]])
>>> _.flatten('F')[:7]
array([1, 5, 2, 6, 3, 7, 4])
In the general case, given an array b and a desired number of columns cols you can do this:
>>> x = np.lib.stride_tricks.as_strided(b, shape=(len(b)//cols + 1, cols)) # reshape to min 2d array needed to hold array b
>>> np.concatenate((x[:,:len(b)%cols].ravel('F'), x[:-1, len(b)%cols:].ravel('F')))
This unravels the "good" part of the array (those columns not containing junk values) and the bad part (except for the junk values which lie in the bottom row) and concatenates the two unraveled arrays. For example:
>>> cols = 5
>>> b = np.arange(17)
>>> x = np.lib.stride_tricks.as_strided(b, shape=(len(b)//cols + 1, cols))
>>> np.concatenate((x[:,:len(b)%cols].ravel('F'), x[:-1, len(b)%cols:].ravel('F')))
array([ 0, 5, 10, 15, 1, 6, 11, 16, 2, 7, 12, 3, 8, 13, 4, 9, 14])
Use some value to represent null to make the array be a multiple of how you want to split it. If casting to float is acceptable, you could use nan's to represent the added elements that represent nulls. Then reshape to 2D, call transpose, and reshape to 1D. Then eliminate the nulls.
import numpy as np
a = np.array([1,2,3,4,5,6,7]) # input
b = np.concatenate( (a, [np.NaN]) ) # add a NaN to make it 8 = 4x2
c = b.reshape(2,4).transpose().reshape(8,) # reshape to 2x4, transpose, reshape to 8x1
d = c[-np.isnan(c)] # remove NaN
print d
[ 1. 5. 2. 6. 3. 7. 4.]

Categories