So it is easier for me to think about vectors as column vectors when I need to do some linear algebra. Thus I prefer shapes like (n,1).
Is there significant memory usage difference between shapes (n,) and (n,1)?
What is preferred way?
And how to reshape (n,) vector into (n,1) vector. Somehow b.reshape((n,1)) doesn't do the trick.
a = np.random.random((10,1))
b = np.ones((10,))
b.reshape((10,1))
print(a)
print(b)
[[ 0.76336295]
[ 0.71643237]
[ 0.37312894]
[ 0.33668241]
[ 0.55551975]
[ 0.20055153]
[ 0.01636735]
[ 0.5724694 ]
[ 0.96887004]
[ 0.58609882]]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
More simpler way with python syntax sugar is to use
b.reshape(-1,1)
where the system will automatically compute the correct shape instead "-1"
ndarray.reshape() returns a new view, or a copy (depends on the new shape). It does not modify the array in place.
b.reshape((10, 1))
as such is effectively no-operation, since the created view/copy is not assigned to anything. The "fix" is simple:
b_new = b.reshape((10, 1))
The amount of memory used should not differ at all between the 2 shapes. Numpy arrays use the concept of strides and so the dimensions (10,) and (10, 1) can both use the same buffer; the amounts to jump to next row and column just change.
Related
I am trying to create a function which exponentiates a 2-D matrix and keeps the result in a 3D array, where the first dimension is indexing the exponent. This is important because the rows of the matrix I am exponentiating represent information about different vertices on a graph. So for example if we have A, A^2, A^3, each is shape (50,50) and I want a matrix D = (3,50,50) so that I can go D[:,1,:] to retrieve all the information about node 1 and be able to do matrix multiplication with that. My code is currently as
def expo(times,A,n):
temp = A;
result = csr_matrix.toarray(temp)
for i in range(0,times):
temp = np.dot(temp,A)
if i == 0:
result = np.array([result,csr_matrix.toarray(temp)]) # this creates a (2,50,50) array
if i > 0:
result = np.append(result,csr_matrix.toarray(temp),axis=0) # this does not work
return result
However, this is not working because in the "i>0" case the temp array is of the shape (50,50) and cannot be appended. I am not sure how to make this work and I am rather confused by the dimensionality in Numpy, e.g. why thinks are (50,1) sometimes and just (50,) other times. Would anyone be able to help me make this code work and explain generally how these things should be done in Numpy?
Documentation reference
If you want to stack matrices in numpy, you can use the stack function.
If you also want the index to correspond to the exponent, you might want to add a unity matrix to the beginning of your output:
MWE
import numpy as np
def expo(A, n):
result =[np.eye(len(A)), A,]
for _ in range(n-1):
result.append(result[-1].dot(A))
return np.stack(result, axis=0)
# If you do not really need the 3D array,
# you could also just return the list
result = expo(np.array([[1,-2],[-2,1]]), 3)
print(result)
# [[[ 1. 0.]
# [ 0. 1.]]
#
# [[ 1. -2.]
# [ -2. 1.]]
#
# [[ 5. -4.]
# [ -4. 5.]]
#
# [[ 13. -14.]
# [-14. 13.]]]
print(result[1])
# [[ 1. -2.]
# [-2. 1.]]
Comments
As you can see, we first simply create the list of matrices, and then convert them to an array at the end. I am not sure if you really need the 3D array though, as you could also just index the list that was created, but that depends on your use case, if that is convenient or not.
I guess the axis keyword argument for a lot of numpy functions can be confusing at first, but the documentation usually has good examples that combined with same trial and error, should get you pretty far. For example for numpy.stack, the very first example is indeed exactly what you want to do.
I have a 3d numpy array of point (484,3,1) and a 2d transformation matrix (3,3). I want to compute the transformation for all 484 points.
I have tried to reshape the arrays and compute the dot product, but I am struggling to get it to output a (484,3,1) shaped array where all the points are transformed.
points = np.random.randint(0, 979, (484,3,1))
transformation = array([[0.94117647, 0. , 0. ],
[0. , 0.94117647, 0. ],
[0. , 0. , 1. ]])
points.shape = (484,3,1)
transformation = (3,3)
transformation.dot(points).shape = (3,484,1)
I would like this to be as optimized as possible. Any advice would be greatly appreciated.
Just do a reshape to (484,3) dimensions and use the np.matmul (np.dot is also possible but since you are looking for a matrix multiplication matmul is prefered according to the documentation) product
np.matmul(points.reshape(484,-1), transformation).reshape(484,3,-1)
resulting shape is the same of course given by the last reshaping: (484,3,1)
I have an array of shape (3,2):
import numpy as np
arr = np.array([[0.,0.],[0.25,-0.125],[0.5,-0.125]])
I was trying to build a matrix (matrix) of dimensions (6,2), with the results of the outer product of the elements i,i of arr and arr.T. At the moment I am using a for loop such as:
size = np.shape(arr)
matrix = np.zeros((size[0]*size[1],size[1]))
for i in range(np.shape(arr)[0]):
prod = np.outer(arr[i],arr[i].T)
matrix[size[1]*i:size[1]+size[1]*i,:] = prod
Resulting:
matrix =array([[ 0. , 0. ],
[ 0. , 0. ],
[ 0.0625 , -0.03125 ],
[-0.03125 , 0.015625],
[ 0.25 , -0.0625 ],
[-0.0625 , 0.015625]])
Is there any way to build this matrix without using a for loop (e.g. broadcasting)?
Extend arrays to 3D with None/np.newaxis keeping the first axis aligned, while letting the second axis getting pair-wise multiplied, perform multiplication leveraging broadcasting and reshape to 2D -
matrix = (arr[:,None,:]*arr[:,:,None]).reshape(-1,arr.shape[1])
We can also use np.einsum -
matrix = np.einsum('ij,ik->ijk',arr,arr).reshape(-1,arr.shape[1])
einsum string representation might be more intuitive as it lets us visualize three things :
Axes that are aligned (axis=0 here).
Axes that are getting summed up (none here).
Axes that are kept i.e. element-wise multiplied (axis=1 here).
I have the following problem with shape of ndarray:
out.shape = (20,)
reference.shape = (20,0)
norm = [out[i] / np.sum(out[i]) for i in range(len(out))]
# norm is a list now so I convert it to ndarray:
norm_array = np.array((norm))
norm_array.shape = (20,30)
# error: operands could not be broadcast together with shapes (20,30) (20,)
diff = np.fabs(norm_array - reference)
How can I change shape of norm_array from (20,30) into (20,) or reference to (20,30), so I can substract them?
EDIT: Can someone explain me, why they have different shape, if I can access both single elements with norm_array[0][0] and reference[0][0] ?
I am not sure what you are trying to do exactly, but here is some information on numpy arrays.
A 1-d numpy array is a row vector with a shape that is a single-valued tuple:
>>> np.array([1,2,3]).shape
(3,)
You can create multidimensional arrays by passing in nested lists. Each sub-list is a 1-d row vector of length 1, and there are 3 of them.
>>> np.array([[1],[2],[3]]).shape
(3,1)
Here is the weird part. You can create the same array, but leave the lists empty. You end up with 3 row vectors of length 0.
>>> np.array([[],[],[]]).shape
(3,0)
This is what you have for you reference array, an array with structure but no values. This brings me back to my original point:
You can't subtract an empty array.
If I make 2 arrays with the shapes you describe, I get an error
In [1856]: norm_array=np.ones((20,30))
In [1857]: reference=np.ones((20,0))
In [1858]: norm_array-reference
...
ValueError: operands could not be broadcast together with shapes (20,30) (20,0)
But it's different from yours. But if I change the shape of reference, the error messages match.
In [1859]: reference=np.ones((20,))
In [1860]: norm_array-reference
...
ValueError: operands could not be broadcast together with shapes (20,30) (20,)
So your (20,0) is wrong. I don't know if you mistyped something or not.
But if I make reference 2d with 1 in the last dimension, broadcasting works, producing a difference that matches (20,30) in shape:
In [1861]: reference=np.ones((20,1))
In [1862]: norm_array-reference
If reference = np.zeros((20,)), then I could use reference[:,None] to add that singleton last dimension.
If reference is (20,), you can't do reference[0][0]. reference[0][0] only works with 2d arrays with at least 1 in the last dim. reference[0,0] is the preferred way of indexing a single element of a 2d array.
So far this is normal array dimensions and broadcasting; something you'll learn with use.
===============
I'm puzzled about the shape of out. If it is (20,), how does norm_array end up as (20,30). out must consist of 20 arrays or lists, each of which has 30 elements.
If out was 2d array, we could normalize without iteration
In [1869]: out=np.arange(12).reshape(3,4)
with the list comprehension:
In [1872]: [out[i]/np.sum(out[i]) for i in range(out.shape[0])]
Out[1872]:
[array([ 0. , 0.16666667, 0.33333333, 0.5 ]),
array([ 0.18181818, 0.22727273, 0.27272727, 0.31818182]),
array([ 0.21052632, 0.23684211, 0.26315789, 0.28947368])]
In [1873]: np.array(_) # and to array
Out[1873]:
array([[ 0. , 0.16666667, 0.33333333, 0.5 ],
[ 0.18181818, 0.22727273, 0.27272727, 0.31818182],
[ 0.21052632, 0.23684211, 0.26315789, 0.28947368]])
Instead take row sums, and tell it to keep it 2d for ease of further use
In [1876]: out.sum(axis=1,keepdims=True)
Out[1876]:
array([[ 6],
[22],
[38]])
now divide
In [1877]: out/out.sum(axis=1,keepdims=True)
Out[1877]:
array([[ 0. , 0.16666667, 0.33333333, 0.5 ],
[ 0.18181818, 0.22727273, 0.27272727, 0.31818182],
[ 0.21052632, 0.23684211, 0.26315789, 0.28947368]])
I have a big sparse matrix. I want to take log4 for all element in that sparse matrix.
I try to use numpy.log() but it doesn't work with matrices.
I can also take logarithm row by row. Then I crush old row with a new one.
# Assume A is a sparse matrix (Linked List Format) with float values as data
# It is only for one row
import numpy as np
c = np.log(A.getrow(0)) / numpy.log(4)
A[0, :] = c
This was not as quick as I'd expected. Is there a faster way to do this?
You can modify the data attribute directly:
>>> a = np.array([[5,0,0,0,0,0,0],[0,0,0,0,2,0,0]])
>>> coo = coo_matrix(a)
>>> coo.data
array([5, 2])
>>> coo.data = np.log(coo.data)
>>> coo.data
array([ 1.60943791, 0.69314718])
>>> coo.todense()
matrix([[ 1.60943791, 0. , 0. , 0. , 0. ,
0. , 0. ],
[ 0. , 0. , 0. , 0. , 0.69314718,
0. , 0. ]])
Note that this doesn't work properly if the sparse format has repeated elements (which is valid in the COO format); it'll take the logs individually, and log(a) + log(b) != log(a + b). You probably want to convert to CSR or CSC first (which is fast) to avoid this problem.
You'll also have to add checks if the sparse matrix is in a different format, of course. And if you don't want to modify the matrix in-place, just construct a new sparse matrix as you did in your answer, but without adding 3 because that's completely unnecessary here.
I think I solve it with very easy way. It is very strange that no one could answer immediately.
# Let A be a COO_matrix
import numpy as np
from scipy.sparse import coo_matrix
new_data = np.log(A.data+3)/np.log(4) #3 is not so important. It can be 1 too.
A = coo_matrix((new_data, (A.row, A.col)), shape=A.shape)