I want to concatenate two csr_matrix, each with shape=(1,N).
I know I should use scipy.sparse.vstack:
from scipy.sparse import csr_matrix,vstack
c1 = csr_matrix([[1, 2]])
c2 = csr_matrix([[3, 4]])
print c1.shape,c2.shape
print vstack([c1, c2], format='csr')
#prints:
(1, 2) (1, 2)
(0, 0) 1
(0, 1) 2
(1, 0) 3
(1, 1) 4
However, my code fails:
from scipy.sparse import csr_matrix,vstack
import numpy as np
y_train = np.array([1, 0, 1, 0, 1, 0])
X_train = csr_matrix([[1, 1], [-1, 1], [1, 0], [-1, 0], [1, -1], [-1, -1]])
c0 = X_train[y_train == 0].mean(axis=0)
c1 = X_train[y_train == 1].mean(axis=0)
print c0.shape, c1.shape #prints (1L, 2L) (1L, 2L)
print c0,c1 #prints [[-1. 0.]] [[ 1. 0.]]
print vstack([c0,c1], format='csr')
The last line raises exception -
File "C:\Anaconda\lib\site-packages\scipy\sparse\construct.py", line 484, in vstack
return bmat([[b] for b in blocks], format=format, dtype=dtype)
File "C:\Anaconda\lib\site-packages\scipy\sparse\construct.py", line 533, in bmat
raise ValueError('blocks must be 2-D')
ValueError: blocks must be 2-D
I guess using mean has something to do with out.
Any ideas?
Taking the mean of a sparse matrix returns a NumPy matrix (which is not sparse).
So c0 and c1 are matrices:
In [76]: type(c0)
Out[76]: numpy.matrixlib.defmatrix.matrix
In [89]: sparse.issparse(c0)
Out[94]: False
vstack expects its first argument to be a sequence of sparse matrices.
So make (at least) the first matrix a sparse matrix:
In [31]: vstack([coo_matrix(c0), c1])
Out[31]:
<2x2 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in COOrdinate format>
In [32]: vstack([coo_matrix(c0), c1]).todense()
Out[32]:
matrix([[-1., 0.],
[ 1., 0.]])
Related
Today, i encountered such a problem:
Tensor A is a segmentation mask with the shape of (1, 4, 4) and its value is either 0 or 1.
Tensor B is a diagonal array created by torch.eye(2).
My problems are why we can index B(2D) with A(3D) in the form of B[A] and why the result is a tensor with the shape of (1, 4, 4, 2)?
Above is my test instance, and the socure code is obtained from a diceloss class:
y_true_dummy = torch.eye(num_classes)[y_true.squeeze(1)]
the shape of y_true is (b, h, w), num_classes equals c.
by the way, why we need function .squeeze()?
I want some explanation about the indexing problem and some videos are more appreciated.
You can understand the problem if you work on a smaller example:
A = torch.randint(2, (4,))
B = torch.eye(2)
>>> A
# tensor([1, 0, 1, 1])
>>> B[A].shape
# (4, 2)
>>> B[A]
# tensor([[0., 1.],
# [1., 0.],
# [0., 1.],
# [0., 1.]])
[1, 0] and [0, 1] are the first and second rows of the 2x2 identity matrix, B. So, using the 1D array A of shape (4, ) as index is selecting 4 "rows" of B / selecting 4 elements of B along axis 0. B[A] is basically [B[1], B[1], B[0], B[1]].
So when A is a 3D array of shape (1, 4, 4), B[A] means selecting (1, 4, 4) rows of B. And because each row in B had 2 elements (2 columns), your output is (1, 4, 4, 2).
B is a 2x2 identity matrix, having 2 rows. Think of it like: you are picking 16 rows out of these 2 rows, getting a (16, 2) matrix -> then you reshape it to get (1, 4, 4, 2) tensor. In fact, you can check this easily:
A = torch.randint(2, (4, 4))
A_flat = A.reshape(-1)
B = torch.eye(2)
>>> torch.allclose(B[A], B[A_flat].reshape(1, 4, 4, -1)])
# True
This isn't also a PyTorch specific phenomenon either. You can observe the same indexing rules in NumPy, which torch maintains close compatibility with.
I am trying to multiply [[3],[1],[0]] with matrix [1,-1,3] using numpy. But it is not able to perform that.
import numpy as np
a = np.array([[3],[1],[0]])
b = np.array([1,-1,3])
x = np.dot(a,b)
print(x)
it is returning error as " ValueError: shapes (3,1) and (3,) not aligned: 1 (dim 1) != 3 (dim 0) "
Actually your shape is not right (shape of b is (3,), it needs to be (1,3)) .
a = np.array([[3],[1],[0]])
b = np.array([[1,-1,3]]) #Make it 2D list to have shape (1, 3)
np.dot(a,b)
array([[ 3, -3, 9],
[ 1, -1, 3],
[ 0, 0, 0]])
You need to write b also as a matrix:
b = np.array([[1,-1,3]])
I have a mask with a mask_re:(8781288, 1) including ones and zeros, label file (y_lbl:(8781288, 1)) and a feature vector with feat_re: (8781288, 64). I need to take only those rows from feature vector and label files that are 1 in the mask file. how can I do this, and how can apply the opposite action of putting (recovering back) prediction values (ypred) in the masked_label file based on the mask file in the elements that are one?
For example in Matlab can be done easily X=feat_re(mask_re==1) and can be recovered back new_lbl(mask_re==1)=ypred, where new_lbl=zeros(8781288, 1). I tried to do a similar thing in python:
X=feat_re[np.where(mask_re==1),:]
X.shape
(2, 437561, 64)
EDITED (SOLVED) According to what #hpaulj suggested
The problem was with the shape of my mask file, once I changed it to mask_new=mask_re.reshape((8781288)), it solved my issue, and then
X=feat_re[mask_new==1,:]
(437561, 64)
In [182]: arr = np.arange(12).reshape(3,4)
In [183]: mask = np.array([1,0,1], bool)
In [184]: arr[mask,:]
Out[184]:
array([[ 0, 1, 2, 3],
[ 8, 9, 10, 11]])
In [185]: new = np.zeros_like(arr)
In [186]: new[mask,:] = np.array([10,12,14,16])
In [187]: new
Out[187]:
array([[10, 12, 14, 16],
[ 0, 0, 0, 0],
[10, 12, 14, 16]])
I suspect your error comes from the shape of mask:
In [188]: mask1 = mask[:,None]
In [189]: mask1.shape
Out[189]: (3, 1)
In [190]: arr[mask1,:]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-190-6317c3ea0302> in <module>
----> 1 arr[mask1,:]
IndexError: too many indices for array
Remember, numpy can have 1d and 0d arrays; it doesn't force everything to be 2d.
With where (aka nonzero):
In [191]: np.nonzero(mask)
Out[191]: (array([0, 2]),) # 1 element tuple
In [192]: np.nonzero(mask1)
Out[192]: (array([0, 2]), array([0, 0])) # 2 element tuple
In [193]: arr[_191] # using the mask index
Out[193]:
array([[ 0, 1, 2, 3],
[ 8, 9, 10, 11]])
you can use boolean indexing for masking like below
X = feat_re[mask_re==1, :]
X = X.reshape(2, -1, 64)
this selects rows of feat_re where (mask_re==1) is True. Then you can reshape x using reshape function. you can again use reshape to get back to same array shape. "-1" in reshape indicate the size need to be calculated by numpy
I have a numpy array (nxn matrix), and I would like to modify only the columns which sum is 0. And I would like to assign the same value to all of these columns.
To do that, I have first taken the index of the columns that sum to 0:
sum_lines = np.sum(mat_trans, axis = 0)
indices = np.where(sum_lines == 0)[0]
then I did a loop on those indices:
for i in indices:
mat_trans[:, i] = rank_vect
so that each of these columns now has the value of the rank_vect column vector.
I was wondering if there was a way to do this without loop, something that would look like:
mat_trans[:, (np.where(sum_lines == 0)[0]))] = rank_vect
Thanks!
In [114]: arr = np.array([[0,1,2,3],[1,0,2,-3],[-1,2,0,0]])
In [115]: sumlines = np.sum(arr, axis=0)
In [116]: sumlines
Out[116]: array([0, 3, 4, 0])
In [117]: idx = np.where(sumlines==0)[0]
In [118]: idx
Out[118]: array([0, 3])
So the columns that we want to modify are:
In [119]: arr[:,idx]
Out[119]:
array([[ 0, 3],
[ 1, -3],
[-1, 0]])
In [120]: rv = np.array([10,11,12])
If rv is 1d, we get a shape error:
In [121]: arr[:,idx] = rv
ValueError: shape mismatch: value array of shape (3,) could not be broadcast to indexing result of shape (2,3)
But if it is a column vector (shape (3,1)) it can be broadcast to the (3,2) target:
In [122]: arr[:,idx] = rv[:,None]
In [123]: arr
Out[123]:
array([[10, 1, 2, 10],
[11, 0, 2, 11],
[12, 2, 0, 12]])
This should do the trick
mat_trans[:,indices] = np.stack((rank_vect,)*indices.size,-1)
Please test and let me know if it does what you want. It just stacks the rank_vect repeatedly to match the shape of the LHS on the RHS.
I believe this is equivalent to
for i in indices:
mat_trans[:, i] = rank_vec
I'd be interested to know the speed difference
I have an array of a size 2 x 2 and I want to change the size to 3 x 4.
A = [[1 2 ],[2 3]]
A_new = [[1 2 0 0],[2 3 0 0],[0 0 0 0]]
I tried 3 shape but it didn't and append can only append row, not column. I don't want to iterate through each row to add the column.
Is there any vectorized way to do this like that of in MATLAB: A(:,3:4) = 0; and A(3,:) = 0; this converted the A from 2 x 2 to 3 x 4. I was thinking is there a similar way in python?
In Python, if the input is a numpy array, you can use np.lib.pad to pad zeros around it -
import numpy as np
A = np.array([[1, 2 ],[2, 3]]) # Input
A_new = np.lib.pad(A, ((0,1),(0,2)), 'constant', constant_values=(0)) # Output
Sample run -
In [7]: A # Input: A numpy array
Out[7]:
array([[1, 2],
[2, 3]])
In [8]: np.lib.pad(A, ((0,1),(0,2)), 'constant', constant_values=(0))
Out[8]:
array([[1, 2, 0, 0],
[2, 3, 0, 0],
[0, 0, 0, 0]]) # Zero padded numpy array
If you don't want to do the math of how many zeros to pad, you can let the code do it for you given the output array size -
In [29]: A
Out[29]:
array([[1, 2],
[2, 3]])
In [30]: new_shape = (3,4)
In [31]: shape_diff = np.array(new_shape) - np.array(A.shape)
In [32]: np.lib.pad(A, ((0,shape_diff[0]),(0,shape_diff[1])),
'constant', constant_values=(0))
Out[32]:
array([[1, 2, 0, 0],
[2, 3, 0, 0],
[0, 0, 0, 0]])
Or, you can start off with a zero initialized output array and then put back those input elements from A -
In [38]: A
Out[38]:
array([[1, 2],
[2, 3]])
In [39]: A_new = np.zeros(new_shape,dtype = A.dtype)
In [40]: A_new[0:A.shape[0],0:A.shape[1]] = A
In [41]: A_new
Out[41]:
array([[1, 2, 0, 0],
[2, 3, 0, 0],
[0, 0, 0, 0]])
In MATLAB, you can use padarray -
A_new = padarray(A,[1 2],'post')
Sample run -
>> A
A =
1 2
2 3
>> A_new = padarray(A,[1 2],'post')
A_new =
1 2 0 0
2 3 0 0
0 0 0 0
Pure Python way achieve this:
row = 3
column = 4
A = [[1, 2],[2, 3]]
A_new = map(lambda x: x + ([0] * (column - len(x))), A + ([[0] * column] * (row - len(A))))
then A_new is [[1, 2, 0, 0], [2, 3, 0, 0], [0, 0, 0, 0]].
Good to know:
[x] * n will repeat x n-times
Lists can be concatenated using the + operator
Explanation:
map(function, list) will iterate each item in list pass it to function and replace that item with the return value
A + ([[0] * column] * (row - len(A))): A is being extended with the remaining "zeroed" lists
repeat the item in [0] by the column count
repeat that array by the remaining row count
([0] * (column - len(x))): for each row item (x) add an list with the remaining count of columns using
Q: Is there a vectorised way to ...
A: Yes, there is
A = np.ones( (2,2) ) # numpy create/assign 1-s
B = np.zeros( (4,5) ) # numpy create/assign 0-s "padding" mat
B[:A.shape[0],:A.shape[1]] += A[:,:] # numpy vectorised .ADD at a cost of ~270 us
B[:A.shape[0],:A.shape[1]] = A[:,:] # numpy vectorised .STO at a cost of ~180 us
B[:A.shape[0],:A.shape[1]] = A # numpy high-level .STO at a cost of ~450 us
B
Out[4]:
array([[ 1., 1., 0., 0., 0.],
[ 1., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
Q: Is it resources efficient to "extend" the A´s data-structure in a smart way "behind the curtain"?
A: No, fortunately not much. Try bigger, big or huge sizes to feel the resources-allocation/processing costs...
Numpy has genuine data-structure "behind-the-curtain" that allows lot of smart tricks alike strided (re-)mapping, view-based operations, fast vectorised/broadcast operations, however, changing the memory-layout "accross the strided smart-mapping" is rather expensive.
For this reason numpy has added since 1.7.0 an in-built layout/mapper-modifier .lib.pad() that is well-aware & optimised so as to handle the "behind-the-curtain" structures both smart & fast.
B = np.lib.pad( A,
( ( 0, 3 ), ( 0, 2) ),
'constant',
constant_values = ( 0, 0 )
) # .pad() at a cost of ~ 270 us