Set row of csr_matrix

Set row of csr_matrix - python

I have a sparse csr_matrix, and I want to change the values of a single row to different values. I can't find an easy and efficient implementation however. This is what it has to do:
A = csr_matrix([[0, 1, 0],
[1, 0, 1],
[0, 1, 0]])
new_row = np.array([-1, -1, -1])
print(set_row_csr(A, 2, new_row).todense())
>>> [[ 0, 1, 0],
[ 1, 0, 1],
[-1, -1, -1]]
This is my current implementation of set_row_csr:
def set_row_csr(A, row_idx, new_row):
A[row_idx, :] = new_row
return A
But this gives me a SparseEfficiencyWarning. Is there a way of getting this done without manual index juggling, or is this my only way out?

physicalattraction's answer is indeed significantly quicker. It's much faster than my solution, which was to just add a separate matrix with that single row set. Though the addition solution was faster than the slicing solution.
The take away for me is that the fastest way to set rows in a csr_matrix or columns in a csc_matrix is to modify the underlying data yourself.
def time_copy(A, num_tries = 10000):
start = time.time()
for i in range(num_tries):
B = A.copy()
end = time.time()
return end - start
def test_method(func, A, row_idx, new_row, num_tries = 10000):
start = time.time()
for i in range(num_tries):
func(A.copy(), row_idx, new_row)
end = time.time()
copy_time = time_copy(A, num_tries)
print("Duration {}".format((end - start) - copy_time))
def set_row_csr_slice(A, row_idx, new_row):
A[row_idx,:] = new_row
def set_row_csr_addition(A, row_idx, new_row):
indptr = np.zeros(A.shape[1] + 1)
indptr[row_idx +1:] = A.shape[1]
indices = np.arange(A.shape[1])
A += csr_matrix((new_row, indices, indptr), shape=A.shape)
>>> A = csr_matrix((np.ones(1000), (np.random.randint(0,1000,1000), np.random.randint(0, 1000, 1000))))
>>> test_method(set_row_csr_slice, A, 200, np.ones(A.shape[1]), num_tries = 10000)
Duration 4.938395977020264
>>> test_method(set_row_csr_addition, A, 200, np.ones(A.shape[1]), num_tries = 10000)
Duration 2.4161765575408936
>>> test_method(set_row_csr, A, 200, np.ones(A.shape[1]), num_tries = 10000)
Duration 0.8432261943817139
The slice solution also scales much worse with the size and sparsity of the matrix.
# Larger matrix, same fraction sparsity
>>> A = csr_matrix((np.ones(10000), (np.random.randint(0,10000,10000), np.random.randint(0, 10000, 10000))))
>>> test_method(set_row_csr_slice, A, 200, np.ones(A.shape[1]), num_tries = 10000)
Duration 18.335174798965454
>>> test_method(set_row_csr, A, 200, np.ones(A.shape[1]), num_tries = 10000)
Duration 1.1089558601379395
# Super sparse matrix
>>> A = csr_matrix((np.ones(100), (np.random.randint(0,10000,100), np.random.randint(0, 10000, 100))))
>>> test_method(set_row_csr_slice, A, 200, np.ones(A.shape[1]), num_tries = 10000)
Duration 13.371600151062012
>>> test_method(set_row_csr, A, 200, np.ones(A.shape[1]), num_tries = 10000)
Duration 1.0454308986663818

In the end, I managed to get this done with index juggling.
def set_row_csr(A, row_idx, new_row):
'''
Replace a row in a CSR sparse matrix A.
Parameters
----------
A: csr_matrix
Matrix to change
row_idx: int
index of the row to be changed
new_row: np.array
list of new values for the row of A
Returns
-------
None (the matrix A is changed in place)
Prerequisites
-------------
The row index shall be smaller than the number of rows in A
The number of elements in new row must be equal to the number of columns in matrix A
'''
assert sparse.isspmatrix_csr(A), 'A shall be a csr_matrix'
assert row_idx < A.shape[0], \
'The row index ({0}) shall be smaller than the number of rows in A ({1})' \
.format(row_idx, A.shape[0])
try:
N_elements_new_row = len(new_row)
except TypeError:
msg = 'Argument new_row shall be a list or numpy array, is now a {0}'\
.format(type(new_row))
raise AssertionError(msg)
N_cols = A.shape[1]
assert N_cols == N_elements_new_row, \
'The number of elements in new row ({0}) must be equal to ' \
'the number of columns in matrix A ({1})' \
.format(N_elements_new_row, N_cols)
idx_start_row = A.indptr[row_idx]
idx_end_row = A.indptr[row_idx + 1]
additional_nnz = N_cols - (idx_end_row - idx_start_row)
A.data = np.r_[A.data[:idx_start_row], new_row, A.data[idx_end_row:]]
A.indices = np.r_[A.indices[:idx_start_row], np.arange(N_cols), A.indices[idx_end_row:]]
A.indptr = np.r_[A.indptr[:row_idx + 1], A.indptr[(row_idx + 1):] + additional_nnz]

This is my approach:
A = A.tolil()
A[index, :] = new_row
A = A.tocsr()
Just convert to lil_matrix, change the row and convert back.

Something is wrong with this set_row_csr. Yes, it is fast and it seemed to work for some test cases. However, it seems to garble the internal csr structure of the csr sparse matrix in my test cases. Try lil_matrix(A) afterwards and you will see error messages.

In physicalattraction's answer, the len(new_row) must be equal to A.shape[1] what may not be interesting when adding sparse rows.
So, based on his answer I've came up with a method to set rows in csr while it keeps the sparcity property. Additionally I've added a method to convert dense arrays to sparse arrays (on data, indices format)
def to_sparse(dense_arr):
sparse = [(data, index) for index, data in enumerate(dense_arr) if data != 0]
# Convert list of tuples to lists
sparse = list(map(list, zip(*sparse)))
# Return data and indices
return sparse[0], sparse[1]
def set_row_csr_unbounded(A, row_idx, new_row_data, new_row_indices):
'''
Replace a row in a CSR sparse matrix A.
Parameters
----------
A: csr_matrix
Matrix to change
row_idx: int
index of the row to be changed
new_row_data: np.array
list of new values for the row of A
new_row_indices: np.array
list of indices for new row
Returns
-------
None (the matrix A is changed in place)
Prerequisites
-------------
The row index shall be smaller than the number of rows in A
Row data and row indices must have the same size
'''
assert isspmatrix_csr(A), 'A shall be a csr_matrix'
assert row_idx < A.shape[0], \
'The row index ({0}) shall be smaller than the number of rows in A ({1})' \
.format(row_idx, A.shape[0])
try:
N_elements_new_row = len(new_row_data)
except TypeError:
msg = 'Argument new_row_data shall be a list or numpy array, is now a {0}'\
.format(type(new_row_data))
raise AssertionError(msg)
try:
assert N_elements_new_row == len(new_row_indices), \
'new_row_data and new_row_indices must have the same size'
except TypeError:
msg = 'Argument new_row_indices shall be a list or numpy array, is now a {0}'\
.format(type(new_row_indices))
raise AssertionError(msg)
idx_start_row = A.indptr[row_idx]
idx_end_row = A.indptr[row_idx + 1]
A.data = np.r_[A.data[:idx_start_row], new_row_data, A.data[idx_end_row:]]
A.indices = np.r_[A.indices[:idx_start_row], new_row_indices, A.indices[idx_end_row:]]
A.indptr = np.r_[A.indptr[:row_idx + 1], A.indptr[(row_idx + 1):] + N_elements_new_row]

Related

Python Numpy how to change the data types inside an array

I am trying to calculate information from an array that contains integers, however when I do a calculation the results are foat's. How do I change the ndarry to accept 0.xxx numbers as a input. Currently I am only getting 0's. Here is the code I have been trying to get working:
ham_fields = np.array([], dtype=float) # dtype specifies the type of the elements
ham_total = np.array([], dtype=float) # dtype specifies the type of the elements
ham_fields = data[data[:, 0] == 0] # All the first column of the dataset doing a check if they are true or false
ham_sum = np.delete((ham_fields.sum(0)),0) # Boolean indices are treated as a mask of elements to remove none Ham items
ham_total = np.sum(ham_sum)
ham_len = len(ham_sum)
for i in range(ham_len):
ham_sum[i] = (ham_sum[i] + self.alpha) / (ham_total + (ham_len * self.alpha))

ham_fields = np.array([], dtype=float)
ham_fields = data[data[:, 0] == 0]
ham_sum = np.delete((ham_fields.sum(0)),0)
This line assigns a new array object to ham_fields. The first assignment did nothing for you. In Python variables are not declared at the start.
If data has a int dtype, then so does ham_fields. You could change that with a another assignment
ham_fields = ham_fields.astype(float)
ham_sum has the same dtype as ham_fields, from which it's derived.
Assigning a float to an element of a int dtype array will not change the dtype.
for i in range(ham_len):
ham_sum[i] = (ham_sum[i] + self.alpha) / (ham_total + (ham_len * self.alpha))
If self.alpha, ham_total are scalar then you should be able to do
ham_sum = (ham_sum + self.alpha)/(ham_toal + (ham_len * self.alpha))
This makes a new array, which will be float, and assigns it to ham_sum variable. It's a new assignment (not modification) so the float dtype is preserved. Or to make things clear, assign it to a new variable name.

You can use astype(int) to convert it to an int array after the calculation
import numpy as np
array1 = np.array([1, 2, 3])
print(array1.dtype)
#output: int64
array2 = np.array([2, 3, 4])
print(array2.dtype)
#output: int64
array3 = array1 / array2
print(array3.dtype)
#output: float64
array4 = array3.astype(int)
print(array3.dtype)
#output: int64
You could also do that inside of your calculation by working with brackets:
array3 = (array1 / array2).astype(int)

How to create a matrix with negative index position?

I can create a normal matrix with numpy using
np.zeros([800, 200])
How can I create a matrix with a negative index - as in a 1600x200 matrix with row index from -800 to 800?

Not sure what you need it for but maybe you could use a dictionary instead.
a={i:0 for i in range(-800,801)}
With this you can call a[-800] to a[800].
For 2-D,
a={(i,j):0 for i in range(-800,801) for j in range(-100,101)}
This can be called with a[(-800,-100)] to a[(800,100)]

Not clear what is being asked. NumPy arrays do already support access via negative indexing, which will reach out to positions relative to the end, e.g.:
import numpy as np
m = np.arange(3 * 4).reshape((3, 4))
print(m)
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
print(m[-1, :])
# [ 8 9 10 11]
print(m[:, -1]
# [ 3 7 11]
If you need an array that is contiguous near the zero of your indices, one option would be to write a function to map each indices i to i + d // 2 (d being the size along the given axis), e.g.:
def idx2neg(indexing, shape):
new_indexing = []
for idx, dim in zip(indexing, shape):
if isinstance(idx, int):
new_indexing.append(idx + dim // 2)
...
# not implemented for slices, boolean masks, etc.
return tuple(new_indexing)
Note that the above function is not as flexible as what NumPy accepts, it is just meant to give some idea on how to proceed.

Probably you refer to the Fortran-like arbitrary indexing of arrays. This is not compatible with Python. Check the comments in this question. Basically it clashes with Python way of treating negative indexes, which is to start counting from the end (or right) of the array.

I don't know why you would need that but if you just need it for indexing try following function:
def getVal(matrix, i, k):
return matrix[i + 800][k]
This function only interpretes the first index so you can type in an index from -800 up to 799 for a 1600x200 matrix.
If you want to index relatively to its number of lines try following function:
def getVal(matrix, i, k):
return matrix[i + len(matrix) // 2][k]
Hope that helps!

Extending the dependency list, this is kind of straightforward using a pandas DataFrame with custom index.
However you will need to change slightly the syntax for how you access rows (and columns), yet there is the possibility to slice multiple rows and columns.
This is specific to 2d numpy arrays:
import numpy as np
import pandas as pd
a = np.arange(1600*200).reshape([1600, 200])
df = pd.DataFrame(a, index=range(-800, 800))
Once you have such dataframe you can access columns and rows (with few syntax inconsistencies):
Access the 1st column: df[0]
Access the 1st row: df.loc[-800]
Access rows from 1st to 100th: df.loc[-800:-700] and df[-800: -700]
Access columns from 1st to 100th: df.loc[:, 0:100]
Access rows and columns: df.loc[-800:-700, 0:100]
Full documentation on pandas slicing and indexing can be found here.

You can use the np.arange function to generate an array of integers from -800 to 800, and then reshape this array into the desired shape using the reshape method.
Here's an example of how you could do this:
import numpy as np
# Create an array of integers from -800 to 800
indices = np.arange(-800, 801)
# Reshape the array into a 1600 x 200 matrix
matrix = indices.reshape(1600, 200)
This will create a 1600 x 200 matrix with row indices ranging from -800 to 800. You can then access elements of the matrix using these negative indices just like you would with positive indices.
For example, to access the element at row -1 and column 0, you could use the following code:
matrix[-1, 0]

You can create a matrix with a negative index using the following code:
import numpy as np
my_matrix = np.zeros((1600, 200))
my_matrix = np.pad(my_matrix, ((800, 800), (0, 0)), mode='constant', constant_values=0)
my_matrix = my_matrix[-800:800, :]

You can create numpy.ndarray subclass. Take a look at the below example, it can create an array with a specific starting index.
import numpy as np
class CustomArray(np.ndarray):
def __new__(cls, input_array, startIndex=None):
obj = np.asarray(input_array)
if startIndex is not None:
if isinstance(startIndex, int):
startIndex = (startIndex, )
else:
startIndex = tuple(startIndex)
assert len(startIndex), len(obj.shape[0])
obj = obj.view(cls)
obj.startIndex = startIndex
return obj
def __array_finalize__(self, obj):
if obj is None: return
self.startIndex = getattr(obj, 'startIndex', None)
#staticmethod
def adj_index(idx, adj):
if isinstance(idx, tuple):
if not isinstance(adj, tuple):
adj = tuple([adj for i in range(len(idx))])
idx = tuple([CustomArray.adj_index(idx_i, adj_i) for idx_i, adj_i in zip(idx, adj)])
elif isinstance(idx, slice):
if isinstance(adj, tuple):
adj = adj[0]
idx = slice(idx.start-adj if idx.start is not None else idx.start,
idx.stop-adj if idx.stop is not None else idx.stop,
idx.step)
else:
if isinstance(adj, tuple):
adj = adj[0]
idx = idx - adj
return idx
def __iter__(self):
return np.asarray(self).__iter__()
def __getitem__(self, idx):
if self.startIndex is not None:
idx = self.adj_index(idx, self.startIndex)
return np.asarray(self).__getitem__(idx)
def __setitem__(self, idx, val):
if self.startIndex is not None:
idx = self.adj_index(idx, self.startIndex)
return np.asarray(self).__setitem__(idx, val)
def __repr__(self):
r = np.asarray(self).__repr__()
if self.startIndex is not None:
r += f'\n StartIndex: {self.startIndex}'
return r
Example

Numpy efficient construction of sparse coo_matrix or faster list extension

I have a list of 100k items and each item has a list of indices. I am trying to put this into a boolean sparse matrix for vector multiplication. My code isn't running as fast as I would like, so I am looking for performance tips or maybe alternative approaches for getting this data into a matrix.
rows = []
cols = []
for i, item in enumerate(items):
indices = item.getIndices()
rows += [i]*len(indices)
cols += indices
data = np.ones(len(rows), dtype='?')
mat = coo_matrix(data,(rows,cols)),shape=(len(items),totalIndices),dtype='?')
mat = mat.tocsr()
There wind up being 800k items in the rows/cols lists and just the extending of those lists seems to be taking up 16% and 13% of the building time. Converting to the coo_matrix then takes up 12%. Enumeration is taking up 13%. I got these stats from line_profiler and I am using python 3.3.

The best I can do is:
def foo3(items,totalIndices):
N = len(items)
cols=[]
cnts=[]
for item in items:
indices = getIndices(item)
cols += indices
cnts.append(len(indices))
rows = np.arange(N).repeat(cnts) # main change
data = np.ones(rows.shape, dtype=bool)
mat = sparse.coo_matrix((data,(rows,cols)),shape=(N,totalIndices))
mat = mat.tocsr()
return mat
For 100000 items it's only a 50% increase in speed.

A lot of sparse matrix algorithms run twice through the data, once to figure out the size of the sparse matrix, the other to fill it in with the right values. So perhaps it is worth trying something like this:
total_len = 0
for item in items:
total_len += len(item.getIndices())
rows = np.empty((total_len,), dtype=np.int32)
cols = np.empty((total_len,), dtype=np.int32)
total_len = 0
for i, item in enumerate(items):
indices = item.getIndices()
len_ = len(indices)
rows[total_len:total_len + len_] = i
cols[total_len:total_len + len_] = indices
total_len += len_
Followed by the same you are currently doing. You can also build the CSR matrix directly, avoiding the COO one, which will save some time as well. After the first run to find out the total size you would do:
indptr = np.empty((len(items) + 1,), dtype=np.int32)
indptr[0] = 0
indices = np.empty((total_len,), dtype=np.int32)
for i, item in enumerate(items):
item_indices = item.getIndices()
len_ = len(item_indices)
indptr[i+1] = indptr[i] + len_
indices[indptr[i]:indptr[i+1]] = item_indices
data = np.ones(total_len,), dtype=np.bool)
mat = csr_matrix((data, indices, indptr))

optimizing indexing and retrieval of elements in numpy arrays in Python?

I'm trying to optimize the following code, potentially by rewriting it in Cython: it simply takes a low dimensional but relatively long numpy arrays, looks into of its columns for 0 values, and marks those as -1 in an array. The code is:
import numpy as np
def get_data():
data = np.array([[1,5,1]] * 5000 + [[1,0,5]] * 5000 + [[0,0,0]] * 5000)
return data
def get_cols(K):
cols = np.array([2] * K)
return cols
def test_nonzero(data):
K = len(data)
result = np.array([1] * K)
# Index into columns of data
cols = get_cols(K)
# Mark zero points with -1
idx = np.nonzero(data[np.arange(K), cols] == 0)[0]
result[idx] = -1
import time
t_start = time.time()
data = get_data()
for n in range(5000):
test_nonzero(data)
t_end = time.time()
print (t_end - t_start)
data is the data. cols is the array of columns of data to look for non-zero values (for simplicity, I made it all the same column). The goal is to compute a numpy array, result, which has a 1 value for each row where the column of interest is non-zero, and -1 for the rows where the corresponding columns of interest have a zero.
Running this function 5000 times on a not-so-large array of 15,000 rows by 3 columns takes about 20 seconds. Is there a way this can be sped up? It appears that most of the work goes into finding the nonzero elements and retrieving them with indices (the call to nonzero and subsequent use of its index.) Can this be optimized or is this the best that can be done?
How could a Cython implementation gain speed on this?

cols = np.array([2] * K)
That's going to be really slow. That's create a very large python list and then converts it into a numpy array. Instead, do something like:
cols = np.ones(K, int)*2
That'll be way faster
result = np.array([1] * K)
Here you should do:
result = np.ones(K, int)
That will produce the numpy array directly.
idx = np.nonzero(data[np.arange(K), cols] == 0)[0]
result[idx] = -1
The cols is an array, but you can just pass a 2. Furthermore, using nonzero adds an extra step.
idx = data[np.arange(K), 2] == 0
result[idx] = -1
Should have the same effect.

Is there a numpy.delete() equivalent for sparse matrices?

Let's say I have a 2-dimensional matrix as a numpy array. If I want to delete rows with specific indices in this matrix, I use numpy.delete(). Here is an example of what I mean:
In [1]: my_matrix = numpy.array([
...: [10, 20, 30, 40, 50],
...: [15, 25, 35, 45, 55],
...: [95, 96, 97, 98, 99]
...: ])
In [2]: numpy.delete(my_matrix, [0, 2], axis=0)
Out[2]: array([[15, 25, 35, 45, 55]])
I'm looking for a way to do the above with matrices from the scipy.sparse package. I know it's possible to do this by converting the entire matrix into a numpy array but I don't want to do that. Is there any other way of doing that?
Thanks a lot!

For CSR, this is probably the most efficient way to do it in-place:
def delete_row_csr(mat, i):
if not isinstance(mat, scipy.sparse.csr_matrix):
raise ValueError("works only for CSR format -- use .tocsr() first")
n = mat.indptr[i+1] - mat.indptr[i]
if n > 0:
mat.data[mat.indptr[i]:-n] = mat.data[mat.indptr[i+1]:]
mat.data = mat.data[:-n]
mat.indices[mat.indptr[i]:-n] = mat.indices[mat.indptr[i+1]:]
mat.indices = mat.indices[:-n]
mat.indptr[i:-1] = mat.indptr[i+1:]
mat.indptr[i:] -= n
mat.indptr = mat.indptr[:-1]
mat._shape = (mat._shape[0]-1, mat._shape[1])
In LIL format it's even simpler:
def delete_row_lil(mat, i):
if not isinstance(mat, scipy.sparse.lil_matrix):
raise ValueError("works only for LIL format -- use .tolil() first")
mat.rows = np.delete(mat.rows, i)
mat.data = np.delete(mat.data, i)
mat._shape = (mat._shape[0] - 1, mat._shape[1])

Pv.s answer is a good and solid in-place solution that takes
a = scipy.sparse.csr_matrix((100,100), dtype=numpy.int8)
%timeit delete_row_csr(a.copy(), 0)
10000 loops, best of 3: 80.3 us per loop
for any array size. Since boolean indexing works for sparse matrices, at least in scipy >= 0.14.0, I would suggest to use it whenever multiple rows are to be removed:
def delete_rows_csr(mat, indices):
"""
Remove the rows denoted by ``indices`` form the CSR sparse matrix ``mat``.
"""
if not isinstance(mat, scipy.sparse.csr_matrix):
raise ValueError("works only for CSR format -- use .tocsr() first")
indices = list(indices)
mask = numpy.ones(mat.shape[0], dtype=bool)
mask[indices] = False
return mat[mask]
This solution takes significantly longer for a single row removal
%timeit delete_rows_csr(a.copy(), [50])
1000 loops, best of 3: 509 us per loop
But is more efficient for the removal of multiple rows, as the execution time barely increases with the number of rows
%timeit delete_rows_csr(a.copy(), numpy.random.randint(0, 100, 30))
1000 loops, best of 3: 523 us per loop

In addition to #loli's version of #pv's answer, I expanded their function to allow for row and/or column deletion by index on CSR matrices.
import numpy as np
from scipy.sparse import csr_matrix
def delete_from_csr(mat, row_indices=[], col_indices=[]):
"""
Remove the rows (denoted by ``row_indices``) and columns (denoted by ``col_indices``) from the CSR sparse matrix ``mat``.
WARNING: Indices of altered axes are reset in the returned matrix
"""
if not isinstance(mat, csr_matrix):
raise ValueError("works only for CSR format -- use .tocsr() first")
rows = []
cols = []
if row_indices:
rows = list(row_indices)
if col_indices:
cols = list(col_indices)
if len(rows) > 0 and len(cols) > 0:
row_mask = np.ones(mat.shape[0], dtype=bool)
row_mask[rows] = False
col_mask = np.ones(mat.shape[1], dtype=bool)
col_mask[cols] = False
return mat[row_mask][:,col_mask]
elif len(rows) > 0:
mask = np.ones(mat.shape[0], dtype=bool)
mask[rows] = False
return mat[mask]
elif len(cols) > 0:
mask = np.ones(mat.shape[1], dtype=bool)
mask[cols] = False
return mat[:,mask]
else:
return mat

You can delete row 0 < i < X.shape[0] - 1 from a CSR matrix X with
scipy.sparse.vstack([X[:i, :], X[i:, :]])
You can delete the first or the last row with X[1:, :] or X[:-1, :], respectively. Deleting multiple rows in one gone will probably require rolling your own function.
For other formats than CSR, this might not necessarily work as not all formats support row slicing.

To remove the i'th row from A simply use left matrix multiplication:
B = J*A
where J is a sparse identity matrix with i'th row removed.
Left multiplication by the transpose of J will insert a zero-vector back to the i'th row of B, which makes this solution a bit more general.
A0 = J.T * B
To construct J itself, I used pv.'s solution on a sparse diagonal matrix as follows (maybe there's a simpler solution for this special case?)
def identity_minus_rows(N, rows):
if np.isscalar(rows):
rows = [rows]
J = sps.diags(np.ones(N), 0).tocsr() # make a diag matrix
for r in sorted(rows):
J = delete_row_csr(J, r)
return J
You may also remove columns by right-multiplying by J.T of the appropriate size.
Finally, multiplication is efficient in this case because J is so sparse.

Note that sparse matrices support fancy indexing to some degree. So what you can do is this:
mask = np.ones(len(mat), dtype=bool)
mask[rows_to_delete] = False
# unfortunatly I think boolean indexing does not work:
w = np.flatnonzero(mask)
result = s[w,:]
The delete method doesn't really do anything else either.

Using #loli implementation, here I leave a function to remove columns:
def delete_cols_csr(mat, indices):
"""
Remove the cols denoted by ``indices`` form the CSR sparse matrix ``mat``.
"""
if not isinstance(mat, csr_matrix):
raise ValueError("works only for CSR format -- use .tocsr() first")
indices = list(indices)
mask = np.ones(mat.shape[1], dtype=bool)
mask[indices] = False
return mat[:,mask]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Set row of csr_matrix - python

This is my approach: A = A.tolil() A[index, :] = new_row A = A.tocsr() Just convert to lil_matrix, change the row and convert back.

Something is wrong with this set_row_csr. Yes, it is fast and it seemed to work for some test cases. However, it seems to garble the internal csr structure of the csr sparse matrix in my test cases. Try lil_matrix(A) afterwards and you will see error messages.

Related

Python Numpy how to change the data types inside an array

How to create a matrix with negative index position?

Numpy efficient construction of sparse coo_matrix or faster list extension

optimizing indexing and retrieval of elements in numpy arrays in Python?

Is there a numpy.delete() equivalent for sparse matrices?

Categories

Resources