What does csr_matrix.sort_indices do?

What does csr_matrix.sort_indices do? - python

I make a csr_matrix in the following way:
>>> A = sparse.csr_matrix([[0, 1, 0],
[1, 0, 1],
[0, 1, 0]])
>>> A[2,:] = np.array([-1, -2, -3])
>>> A.indptr
Out[12]: array([0, 1, 3, 6], dtype=int32)
>>> A.indices
Out[13]: array([1, 0, 2, 0, 2, 1], dtype=int32)
>>> A.data
Out[14]: array([ 1, 1, 1, -1, -3, -2], dtype=int64)
Now I want to interchange the last two elements in the indices and data arrays, so I try:
>>> A.sort_indices()
This does not do anything to my matrix however. The manual for this function only states that it sorts the indices.
What does this function do? In which condition can you see a difference?
How can I sort the indices and data arrays, such that for each row the indices are sorted?

As stated in the document, A.sort_indices() sorts the indices in-place. But there is a cache: if A.has_sorted_indices is True, it won't do anything (the cache was introduced at 0.7.0).
So, in order to see a difference, you need to manually set A.has_sorted_indices to False.
>>> A.has_sorted_indices, A.indices
(True, array([1, 0, 2, 0, 2, 1], dtype=int32))
>>> A.sort_indices()
>>> A.has_sorted_indices, A.indices
(True, array([1, 0, 2, 0, 2, 1], dtype=int32))
>>> A.has_sorted_indices = False
>>> A.sort_indices()
>>> A.has_sorted_indices, A.indices
(True, array([1, 0, 2, 0, 1, 2], dtype=int32))
Note that, unlike what OP has indicated, as of SciPy 0.19.0 running A[2, :] = [-1, -2, -3] no longer produces an out-of-order index (this should have been fixed in 0.14.0). On the other hand, this operation produces a warning:
SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
Anyway, we could easily produce out-of-order index in other ways, e.g. by matrix multiplication:
>>> B = scipy.sparse.csr_matrix([[0, 1, 0], [1, 0, 1], [0, 1, 0]])
>>> C = B*B
>>> C.has_sorted_indices, C.indices
(0, array([2, 0, 1, 2, 0], dtype=int32))
>>> C.sort_indices()
>>> C.has_sorted_indices, C.indices
(True, array([0, 2, 1, 0, 2], dtype=int32))

Related

Error while merging split part of numpy array

test = [(0,1,2),(9,0,1),(0,1,3),(0,1,8)]
test=np.array(test)
test = np.array_split(test, 4)
t_0 = test[:0]
t_1 = test[0]
new_test= t_0+test[1:]
print(new_test)
It is giving me following answer : [array([[9, 0, 1]]), array([[0, 1, 3]]), array([[0, 1, 8]])]
Whereas I am aiming for [(9,0,1),(0,1,3),(0,1,8)] if I select the first set from test.

You are mixing up lists and arrays. Pay attention to what you get at each stage:
Start with a list of tuples:
In [126]: test = [(0,1,2),(9,0,1),(0,1,3),(0,1,8)]
Make a 2d array:
In [127]: arr = np.array(test)
In [128]: arr
Out[128]:
array([[0, 1, 2],
[9, 0, 1],
[0, 1, 3],
[0, 1, 8]])
Split into a list - one 'row' per element, but each is a 2d array. Question, will the split number always be this size?
In [129]: alist = np.array_split(arr, arr.shape[0])
In [130]: alist
Out[130]:
[array([[0, 1, 2]]),
array([[9, 0, 1]]),
array([[0, 1, 3]]),
array([[0, 1, 8]])]
Sublists:
In [131]: alist[:0]
Out[131]: []
In [132]: alist[1:]
Out[132]: [array([[9, 0, 1]]), array([[0, 1, 3]]), array([[0, 1, 8]])]
List join:
In [133]: alist[:0] + alist[1:]
Out[133]: [array([[9, 0, 1]]), array([[0, 1, 3]]), array([[0, 1, 8]])]
Looks like what you want is a list of tuples, like what you started with:
In [134]: test[:0] + test[1:]
Out[134]: [(9, 0, 1), (0, 1, 3), (0, 1, 8)]
You could recreate a 2d array, by applying concatenate to the joined lists of arrays:
In [135]: np.concatenate(alist[:0] + alist[1:])
Out[135]:
array([[9, 0, 1],
[0, 1, 3],
[0, 1, 8]])
In [136]: np.concatenate(alist[:1] + alist[2:])
Out[136]:
array([[0, 1, 2],
[0, 1, 3],
[0, 1, 8]])
In [137]: np.concatenate(alist[:2] + alist[3:])
Out[137]:
array([[0, 1, 2],
[9, 0, 1],
[0, 1, 8]])
But note that you could just as easily get any of these arrays with indexing:
In [138]: arr[[0,1,3],:]
Out[138]:
array([[0, 1, 2],
[9, 0, 1],
[0, 1, 8]])
And with the r_ helper you could construct the indices from ranges:
In [139]: np.r_[:2, 3:4]
Out[139]: array([0, 1, 3])
In [140]: arr[np.r_[:2, 3:4],:]
Out[140]:
array([[0, 1, 2],
[9, 0, 1],
[0, 1, 8]])
You could also do the join after indexing:
In [141]: np.concatenate([arr[:2,:], arr[3:,:]], axis=0)
Out[141]:
array([[0, 1, 2],
[9, 0, 1],
[0, 1, 8]])
+ is define for lists a a join operator. For arrays it is addition. concatenate (along with various stack variants) is the array join function.

np.choose not giving desired result after broadcasting

I would like to pick the nth elements as specified in maxsuit from suitCounts. I did broadcast the maxsuit array so I do get a result, but not the desired one. Any suggestions what I'm doing conceptually wrong is appreciated. I don't understand the result of np.choose(self.maxsuit[:,:,None]-1, self.suitCounts), which is not what I'm looking for.
>>> self.maxsuit
Out[38]:
array([[3, 3],
[1, 1],
[1, 1]], dtype=int64)
>>> self.maxsuit[:,:,None]-1
Out[33]:
array([[[2],
[2]],
[[0],
[0]],
[[0],
[0]]], dtype=int64)
>>> self.suitCounts
Out[34]:
array([[[2, 1, 3, 0],
[1, 0, 3, 0]],
[[4, 1, 2, 0],
[3, 0, 3, 0]],
[[2, 2, 0, 0],
[1, 1, 1, 0]]])
>>> np.choose(self.maxsuit[:,:,None]-1, self.suitCounts)
Out[35]:
array([[[2, 2, 0, 0],
[1, 1, 1, 0]],
[[2, 1, 3, 0],
[1, 0, 3, 0]],
[[2, 1, 3, 0],
[1, 0, 3, 0]]])
The desired result would be:
[[3,3],[4,3],[2,1]]

You could use advanced-indexing for a broadcasted way to index into the array, like so -
In [415]: val # Data array
Out[415]:
array([[[2, 1, 3, 0],
[1, 0, 3, 0]],
[[4, 1, 2, 0],
[3, 0, 3, 0]],
[[2, 2, 0, 0],
[1, 1, 1, 0]]])
In [416]: idx # Indexing array
Out[416]:
array([[3, 3],
[1, 1],
[1, 1]])
In [417]: m,n = val.shape[:2]
In [418]: val[np.arange(m)[:,None],np.arange(n),idx-1]
Out[418]:
array([[3, 3],
[4, 3],
[2, 1]])
A bit cleaner way with np.ogrid to use open range arrays -
In [424]: d0,d1 = np.ogrid[:m,:n]
In [425]: val[d0,d1,idx-1]
Out[425]:
array([[3, 3],
[4, 3],
[2, 1]])

This is the best I can do with choose
In [23]: np.choose([[1,2,0],[1,2,0]], suitcounts[:,:,:3])
Out[23]:
array([[4, 2, 3],
[3, 1, 3]])
choose prefers that we use a list of arrays, rather than single one. It's supposed to prevent misuse. So the problem could be written as:
In [24]: np.choose([[1,2,0],[1,2,0]], [suitcounts[0,:,:3], suitcounts[1,:,:3], suitcounts[2,:,:3]])
Out[24]:
array([[4, 2, 3],
[3, 1, 3]])
The idea is to select items from the 3 subarrays, based on an index array like:
In [25]: np.array([[1,2,0],[1,2,0]])
Out[25]:
array([[1, 2, 0],
[1, 2, 0]])
The output will match the indexing array in shape. The choise arrays have match in shape as well, hence my use of [...,:3].
Values for the first column are selected from suitcounts[1,:,:3], for the 2nd column from suitcounts[2...] etc.
choose is limited to 32 choices; this is limitation imposed by the broadcasting mechanism.
Speaking of broadcasting I could simplify the expression
In [26]: np.choose([1,2,0], suitcounts[:,:,:3])
Out[26]:
array([[4, 2, 3],
[3, 1, 3]])
This broadcasts [1,2,0] to match the 2x3 shape of the subarrays.
I could get the target order by reordering the columns:
In [27]: np.choose([0,1,2], suitcounts[:,:,[2,0,1]])
Out[27]:
array([[3, 4, 2],
[3, 3, 1]])

How do I set cell values in `np.array()` based on condition?

I have a numpy array and a list of valid values in that array:
import numpy as np
arr = np.array([[1,2,0], [2,2,0], [4,1,0], [4,1,0], [3,2,0], ... ])
valid = [1,4]
Is there a nice pythonic way to set all array values to zero, that are not in the list of valid values and do it in-place? After this operation, the list should look like this:
[[1,0,0], [0,0,0], [4,1,0], [4,1,0], [0,0,0], ... ]
The following creates a copy of the array in memory, which is bad for large arrays:
arr = np.vectorize(lambda x: x if x in valid else 0)(arr)
It bugs me, that for now I loop over each array element and set it to zero if it is in the valid list.
Edit: I found an answer suggesting there is no in-place function to achieve this. Also stop changing my whitespaces. It's easier to see the changes in arr whith them.

You can use np.place for an in-situ update -
np.place(arr,~np.in1d(arr,valid),0)
Sample run -
In [66]: arr
Out[66]:
array([[1, 2, 0],
[2, 2, 0],
[4, 1, 0],
[4, 1, 0],
[3, 2, 0]])
In [67]: np.place(arr,~np.in1d(arr,valid),0)
In [68]: arr
Out[68]:
array([[1, 0, 0],
[0, 0, 0],
[4, 1, 0],
[4, 1, 0],
[0, 0, 0]])
Along the same lines, np.put could also be used -
np.put(arr,np.where(~np.in1d(arr,valid))[0],0)
Sample run -
In [70]: arr
Out[70]:
array([[1, 2, 0],
[2, 2, 0],
[4, 1, 0],
[4, 1, 0],
[3, 2, 0]])
In [71]: np.put(arr,np.where(~np.in1d(arr,valid))[0],0)
In [72]: arr
Out[72]:
array([[1, 0, 0],
[0, 0, 0],
[4, 1, 0],
[4, 1, 0],
[0, 0, 0]])

Indexing with booleans would work too:
>>> arr = np.array([[1, 2, 0], [2, 2, 0], [4, 1, 0], [4, 1, 0], [3, 2, 0]])
>>> arr[~np.in1d(arr, valid).reshape(arr.shape)] = 0
>>> arr
array([[1, 0, 0],
[0, 0, 0],
[4, 1, 0],
[4, 1, 0],
[0, 0, 0]])

making an array of n columns where each successive row increases by one

In numpy, I would like to be able to input n for rows and m for columns and end with the array that looks like:
[(0,0,0,0),
(1,1,1,1),
(2,2,2,2)]
So that would be a 3x4. Each column is just a copy of the previous one and the row increases by one each time. As an example:
input would be 4, then 6 and the output would be and array
[(0,0,0,0,0,0),
(1,1,1,1,1,1),
(2,2,2,2,2,2),
(3,3,3,3,3,3)]
4 rows and 6 columns where the row increases by one each time. Thanks for your time.

So many possibilities...
In [51]: n = 4
In [52]: m = 6
In [53]: np.tile(np.arange(n), (m, 1)).T
Out[53]:
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
In [54]: np.repeat(np.arange(n).reshape(-1,1), m, axis=1)
Out[54]:
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
In [55]: np.outer(np.arange(n), np.ones(m, dtype=int))
Out[55]:
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
Here's one more. The neat trick here is that the values are not duplicated--only memory for the single sequence [0, 1, 2, ..., n-1] is allocated.
In [67]: from numpy.lib.stride_tricks import as_strided
In [68]: seq = np.arange(n)
In [69]: rep = as_strided(seq, shape=(n,m), strides=(seq.strides[0],0))
In [70]: rep
Out[70]:
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
Be careful with the as_strided function. If you don't get the arguments right, you can crash Python.
To see that seq has not been copied, change seq in place, and then check rep:
In [71]: seq[1] = 99
In [72]: rep
Out[72]:
array([[ 0, 0, 0, 0, 0, 0],
[99, 99, 99, 99, 99, 99],
[ 2, 2, 2, 2, 2, 2],
[ 3, 3, 3, 3, 3, 3]])

import numpy as np
def foo(n, m):
return np.array([np.arange(n)] * m).T

Natively (no Python lists):
rows, columns = 4, 6
numpy.arange(rows).reshape(-1, 1).repeat(columns, axis=1)
#>>> array([[0, 0, 0, 0, 0, 0],
#>>> [1, 1, 1, 1, 1, 1],
#>>> [2, 2, 2, 2, 2, 2],
#>>> [3, 3, 3, 3, 3, 3]])

You can easily do this using built in python functions. The program counts to 3 converting each number to a string and repeats the string 6 times.
print [6*str(n) for n in range(0,4)]
Here is the output.
ks-MacBook-Pro:~ kyle$ pbpaste | python
['000000', '111111', '222222', '333333']

On more for fun
np.zeros((n, m), dtype=np.int) + np.arange(n, dtype=np.int)[:,None]

As has been mentioned, there are many ways to do this.
Here's what I'd do:
import numpy as np
def makearray(m, n):
A = np.empty((m,n))
A.T[:] = np.arange(m)
return A
Here's an amusing alternative that will work if you aren't going to be changing the contents of the array.
It should save some memory.
Be careful though because this doesn't allocate a full array, it will have multiple entries pointing to the same memory address.
import numpy as np
from numpy.lib.stride_tricks import as_strided
def makearray(m, n):
A = np.arange(m)
return as_strided(A, strides=(A.strides[0],0), shape=(m,n))
In either case, as I have written them, a 3x4 array can be created by makearray(3, 4)

Using count from the built-in module itertools:
>>> from itertools import count
>>> rows = 4
>>> columns = 6
>>> cnt = count()
>>> [[cnt.next()]*columns for i in range(rows)]
[[0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1], [2, 2, 2, 2, 2, 2], [3, 3, 3, 3, 3, 3]]

you can simply
>>> nc=5
>>> nr=4
>>> [[k]*nc for k in range(nr)]
[[0, 0, 0, 0, 0], [1, 1, 1, 1, 1], [2, 2, 2, 2, 2], [3, 3, 3, 3, 3]]

Several other possibilities using a (n,1) array
a = np.arange(n)[:,None] (or np.arange(n).reshape(-1,1))
a*np.ones((m),dtype=int)
a[:,np.zeros((m),dtype=int)]
If used with a (m,) array, just leave it (n,1), and let broadcasting expand it for you.

Python: multidimensional array masking

What is equivalent pythonic implementation for the following simple piece of code in Matlab.
Matlab:
B = 2D array of integers as indices [1...100]
A = 2D array of numbers: [10x10]
A[B] = 0
which works well as for example for B[i]=42 it finds the position 2 of column 5 to be set.
In Python it causes an error: out of bound which is logical. However to translate the above Matlab code into Python we are looking for pythonic ways.
Please also consider the problem for higher dimensions such as:
B = 2D array of integers as indices [1...3000]
C = 3D array of numbers: [10x10x30]
C[B] = 0
One way we thought about it is to reform indices array elements as i,j instead of being absolute position. That is, position 42 to divmod(42,m=10)[::-1] >>> (2,4). So we will have a nx2 >>> ii,jj vectors of indices which can be used for indexing A easily.
We thought that it might be a better way, efficient also for higher dimensions in Python.

You can use .ravel() on the array (A) before indexing it, and then .reshape() after.
Alternatively, since you know A.shape, you can use np.unravel_index on the other array (B) before indexing.
Example 1:
>>> import numpy as np
>>> A = np.ones((5,5), dtype=int)
>>> B = [1, 3, 7, 23]
>>> A
array([[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]])
>>> A_ = A.ravel()
>>> A_[B] = 0
>>> A_.reshape(A.shape)
array([[1, 0, 1, 0, 1],
[1, 1, 0, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 0, 1]])
Example 2:
>>> b_row, b_col = np.vstack([np.unravel_index(b, A.shape) for b in B]).T
>>> A[b_row, b_col] = 0
>>> A
array([[1, 0, 1, 0, 1],
[1, 1, 0, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 0, 1]])
Discovered later: you can use numpy.put
>>> import numpy as np
>>> A = np.ones((5,5), dtype=int)
>>> B = [1, 3, 7, 23]
>>> A.put(B, [0]*len(B))
>>> A
array([[1, 0, 1, 0, 1],
[1, 1, 0, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 0, 1]])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

What does csr_matrix.sort_indices do? - python

Related

Error while merging split part of numpy array

np.choose not giving desired result after broadcasting

How do I set cell values in `np.array()` based on condition?

making an array of n columns where each successive row increases by one

Python: multidimensional array masking

Categories

Resources