puzzled on how to slice a numpy array

puzzled on how to slice a numpy array - python

m is a ndarray with shape (12, 21, 21), now I want to take only a sparse slice of it to form a new 2D array, with
sliceid = 0
indx = np.array([0, 2, 4, 6, 8, 10])
so that sparse_slice is, intuitively,
sparse_slice = m[sliceid, indx, indx]
but apparently the above operation does not work, currently what I am using is
sparse_slice = m[sliceid,indx,:][:, indx]
why is the first "intuitive" way not working? and is there a more compact way than my current solution? all my previous ndarray slicing trials were based on nothing but intuition, maybe I shall switch to read some serious manual now...

The more compact way is to do new = m[0, :12:2, :12:2]. This is what the numpy docs call "basic indexing" meaning that you slice with an integer or a slice object (ie 0:12:2). When you use basic indexing numpy returns a view of the original array. For example:
In [3]: a = np.zeros((2, 3, 4))
In [4]: b = a[0, 1, ::2]
In [5]: b
Out[5]: array([ 0., 0.])
In [6]: b[:] = 7
In [7]: a
Out[7]:
array([[[ 0., 0., 0., 0.],
[ 7., 0., 7., 0.],
[ 0., 0., 0., 0.]],
[[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]]])
In you "intuitive" approach what you're doing is indexing an array with another array. When you index an numpy array with another array the arrays need to be the same size (or they need to broadcast against each other, more on this in a sec). In the docs this is called fancy indexing or advanced indexing. For example:
In [10]: a = np.arange(9).reshape(3,3)
In [11]: a
Out[11]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [12]: index = np.array([0,1,2])
In [13]: b = a[index, index]
In [14]: b
Out[14]: array([0, 4, 8])
You see that I get a[0,0], a[1,1], and a[2,2] not a[0,0], a[0,1] ... If you want the "outer product" of index with index you can do the following.
In [22]: index1 = np.array([[0,0],[1,1]])
In [23]: index2 = np.array([[0,1],[0,1]])
In [24]: b = a[index1, index2]
In [25]: b
Out[25]:
array([[0, 1],
[3, 4]])
There is a shorthand for doing the above, like this:
In [28]: index = np.array([0,1])
In [29]: index1, index2 = np.ix_(index, index)
In [31]: index1
Out[31]:
array([[0],
[1]])
In [32]: index2
Out[32]: array([[0, 1]])
In [33]: a[index1, index2]
Out[33]:
array([[0, 1],
[3, 4]])
In [34]: a[np.ix_(index, index)]
Out[34]:
array([[0, 1],
[3, 4]])
You'll notice that index1 is (2, 1) and index2 is (1, 2), not (2, 2). That's because the two arrays get broadcast against one another, you can read more about broadcasting here. Keep in mind that when you're using fancy indexing you get a copy of the original data not a view. Sometimes this is better (if you want to leave the original data unchanged) and sometimes it just takes more memory. More about indexing here.

If I'm not mistaken, for input m = np.array(range(5292)).reshape(12,21,21) you are expecting output sparse_slice = m[sliceid,indx,:][:, indx] of
array([[ 0, 2, 4, 6, 8, 10],
[ 42, 44, 46, 48, 50, 52],
[ 84, 86, 88, 90, 92, 94],
[126, 128, 130, 132, 134, 136],
[168, 170, 172, 174, 176, 178],
[210, 212, 214, 216, 218, 220]])
In that case, you can get it using the step part of a slice:
m[0, :12:2, :12:2]

Related

Create a new sparse matrix from the operations of different rows of a given large sparse matrix in python

Lets say , S is the large scipy-csr-matrix(sparse) and a dictionary D with key -> index(position) of the row vector A in S & values -> list of all the indices(positions) of other row vectors l in S. For each row vector in l you subtract A and get the new vector which will be nothing but the new row vector to be updated in the new sparse matrix.
dictionary of form -> { 1 : [4 , 5 ,... ,63] }
then have to create a new sparse matrix with....
new_row_vector_1 -> S_vec1 - S_vec4
new_row_vector_2 -> S_vec1 - S_vec5
.
new_row_vector_n -> S_vec1 - S_vec63
where S_vecX is the Xth row vector of matrix S
Check out the pictorial explanation of the above statements
Numpy Example:
>>> import numpy as np
>>> s = np.array([[1,5,3,4],[3,0,12,7],[5,6,2,4],[4,6,6,4],[7,12,5,67]])
>>> s
array([[ 1, 5, 3, 4],
[ 3, 0, 12, 7],
[ 5, 6, 2, 4],
[ 4, 6, 6, 4],
[ 7, 12, 5, 67]])
>>> index_dictionary = {0: [2, 4], 1: [3, 4], 2: [1], 3: [1, 2], 4: [1, 3, 2]}
>>> n = np.zeros((10,4)) #sum of all lengths of values in index_dictionary would be the number of rows for the new array(n) and columns remain the same as s.
>>> n
array([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])
>>> idx = 0
>>> for index in index_dictionary:
... for k in index_dictionary[index]:
... n[idx] = s[index]-s[k]
... idx += 1
...
>>> n
array([[ -4., -1., 1., 0.],
[ -6., -7., -2., -63.],
[ -1., -6., 6., 3.],
[ -4., -12., 7., -60.],
[ 2., 6., -10., -3.],
[ 1., 6., -6., -3.],
[ -1., 0., 4., 0.],
[ 4., 12., -7., 60.],
[ 3., 6., -1., 63.],
[ 2., 6., 3., 63.]])
n is what i want.

Here's a simple demonstration of what I think you are trying to do:
First the numpy array version:
In [619]: arr=np.arange(12).reshape(4,3)
In [620]: arr[[1,0,2,3]]-arr[0]
Out[620]:
array([[3, 3, 3],
[0, 0, 0],
[6, 6, 6],
[9, 9, 9]])
Now the sparse equivalent:
In [622]: M=sparse.csr_matrix(arr)
csr implements row indexing:
In [624]: M[[1,0,2,3]]
Out[624]:
<4x3 sparse matrix of type '<class 'numpy.int32'>'
with 11 stored elements in Compressed Sparse Row format>
In [625]: M[[1,0,2,3]].A
Out[625]:
array([[ 3, 4, 5],
[ 0, 1, 2],
[ 6, 7, 8],
[ 9, 10, 11]], dtype=int32)
But not broadcasting:
In [626]: M[[1,0,2,3]]-M[0]
....
ValueError: inconsistent shapes
So we can use an explicit form of broadcasting
In [627]: M[[1,0,2,3]]-M[[0,0,0,0]] # or M[[0]*4]
Out[627]:
<4x3 sparse matrix of type '<class 'numpy.int32'>'
with 9 stored elements in Compressed Sparse Row format>
In [628]: _.A
Out[628]:
array([[3, 3, 3],
[0, 0, 0],
[6, 6, 6],
[9, 9, 9]], dtype=int32)
This may not be the fastest or most efficient, but it's a start.
I found in a previous SO question that M[[1,0,2,3]] indexing is performed with a matrix multiplication, in this case the equivalent of:
idxM = sparse.csr_matrix(([1,1,1,1],([0,1,2,3],[1,0,2,3])),(4,4))
M1 = idxM * M
Sparse matrix slicing using list of int
So my difference expression requires 2 such multiplication along with the subtraction.
We could try a row by row iteration, and building a new matrix from the result, but there's no guarantee that it will be faster. Depending on the arrays, converting to dense and back might even be faster.
=================
I can imagine 2 ways of applying this to the dictionary.
One is to iterate through the dictionary (what order?), perform this difference for each key, collect the results in a list (list of sparse matrices), and use sparse.bmat to join them into one matrix.
Another is to collect two lists of indexes, and apply the above indexed difference just once.
In [8]: index_dictionary = {0: [2, 4], 1: [3, 4], 2: [1], 3: [1, 2], 4: [1, 3, 2]}
In [10]: alist=[]
...: for index in index_dictionary:
...: for k in index_dictionary[index]:
...: alist.append((index, k))
In [11]: idx = np.array(alist)
In [12]: idx
Out[12]:
array([[0, 2],
[0, 4],
[1, 3],
[1, 4],
[2, 1],
[3, 1],
[3, 2],
[4, 1],
[4, 3],
[4, 2]])
applied to your dense s:
In [15]: s = np.array([[1,5,3,4],[3,0,12,7],[5,6,2,4],[4,6,6,4],[7,12,5,67]])
In [16]: s[idx[:,0]]-s[idx[:,1]]
Out[16]:
array([[ -4, -1, 1, 0],
[ -6, -7, -2, -63],
[ -1, -6, 6, 3],
[ -4, -12, 7, -60],
[ 2, 6, -10, -3],
[ 1, 6, -6, -3],
[ -1, 0, 4, 0],
[ 4, 12, -7, 60],
[ 3, 6, -1, 63],
[ 2, 6, 3, 63]])
and to the sparse equivalent
In [19]: arr= sparse.csr_matrix(s)
In [20]: arr
Out[20]:
<5x4 sparse matrix of type '<class 'numpy.int32'>'
with 19 stored elements in Compressed Sparse Row format>
In [21]: res=arr[idx[:,0]]-arr[idx[:,1]]
In [22]: res
Out[22]:
<10x4 sparse matrix of type '<class 'numpy.int32'>'
with 37 stored elements in Compressed Sparse Row format>

Specifying dtype=object for numpy.gradient

Is there a way to specify the dtype for numpy.gradient?
I'm using an array of subarrays and it's throwing the following error:
ValueError: setting an array element with a sequence.
Here is an example:
import numpy as np
a = np.empty([3, 3], dtype=object)
it = np.nditer(a, flags=['multi_index', 'refs_ok'])
while not it.finished:
i = it.multi_index[0]
j = it.multi_index[1]
a[it.multi_index] = np.array([i, j])
it.iternext()
print(a)
which outputs
[[array([0, 0]) array([0, 1]) array([0, 2])]
[array([1, 0]) array([1, 1]) array([1, 2])]
[array([2, 0]) array([2, 1]) array([2, 2])]]
I would like print(np.gradient(a)) to return
array(
[[array([[1, 0],[0, 1]]), array([[1, 0], [0, 1]]), array([[1, 0], [0, 1]])],
[array([[1, 0], [0, 1]]), array([[1, 0], [0, 1]]), array([[1, 0],[0, 1]])],
[array([[1, 0], [0, 1]]), array([[1, 0], [0, 1]]), array([[1, 0],[0, 1]])]],
dtype=object)
Notice that, in this case, the gradient of the vector field is an identity tensor field.

why are you working an array of dtype object? That's more work than using a 2d array.
e.g.
In [53]: a1=np.array([[1,2],[3,4],[5,6]])
In [54]: a1
Out[54]:
array([[1, 2],
[3, 4],
[5, 6]])
In [55]: np.gradient(a1)
Out[55]:
[array([[ 2., 2.],
[ 2., 2.],
[ 2., 2.]]),
array([[ 1., 1.],
[ 1., 1.],
[ 1., 1.]])]
or working column by column, or row by row
In [61]: [np.gradient(i) for i in a1.T]
Out[61]: [array([ 2., 2., 2.]), array([ 2., 2., 2.])]
In [62]: [np.gradient(i) for i in a1]
Out[62]: [array([ 1., 1.]), array([ 1., 1.]), array([ 1., 1.])]
dtype=object only make sense if the subarrays/lists differ in type and/or shape. And even then it doesn't add much to a regular Python list.
==============================
I can take your 2d a, and make a 3d array with:
In [126]: a1=np.zeros((3,3,2),int)
In [127]: a1.flat[:]=[i for i in a.flatten()]
In [128]: a1
Out[128]:
array([[[0, 0],
[0, 1],
[0, 2]],
[[1, 0],
[1, 1],
[1, 2]],
[[2, 0],
[2, 1],
[2, 2]]])
Or I could produce the same thing with meshgrid:
In [129]: X,Y=np.meshgrid(np.arange(3),np.arange(3),indexing='ij')
In [130]: a2=np.array([Y,X]).T
When I apply np.gradient to that I get 3 arrays, each (3,3,2) in shape.
In [136]: ga1=np.gradient(a1)
In [137]: len(ga1)
Out[137]: 3
In [138]: ga1[0].shape
Out[138]: (3, 3, 2)
It looks like the 1st 2 arrays have the values you want, so it's just a matter of rearranging them.
In [141]: np.array(ga1[:2]).shape
Out[141]: (2, 3, 3, 2)
In [143]: gga1=np.array(ga1[:2]).transpose([1,2,0,3])
In [144]: gga1.shape
Out[144]: (3, 3, 2, 2)
In [145]: gga1[0,0]
Out[145]:
array([[ 1., -0.],
[-0., 1.]])
If they must go back into a (3,3) object array, I could do:
In [146]: goa1=np.empty([3,3],dtype=object)
In [147]: for i in range(3):
for j in range(3):
goa1[i,j]=gga1[i,j]
.....:
In [148]: goa1
Out[148]:
array([[array([[ 1., -0.],
[-0., 1.]]),
array([[ 1., -0.],
[ 0., 1.]]),
array([[ 1., -0.],
...
[ 0., 1.]]),
array([[ 1., 0.],
[ 0., 1.]])]], dtype=object)
I still wonder what's the point to working with a object array.

How to modify specific field in ndarray matrix

I have a matrix like this
x = [[a, b],
[c, d]]
But instead of a,b,c,d for each of those values there's a list of numbers, for example [x, xx, xxx].
I would like to create another matrix that would have ones only on positions where x==0 && xx==0 && xxx==0. How can I do that without loops? For example, I could do B = [x == 0], but how can I do that where there's a list instead of single matrix element?

If the list is of fixed length, you can create a 3d array and then use np.all() on its last axis:
In [1]: import numpy as np
In [2]: a = np.zeros((2, 2, 3)) # 2x2 matrix, 3 variants for each element
In [3]: a[0, 0] = [0, 1, 2] # filling one element of the "matrix"
In [4]: a[0, 1] = 1
In [5]: a[1, 1] = 0 # this
In [6]: a[1, 0] = 0 # and this are "all zeros"
In [7]: a
Out[7]:
array([[[ 0., 1., 2.],
[ 1., 1., 1.]],
[[ 0., 0., 0.],
[ 0., 0., 0.]]])
Now let's construct the matrix b:
In [8]: np.all(a == 0, axis=-1).astype(int)
Out[8]:
array([[0, 0],
[1, 1]])
If you want another condition, you can modify the expression in the following way:
In [9]: np.all(a - [0, 1, 2] == 0, axis=-1).astype(int)
Out[9]:
array([[1, 0],
[0, 0]])

size of NumPy array

Is there an equivalent to the MATLAB size() command in Numpy?
In MATLAB,
>>> a = zeros(2,5)
0 0 0 0 0
0 0 0 0 0
>>> size(a)
2 5
In Python,
>>> a = zeros((2,5))
>>> a
array([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
>>> ?????

This is called the "shape" in NumPy, and can be requested via the .shape attribute:
>>> a = zeros((2, 5))
>>> a.shape
(2, 5)
If you prefer a function, you could also use numpy.shape(a).

Yes numpy has a size function, and shape and size are not quite the same.
Input
import numpy as np
data = [[1, 2, 3, 4], [5, 6, 7, 8]]
arrData = np.array(data)
print(data)
print(arrData.size)
print(arrData.shape)
Output
[[1, 2, 3, 4], [5, 6, 7, 8]]
8 # size
(2, 4) # shape

[w,k] = a.shape will give you access to individual sizes if you want to use it for loops like in matlab

Map arrays with duplicate indexes?

Assume three arrays in numpy:
a = np.zeros(5)
b = np.array([3,3,3,0,0])
c = np.array([1,5,10,50,100])
b can now be used as an index for a and c. For example:
In [142]: c[b]
Out[142]: array([50, 50, 50, 1, 1])
Is there any way to add up the values connected to the duplicate indexes with this kind of slicing? With
a[b] = c
Only the last values are stored:
array([ 100., 0., 0., 10., 0.])
I would like something like this:
a[b] += c
which would give
array([ 150., 0., 0., 16., 0.])
I'm mapping very large vectors onto 2D matrices and would really like to avoid loops...

The += operator for NumPy arrays simply doesn't work the way you are hoping, and I'm not aware of a away of making it work that way. As a work-around I suggest using numpy.bincount():
>>> numpy.bincount(b, c)
array([ 150., 0., 0., 16.])
Just append zeros as needed.

You could do something like:
def sum_unique(label, weight):
order = np.lexsort(label.T)
label = label[order]
weight = weight[order]
unique = np.ones(len(label), 'bool')
unique[:-1] = (label[1:] != label[:-1]).any(-1)
totals = weight.cumsum()
totals = totals[unique]
totals[1:] = totals[1:] - totals[:-1]
return label[unique], totals
And use it like this:
In [110]: coord = np.random.randint(0, 3, (10, 2))
In [111]: coord
Out[111]:
array([[0, 2],
[0, 2],
[2, 1],
[1, 2],
[1, 0],
[0, 2],
[0, 0],
[2, 1],
[1, 2],
[1, 2]])
In [112]: weights = np.ones(10)
In [113]: uniq_coord, sums = sum_unique(coord, weights)
In [114]: uniq_coord
Out[114]:
array([[0, 0],
[1, 0],
[2, 1],
[0, 2],
[1, 2]])
In [115]: sums
Out[115]: array([ 1., 1., 2., 3., 3.])
In [116]: a = np.zeros((3,3))
In [117]: x, y = uniq_coord.T
In [118]: a[x, y] = sums
In [119]: a
Out[119]:
array([[ 1., 0., 3.],
[ 1., 0., 3.],
[ 0., 2., 0.]])
I just thought of this, it might be easier:
In [120]: flat_coord = np.ravel_multi_index(coord.T, (3,3))
In [121]: sums = np.bincount(flat_coord, weights)
In [122]: a = np.zeros((3,3))
In [123]: a.flat[:len(sums)] = sums
In [124]: a
Out[124]:
array([[ 1., 0., 3.],
[ 1., 0., 3.],
[ 0., 2., 0.]])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

puzzled on how to slice a numpy array - python

Related

Create a new sparse matrix from the operations of different rows of a given large sparse matrix in python

Specifying dtype=object for numpy.gradient

How to modify specific field in ndarray matrix

size of NumPy array

Map arrays with duplicate indexes?

Categories

Resources