Printing numpy with different position in the column - python

I have following numpy array
import numpy as np
np.random.seed(20)
np.random.rand(20).reshape(5, 4)
array([[ 0.5881308 , 0.89771373, 0.89153073, 0.81583748],
[ 0.03588959, 0.69175758, 0.37868094, 0.51851095],
[ 0.65795147, 0.19385022, 0.2723164 , 0.71860593],
[ 0.78300361, 0.85032764, 0.77524489, 0.03666431],
[ 0.11669374, 0.7512807 , 0.23921822, 0.25480601]])
For each column I would like to slice it in positions:
position_for_slicing=[0, 3, 4, 4]
So I will get following array:
array([[ 0.5881308 , 0.85032764, 0.23921822, 0.81583748],
[ 0.03588959, 0.7512807 , 0 0],
[ 0.65795147, 0, 0 0],
[ 0.78300361, 0, 0 0],
[ 0.11669374, 0, 0 0]])
Is there fast way to do this ? I know I can use to do for loop for each column, but I was wondering if there is more elegant way to do this.

If "elegant" means "no loop" the following would qualify, but probably not under many other definitions (arr is your input array):
m, n = arr.shape
arrf = np.asanyarray(arr, order='F')
padded = np.r_[arrf, np.zeros_like(arrf)]
assert padded.flags['F_CONTIGUOUS']
expnd = np.lib.stride_tricks.as_strided(padded, (m, m+1, n), padded.strides[:1] + padded.strides)
expnd[:, [0,3,4,4], range(4)]
# array([[ 0.5881308 , 0.85032764, 0.23921822, 0.25480601],
# [ 0.03588959, 0.7512807 , 0. , 0. ],
# [ 0.65795147, 0. , 0. , 0. ],
# [ 0.78300361, 0. , 0. , 0. ],
# [ 0.11669374, 0. , 0. , 0. ]])
Please note that order='C' and then 'C_CONTIGUOUS' in the assertion also works. My hunch is that 'F' could be a bit faster because the indexing then operates on contiguous slices.

Related

numpy array with diagonal equal to zero and [x,y] =-[y,x]

I want to create a N x N array in numpy such that the diagonal is zero and [x,y] = -[y,x].
For example:
np.array([[[0,12, 2],
[-12, 0, 3],
[-2, -3, 0]],])
The values inside the array can be any float.
One way would be with scipy.spatial.distance.squareform -
from scipy.spatial.distance import squareform
def diag_inverted(n):
l = n*(n-1)//2
out = squareform(np.random.randn(l))
out[np.tri(len(out),k=-1,dtype=bool)] *= -1
return out
Another with array-assignment and masking -
def diag_inverted_v2(n):
l = n*(n-1)//2
m = np.tri(n, k=-1, dtype=bool)
out = np.zeros((n,n),dtype=float)
out[m] = np.random.randn(l)
out[m.T] = -out.T[m.T]
return out
Sample runs -
In [148]: diag_inverted(2)
Out[148]:
array([[ 0. , -0.97873798],
[ 0.97873798, 0. ]])
In [149]: diag_inverted(3)
Out[149]:
array([[ 0. , -2.2408932 , -1.86755799],
[ 2.2408932 , 0. , 0.97727788],
[ 1.86755799, -0.97727788, 0. ]])
In [150]: diag_inverted(4)
Out[150]:
array([[ 0. , -0.95008842, 0.15135721, -0.4105985 ],
[ 0.95008842, 0. , 0.10321885, -0.14404357],
[-0.15135721, -0.10321885, 0. , -1.45427351],
[ 0.4105985 , 0.14404357, 1.45427351, 0. ]])
Here you go:
size = 3
a = np.random.normal(0,1, (size, size))
ret = (a-a.transpose())/2
Output (random):
array([[ 0. , 0.11872306, 0.46792054],
[-0.11872306, 0. , 0.12530741],
[-0.46792054, -0.12530741, 0. ]])

How to extract an array of same dimension as the original array meeting a condition? [duplicate]

This question already has answers here:
Numpy array loss of dimension when masking
(5 answers)
Closed 3 years ago.
The question sounds very basic. But when I try to use where or boolean conditions on numpy arrays, it always returns a flattened array.
I have the NumPy array
P = array([[ 0.49530662, 0.07901 , -0.19012371],
[ 0.1421513 , 0.48607405, -0.20315014],
[ 0.76467375, 0.16479826, -0.56598029],
[ 0.53530718, -0.21166188, -0.08773241]])
I want to extract the array of only negative values, but when I try
P[P<0]
array([-0.19012371, -0.41421612, -0.20315014, -0.56598029, -0.21166188,
-0.08773241, -0.09241335])
P[np.where(P<0)]
array([-0.19012371, -0.41421612, -0.20315014, -0.56598029, -0.21166188,
-0.08773241, -0.09241335])
I get a flattened array. How can I extract the array of the form
array([[ 0, 0, -0.19012371],
[ 0 , 0, -0.20315014],
[ 0, 0, -0.56598029],
[ 0, -0.21166188, -0.08773241]])
I do not wish to create a temp array and then use something like Temp[Temp>=0] = 0
Since your need is:
I want to "extract" the array of only negative values
You can use numpy.where() with your condition (checking for negative values), which can preserve the dimension of the array, as in the below example:
In [61]: np.where(P<0, P, 0)
Out[61]:
array([[ 0. , 0. , -0.19012371],
[ 0. , 0. , -0.20315014],
[ 0. , 0. , -0.56598029],
[ 0. , -0.21166188, -0.08773241]])
where P is your input array.
Another idea could be to use numpy.zeros_like() for initializing a same shape array and numpy.where() to gather the indices at which our condition satisfies.
# initialize our result array with zeros
In [106]: non_positives = np.zeros_like(P)
# gather the indices where our condition is obeyed
In [107]: idxs = np.where(P < 0)
# copy the negative values to correct indices
In [108]: non_positives[idxs] = P[idxs]
In [109]: non_positives
Out[109]:
array([[ 0. , 0. , -0.19012371],
[ 0. , 0. , -0.20315014],
[ 0. , 0. , -0.56598029],
[ 0. , -0.21166188, -0.08773241]])
Yet another idea would be to simply use the barebones numpy.clip() API, which would return a new array, if we omit the out= kwarg.
In [22]: np.clip(P, -np.inf, 0) # P.clip(-np.inf, 0)
Out[22]:
array([[ 0. , 0. , -0.19012371],
[ 0. , 0. , -0.20315014],
[ 0. , 0. , -0.56598029],
[ 0. , -0.21166188, -0.08773241]])
This should work, essentially get the indexes of all elements which are above 0, and set them to 0, this will preserve the dimensions! I got the idea from here: Replace all elements of Python NumPy Array that are greater than some value
Also note that I have modified the original array, I haven't used a temp array here
import numpy as np
P = np.array([[ 0.49530662, 0.07901 , -0.19012371],
[ 0.1421513 , 0.48607405, -0.20315014],
[ 0.76467375, 0.16479826, -0.56598029],
[ 0.53530718, -0.21166188, -0.08773241]])
P[P >= 0] = 0
print(P)
The output will be
[[ 0. 0. -0.19012371]
[ 0. 0. -0.20315014]
[ 0. 0. -0.56598029]
[ 0. -0.21166188 -0.08773241]]
As noted below, this will modify the array, so we should use np.where(P<0, P 0) to preserve the original array as follows, thanks #kmario123 as follows
import numpy as np
P = np.array([[ 0.49530662, 0.07901 , -0.19012371],
[ 0.1421513 , 0.48607405, -0.20315014],
[ 0.76467375, 0.16479826, -0.56598029],
[ 0.53530718, -0.21166188, -0.08773241]])
print( np.where(P<0, P, 0))
print(P)
The output will be
[[ 0. 0. -0.19012371]
[ 0. 0. -0.20315014]
[ 0. 0. -0.56598029]
[ 0. -0.21166188 -0.08773241]]
[[ 0.49530662 0.07901 -0.19012371]
[ 0.1421513 0.48607405 -0.20315014]
[ 0.76467375 0.16479826 -0.56598029]
[ 0.53530718 -0.21166188 -0.08773241]]

Numpy get values from np.argmin indices [duplicate]

This question already has answers here:
How to take elements along a given axis, given by their indices?
(4 answers)
indexing a numpy array with indices from another array
(1 answer)
Closed 4 years ago.
Let's say I've d1, d2 and d3 as following. t is a variable where I've combined my arrays and m contains the indices of the smallest value.
>>> d1
array([[ 0.9850916 , 0.95004463, 1.35728604, 1.18554035],
[ 0.47624542, 0.45561795, 0.6231743 , 0.94746001],
[ 0.74008166, 0. , 1.59774065, 1.00423774],
[ 0.86173439, 0.70940862, 1.0601817 , 0.96112015],
[ 1.03413477, 0.64874991, 1.27488263, 0.80250053]])
>>> d2
array([[ 0.27301946, 0.38387185, 0.93215524, 0.98851404],
[ 0.17996978, 0. , 0.41283798, 0.15204035],
[ 0.10952115, 0.45561795, 0.5334015 , 0.75242805],
[ 0.4600214 , 0.74100962, 0.16743427, 0.36250385],
[ 0.60984208, 0.35161234, 0.44580535, 0.6713633 ]])
>>> d3
array([[ 0. , 0.19658541, 1.14605925, 1.18431945],
[ 0.10697428, 0.27301946, 0.45536417, 0.11922118],
[ 0.42153386, 0.9850916 , 0.28225364, 0.82765657],
[ 1.04940684, 1.63082272, 0.49987388, 0.38596938],
[ 0.21015723, 1.07007177, 0.22599987, 0.89288339]])
>>> t = np.array([d1, d2, d3])
>>> t
array([[[ 0.9850916 , 0.95004463, 1.35728604, 1.18554035],
[ 0.47624542, 0.45561795, 0.6231743 , 0.94746001],
[ 0.74008166, 0. , 1.59774065, 1.00423774],
[ 0.86173439, 0.70940862, 1.0601817 , 0.96112015],
[ 1.03413477, 0.64874991, 1.27488263, 0.80250053]],
[[ 0.27301946, 0.38387185, 0.93215524, 0.98851404],
[ 0.17996978, 0. , 0.41283798, 0.15204035],
[ 0.10952115, 0.45561795, 0.5334015 , 0.75242805],
[ 0.4600214 , 0.74100962, 0.16743427, 0.36250385],
[ 0.60984208, 0.35161234, 0.44580535, 0.6713633 ]],
[[ 0. , 0.19658541, 1.14605925, 1.18431945],
[ 0.10697428, 0.27301946, 0.45536417, 0.11922118],
[ 0.42153386, 0.9850916 , 0.28225364, 0.82765657],
[ 1.04940684, 1.63082272, 0.49987388, 0.38596938],
[ 0.21015723, 1.07007177, 0.22599987, 0.89288339]]])
>>> m = np.argmin(t, axis=0)
>>> m
array([[2, 2, 1, 1],
[2, 1, 1, 2],
[1, 0, 2, 1],
[1, 0, 1, 1],
[2, 1, 2, 1]])
From m and t, I want to calculate the actual values as following. How do I do this? ... preferably, the efficient way?
array([ [ 0. , 0.19658541, 0.93215524, 0.98851404],
[ 0.10697428, 0. , 0.41283798, 0.11922118],
[ 0.10952115, 0. , 0.28225364, 0.75242805],
[ 0.4600214 , 0.70940862, 0.16743427, 0.36250385],
[ 0.21015723, 0.35161234, 0.22599987, 0.6713633 ]])
If only the minimum is what you needed, you can use np.min(t, axis=0)
If you want to use customary indexing, you can use choose:
m.choose(t) # This will return the same thing.
It can also be written as
np.choose(m, t)
Which returns:
array([[0. , 0.19658541, 0.93215524, 0.98851404],
[0.10697428, 0. , 0.41283798, 0.11922118],
[0.10952115, 0. , 0.28225364, 0.75242805],
[0.4600214 , 0.70940862, 0.16743427, 0.36250385],
[0.21015723, 0.35161234, 0.22599987, 0.6713633 ]])

scipy sparse matrix division

I have been trying to divide a python scipy sparse matrix by a vector sum of its rows. Here is my code
sparse_mat = bsr_matrix((l_data, (l_row, l_col)), dtype=float)
sparse_mat = sparse_mat / (sparse_mat.sum(axis = 1)[:,None])
However, it throws an error no matter how I try it
sparse_mat = sparse_mat / (sparse_mat.sum(axis = 1)[:,None])
File "/usr/lib/python2.7/dist-packages/scipy/sparse/base.py", line 381, in __div__
return self.__truediv__(other)
File "/usr/lib/python2.7/dist-packages/scipy/sparse/compressed.py", line 427, in __truediv__
raise NotImplementedError
NotImplementedError
Anyone with an idea of where I am going wrong?
You can circumvent the problem by creating a sparse diagonal matrix from the reciprocals of your row sums and then multiplying it with your matrix. In the product the diagonal matrix goes left and your matrix goes right.
Example:
>>> a
array([[0, 9, 0, 0, 1, 0],
[2, 0, 5, 0, 0, 9],
[0, 2, 0, 0, 0, 0],
[2, 0, 0, 0, 0, 0],
[0, 9, 5, 3, 0, 7],
[1, 0, 0, 8, 9, 0]])
>>> b = sparse.bsr_matrix(a)
>>>
>>> c = sparse.diags(1/b.sum(axis=1).A.ravel())
>>> # on older scipy versions the offsets parameter (default 0)
... # is a required argument, thus
... # c = sparse.diags(1/b.sum(axis=1).A.ravel(), 0)
...
>>> a/a.sum(axis=1, keepdims=True)
array([[ 0. , 0.9 , 0. , 0. , 0.1 , 0. ],
[ 0.125 , 0. , 0.3125 , 0. , 0. , 0.5625 ],
[ 0. , 1. , 0. , 0. , 0. , 0. ],
[ 1. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0.375 , 0.20833333, 0.125 , 0. , 0.29166667],
[ 0.05555556, 0. , 0. , 0.44444444, 0.5 , 0. ]])
>>> (c # b).todense() # on Python < 3.5 replace c # b with c.dot(b)
matrix([[ 0. , 0.9 , 0. , 0. , 0.1 , 0. ],
[ 0.125 , 0. , 0.3125 , 0. , 0. , 0.5625 ],
[ 0. , 1. , 0. , 0. , 0. , 0. ],
[ 1. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0.375 , 0.20833333, 0.125 , 0. , 0.29166667],
[ 0.05555556, 0. , 0. , 0.44444444, 0.5 , 0. ]])
Something funny is going on. I have no problem performing the element division. I wonder if it's a Py2 issue. I'm using Py3.
In [1022]: A=sparse.bsr_matrix([[2,4],[1,2]])
In [1023]: A
Out[1023]:
<2x2 sparse matrix of type '<class 'numpy.int32'>'
with 4 stored elements (blocksize = 2x2) in Block Sparse Row format>
In [1024]: A.A
Out[1024]:
array([[2, 4],
[1, 2]], dtype=int32)
In [1025]: A.sum(axis=1)
Out[1025]:
matrix([[6],
[3]], dtype=int32)
In [1026]: A/A.sum(axis=1)
Out[1026]:
matrix([[ 0.33333333, 0.66666667],
[ 0.33333333, 0.66666667]])
or to try the other example:
In [1027]: b=sparse.bsr_matrix([[0, 9, 0, 0, 1, 0],
...: [2, 0, 5, 0, 0, 9],
...: [0, 2, 0, 0, 0, 0],
...: [2, 0, 0, 0, 0, 0],
...: [0, 9, 5, 3, 0, 7],
...: [1, 0, 0, 8, 9, 0]])
In [1028]: b
Out[1028]:
<6x6 sparse matrix of type '<class 'numpy.int32'>'
with 14 stored elements (blocksize = 1x1) in Block Sparse Row format>
In [1029]: b.sum(axis=1)
Out[1029]:
matrix([[10],
[16],
[ 2],
[ 2],
[24],
[18]], dtype=int32)
In [1030]: b/b.sum(axis=1)
Out[1030]:
matrix([[ 0. , 0.9 , 0. , 0. , 0.1 , 0. ],
[ 0.125 , 0. , 0.3125 , 0. , 0. , 0.5625 ],
....
[ 0.05555556, 0. , 0. , 0.44444444, 0.5 , 0. ]])
The result of this sparse/dense is also dense, where as the c*b (c is the sparse diagonal) is sparse.
In [1039]: c*b
Out[1039]:
<6x6 sparse matrix of type '<class 'numpy.float64'>'
with 14 stored elements in Compressed Sparse Row format>
The sparse sum is a dense matrix. It is 2d, so there's no need to expand it dimensions. In fact if I try that I get an error:
In [1031]: A/(A.sum(axis=1)[:,None])
....
ValueError: shape too large to be a matrix.
Per this message, to keep the matrix sparse, you access the data values and use the (nonzero) indices:
sums = np.asarray(A.sum(axis=1)).squeeze() # this is dense
A.data /= sums[A.nonzero()[0]]
If dividing by the nonzero row mean instead of the sum, one can
nnz = A.getnnz(axis=1) # this is also dense
means = sums / nnz
A.data /= means[A.nonzero()[0]]

Error in scipy sparse diags matrix construction

When using scipy.sparse.spdiags or scipy.sparse.diags I have noticed want I consider to be a bug in the routines eg
scipy.sparse.spdiags([1.1,1.2,1.3],1,4,4).toarray()
returns
array([[ 0. , 1.2, 0. , 0. ],
[ 0. , 0. , 1.3, 0. ],
[ 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. ]])
That is for positive diagonals it drops the first k data. One might argue that there is some grand programming reason for this and that I just need to pad with zeros. OK annoying as that may be, one can use scipy.sparse.diags which gives the correct result. However this routine has a bug that can't be worked around
scipy.sparse.diags([1.1,1.2],0,(4,2)).toarray()
gives
array([[ 1.1, 0. ],
[ 0. , 1.2],
[ 0. , 0. ],
[ 0. , 0. ]])
nice, and
scipy.sparse.diags([1.1,1.2],-2,(4,2)).toarray()
gives
array([[ 0. , 0. ],
[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.2]])
but
scipy.sparse.diags([1.1,1.2],-1,(4,2)).toarray()
gives an error saying ValueError: Diagonal length (index 0: 2 at offset -1) does not agree with matrix size (4, 2). Obviously the answer is
array([[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.2],
[ 0. , 0. ]])
and for extra random behaviour we have
scipy.sparse.diags([1.1],-1,(4,2)).toarray()
giving
array([[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.1],
[ 0. , 0. ]])
Anyone know if there is a function for constructing diagonal sparse matrices that actually works?
Executive summary: spdiags works correctly, even if the matrix input isn't the most intuitive. diags has a bug that affects some offsets in rectangular matrices. There is a bug fix on scipy github.
The example for spdiags is:
>>> data = array([[1,2,3,4],[1,2,3,4],[1,2,3,4]])
>>> diags = array([0,-1,2])
>>> spdiags(data, diags, 4, 4).todense()
matrix([[1, 0, 3, 0],
[1, 2, 0, 4],
[0, 2, 3, 0],
[0, 0, 3, 4]])
Note that the 3rd column of data always appears in the 3rd column of the sparse. The other columns also line up. But they are omitted where they 'fall off the edge'.
The input to this function is a matrix, while the input to diags is a ragged list. The diagonals of the sparse matrix all have different numbers of values. So the specification has to accomodate this in one or other. spdiags does this by ignoring some values, diags by taking a list input.
The sparse.diags([1.1,1.2],-1,(4,2)) error is puzzling.
the spdiags equivalent does work:
In [421]: sparse.spdiags([[1.1,1.2]],-1,4,2).A
Out[421]:
array([[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.2],
[ 0. , 0. ]])
The error is raised in this block of code:
for j, diagonal in enumerate(diagonals):
offset = offsets[j]
k = max(0, offset)
length = min(m + offset, n - offset)
if length <= 0:
raise ValueError("Offset %d (index %d) out of bounds" % (offset, j))
try:
data_arr[j, k:k+length] = diagonal
except ValueError:
if len(diagonal) != length and len(diagonal) != 1:
raise ValueError(
"Diagonal length (index %d: %d at offset %d) does not "
"agree with matrix size (%d, %d)." % (
j, len(diagonal), offset, m, n))
raise
The actual matrix constructor in the diags is:
dia_matrix((data_arr, offsets), shape=(m, n))
This is the same constructor that spdiags uses, but without any manipulation.
In [434]: sparse.dia_matrix(([[1.1,1.2]],-1),shape=(4,2)).A
Out[434]:
array([[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.2],
[ 0. , 0. ]])
In dia format, the inputs are stored exactly as given by spdiags (complete with that matrix with extra values):
In [436]: M.data
Out[436]: array([[ 1.1, 1.2]])
In [437]: M.offsets
Out[437]: array([-1], dtype=int32)
As #user2357112 points out, length = min(m + offset, n - offset is wrong, producing 3 in the test case. Changing it to length = min(m + k, n - k) makes all cases for this (4,2) matrix work. But it fails with the transpose: diags([1.1,1.2], 1, (2, 4))
The correction, as of Oct 5, for this issue is:
https://github.com/pv/scipy-work/commit/529cbde47121c8ed87f74fa6445c05d71353eb6c
length = min(m + offset, n - offset, min(m,n))
With this fix, diags([1.1,1.2], 1, (2, 4)) works.

Categories