Understanding _r from numpy - python

a = np.zeros([4, 4])
b = np.ones([4, 4])
#vertical stacking(ROW WISE)
print(np.r_[a,b])
print(np.r_[[1,2,3],0,0,[4,5,6]])
# output is
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
[[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]]
[1 2 3 0 0 4 5 6]
But here np._r doesn't perform vertical stacking, but does horizontal stacking. How does np._r work? Would be grateful for any help

In [324]: a = np.zeros([4, 4],int)
...: b = np.ones([4, 4],int)
In [325]: np.r_[a,b]
Out[325]:
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]])
This is a row stack; same as vstack. And since the arrays are already 2d, concatenate is enough:
In [326]: np.concatenate((a,b), axis=0)
Out[326]:
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]])
With the mix of 1d and scalars, r_ is the same as hstack:
In [327]: np.r_[[1,2,3],0,0,[4,5,6]]
Out[327]: array([1, 2, 3, 0, 0, 4, 5, 6])
In [328]: np.hstack([[1,2,3],0,0,[4,5,6]])
Out[328]: array([1, 2, 3, 0, 0, 4, 5, 6])
In [329]: np.concatenate([[1,2,3],0,0,[4,5,6]],axis=0)
...
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 1 has 0 dimension(s)
concatenate fails because of the scalars. The other methods first convert those to 1d arrays.
In both case, r_ does
Translates slice objects to concatenation along the first axis.
r_ is actually an instance of a special class, with its own __getitem__ method, that allows us to use [] instead of (). It also means it can take slices as inputs (which are actually rendered as np.arange or np.linspace).
r_ takes an optional initial string argument, which if consisting of 3 numbers, can control the concatenate axis, and control how inputs are adjusted to matching dimensions. See the docs for details, and np.lib.index_tricks.py file for more details.
In order of importance I think the concatenate functions are:
np.concatenate # base
np.vstack # easy join 1d arrays into 2d
np.stack # generalize np.array
np.hstack # saves specifying axis
np.r_
np.c_
r_ and c_ can do neat things when mixing arrays of different shapes, but it all boils down to using concatanate correctly.

Related

How to create a numpy array of zeros of a list's length?

I have a list, say
a = [3, 4, 5, 6, 7]
And i want to create a numpy array of zeros of that list's length.
If I do
b = np.zeros((len(a), 1))
I get
[[0, 0, 0, 0, 0]]
instead of
[0, 0, 0, 0, 0]
What is the best way to get the latter option?
If you don't want to have to care about shapes, use np.zeros_like:
np.zeros_like(a)
# array([0, 0, 0, 0, 0])
There's also the option of querying np.shape:
np.zeros(np.shape(a))
# array([0., 0., 0., 0., 0.])
Both options should work for ND lists as well.
You passed two-element tuple to zeros, so it produced 2D array, you can simply pass integer to zeros
a = [3, 4, 5, 6, 7]
b = np.zeros(len(a))
print(b) #prints [ 0. 0. 0. 0. 0.]
You can try this
np.zeros(len(a), dtype=np.int)
It will return
array([0, 0, 0, 0, 0])

How to efficiently order a numpy matrix

I have this numpy array
matrix = np.array([[ 0.8, 0.2, 0.1],
[ 1. , 0. , 0. ],
[ 0. , 0. , 1. ]])
and I would like to filter to return, for each row of matrix the indices in decreasing value order.
For example, this would be
np.array([[0, 1, 2], [0, 1, 2], [2, 0, 1]])
I know I could use np.argsort, but this doesn't seem to be returning the right output. I tried changing the axis to different values, but that doesn't help either.
Probably the easiest way to get your desired output would be:
(-matrix).argsort(axis=1)
# array([[0, 1, 2],
# [0, 1, 2],
# [2, 0, 1]])
I think np.argsort does seem to do the trick, you just need to make sure to flip the matrix horizontally to make it decreasing order:
>>>matrix = np.array(
[[ 0.8, 0.2, 0.1],
[ 1. , 0. , 0. ],
[ 0. , 0. , 1. ]])
>>> np.fliplr(np.argsort(matrix))
array([[0, 1, 2],
[0, 2, 1],
[2, 1, 0]])
This should be the right output unless you have any requirements for sorting ties. Right now the flipping would make the rightmost tie the first index. If you wanted to match your exact output, where the leftmost index is first you could do a bit of juggling:
# Flip the array first and get the indices
>>> flipped = np.argsort(np.fliplr(matrix))
# Subtract the width of your array to reverse the indices
# Flip the array to be in descending order
>>> np.fliplr(abs(flipped - flipped.shape[1]))
array([[0, 1, 2],
[0, 1, 2],
[2, 0, 1]])

How to invert only negative elements in numpy matrix?

I have a matrix containing positive and negative numbers like this:
>>> source_matrix
array([[-4, -2, 0],
[-5, 0, 4],
[ 0, 6, 5]])
I'd like to had a copy of this matrix with inverted negatives:
>>> result
array([[-0.25, -0.5, 0],
[-0.2, 0, 4],
[ 0, 6, 5]])
Firstly, since your desired array is gonna contain float type you need to determine the array's dtype at creation time as float. The reason for that is because if you assign the float results of the inverted sub-array they'll automatically be casted to float. Secondly, you need to find the negative numbers in your array and then use a simple indexing in order to grab them and use np.true_divide() to perform the inversion.
In [25]: arr = np.array([[-4, -2, 0],
...: [-5, 0, 4],
...: [ 0, 6, 5]], dtype=np.float)
...:
...:
In [26]: mask = arr < 0
In [27]: arr[mask] = np.true_divide(1, arr[mask])
In [28]: arr
Out[28]:
array([[-0.25, -0.5 , 0. ],
[-0.2 , 0. , 4. ],
[ 0. , 6. , 5. ]])
You can also achieve this without masking, by using the where and out params of true_divide.
a = np.array([[-4, -2, 0],
[-5, 0, 4],
[ 0, 6, 5]], dtype=np.float)
np.true_divide(1, a, out=a, where=a<0)
Giving the result:
array([[-0.25, -0.5 , 0. ],
[-0.2 , 0. , 4. ],
[ 0. , 6. , 5. ]])
The where= parameter is passed an array of the same dimensions as your two inputs. Where this evaluates to True the divide is performed. Where it evaluates to False, the original input, passed in via out= is output into the result unchanged.

scipy.sparse.coo_matrix how to fast find all zeros column, fill with 1 and normalize

For a matrix, i want to find columns with all zeros and fill with 1s, and then normalize the matrix by column. I know how to do that with np.arrays
[[0 0 0 0 0]
[0 0 1 0 0]
[1 0 0 1 0]
[0 0 0 0 1]
[1 0 0 0 0]]
|
V
[[0 1 0 0 0]
[0 1 1 0 0]
[1 1 0 1 0]
[0 1 0 0 1]
[1 1 0 0 0]]
|
V
[[0 0.2 0 0 0]
[0 0.2 1 0 0]
[0.5 0.2 0 1 0]
[0 0.2 0 0 1]
[0.5 0.2 0 0 0]]
But how can I do the same thing when the matrix is in scipy.sparse.coo.coo_matrix form, without converting it back to np.arrays. how can I achieve the same thing?
This will be a lot easier with the lil format, and working with rows rather than columns:
In [1]: from scipy import sparse
In [2]: A=np.array([[0,0,0,0,0],[0,0,1,0,0],[1,0,0,1,0],[0,0,0,0,1],[1,0,0,0,0]])
In [3]: A
Out[3]:
array([[0, 0, 0, 0, 0],
[0, 0, 1, 0, 0],
[1, 0, 0, 1, 0],
[0, 0, 0, 0, 1],
[1, 0, 0, 0, 0]])
In [4]: At=A.T # switch to work with rows
In [5]: M=sparse.lil_matrix(At)
Now it is obvious which row is all zeros
In [6]: M.data
Out[6]: array([[1, 1], [], [1], [1], [1]], dtype=object)
In [7]: M.rows
Out[7]: array([[2, 4], [], [1], [2], [3]], dtype=object)
And lil format allows us to fill that row:
In [8]: M.data[1]=[1,1,1,1,1]
In [9]: M.rows[1]=[0,1,2,3,4]
In [10]: M.A
Out[10]:
array([[0, 0, 1, 0, 1],
[1, 1, 1, 1, 1],
[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0]], dtype=int32)
I could have also used M[1,:]=np.ones(5,int)
The coo format is great for creating the array from the data/row/col arrays, but doesn't implement indexing or math. It has to be transformed to csr for that. And csc for column oriented stuff.
The row that I filled isn't so obvious in the csr format:
In [14]: Mc=M.tocsr()
In [15]: Mc.data
Out[15]: array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)
In [16]: Mc.indices
Out[16]: array([2, 4, 0, 1, 2, 3, 4, 1, 2, 3], dtype=int32)
In [17]: Mc.indptr
Out[17]: array([ 0, 2, 7, 8, 9, 10], dtype=int32)
On the other hand normalizing is probably easier in this format.
In [18]: Mc.sum(axis=1)
Out[18]:
matrix([[2],
[5],
[1],
[1],
[1]], dtype=int32)
In [19]: Mc/Mc.sum(axis=1)
Out[19]:
matrix([[ 0. , 0. , 0.5, 0. , 0.5],
[ 0.2, 0.2, 0.2, 0.2, 0.2],
[ 0. , 1. , 0. , 0. , 0. ],
[ 0. , 0. , 1. , 0. , 0. ],
[ 0. , 0. , 0. , 1. , 0. ]])
Notice that it's converted the sparse matrix to a dense one. The sum is dense, and math involving sparse and dense usually produces dense.
I have to use a more round about calculation to preserve the sparse status:
In [27]: Mc.multiply(sparse.csr_matrix(1/Mc.sum(axis=1)))
Out[27]:
<5x5 sparse matrix of type '<class 'numpy.float64'>'
with 10 stored elements in Compressed Sparse Row format>
Here's a way of doing this with the csc format (on A)
In [40]: Ms=sparse.csc_matrix(A)
In [41]: Ms.sum(axis=0)
Out[41]: matrix([[2, 0, 1, 1, 1]], dtype=int32)
Use sum to find the all-zeros column. Obviously this could be wrong if the columns have negative values and happen to sum to 0. If that's a concern I can see making a copy of the matrix with all data values replaced by 1.
In [43]: Ms[:,1]=np.ones(5,int)[:,None]
/usr/lib/python3/dist-packages/scipy/sparse/compressed.py:730: SparseEfficiencyWarning: Changing the sparsity structure of a csc_matrix is expensive. lil_matrix is more efficient.
SparseEfficiencyWarning)
In [44]: Ms.A
Out[44]:
array([[0, 1, 0, 0, 0],
[0, 1, 1, 0, 0],
[1, 1, 0, 1, 0],
[0, 1, 0, 0, 1],
[1, 1, 0, 0, 0]])
The warning matters more if you do this sort of change repeatedly. Notice I have to adjust the dimension of the LHS array. Depending on the number of all-zero columns this action can change the sparsity of the matrix substantially.
==================
I could search the col of coo format for missing values with:
In [69]: Mo=sparse.coo_matrix(A)
In [70]: Mo.col
Out[70]: array([2, 0, 3, 4, 0], dtype=int32)
In [71]: Mo.col==np.arange(Mo.shape[1])[:,None]
Out[71]:
array([[False, True, False, False, True],
[False, False, False, False, False],
[ True, False, False, False, False],
[False, False, True, False, False],
[False, False, False, True, False]], dtype=bool)
In [72]: idx = np.nonzero(~(Mo.col==np.arange(Mo.shape[1])[:,None]).any(axis=1))[0]
In [73]: idx
Out[73]: array([1], dtype=int32)
I could then add a column of 1s at this idx with:
In [75]: N=Mo.shape[0]
In [76]: data = np.concatenate([Mo.data, np.ones(N,int)])
In [77]: row = np.concatenate([Mo.row, np.arange(N)])
In [78]: col = np.concatenate([Mo.col, np.ones(N,int)*idx])
In [79]: Mo1 = sparse.coo_matrix((data,(row, col)), shape=Mo.shape)
In [80]: Mo1.A
Out[80]:
array([[0, 1, 0, 0, 0],
[0, 1, 1, 0, 0],
[1, 1, 0, 1, 0],
[0, 1, 0, 0, 1],
[1, 1, 0, 0, 0]])
As written it works for just one column, but it could be generalized to several. I also created a new matrix rather than update Mo. But this in-place seems to work as well:
Mo.data,Mo.col,Mo.row = data,col,row
The normalization still requires csr conversion, though I think sparse can hide that for you.
In [87]: Mo1/Mo1.sum(axis=0)
Out[87]:
matrix([[ 0. , 0.2, 0. , 0. , 0. ],
[ 0. , 0.2, 1. , 0. , 0. ],
[ 0.5, 0.2, 0. , 1. , 0. ],
[ 0. , 0.2, 0. , 0. , 1. ],
[ 0.5, 0.2, 0. , 0. , 0. ]])
Even when I take the extra work of maintaining the sparse nature, I still get a csr matrix:
In [89]: Mo1.multiply(sparse.coo_matrix(1/Mo1.sum(axis=0)))
Out[89]:
<5x5 sparse matrix of type '<class 'numpy.float64'>'
with 10 stored elements in Compressed Sparse Row format>
See
Find all-zero columns in pandas sparse matrix
for more methods of finding the 0 columns. It turns out Mo.col==np.arange(Mo.shape[1])[:,None] is too slow with large Mo. A test using np.in1d is much better.
1 - np.in1d(np.arange(Mo.shape[1]),Mo.col)

Transform an array of count data into a matrix of ones and zeroes

I have an array n of count data, and I want to transform it into a matrix x in which each row contains a number of ones equal to the corresponding count number, padded by zeroes, e.g:
n = [0 1 3 0 1]
x = [[ 0. 0. 0.]
[ 1. 0. 0.]
[ 1. 1. 1.]
[ 0. 0. 0.]
[ 1. 0. 0.]]
My solution is the following, and is very slow. Is it possible to do better?
n = np.random.poisson(2,5)
max_n = max(n)
def f(y):
return np.concatenate((np.ones(y), np.zeros(max_n-y)))
x = np.vstack(map(f,n))
Here's one way to vectorize it:
>>> n = np.array([0,2,1,0,3])
>>> width = 4
>>> (np.arange(width) < n[:,None]).astype(int)
array([[0, 0, 0, 0],
[1, 1, 0, 0],
[1, 0, 0, 0],
[0, 0, 0, 0],
[1, 1, 1, 0]])
where if you liked, width could be max(n) or anything else you chose.
import numpy as np
n = np.array([0, 1, 3, 0, 1])
max_n = max(n)
np.vstack(n > i for i in range(max_n)).T.astype(int) # xrange(max_n) for python 2.x
Output:
array([[0, 0, 0],
[1, 0, 0],
[1, 1, 1],
[0, 0, 0],
[1, 0, 0]])

Categories