I have a list, say
a = [3, 4, 5, 6, 7]
And i want to create a numpy array of zeros of that list's length.
If I do
b = np.zeros((len(a), 1))
I get
[[0, 0, 0, 0, 0]]
instead of
[0, 0, 0, 0, 0]
What is the best way to get the latter option?
If you don't want to have to care about shapes, use np.zeros_like:
np.zeros_like(a)
# array([0, 0, 0, 0, 0])
There's also the option of querying np.shape:
np.zeros(np.shape(a))
# array([0., 0., 0., 0., 0.])
Both options should work for ND lists as well.
You passed two-element tuple to zeros, so it produced 2D array, you can simply pass integer to zeros
a = [3, 4, 5, 6, 7]
b = np.zeros(len(a))
print(b) #prints [ 0. 0. 0. 0. 0.]
You can try this
np.zeros(len(a), dtype=np.int)
It will return
array([0, 0, 0, 0, 0])
I have this numpy array
matrix = np.array([[ 0.8, 0.2, 0.1],
[ 1. , 0. , 0. ],
[ 0. , 0. , 1. ]])
and I would like to filter to return, for each row of matrix the indices in decreasing value order.
For example, this would be
np.array([[0, 1, 2], [0, 1, 2], [2, 0, 1]])
I know I could use np.argsort, but this doesn't seem to be returning the right output. I tried changing the axis to different values, but that doesn't help either.
Probably the easiest way to get your desired output would be:
(-matrix).argsort(axis=1)
# array([[0, 1, 2],
# [0, 1, 2],
# [2, 0, 1]])
I think np.argsort does seem to do the trick, you just need to make sure to flip the matrix horizontally to make it decreasing order:
>>>matrix = np.array(
[[ 0.8, 0.2, 0.1],
[ 1. , 0. , 0. ],
[ 0. , 0. , 1. ]])
>>> np.fliplr(np.argsort(matrix))
array([[0, 1, 2],
[0, 2, 1],
[2, 1, 0]])
This should be the right output unless you have any requirements for sorting ties. Right now the flipping would make the rightmost tie the first index. If you wanted to match your exact output, where the leftmost index is first you could do a bit of juggling:
# Flip the array first and get the indices
>>> flipped = np.argsort(np.fliplr(matrix))
# Subtract the width of your array to reverse the indices
# Flip the array to be in descending order
>>> np.fliplr(abs(flipped - flipped.shape[1]))
array([[0, 1, 2],
[0, 1, 2],
[2, 0, 1]])
I have a matrix containing positive and negative numbers like this:
>>> source_matrix
array([[-4, -2, 0],
[-5, 0, 4],
[ 0, 6, 5]])
I'd like to had a copy of this matrix with inverted negatives:
>>> result
array([[-0.25, -0.5, 0],
[-0.2, 0, 4],
[ 0, 6, 5]])
Firstly, since your desired array is gonna contain float type you need to determine the array's dtype at creation time as float. The reason for that is because if you assign the float results of the inverted sub-array they'll automatically be casted to float. Secondly, you need to find the negative numbers in your array and then use a simple indexing in order to grab them and use np.true_divide() to perform the inversion.
In [25]: arr = np.array([[-4, -2, 0],
...: [-5, 0, 4],
...: [ 0, 6, 5]], dtype=np.float)
...:
...:
In [26]: mask = arr < 0
In [27]: arr[mask] = np.true_divide(1, arr[mask])
In [28]: arr
Out[28]:
array([[-0.25, -0.5 , 0. ],
[-0.2 , 0. , 4. ],
[ 0. , 6. , 5. ]])
You can also achieve this without masking, by using the where and out params of true_divide.
a = np.array([[-4, -2, 0],
[-5, 0, 4],
[ 0, 6, 5]], dtype=np.float)
np.true_divide(1, a, out=a, where=a<0)
Giving the result:
array([[-0.25, -0.5 , 0. ],
[-0.2 , 0. , 4. ],
[ 0. , 6. , 5. ]])
The where= parameter is passed an array of the same dimensions as your two inputs. Where this evaluates to True the divide is performed. Where it evaluates to False, the original input, passed in via out= is output into the result unchanged.
For a matrix, i want to find columns with all zeros and fill with 1s, and then normalize the matrix by column. I know how to do that with np.arrays
[[0 0 0 0 0]
[0 0 1 0 0]
[1 0 0 1 0]
[0 0 0 0 1]
[1 0 0 0 0]]
|
V
[[0 1 0 0 0]
[0 1 1 0 0]
[1 1 0 1 0]
[0 1 0 0 1]
[1 1 0 0 0]]
|
V
[[0 0.2 0 0 0]
[0 0.2 1 0 0]
[0.5 0.2 0 1 0]
[0 0.2 0 0 1]
[0.5 0.2 0 0 0]]
But how can I do the same thing when the matrix is in scipy.sparse.coo.coo_matrix form, without converting it back to np.arrays. how can I achieve the same thing?
This will be a lot easier with the lil format, and working with rows rather than columns:
In [1]: from scipy import sparse
In [2]: A=np.array([[0,0,0,0,0],[0,0,1,0,0],[1,0,0,1,0],[0,0,0,0,1],[1,0,0,0,0]])
In [3]: A
Out[3]:
array([[0, 0, 0, 0, 0],
[0, 0, 1, 0, 0],
[1, 0, 0, 1, 0],
[0, 0, 0, 0, 1],
[1, 0, 0, 0, 0]])
In [4]: At=A.T # switch to work with rows
In [5]: M=sparse.lil_matrix(At)
Now it is obvious which row is all zeros
In [6]: M.data
Out[6]: array([[1, 1], [], [1], [1], [1]], dtype=object)
In [7]: M.rows
Out[7]: array([[2, 4], [], [1], [2], [3]], dtype=object)
And lil format allows us to fill that row:
In [8]: M.data[1]=[1,1,1,1,1]
In [9]: M.rows[1]=[0,1,2,3,4]
In [10]: M.A
Out[10]:
array([[0, 0, 1, 0, 1],
[1, 1, 1, 1, 1],
[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0]], dtype=int32)
I could have also used M[1,:]=np.ones(5,int)
The coo format is great for creating the array from the data/row/col arrays, but doesn't implement indexing or math. It has to be transformed to csr for that. And csc for column oriented stuff.
The row that I filled isn't so obvious in the csr format:
In [14]: Mc=M.tocsr()
In [15]: Mc.data
Out[15]: array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)
In [16]: Mc.indices
Out[16]: array([2, 4, 0, 1, 2, 3, 4, 1, 2, 3], dtype=int32)
In [17]: Mc.indptr
Out[17]: array([ 0, 2, 7, 8, 9, 10], dtype=int32)
On the other hand normalizing is probably easier in this format.
In [18]: Mc.sum(axis=1)
Out[18]:
matrix([[2],
[5],
[1],
[1],
[1]], dtype=int32)
In [19]: Mc/Mc.sum(axis=1)
Out[19]:
matrix([[ 0. , 0. , 0.5, 0. , 0.5],
[ 0.2, 0.2, 0.2, 0.2, 0.2],
[ 0. , 1. , 0. , 0. , 0. ],
[ 0. , 0. , 1. , 0. , 0. ],
[ 0. , 0. , 0. , 1. , 0. ]])
Notice that it's converted the sparse matrix to a dense one. The sum is dense, and math involving sparse and dense usually produces dense.
I have to use a more round about calculation to preserve the sparse status:
In [27]: Mc.multiply(sparse.csr_matrix(1/Mc.sum(axis=1)))
Out[27]:
<5x5 sparse matrix of type '<class 'numpy.float64'>'
with 10 stored elements in Compressed Sparse Row format>
Here's a way of doing this with the csc format (on A)
In [40]: Ms=sparse.csc_matrix(A)
In [41]: Ms.sum(axis=0)
Out[41]: matrix([[2, 0, 1, 1, 1]], dtype=int32)
Use sum to find the all-zeros column. Obviously this could be wrong if the columns have negative values and happen to sum to 0. If that's a concern I can see making a copy of the matrix with all data values replaced by 1.
In [43]: Ms[:,1]=np.ones(5,int)[:,None]
/usr/lib/python3/dist-packages/scipy/sparse/compressed.py:730: SparseEfficiencyWarning: Changing the sparsity structure of a csc_matrix is expensive. lil_matrix is more efficient.
SparseEfficiencyWarning)
In [44]: Ms.A
Out[44]:
array([[0, 1, 0, 0, 0],
[0, 1, 1, 0, 0],
[1, 1, 0, 1, 0],
[0, 1, 0, 0, 1],
[1, 1, 0, 0, 0]])
The warning matters more if you do this sort of change repeatedly. Notice I have to adjust the dimension of the LHS array. Depending on the number of all-zero columns this action can change the sparsity of the matrix substantially.
==================
I could search the col of coo format for missing values with:
In [69]: Mo=sparse.coo_matrix(A)
In [70]: Mo.col
Out[70]: array([2, 0, 3, 4, 0], dtype=int32)
In [71]: Mo.col==np.arange(Mo.shape[1])[:,None]
Out[71]:
array([[False, True, False, False, True],
[False, False, False, False, False],
[ True, False, False, False, False],
[False, False, True, False, False],
[False, False, False, True, False]], dtype=bool)
In [72]: idx = np.nonzero(~(Mo.col==np.arange(Mo.shape[1])[:,None]).any(axis=1))[0]
In [73]: idx
Out[73]: array([1], dtype=int32)
I could then add a column of 1s at this idx with:
In [75]: N=Mo.shape[0]
In [76]: data = np.concatenate([Mo.data, np.ones(N,int)])
In [77]: row = np.concatenate([Mo.row, np.arange(N)])
In [78]: col = np.concatenate([Mo.col, np.ones(N,int)*idx])
In [79]: Mo1 = sparse.coo_matrix((data,(row, col)), shape=Mo.shape)
In [80]: Mo1.A
Out[80]:
array([[0, 1, 0, 0, 0],
[0, 1, 1, 0, 0],
[1, 1, 0, 1, 0],
[0, 1, 0, 0, 1],
[1, 1, 0, 0, 0]])
As written it works for just one column, but it could be generalized to several. I also created a new matrix rather than update Mo. But this in-place seems to work as well:
Mo.data,Mo.col,Mo.row = data,col,row
The normalization still requires csr conversion, though I think sparse can hide that for you.
In [87]: Mo1/Mo1.sum(axis=0)
Out[87]:
matrix([[ 0. , 0.2, 0. , 0. , 0. ],
[ 0. , 0.2, 1. , 0. , 0. ],
[ 0.5, 0.2, 0. , 1. , 0. ],
[ 0. , 0.2, 0. , 0. , 1. ],
[ 0.5, 0.2, 0. , 0. , 0. ]])
Even when I take the extra work of maintaining the sparse nature, I still get a csr matrix:
In [89]: Mo1.multiply(sparse.coo_matrix(1/Mo1.sum(axis=0)))
Out[89]:
<5x5 sparse matrix of type '<class 'numpy.float64'>'
with 10 stored elements in Compressed Sparse Row format>
See
Find all-zero columns in pandas sparse matrix
for more methods of finding the 0 columns. It turns out Mo.col==np.arange(Mo.shape[1])[:,None] is too slow with large Mo. A test using np.in1d is much better.
1 - np.in1d(np.arange(Mo.shape[1]),Mo.col)
I have an array n of count data, and I want to transform it into a matrix x in which each row contains a number of ones equal to the corresponding count number, padded by zeroes, e.g:
n = [0 1 3 0 1]
x = [[ 0. 0. 0.]
[ 1. 0. 0.]
[ 1. 1. 1.]
[ 0. 0. 0.]
[ 1. 0. 0.]]
My solution is the following, and is very slow. Is it possible to do better?
n = np.random.poisson(2,5)
max_n = max(n)
def f(y):
return np.concatenate((np.ones(y), np.zeros(max_n-y)))
x = np.vstack(map(f,n))
Here's one way to vectorize it:
>>> n = np.array([0,2,1,0,3])
>>> width = 4
>>> (np.arange(width) < n[:,None]).astype(int)
array([[0, 0, 0, 0],
[1, 1, 0, 0],
[1, 0, 0, 0],
[0, 0, 0, 0],
[1, 1, 1, 0]])
where if you liked, width could be max(n) or anything else you chose.
import numpy as np
n = np.array([0, 1, 3, 0, 1])
max_n = max(n)
np.vstack(n > i for i in range(max_n)).T.astype(int) # xrange(max_n) for python 2.x
Output:
array([[0, 0, 0],
[1, 0, 0],
[1, 1, 1],
[0, 0, 0],
[1, 0, 0]])