Where is this \n coming from in my arrays (Python)?

Where is this \n coming from in my arrays (Python)? - python

I'm trying to create a text string for the following numpy array:
A = array([0, 0, 0, 0, 0.64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0])
I can do this easily with this line of code:
text = f'{A}'
The problem I'm having is that whenever I use this f'{}' to create a string from an array, it outputs the same array, but with a \n after some characters:
text
'[0. 0. 0. 0.64 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n 1. 0. 0. 0. ]'
I'm trying to use this array in the title of a plot, so I don't want the array to be text wrapping onto a new line because it makes it confusing to read/see.
I've tried using rstrip('\n') on text but it doesn't remove the '\n'. Does anyone have any idea what's going on? Why is this \n popping up in the string array?

You don't need to declare array() to accomplish what you are trying to do:
A = [0, 0, 0, 0, 0.64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
print(A)
[0, 0, 0, 0, 0.64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
print(f'{A}')
[0, 0, 0, 0, 0.64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
A
[0, 0, 0, 0, 0.64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
It seems like you are creating a NumPy N-dimensional array there and then converting it to a string, so it is printing the string representation of that array when you call print(). Unless you specifically need a NumPy array, you can do it just like I have above, or if you need to:
from numpy import array
A = [0, 0, 0, 0, 0.64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
B = array(A)
print(B)
[0. 0. 0. 0. 0.64 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 1. 0. 0. 0. ]
print(f'{B}')
[0. 0. 0. 0. 0.64 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 1. 0. 0. 0. ]
B
array([0. , 0. , 0. , 0. , 0.64, 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 1. , 0. , 0. , 0. ])
If you absolutely have to render a NumPy array as a string, then you can do something like this:
text = f'{A}'
text = text.replace("\n","")
Or as Ramón Márquez also mentioned, you can simply increase the printoptions line width:
numpy.set_printoptions(linewidth=96)
Documentation on NumPy arrays: https://machinelearningmastery.com/gentle-introduction-n-dimensional-arrays-python-numpy/
Documentation on NumPy print options: https://numpy.org/doc/1.18/reference/generated/numpy.printoptions.html

It has to do with the way numpy can be configured to print out arrays.
If you set the linewidth to 96 —the length of str(A) without any \n plus 1—, it won't insert line breaks:
numpy.set_printoptions(linewidth=96)

Related

Truncating a 2D array for a given tolerance [Python]

An old question on Singular Value Decomposition lead me to ask this question:
How could I truncate a 2-Dimensional array, to a number of columns dictated by a certain tolerance?
Specifically, please consider the following code snippet, which defines an accepted tolerance of 1e-4 and applies Singular Value Decomposition to a matrix 'A'.
#Python
tol=1e-4
U,Sa,V=np.linalg.svd(A)
S=np.diag(Sa)
The resulting singular value diagonal matrix 'S' holds non-negative singular values in decreasing order of magnitude.
What I want to obtain is a truncated 'S' matrix, in a way that the columns of the matrix holding singular values lower than 1e-4 would be removed. Then, apply this truncation to the matrix 'U'.
Is there a simple way of doing this? I have been looking around, and found some solutions to the problem for Matlab, but didn't find anything similar for Python.
For Matlab, the code would look something like:
%Matlab
tol=1e-4
mask=any(Sigma>=tol,2);
sigRB=Sigma(:,mask);
mask2=any(U>=tol,2);
B=U(:,mask);
Thanks in advance. I hope my post was not too messy to understand.

I am not sure if I understand you correctly. If my solution is not what you ask for, please consider adding an example to your question.
The following code drops all columns from array s that consist only of values smaller than tol.
s = np.array([
[1, 0, 0, 0, 0, 0],
[0, .9, 0, 0, 0, 0],
[0, 0, .5, 0, 0, 0],
[0, 0, 0, .4, 0, 0],
[0, 0, 0, 0, .3, 0],
[0, 0, 0, 0, 0, .2]
])
print(s)
tol = .4
ind = np.argwhere(s.max(axis=1) < tol)
s = np.delete(s, ind, 1)
print(s)
Output:
[[1. 0. 0. 0. 0. 0. ]
[0. 0.9 0. 0. 0. 0. ]
[0. 0. 0.5 0. 0. 0. ]
[0. 0. 0. 0.4 0. 0. ]
[0. 0. 0. 0. 0.3 0. ]
[0. 0. 0. 0. 0. 0.2]]
[[1. 0. 0. 0. ]
[0. 0.9 0. 0. ]
[0. 0. 0.5 0. ]
[0. 0. 0. 0.4]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]]
I am applying max to axis 1 and then using np.argwhere to get the indices of the columns where the max value is smaller than tol.
Edit: In order to truncate the columns of matrix 'U', so it coincides in size with the reduced matrix 'S', the following code works:
k = len(S[0])
Ured = U[:,0:k]
Uredsize = np.shape(Ured) # To check it has worked
print(Uredsize)

how to generate a modified version of identity matrix in python

I want to generate a modified version of the identity matrix, call it C, such that Cii is zero until some index i, the rest is still 1.
I can use brute force to set Cii to 0, but I think that is not good.
Is there any efficient functions I can use, this is hard to search.
Example below:
the original identity matrix for 3 * 3 is
1 0 0
0 1 0
0 0 1
, I want to change this into:
0 0 0
0 1 0
0 0 1
so the i is 0 in this case, want to change Ckk, k goes from [0, i] to 0.

np.diag makes a 2d array from a 1d diagonal:
In [97]: np.diag((np.arange(6)>2).astype(int))
Out[97]:
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1]])
basically the same as PPanzer's, but generating the diagonal a different way. Similar speed.

Here is one possibility:
N = 5
k = 2
np.diag(np.bincount([k],None,N).cumsum())
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 0, 1]])
Update: fast solution:
out = np.zeros((N,N))
out.reshape(-1)[(N+1)*k::N+1] = 1

You can build an NxN identity matrix and assign zero to the top left KxK corner:
N,K = 10,3
im = np.identity(N)
im[:K,:K] = 0
print(im)
output:
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]
40% faster than hpaulj's but not as fast at Paul Panzer's fast solution (which is 3x faster than this)

Quickly fill large Numpy matrix from Pandas DataFrame

I have DataFrame df with info of x-axes, y-axes, and values to fill numpy matrix mat.
Example of smaller df:
y x x x x value value value value
1 6 3 6 4 100 10 300 15
1 6 2 8 7 50 200 35 70
5 7 5 4 6 2 50 40 400
7 5 3 2 1 105 80 35 44
I want to fill mat = np.zeros(shape=(10,10)) by each y is row index, x is column index with the value at the same position as x in value block. Such as
col=1, row=6, value=100 ###
col=1, row=3, value=10
col=1, row=6, value=300 ###
col=1, row=4, value=10
col=1, row=6, value=50 ###
If more than one value goes into that position (like ###), do average. Is there any ways to go direct from Pandas to matrix (or other quick way)?
What I can do now is using np.ravel of selected column in dataframe first to make 1D-arrays and fill from those arrays but it is slow and redundant a lot.

Construct row and column indices and perform slice assignment.
val = df.values
j = val[:, 0].repeat(4)
i = val[:, 1: 5].ravel()
v = val[:, 5:].ravel()
mat = np.zeros(shape=(10,10), dtype=int)
mat[i, j] = v
mat
array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 44, 0, 0],
[ 0, 200, 0, 0, 0, 0, 0, 35, 0, 0],
[ 0, 10, 0, 0, 0, 0, 0, 80, 0, 0],
[ 0, 15, 0, 0, 0, 40, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 50, 0, 105, 0, 0],
[ 0, 50, 0, 0, 0, 400, 0, 0, 0, 0],
[ 0, 70, 0, 0, 0, 2, 0, 0, 0, 0],
[ 0, 35, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
For averages
val = df.values
j = val[:, 0].repeat(4)
i = val[:, 1: 5].ravel()
v = val[:, 5:].ravel()
sums = np.bincount(i * 10 + j, v, 100)
cnts = np.bincount(i * 10 + j, minlength=100)
mask = cnts > 0
sums[mask] /= cnts[mask]
print(sums.reshape(10, 10))
[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 44. 0. 0.]
[ 0. 200. 0. 0. 0. 0. 0. 35. 0. 0.]
[ 0. 10. 0. 0. 0. 0. 0. 80. 0. 0.]
[ 0. 15. 0. 0. 0. 40. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 50. 0. 105. 0. 0.]
[ 0. 150. 0. 0. 0. 400. 0. 0. 0. 0.]
[ 0. 70. 0. 0. 0. 2. 0. 0. 0. 0.]
[ 0. 35. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]

scipy sparse matrix division

I have been trying to divide a python scipy sparse matrix by a vector sum of its rows. Here is my code
sparse_mat = bsr_matrix((l_data, (l_row, l_col)), dtype=float)
sparse_mat = sparse_mat / (sparse_mat.sum(axis = 1)[:,None])
However, it throws an error no matter how I try it
sparse_mat = sparse_mat / (sparse_mat.sum(axis = 1)[:,None])
File "/usr/lib/python2.7/dist-packages/scipy/sparse/base.py", line 381, in __div__
return self.__truediv__(other)
File "/usr/lib/python2.7/dist-packages/scipy/sparse/compressed.py", line 427, in __truediv__
raise NotImplementedError
NotImplementedError
Anyone with an idea of where I am going wrong?

You can circumvent the problem by creating a sparse diagonal matrix from the reciprocals of your row sums and then multiplying it with your matrix. In the product the diagonal matrix goes left and your matrix goes right.
Example:
>>> a
array([[0, 9, 0, 0, 1, 0],
[2, 0, 5, 0, 0, 9],
[0, 2, 0, 0, 0, 0],
[2, 0, 0, 0, 0, 0],
[0, 9, 5, 3, 0, 7],
[1, 0, 0, 8, 9, 0]])
>>> b = sparse.bsr_matrix(a)
>>>
>>> c = sparse.diags(1/b.sum(axis=1).A.ravel())
>>> # on older scipy versions the offsets parameter (default 0)
... # is a required argument, thus
... # c = sparse.diags(1/b.sum(axis=1).A.ravel(), 0)
...
>>> a/a.sum(axis=1, keepdims=True)
array([[ 0. , 0.9 , 0. , 0. , 0.1 , 0. ],
[ 0.125 , 0. , 0.3125 , 0. , 0. , 0.5625 ],
[ 0. , 1. , 0. , 0. , 0. , 0. ],
[ 1. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0.375 , 0.20833333, 0.125 , 0. , 0.29166667],
[ 0.05555556, 0. , 0. , 0.44444444, 0.5 , 0. ]])
>>> (c # b).todense() # on Python < 3.5 replace c # b with c.dot(b)
matrix([[ 0. , 0.9 , 0. , 0. , 0.1 , 0. ],
[ 0.125 , 0. , 0.3125 , 0. , 0. , 0.5625 ],
[ 0. , 1. , 0. , 0. , 0. , 0. ],
[ 1. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0.375 , 0.20833333, 0.125 , 0. , 0.29166667],
[ 0.05555556, 0. , 0. , 0.44444444, 0.5 , 0. ]])

Something funny is going on. I have no problem performing the element division. I wonder if it's a Py2 issue. I'm using Py3.
In [1022]: A=sparse.bsr_matrix([[2,4],[1,2]])
In [1023]: A
Out[1023]:
<2x2 sparse matrix of type '<class 'numpy.int32'>'
with 4 stored elements (blocksize = 2x2) in Block Sparse Row format>
In [1024]: A.A
Out[1024]:
array([[2, 4],
[1, 2]], dtype=int32)
In [1025]: A.sum(axis=1)
Out[1025]:
matrix([[6],
[3]], dtype=int32)
In [1026]: A/A.sum(axis=1)
Out[1026]:
matrix([[ 0.33333333, 0.66666667],
[ 0.33333333, 0.66666667]])
or to try the other example:
In [1027]: b=sparse.bsr_matrix([[0, 9, 0, 0, 1, 0],
...: [2, 0, 5, 0, 0, 9],
...: [0, 2, 0, 0, 0, 0],
...: [2, 0, 0, 0, 0, 0],
...: [0, 9, 5, 3, 0, 7],
...: [1, 0, 0, 8, 9, 0]])
In [1028]: b
Out[1028]:
<6x6 sparse matrix of type '<class 'numpy.int32'>'
with 14 stored elements (blocksize = 1x1) in Block Sparse Row format>
In [1029]: b.sum(axis=1)
Out[1029]:
matrix([[10],
[16],
[ 2],
[ 2],
[24],
[18]], dtype=int32)
In [1030]: b/b.sum(axis=1)
Out[1030]:
matrix([[ 0. , 0.9 , 0. , 0. , 0.1 , 0. ],
[ 0.125 , 0. , 0.3125 , 0. , 0. , 0.5625 ],
....
[ 0.05555556, 0. , 0. , 0.44444444, 0.5 , 0. ]])
The result of this sparse/dense is also dense, where as the c*b (c is the sparse diagonal) is sparse.
In [1039]: c*b
Out[1039]:
<6x6 sparse matrix of type '<class 'numpy.float64'>'
with 14 stored elements in Compressed Sparse Row format>
The sparse sum is a dense matrix. It is 2d, so there's no need to expand it dimensions. In fact if I try that I get an error:
In [1031]: A/(A.sum(axis=1)[:,None])
....
ValueError: shape too large to be a matrix.

Per this message, to keep the matrix sparse, you access the data values and use the (nonzero) indices:
sums = np.asarray(A.sum(axis=1)).squeeze() # this is dense
A.data /= sums[A.nonzero()[0]]
If dividing by the nonzero row mean instead of the sum, one can
nnz = A.getnnz(axis=1) # this is also dense
means = sums / nnz
A.data /= means[A.nonzero()[0]]

Transform an array of count data into a matrix of ones and zeroes

I have an array n of count data, and I want to transform it into a matrix x in which each row contains a number of ones equal to the corresponding count number, padded by zeroes, e.g:
n = [0 1 3 0 1]
x = [[ 0. 0. 0.]
[ 1. 0. 0.]
[ 1. 1. 1.]
[ 0. 0. 0.]
[ 1. 0. 0.]]
My solution is the following, and is very slow. Is it possible to do better?
n = np.random.poisson(2,5)
max_n = max(n)
def f(y):
return np.concatenate((np.ones(y), np.zeros(max_n-y)))
x = np.vstack(map(f,n))

Here's one way to vectorize it:
>>> n = np.array([0,2,1,0,3])
>>> width = 4
>>> (np.arange(width) < n[:,None]).astype(int)
array([[0, 0, 0, 0],
[1, 1, 0, 0],
[1, 0, 0, 0],
[0, 0, 0, 0],
[1, 1, 1, 0]])
where if you liked, width could be max(n) or anything else you chose.

import numpy as np
n = np.array([0, 1, 3, 0, 1])
max_n = max(n)
np.vstack(n > i for i in range(max_n)).T.astype(int) # xrange(max_n) for python 2.x
Output:
array([[0, 0, 0],
[1, 0, 0],
[1, 1, 1],
[0, 0, 0],
[1, 0, 0]])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Where is this \n coming from in my arrays (Python)? - python

It has to do with the way numpy can be configured to print out arrays. If you set the linewidth to 96 —the length of str(A) without any \n plus 1—, it won't insert line breaks: numpy.set_printoptions(linewidth=96)

Related

Truncating a 2D array for a given tolerance [Python]

how to generate a modified version of identity matrix in python

Quickly fill large Numpy matrix from Pandas DataFrame

scipy sparse matrix division

Transform an array of count data into a matrix of ones and zeroes

Categories

Resources