How to change offset of matrix python numpy - python

I would like to get a matrix in shape of 100x100 like this:
[-2,1,0,0]
[1,-2,1,0]
[0,1,-2,1]
[0,0,1,-2]
I started with creating the diagonal:
import numpy as np
diagonal= (100)
diagonal= np.full(diagonal, -2)
A100 = (100,100)
A100 = np.zeros(A100)
np.fill_diagonal(A100, diagonal)
Now for changing the offset I tried:
off1=(99)
off1=np.ones(off1)
off1=np.diagonal(A100, offset=1)
But this doesn`t work.
Thanks for your help!

Construct the matrix from three identity matrices:
np.eye(100, k=1) + np.eye(100, k=-1) - 2 * np.eye(100)
P.S. This solution is 7x faster than the scipy.sparse solution.

You can use scipy.sparse.diags
from scipy.sparse import diags
A100 = diags([-2, 1, 1], [0, -1, 1], shape = (100, 100))
A100.A
Out[]:
array([[-2., 1., 0., ..., 0., 0., 0.],
[ 1., -2., 1., ..., 0., 0., 0.],
[ 0., 1., -2., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., -2., 1., 0.],
[ 0., 0., 0., ..., 1., -2., 1.],
[ 0., 0., 0., ..., 0., 1., -2.]])

Related

How to distribute a Numpy array along the diagonal of an array of higher dimension?

I have three two dimensional Numpy arrays x, w, d and want to create a fourth one called a. w and d define only the shape of a with d.shape + w.shape. I want to have x in the entries of a with a zeros elsewhere.
Specifically, I want a loop-free version of this code:
a = np.zeros(d.shape + w.shape)
for j in range(d.shape[1]):
a[:,j,:,j] = x
For example, given:
x = np.array([
[2, 3],
[1, 1],
[8,10],
[0, 1]
])
w = np.array([
[ 0, 1, 1],
[-1,-2, 1]
])
d = np.matmul(x,w)
I want a to be
array([[[[ 2., 0., 0.],
[ 3., 0., 0.]],
[[ 0., 2., 0.],
[ 0., 3., 0.]],
[[ 0., 0., 2.],
[ 0., 0., 3.]]],
[[[ 1., 0., 0.],
[ 1., 0., 0.]],
[[ 0., 1., 0.],
[ 0., 1., 0.]],
[[ 0., 0., 1.],
[ 0., 0., 1.]]],
[[[ 8., 0., 0.],
[10., 0., 0.]],
[[ 0., 8., 0.],
[ 0., 10., 0.]],
[[ 0., 0., 8.],
[ 0., 0., 10.]]],
[[[ 0., 0., 0.],
[ 1., 0., 0.]],
[[ 0., 0., 0.],
[ 0., 1., 0.]],
[[ 0., 0., 0.],
[ 0., 0., 1.]]]])
This answer inspired the following solution:
# shape a: (4, 3, 2, 3)
# shape x: (4, 2)
a = np.zeros(d.shape + w.shape)
a[:, np.arange(a.shape[1]), :, np.arange(a.shape[3])] = x
It uses Numpy's broadcasting (see here or here) im combination with Advanced Indexing to enlarge x to fit the slicing.
I happen to have an even simpler solution: a = np.tensordot(x, np.identity(3), axes = 0).swapaxes(1,2)
The size of the identity matrix will be decided by the number of times you wish to repeat the elements of x.

Scikit: Convert one-hot encoding to encoding with integers

I need to convert one-hot encoding to categories represented by unique integers. So one-hot encoding created with the following code:
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
labels = [[1],[2],[3]]
enc.fit(labels)
for x in [1,2,3]:
print(enc.transform([[x]]).toarray())
Out:
[[ 1. 0. 0.]]
[[ 0. 1. 0.]]
[[ 0. 0. 1.]]
Could be converted back to a set of unique integers, for example:
[1,2,3] or [11,37, 45] or any other where each integer uniquely represents a single class.
Is it possible to do with scikit-learn or any other python lib?
* Update *
Tried to:
labels = [[1],[2],[3], [4], [5],[6],[7]]
enc.fit(labels)
lst = []
for x in [1,2,3,4,5,6,7]:
lst.append(enc.transform([[x]]).toarray())
lst
Out:
[array([[ 1., 0., 0., 0., 0., 0., 0.]]),
array([[ 0., 1., 0., 0., 0., 0., 0.]]),
array([[ 0., 0., 1., 0., 0., 0., 0.]]),
array([[ 0., 0., 0., 1., 0., 0., 0.]]),
array([[ 0., 0., 0., 0., 1., 0., 0.]]),
array([[ 0., 0., 0., 0., 0., 1., 0.]]),
array([[ 0., 0., 0., 0., 0., 0., 1.]])]
a = np.array(lst)
np.where(a==1)[1]
Out:
array([0, 0, 0, 0, 0, 0, 0], dtype=int64)
Not what I need
You can do that using np.where as follows:
import numpy as np
a=np.array([[ 0., 1., 0.],
[ 1., 0., 0.],
[ 0., 0., 1.]])
np.where(a==1)[1]
This prints array([1, 0, 2], dtype=int64). This works since np.where(a==1)[1] returns the column indices of the 1's, which are exactly the labels.
In addition, since a is a 0,1-matrix, you can also replace np.where(a==1)[1] with just np.where(a)[1].
Update: The following solution should work with your format:
l=[np.array([[ 1., 0., 0., 0., 0., 0., 0.]]),
np.array([[ 0., 0., 1., 0., 0., 0., 0.]]),
np.array([[ 0., 1., 0., 0., 0., 0., 0.]]),
np.array([[ 0., 0., 0., 0., 1., 0., 0.]]),
np.array([[ 0., 0., 0., 0., 1., 0., 0.]]),
np.array([[ 0., 0., 0., 0., 0., 1., 0.]]),
np.array([[ 0., 0., 0., 0., 0., 0., 1.]])]
a=np.array(l)
np.where(a)[2]
This prints
array([0, 2, 1, 4, 4, 5, 6], dtype=int64)
Alternativaly, you could use the original solution together with #ml4294's comment.
You can use np.argmax():
from sklearn.preprocessing import OneHotEncoder
import numpy as np
enc = OneHotEncoder()
labels = [[1],[2],[3]]
enc.fit(labels)
x = enc.transform(labels).toarray()
# x = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
xr = (np.argmax(x, axis=1)+1).reshape(-1, 1)
print(xr)
This should return array([[1], [2], [3]]). If you want instead array([[0], [1], [2]]), just remove the +1 in the definition of xr.
Since you are using sklearn.preprocessing.OneHotEncoder to 'encode' the data, you can use its .inverse_transform() method to 'decode' the data (I think this requires .__version__ = 0.20.1 or newer):
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
labels = [[1],[2],[3]]
encoder = enc.fit(labels)
encoded_labels = encoder.transform(labels)
decoded_labels = encoder.inverse_transform(encoded_labels)
decoded_labels # array([[1],
[2],
[3]])
n.b. decoded_labels is a numpy array not a list.
Source: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder.inverse_transform

Numpy create an array of matrices

I am trying to store matrices into an array, however when I append the matrix, it would get every element and output just an 1 dimensional array.
Example Code:
matrix_array= np.array([])
for y in y_label:
matrix_array= np.append(matrix_array, np.identity(3))
Clearly np.append is the wrong tool for the job:
In [144]: np.append(np.array([]), np.identity(3))
Out[144]: array([ 1., 0., 0., 0., 1., 0., 0., 0., 1.])
From its docs:
If axis is not specified, values can be any shape and will be
flattened before use.
With list append
In [153]: alist=[]
In [154]: for y in [1,2]:
...: alist.append(np.identity(3))
...:
In [155]: alist
Out[155]:
[array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]]), array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])]
In [156]: np.array(alist)
Out[156]:
array([[[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]],
[[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]]])
In [157]: _.shape
Out[157]: (2, 3, 3)

allocate memory in python for large scipy.sparse matrix operations

Is there a way I can allocate memory for scipy sparse matrix functions to process large datasets?
Specifically, I'm attempting to use Asymmetric Least Squares Smoothing (translated into python here and the original here) to perform a baseline correction on a large mass spec dataset (length of ~60,000).
The function (see below) uses the scipy.sparse matrix operations.
def baseline_als(y, lam, p, niter):
L = len(y)
D = sparse.csc_matrix(np.diff(np.eye(L), 2))
w = np.ones(L)
for i in xrange(niter):
W = sparse.spdiags(w, 0, L, L)
Z = W + lam * D.dot(D.transpose())
z = spsolve(Z, w*y)
w = p * (y > z) + (1-p) * (y < z)
return z
I have no problem when I pass data sets that are 10,000 or less in length:
baseline_als(np.ones(10000),100,0.1,10)
But when passing larger data sets, e.g.
baseline_als(np.ones(50000), 100, 0.1, 10)
I get a MemoryError, for the line
D = sparse.csc_matrix(np.diff(np.eye(L), 2))
Try changing
D = sparse.csc_matrix(np.diff(np.eye(L), 2))
to
diag = np.ones(L - 2)
D = sparse.spdiags([diag, -2*diag, diag], [0, -1, -2], L, L-2)
D will be a sparse matrix in DIAgonal format. If it turns out that being in CSC format is important, convert it using the tocsc() method:
D = sparse.spdiags([diag, -2*diag, diag], [0, -1, -2], L, L-2).tocsc()
The following example shows that the old and new versions generate the same matrix:
In [67]: from scipy import sparse
In [68]: L = 8
Original:
In [69]: D = sparse.csc_matrix(np.diff(np.eye(L), 2))
In [70]: D.A
Out[70]:
array([[ 1., 0., 0., 0., 0., 0.],
[-2., 1., 0., 0., 0., 0.],
[ 1., -2., 1., 0., 0., 0.],
[ 0., 1., -2., 1., 0., 0.],
[ 0., 0., 1., -2., 1., 0.],
[ 0., 0., 0., 1., -2., 1.],
[ 0., 0., 0., 0., 1., -2.],
[ 0., 0., 0., 0., 0., 1.]])
New version:
In [71]: diag = np.ones(L - 2)
In [72]: D = sparse.spdiags([diag, -2*diag, diag], [0, -1, -2], L, L-2)
In [73]: D.A
Out[73]:
array([[ 1., 0., 0., 0., 0., 0.],
[-2., 1., 0., 0., 0., 0.],
[ 1., -2., 1., 0., 0., 0.],
[ 0., 1., -2., 1., 0., 0.],
[ 0., 0., 1., -2., 1., 0.],
[ 0., 0., 0., 1., -2., 1.],
[ 0., 0., 0., 0., 1., -2.],
[ 0., 0., 0., 0., 0., 1.]])

Numpy triu generates nan when called on matrices with infinite values

Just found some unexpected behaviour in Numpy 1.8.1 in the triu function.
import numpy as np
a = np.zeros((4, 4))
a[1:, 2] = np.inf
a
>>>array([[ 0., 0., 0., 0.],
[ inf, 0., 0., 0.],
[ inf, 0., 0., 0.],
[ inf, 0., 0., 0.]])
np.triu(a)
>>>array([[ 0., 0., 0., 0.],
[ nan, 0., 0., 0.],
[ nan, 0., 0., 0.],
[ nan, 0., 0., 0.]])
Would this behaviour ever be desirable? Or shall I file a bug report?
Edit
I raised an issue on the Numpy github page
1. Explanation
Looks like you ignored the RuntimeWarning:
>>> np.triu(a)
twodim_base.py:450: RuntimeWarning: invalid value encountered in multiply
out = multiply((1 - tri(m.shape[0], m.shape[1], k - 1, dtype=m.dtype)), m)
The source code for numpy.triu is as follows:
def triu(m, k=0):
m = asanyarray(m)
out = multiply((1 - tri(m.shape[0], m.shape[1], k - 1, dtype=m.dtype)), m)
return out
This uses numpy.tri to get an array with ones below the diagonal and zeros above, and subtracts this from 1 to get an array with zeros below the diagonal and ones above:
>>> 1 - np.tri(4, 4, -1)
array([[ 1., 1., 1., 1.],
[ 0., 1., 1., 1.],
[ 0., 0., 1., 1.],
[ 0., 0., 0., 1.]])
Then it multiplies this element-wise with the original array. So where the original array has inf, the result has inf * 0 which is NaN.
2. Workaround
Use numpy.tril_indices to generate the indices of the lower triangle, and set all those entries to zero:
>>> a = np.ones((4, 4))
>>> a[1:, 0] = np.inf
>>> a
array([[ 1., 1., 1., 1.],
[ inf, 1., 1., 1.],
[ inf, 1., 1., 1.],
[ inf, 1., 1., 1.]])
>>> a[np.tril_indices(4, -1)] = 0
>>> a
array([[ 1., 1., 1., 1.],
[ 0., 1., 1., 1.],
[ 0., 0., 1., 1.],
[ 0., 0., 0., 1.]])
(Depending on what you are going to do with a, you might want to take a copy before zeroing these entries.)

Categories