Compute indexed tensor multiplication with sympy - python

I would like to compute the following with sympy:
Where I is a 3x3 identity matrix. The end use is to use this with symbolic matrices.
I have the following:
import sympy as sp
I = sp.eye(3)
Missing operations with sympy
With numpy I can just use the einsum function and have:
import numpy as np
I = np.eye(3)
Res = (np.einsum("ij,kl->ijkl", I, I)
+ np.einsum("ik,jl->ijkl", I, I)
+ np.einsum("il,jk->ijkl", I, I))
However, einsum will not accept sympy's objects for this operation.
How can I compute this with sympy?

While the notation makes the einsum expression convenient, it isn't a matrix-product. It's more like an extended outer product. There's no sum-of-products:
In [22]: I = np.eye(3)
...: Res = (
...: np.einsum("ij,kl->ijkl", I, I)
...: + np.einsum("ik,jl->ijkl", I, I)
...: + np.einsum("il,jk->ijkl", I, I)
...: )
In [23]: Res
Out[23]:
array([[[[3., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]],
[[0., 1., 0.],
[1., 0., 0.],
[0., 0., 0.]],
[[0., 0., 1.],
[0., 0., 0.],
[1., 0., 0.]]],
[[[0., 1., 0.],
[1., 0., 0.],
[0., 0., 0.]],
[[1., 0., 0.],
[0., 3., 0.],
[0., 0., 1.]],
[[0., 0., 0.],
[0., 0., 1.],
[0., 1., 0.]]],
[[[0., 0., 1.],
[0., 0., 0.],
[1., 0., 0.]],
[[0., 0., 0.],
[0., 0., 1.],
[0., 1., 0.]],
[[1., 0., 0.],
[0., 1., 0.],
[0., 0., 3.]]]])
In [24]: Res.shape
Out[24]: (3, 3, 3, 3)
In numpy we can use broadcasting to do the same thing:
In [25]: res1 = I[:,:,None,None]*I + I[:,None,:,None]*I[None,:,None,:]+I[:,None,None,:]*I[None,:,:,Non
...: e]
In [26]: res1.shape
Out[26]: (3, 3, 3, 3)
In [27]: np.allclose(Res, res1)
Out[27]: True
Your sp.eye produces a MutableDenseMatrix
https://docs.sympy.org/latest/modules/matrices/dense.html#sympy.matrices.dense.MutableDenseMatrix
Feel free to study its docs. My impression is that sympy matrices don't implement multidimensional arrays with anything like the power of numpy.

Related

4x4 matrix with 1's in the diagonals (like a cross) and 0's everywhere else, using python

i am able to get the checkerboard pattern, the + pattern and the one with 1's on the border but i am not able to figure this out. Can somebody help?
If you're sticking with whole dimensions, then as #Péter Leéh pointed out:
>>> np.eye(n) + np.fliplr(np.eye(n))
array([[1., 0., 0., 1.],
[0., 1., 1., 0.],
[0., 1., 1., 0.],
[1., 0., 0., 1.]])
will suffice, np.fliplr(x) (horizontal flip) is identical to np.flip(x, axis=1).
However if n is odd, then you will have to replace the center element with a 1. e.g. n=5:
>>> x = np.eye(n) + np.fliplr(np.eye(n))
>>> x[n//2, n//2] = 1
array([[1., 0., 0., 0., 1.],
[0., 1., 0., 1., 0.],
[0., 0., 1., 0., 0.],
[0., 1., 0., 1., 0.],
[1., 0., 0., 0., 1.]])

Tensorflow to categorical problem, i want map my masks for segmentation?

I have a problem with labels for segmentation, the label can have this value: 0, 200, 210, 220, 230, 240. I use this code:
mask = tf.keras.utils.to_categorical(y, 241)
The code work, but i want map the mask with only 6 classes, is this possible?
mask = tf.keras.utils.to_categorical(y,6)
On solution is that first replace your list with ordered indexes and then make it categorical. Because to_categorical expects indices for your list.
Here is the example code if you have limited categories:
y = [0, 200,210,0,240,230,200,0,210,220,240,0]
replacements = {
0: 0,
200: 1,
210: 2,
220: 3,
230: 4,
240: 5,
}
y = [replacements.get(x, x) for x in y]
y = tf.keras.utils.to_categorical(y)
Or you can use a simpler way like this:
from sklearn.preprocessing import LabelEncoder
y = tf.keras.utils.to_categorical(LabelEncoder().fit_transform(y))
If you print y:
array([[1., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0.],
[1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1.],
[0., 0., 0., 0., 1., 0.],
[0., 1., 0., 0., 0., 0.],
[1., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0.],
[0., 0., 0., 1., 0., 0.],
[0., 0., 0., 0., 0., 1.],
[1., 0., 0., 0., 0., 0.]], dtype=float32)

Filling torch tensor with zeros after certain index

Given a 3d tenzor, say:
batch x sentence length x embedding dim
a = torch.rand((10, 1000, 96))
and an array(or tensor) of actual lengths for each sentence
lengths = torch .randint(1000,(10,))
outputs tensor([ 370., 502., 652., 859., 545., 964., 566., 576.,1000., 803.])
How to fill tensor ‘a’ with zeros after certain index along dimension 1 (sentence length) according to tensor ‘lengths’ ?
I want smth like that :
a[ : , lengths : , : ] = 0
One way of doing it (slow if batch size is big enough):
for i_batch in range(10):
a[ i_batch , lengths[i_batch ] : , : ] = 0
You can do it using a binary mask.
Using lengths as column-indices to mask we indicate where each sequence ends (note that we make mask longer than a.size(1) to allow for sequences with full length).
Using cumsum() we set all entries in mask after the seq len to 1.
mask = torch.zeros(a.shape[0], a.shape[1] + 1, dtype=a.dtype, device=a.device)
mask[(torch.arange(a.shape[0]), lengths)] = 1
mask = mask.cumsum(dim=1)[:, :-1] # remove the superfluous column
a = a * (1. - mask[..., None]) # use mask to zero after each column
For a.shape = (10, 5, 96), and lengths = [1, 2, 1, 1, 3, 0, 4, 4, 1, 3].
Assigning 1 to respective lengths at each row, mask looks like:
mask =
tensor([[0., 1., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0.],
[0., 1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0.],
[1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1., 0.],
[0., 1., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0.]])
After cumsum you get
mask =
tensor([[0., 1., 1., 1., 1.],
[0., 0., 1., 1., 1.],
[0., 1., 1., 1., 1.],
[0., 1., 1., 1., 1.],
[0., 0., 0., 1., 1.],
[1., 1., 1., 1., 1.],
[0., 0., 0., 0., 1.],
[0., 0., 0., 0., 1.],
[0., 1., 1., 1., 1.],
[0., 0., 0., 1., 1.]])
Note that it exactly has zeros where the valid sequence entries are and ones beyond the lengths of the sequences. Taking 1 - mask gives you exactly what you want.
Enjoy ;)

Keras one-hot-encoder

I have an array, and use the to_categorical function in keras:
labels = np.array([1,7,7,1,7])
keras.utils.to_categorical(labels)
I get this response:
array([[0., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1.],
[0., 0., 0., 0., 0., 0., 0., 1.],
[0., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1.]], dtype=float32)
How can I get only two columns? One for the 1 and one for the 7.
This is a possible way, but not a very good one:
labels = np.delete(labels, np.s_[0:1], axis=1)
np.delete(labels, np.s_[1:6], axis=1)
that gives:
array([[1., 0.],
[0., 1.],
[0., 1.],
[1., 0.],
[0., 1.]], dtype=float32)
Is there a better way to achieve this? Preferably by some "hidden" function in Keras utils or similar?
IIUC, you can just index your array by any column that has a value:
cat = keras.utils.to_categorical(labels)
>>> cat
array([[0., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1.],
[0., 0., 0., 0., 0., 0., 0., 1.],
[0., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1.]])
# Select column if it has at least one value:
>>> cat[:,cat.any(0)]
array([[1., 0.],
[0., 1.],
[0., 1.],
[1., 0.],
[0., 1.]])
You could also use pandas:
import pandas as pd
cat = pd.get_dummies(labels).values
>>> cat
array([[1, 0],
[0, 1],
[0, 1],
[1, 0],
[0, 1]], dtype=uint8)
Use np.unique with return_inverse flag -
# Get unique IDs mapped to each group of elements
In [73]: unql, idx = np.unique(labels, return_inverse=True)
# Perform outer comparison for idx against range of unique groups
In [74]: (idx[:,None] == np.arange(len(unql))).astype(float)
Out[74]:
array([[1., 0.],
[0., 1.],
[0., 1.],
[1., 0.],
[0., 1.]])
Alternatively with direct usage of unique labels -
In [96]: (labels[:,None] == np.unique(labels)).astype(float)
Out[96]:
array([[1., 0.],
[0., 1.],
[0., 1.],
[1., 0.],
[0., 1.]])

allocate memory in python for large scipy.sparse matrix operations

Is there a way I can allocate memory for scipy sparse matrix functions to process large datasets?
Specifically, I'm attempting to use Asymmetric Least Squares Smoothing (translated into python here and the original here) to perform a baseline correction on a large mass spec dataset (length of ~60,000).
The function (see below) uses the scipy.sparse matrix operations.
def baseline_als(y, lam, p, niter):
L = len(y)
D = sparse.csc_matrix(np.diff(np.eye(L), 2))
w = np.ones(L)
for i in xrange(niter):
W = sparse.spdiags(w, 0, L, L)
Z = W + lam * D.dot(D.transpose())
z = spsolve(Z, w*y)
w = p * (y > z) + (1-p) * (y < z)
return z
I have no problem when I pass data sets that are 10,000 or less in length:
baseline_als(np.ones(10000),100,0.1,10)
But when passing larger data sets, e.g.
baseline_als(np.ones(50000), 100, 0.1, 10)
I get a MemoryError, for the line
D = sparse.csc_matrix(np.diff(np.eye(L), 2))
Try changing
D = sparse.csc_matrix(np.diff(np.eye(L), 2))
to
diag = np.ones(L - 2)
D = sparse.spdiags([diag, -2*diag, diag], [0, -1, -2], L, L-2)
D will be a sparse matrix in DIAgonal format. If it turns out that being in CSC format is important, convert it using the tocsc() method:
D = sparse.spdiags([diag, -2*diag, diag], [0, -1, -2], L, L-2).tocsc()
The following example shows that the old and new versions generate the same matrix:
In [67]: from scipy import sparse
In [68]: L = 8
Original:
In [69]: D = sparse.csc_matrix(np.diff(np.eye(L), 2))
In [70]: D.A
Out[70]:
array([[ 1., 0., 0., 0., 0., 0.],
[-2., 1., 0., 0., 0., 0.],
[ 1., -2., 1., 0., 0., 0.],
[ 0., 1., -2., 1., 0., 0.],
[ 0., 0., 1., -2., 1., 0.],
[ 0., 0., 0., 1., -2., 1.],
[ 0., 0., 0., 0., 1., -2.],
[ 0., 0., 0., 0., 0., 1.]])
New version:
In [71]: diag = np.ones(L - 2)
In [72]: D = sparse.spdiags([diag, -2*diag, diag], [0, -1, -2], L, L-2)
In [73]: D.A
Out[73]:
array([[ 1., 0., 0., 0., 0., 0.],
[-2., 1., 0., 0., 0., 0.],
[ 1., -2., 1., 0., 0., 0.],
[ 0., 1., -2., 1., 0., 0.],
[ 0., 0., 1., -2., 1., 0.],
[ 0., 0., 0., 1., -2., 1.],
[ 0., 0., 0., 0., 1., -2.],
[ 0., 0., 0., 0., 0., 1.]])

Categories