replace for-loop with one line numpy code - python

Hi I have the following code and is there any way to replace the for loop with a one line numpy code?
x = 10
y = 5
z = 2
b = np.zeros((x,y))
a = np.random.choice(np.arange(y),size=(x,z))
for i in range(len(a)):
b[i,a[i]] = 1
With the above code, I get b as
array([[1., 0., 1., 0., 0.],
[0., 1., 0., 1., 0.],
[0., 0., 0., 1., 1.],
[0., 0., 1., 0., 1.],
[1., 0., 0., 1., 0.],
[1., 0., 0., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 1., 1., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 1.]])
I've tried b[a] = 1 instead of
for i in range(len(a)):
b[i,a[i]] = 1
and it gives all ones in the first 5 rows.
array([[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 0., 0., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 1., 1., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 1.]])

b[range(len(a)), a[:, 0]] = 1
b[range(len(a)), a[:, 1]] = 1
or for arbitrary value of z you can do it like
b[np.resize(range(len(a)), a.shape[0] * a.shape[1]), a.T.reshape(-1)] = 1

Related

How to create an "islands" style pytorch matrix

Probably a simple question, hopefully with a simple solution:
I am given a (sparse) 1D boolean tensor of size [1,N].
I would like to produce a 2D tensor our of it of size [N,N], containing islands which are induced by the 1D tensor. It will be the easiest to observe the following image example, where the upper is the 1D boolean tensor, and the matrix below represents the resulted matrix:
Given a mask input:
>>> x = torch.tensor([0,0,0,1,0,0,0,0,1,0,0])
You can retrieve the indices with torch.diff:
>>> index = x.nonzero()[:,0].diff(prepend=torch.zeros(1), append=torch.ones(1)*len(x))
tensor([3., 5., 3.])
Then use torch.block_diag to create the diagonal block matrix:
>>> torch.block_diag(*[torch.ones(i,i) for i in index.int()])
tensor([[1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1.],
[0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1.],
[0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1.]])

None accesses all the elements in a numpy array. How to bypass this?

Hello this is the code block that one-hot encodes a DNA sequence. What happens is that for 'n' it is mapping 1 in all 4 positions in the 2nd axis. I want to avoid using an if-else in the following code.
seq = 'nnnactgactgnnnnn'
onehot = np.zeros((len(seq), 4))
mapper = {'a':0,'c':1,'g':2,'t':3,'n':None}
for i in range(len(seq)):
onehot[i][mapper[seq[i]]] = 1
output:
array([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 0., 1.],
[0., 0., 1., 0.],
[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 0., 1.],
[0., 0., 1., 0.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
How can I assign 0 for n while using the mapper dict.
tldr: using None is accessing all the positions for a row. How to solve that?
You could use:
mapper = {'a':[1,0,0,0],'c':[0,1,0,0],'g':[0,0,1,0],'t':[0,0,0,1],'n':[0,0,0,0]}
And then append the corresponding encoding according to sequence.
Edit: Might be faster.
seq = 'nnnactgactgnnnnn'
onehot = np.zeros((len(seq), 4))
mapper = {'a':0,'c':1,'g':2,'t':3,'n':None}
result = {'a':1,'c':1,'g':1,'t':1,'n':0}
for i in range(len(seq)):
onehot[i][mapper[seq[i]]] = result[seq[i]]

create one-hot encoding for values of histogram bins

Given the tensor below of size torch.Size([22])
tensor([-20.1659, -19.7022, -17.4124, -16.7115, -16.4696, -15.6848, -15.5201, -14.5384, -12.5017, -12.4227, -11.0946, -10.7844, -10.5467, -9.3933, -4.2351, -4.0521, -3.8844, -3.8668, -3.7337, -3.7002, -3.6242, -3.5820])
and the below historgram:
hist = torch.histogram(tensor, 5)
hist
torch.return_types.histogram(
hist=tensor([3., 5., 5., 1., 8.]),
bin_edges=tensor([-20.1659, -16.8491, -13.5323, -10.2156, -6.8988, -3.5820]))
For each value of the tensor, how to create a one hot encoding that corresponds to its bin number, so that the output is a tensor of size torch.Size([22, 5])
You can use torch.repeat_interleave
import torch
bins = torch.tensor([3, 5, 5, 1, 8])
one_hots = torch.eye(len(bins))
one_hots = torch.repeat_interleave(one_hots, bins, dim=0)
print(one_hots)
output
tensor([[1., 0., 0., 0., 0.],
[1., 0., 0., 0., 0.],
[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.],
[0., 0., 0., 0., 1.],
[0., 0., 0., 0., 1.],
[0., 0., 0., 0., 1.],
[0., 0., 0., 0., 1.],
[0., 0., 0., 0., 1.],
[0., 0., 0., 0., 1.],
[0., 0., 0., 0., 1.]])

Computing mean and variance of chunks of an array

I have an array that is grouped and looks like this:
import numpy as np
y = np.array(
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.],
[2., 2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2., 2.]]
)
n_repeats = 4
The array contains three groups, here marked as 0, 1, and 2. Every group appears n_repeats times. Here n_repeats=4. Currently I do the following to compute the mean and variance of chunks of that array:
mean = np.array([np.mean(y[i: i+n_repeats], axis=0) for i in range(0, len(y), n_repeats)])
var = np.array([np.var(y[i: i+n_repeats], axis=0) for i in range(0, len(y), n_repeats)])
Is there a better and faster way to achieve this?
Yes, reshape and then use .mean and .var along the appropriate dimension:
>>> arr.reshape(-1, 4, 6)
array([[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.]],
[[2., 2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2., 2.]]])
>>> arr.reshape(-1, 4, 6).mean(axis=1)
array([[0., 0., 0., 0., 0., 0.],
[1., 1., 1., 1., 1., 1.],
[2., 2., 2., 2., 2., 2.]])
>>> arr.reshape(-1, 4, 6).var(axis=1)
array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]])
In case you do not know how many groups, or number of repeats, you can try:
>>> np.vstack([y[y == i].reshape(-1,y.shape[1]).mean(axis=0) for i in np.unique(y)])
array([[0., 0., 0., 0., 0., 0.],
[1., 1., 1., 1., 1., 1.],
[2., 2., 2., 2., 2., 2.]])
>>> np.vstack([y[y == i].reshape(-1,y.shape[1]).var(axis=0) for i in np.unique(y)])
array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]])

Switch triangular matrix

Is there an easy way to turn around a triangular matrix.
import numpy as np
shape=(4,8)
x3=np.ones(shape)
for m in range(len(x3)):
step = (m * int(2)+1) #per step of 2 zeros
for n in range(int(step), len(x3[m])):
x3[m][n] = 0
Gives me this matrix:
array([[1., 0., 0., 0., 0., 0., 0., 0.],
[1., 1., 1., 0., 0., 0., 0., 0.],
[1., 1., 1., 1., 1., 0., 0., 0.],
[1., 1., 1., 1., 1., 1., 1., 0.]])
I want to switch this to something like this:
array([[0., 0., 0., 0., 0., 0., 0., 1.],
[0., 0., 0., 0., 0., 1., 1., 1.],
[0., 0., 0., 1., 1., 1., 1., 1.],
[0., 1., 1., 1., 1., 1., 1., 1.]])
Is there a simple way of doing this?
np.flip from numpy package does the trick :
A = array([[1., 0., 0., 0., 0., 0., 0., 0.],
[1., 1., 1., 0., 0., 0., 0., 0.],
[1., 1., 1., 1., 1., 0., 0., 0.],
[1., 1., 1., 1., 1., 1., 1., 0.]])
np.flip(A, 1)
#returns what you want : 1 for vertical symetry
array([[0., 0., 0., 0., 0., 0., 0., 1.],
[0., 0., 0., 0., 0., 1., 1., 1.],
[0., 0., 0., 1., 1., 1., 1., 1.],
[0., 1., 1., 1., 1., 1., 1., 1.]])

Categories