Smoothing one-hot encoded matrix rows

Smoothing one-hot encoded matrix rows - python

Assuming that I have the following matrix consisting of one-hot encoded rows:
X = np.array([[0., 0., 0., 1., 0.], [1., 0., 0., 0., 0.], [0., 0., 1., 0., 0.]])
What I aim to do is smooth/expand the one-hot encoding in a way such that I will obtain the following output:
Y = np.array([[0., 0., 1., 1., 1.], [1., 1., 0., 0., 0.], [0., 1., 1., 1., 0.]])
assuming that I want to smooth/expand 1 element to the left or the right of the one-hot element. Thank you for the help!

We can use convolution -
In [22]: from scipy.signal import convolve2d
In [23]: convolve2d(X,np.ones((1,3)),'same')
Out[23]:
array([[0., 0., 1., 1., 1.],
[1., 1., 0., 0., 0.],
[0., 1., 1., 1., 0.]])
With binary-dilation to be more memory-efficient -
In [43]: from scipy.ndimage.morphology import binary_dilation
In [46]: binary_dilation(X,np.ones((1,3), dtype=bool)).view('i1')
Out[46]:
array([[0, 0, 1, 1, 1],
[1, 1, 0, 0, 0],
[0, 1, 1, 1, 0]], dtype=int8)
Or since we only 0s and 1s, uniform filter would also work and additionally we can use it along a generic axis (axis=1 in our case) and should be better on perf. -
In [47]: from scipy.ndimage import uniform_filter1d
In [50]: (uniform_filter1d(X,size=3,axis=1)>0).view('i1')
Out[50]:
array([[0, 0, 1, 1, 1],
[1, 1, 0, 0, 0],
[0, 1, 1, 1, 0]], dtype=int8)

You could convolve X with an array of ones:
from scipy.signal import convolve2d
convolve2d(X, np.ones((1,3)), mode='same')
array([[0., 0., 1., 1., 1.],
[1., 1., 0., 0., 0.],
[0., 1., 1., 1., 0.]])

Solution based on standard np.convolve:
import numpy as np
np.array([np.convolve(x, np.array([1,1,1]), mode='same') for x in X])
Iterate rows using list comprehension to convolve, then convert back to np.array

Related

Add width to a numpy 1d "signal array "

I have a numpy int 1D array. Which looks like this:
[0,0,0,0,0,1,0,0,0,0,2,0,0,0,0,5,0,0,0,1,0,0,0,0,0,0,0,0,0,0]
Basically, it's an array of mostly zeros with some signals that are ints [1,2,3,4,5,...] and the signals always have a "width" of 1, meaning they are surrounded by 0s.
I want to add "width" to each signal so instead of taking only 1 space in the array it would take width space in the array.
So, in this example with the width of 3, I would get
[0,0,0,0,1,1,1,0,0,2,2,2,0,0,5,5,5,0,1,1,1,0,0,0,0,0,0,0,0,0]
The length of the array stays the same, the width can be 3,5,7, but nothing too outrageous.
What would be the fastest way to do this? I feel like there probably is an easy way to do this, but not sure how to correctly call this operation.

Convolution might be what you're looking for?
>>> import numpy as np
>>> width = 3
>>> a = np.array([0,0,0,0,0,1,0,0,0,0,2,0,0,0,0,5,0,0,0,1,0,0,0,0,0,0,0,0,0,0])
>>> np.convolve(a, np.ones(width))
array([0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 2., 2., 2., 0., 0., 5., 5.,
5., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
This does not preserve the length of the array though. If you want to preserve the length, you should use the 'same' mode as such:
>>> np.convolve(a, np.ones(width), mode='same')
array([0., 0., 0., 0., 1., 1., 1., 0., 0., 2., 2., 2., 0., 0., 5., 5., 5.,
0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
If this is not fast enough, I suggest you take a look at scipy.signal.fftconvolve.

I know it's not the perfect solution but here it is:
I made a duplicate of the intial list and created a width range so when I find a number diffrent than 0 I replace the surrounding zeros with the appropriate number
arr = [0,0,0,0,0,1,0,0,0,0,2,0,0,0,0,5,0,0,0,1,0,0,0,0,0,0,0,0,0,0]
arr1 = [0,0,0,0,0,1,0,0,0,0,2,0,0,0,0,5,0,0,0,1,0,0,0,0,0,0,0,0,0,0]
width = 3
width_range = [i for i in range(width//(-2)+1,width//(2)+1)]
print('width_range: ',width_range)
for idx,elem in enumerate(arr):
if elem !=0:
for i in width_range:
arr1[idx+i]=elem
print(arr1)
Output:
width_range: [-1, 0, 1]
[0, 0, 0, 0, 1, 1, 1, 0, 0, 2, 2, 2, 0, 0, 5, 5, 5, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
PS: This example only works with 3 and 5 if you want to test it with 7 you need to add zeros between your signals.

numpy fill ndarray with values of a 1D array

I want to fill Ndarray x with values from array b along dimension i without using a for loop. This snippet of code is what I'm currently using but it's not that fast. Is there a better way?
for i in range(len(b)):
x[...,i,:,:] = b[i]
Edit 1: It's almost what I'm looking for but for higher dimensions it doesn't seem to work. x has a dimension of 8 and it's important that the shape of the Ndarray remains the same. Any more ideas?
import numpy as np
x = np.ones((2,3,4))
b = np.arange(3)
for i in range(len(b)):
x[:,i,:] = b[i]
x
Out[5]:
array([[[0., 0., 0., 0.],
[1., 1., 1., 1.],
[2., 2., 2., 2.]],
[[0., 0., 0., 0.],
[1., 1., 1., 1.],
[2., 2., 2., 2.]]])
y = np.tile(b,(4,1,2)).T
y
Out[7]:
array([[[0, 0, 0, 0]],
[[1, 1, 1, 1]],
[[2, 2, 2, 2]],
[[0, 0, 0, 0]],
[[1, 1, 1, 1]],
[[2, 2, 2, 2]]])
Edit 2: This seems to do the job
z[...] = b.reshape(1,-1,1)
z
Out[20]:
array([[[0., 0., 0., 0.],
[1., 1., 1., 1.],
[2., 2., 2., 2.]],
[[0., 0., 0., 0.],
[1., 1., 1., 1.],
[2., 2., 2., 2.]]])

There is a faster way. You can reshape b to add new dimensions and get the advantages of numpy broadcasting rules:
x[...,:,:,:] = b.reshape(-1,1,1)
Here I am assuming that b is a vector.
Another equivalent way to create new dimensions is as the following code indicates:
x[...,:,:,:] = b[:, np.newaxis, np.newaxis]

Depending on the shape of your destination array you can do something like this
>>> import numpy as np
>>> x = np.ones((4,8))
>>> x
array([[1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1.]])
>>> b = np.arange(4)
>>> b
array([0, 1, 2, 3])
>>> x[:,1] = b
>>> x
array([[1., 0., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1.],
[1., 2., 1., 1., 1., 1., 1., 1.],
[1., 3., 1., 1., 1., 1., 1., 1.]])
In this example we assigned b to column 1 of the 2D array x
If instead you are trying to repeat b a certain number of times you can use np.tile
>>> x = np.tile(b, (8,1)).T
>>> x
array([[0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3, 3, 3]])

Is there a way to do Pytorch element wise equality treating each dimension as an element?

I have two tensors and I want to check for equality treating an array in one dimension as the element
I have 2 tensors
lo = torch.Tensor(([1., 1., 0.],
[0., 1., 1.],
[0., 0., 0.],
[1., 1., 1.]))
lo = torch.Tensor(([1., 1., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]))
I've tried using
torch.eq(lee, lo)
which returns a tensor like
tensor([[1, 1, 1],
[1, 0, 0],
[1, 1, 1],
[0, 0, 0]], dtype=torch.uint8)
Is there a way to have the output become
tensor([1, 0, 1, 0])
as the only complete element that matches is the first?
edit:
I've come up with this solution
lee = lee.tolist()
lo = lo.tolist()
out = []
for i, j in enumerate(lee):
if j == lo[i]:
out.append(1)
else:
out.append(0)
and out will be [1, 0, 1, 0]
But is there an easier way?

You can simply use torch.all(tensor, dim).
code:
l1 = torch.Tensor(([1., 1., 0.],
[0., 1., 1.],
[0., 0., 0.],
[1., 1., 1.]))
l2 = torch.Tensor(([1., 1., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]))
print(torch.eq(l1, l2))
print(torch.all(torch.eq(l1, l2), dim=0)) # equivalent to dim = -2
print(torch.all(torch.eq(l1, l2), dim=1)) # equivalent to dim = -1
output:
tensor([[1, 1, 1],
[1, 0, 0],
[1, 1, 1],
[0, 0, 0]], dtype=torch.uint8)
tensor([0, 0, 0], dtype=torch.uint8)
tensor([1, 0, 1, 0], dtype=torch.uint8) # your desired output

Or take torch.eq(lee, lo) and row must summ to its len , means all 1 must be there
import torch
lo = torch.Tensor(([1., 1., 0.],
[0., 1., 1.],
[0., 0., 0.],
[1., 1., 1.]))
l1 = torch.Tensor(([1., 1., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]))
teq = torch.eq(l1, lo)
print(teq)
tsm = teq.sum(-1)
print(tsm == 3)
tsm is tensor([3, 1, 3, 0])
printout returns [1, 0, 1, 0]

Remove all columns matching a value in Numpy

Let's suppose I have a matrix with a number of binary values:
matrix([[1., 1., 1., 0., 0.],
[0., 0., 1., 1., 1.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]])
Using np.sum(M, 0) produces:
matrix([[1., 1., 2., 2., 2.]])
How do I remove all of the columns from the matrix that have only the value of 1?

Easier to have an array here:
M = M.A
Now using simple slicing:
M[:, np.sum(M, 0)!=1]
array([[1., 0., 0.],
[1., 1., 1.],
[0., 1., 0.],
[0., 0., 1.]])

You can convert the matrix to array. Then find the index with values 1 and then use those indexes to delete the values. For example you can do the following.
import numpy as np
M = np.matrix([[1, 1, 1, 0, 0], [0, 0, 1, 1, 1], [0, 0, 0, 1, 0], [0, 0, 0, 0, 1]])
M = np.sum(M, 0)
# conversion to array
array = np.squeeze(np.asarray(M))
index_of_elements_with_value_1 = [i for i, val in enumerate(array) if val == 1]
array = np.delete(array, index_of_elements_with_value_1)
print(array)

Scikit: Convert one-hot encoding to encoding with integers

I need to convert one-hot encoding to categories represented by unique integers. So one-hot encoding created with the following code:
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
labels = [[1],[2],[3]]
enc.fit(labels)
for x in [1,2,3]:
print(enc.transform([[x]]).toarray())
Out:
[[ 1. 0. 0.]]
[[ 0. 1. 0.]]
[[ 0. 0. 1.]]
Could be converted back to a set of unique integers, for example:
[1,2,3] or [11,37, 45] or any other where each integer uniquely represents a single class.
Is it possible to do with scikit-learn or any other python lib?
* Update *
Tried to:
labels = [[1],[2],[3], [4], [5],[6],[7]]
enc.fit(labels)
lst = []
for x in [1,2,3,4,5,6,7]:
lst.append(enc.transform([[x]]).toarray())
lst
Out:
[array([[ 1., 0., 0., 0., 0., 0., 0.]]),
array([[ 0., 1., 0., 0., 0., 0., 0.]]),
array([[ 0., 0., 1., 0., 0., 0., 0.]]),
array([[ 0., 0., 0., 1., 0., 0., 0.]]),
array([[ 0., 0., 0., 0., 1., 0., 0.]]),
array([[ 0., 0., 0., 0., 0., 1., 0.]]),
array([[ 0., 0., 0., 0., 0., 0., 1.]])]
a = np.array(lst)
np.where(a==1)[1]
Out:
array([0, 0, 0, 0, 0, 0, 0], dtype=int64)
Not what I need

You can do that using np.where as follows:
import numpy as np
a=np.array([[ 0., 1., 0.],
[ 1., 0., 0.],
[ 0., 0., 1.]])
np.where(a==1)[1]
This prints array([1, 0, 2], dtype=int64). This works since np.where(a==1)[1] returns the column indices of the 1's, which are exactly the labels.
In addition, since a is a 0,1-matrix, you can also replace np.where(a==1)[1] with just np.where(a)[1].
Update: The following solution should work with your format:
l=[np.array([[ 1., 0., 0., 0., 0., 0., 0.]]),
np.array([[ 0., 0., 1., 0., 0., 0., 0.]]),
np.array([[ 0., 1., 0., 0., 0., 0., 0.]]),
np.array([[ 0., 0., 0., 0., 1., 0., 0.]]),
np.array([[ 0., 0., 0., 0., 1., 0., 0.]]),
np.array([[ 0., 0., 0., 0., 0., 1., 0.]]),
np.array([[ 0., 0., 0., 0., 0., 0., 1.]])]
a=np.array(l)
np.where(a)[2]
This prints
array([0, 2, 1, 4, 4, 5, 6], dtype=int64)
Alternativaly, you could use the original solution together with #ml4294's comment.

You can use np.argmax():
from sklearn.preprocessing import OneHotEncoder
import numpy as np
enc = OneHotEncoder()
labels = [[1],[2],[3]]
enc.fit(labels)
x = enc.transform(labels).toarray()
# x = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
xr = (np.argmax(x, axis=1)+1).reshape(-1, 1)
print(xr)
This should return array([[1], [2], [3]]). If you want instead array([[0], [1], [2]]), just remove the +1 in the definition of xr.

Since you are using sklearn.preprocessing.OneHotEncoder to 'encode' the data, you can use its .inverse_transform() method to 'decode' the data (I think this requires .__version__ = 0.20.1 or newer):
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
labels = [[1],[2],[3]]
encoder = enc.fit(labels)
encoded_labels = encoder.transform(labels)
decoded_labels = encoder.inverse_transform(encoded_labels)
decoded_labels # array([[1],
[2],
[3]])
n.b. decoded_labels is a numpy array not a list.
Source: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder.inverse_transform

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Smoothing one-hot encoded matrix rows - python

You could convolve X with an array of ones: from scipy.signal import convolve2d convolve2d(X, np.ones((1,3)), mode='same') array([[0., 0., 1., 1., 1.], [1., 1., 0., 0., 0.], [0., 1., 1., 1., 0.]])

Solution based on standard np.convolve: import numpy as np np.array([np.convolve(x, np.array([1,1,1]), mode='same') for x in X]) Iterate rows using list comprehension to convolve, then convert back to np.array

Related

Add width to a numpy 1d "signal array "

numpy fill ndarray with values of a 1D array

Is there a way to do Pytorch element wise equality treating each dimension as an element?

Remove all columns matching a value in Numpy

Scikit: Convert one-hot encoding to encoding with integers

Categories

Resources