Map a Numpy array into a list of characters - python

Given a two dim numpy array:
a = array([[-1, -1],
[-1, 1],
[ 1, 1],
[ 1, 1],
[ 1, 0],
[ 0, -1],
[-1, 0],
[ 0, -1],
[-1, 0],
[ 0, 1],
[ 1, 1],
[ 1, 1]])
and a dictionary of conversions:
d = {-1:'a', 0:'b', 1:'c'}
how to map the original array into a list of character combinations?
What I need is the following list (or array)
out_put = ['aa', 'ac', 'cc', 'cc', 'cb', 'ba', ....]
(I am doing some machine learning classification and my classes are labeled by the combination of -1, 0,1 and I need to convert the array of 'labels' into something readable, as 'aa', bc' and so on).
If there is a simple function (binarizer, or one-hot-encoding) within the sklearn package, which can convert the original bumpy array into a set of labels, that would be perfect!

Here's another approach with list comprehension:
my_dict = {-1:'a', 0:'b', 1:'c'}
out_put = ["".join([my_dict[val] for val in row]) for row in a]

i think you ought to be able to do this via a list comprehension
# naming something `dict` is a bad idea
d = {-1:'a', 0:'b', 1:'c'}
out_put = ['%s%s' % (d[x], d[y]) for x, y in a]

I think the following is very readable:
def switch(row):
dic = {
-1:'a',
0:'b',
1:'c'
}
return dic.get(row)
out_put = [switch(x)+switch(y) for x,y in a]

Related

Comparing two dimensional arrays to one another

I want to write a code where it outputs the similarities for the values of arrays a,b,c. I want the code to check if there are any similar values between the arrays. I will be comparing b and c to a. So [ 0, 1624580882] exist when comparing a and b and so on. Both the columns must be equivalent for the comparison to work.
import numpy as np
a= np.array([[ 0, 1624580882],
[ 1, 1624584458],
[ 0, 1624589467],
[ 1, 1624592213],
[ 0, 1624595336],
[ 1, 1624596349]])
b= np.array([[ 1, 1624580882],
[ 1, 1624584460],
[ 1, 1624595336],
[ 1, 1624596349]])
c = np.array([[ 0, 1624580882],
[ 1, 1624584458],
[ 0, 1624589495],
[ 1, 1624592238],
[ 0, 1624595336],
[ 1, 1624596349]])
Expected Output:
b comparison
Similarities= None
c comparison
Similarities= [ 0, 1624580882],[ 1, 1624584464], [ 0, 1624595350],[ 1, 1624596380]
I'm not giving you the actual solution rather I can help you with a simple function. You can design the rest of your code according to that function.
def compare_arrays(arr_1, arr_2):
result = []
for row in arr_1:
result.append(row in arr_2)
return result
Edit:
For getting the index of the duplicate values.
from numpy.lib import recfunctions as rfn
ndtype = [('a', int)]
a = np.ma.array([1, 1, 1, 2, 2, 3, 3],mask=[0, 0, 1, 0, 0, 0, 1]).view(ndtype)
rfn.find_duplicates(a, ignoremask=True, return_index=True)
not the most beautiful solution. But the first thing that comes to mind:
result = []
for row in a:
for irow in c:
if np.all(np.equal(row, irow)):
result.append(row)
break
I note that the proposed by Fatin Ishrak Rafi solution does not work. For example:
>>> [0, 1624589467] in c
>>> True

concatenate numpy columns in different positions

I have an array x = np.empty([2,3]). Assume I have two set of logical indices indx1 and indx2 and each one of them is paired with different columns, set1 and set2:
indx1 = [False,False,True]
set1 = np.array([[-1],[-1]])
indx2 = [True,True,False]
set2 = np.array([[1,2],[1,2]])
#need to join these two writing operations to a one.
x[:,indx1] = set1
x[:,indx2] = set2
>>> x
array([[1., 2., -1.],
[1., 2., -1.]])
How can I use indx1 and indx2 at the same time? For instance, I am looking for something like this (which does not work):
x[:,[indx1,indx2]] = [set1,set2]
In your case there are array, which have different dimensions (axis=0 if there the same dimension, and axis=1 if there is different dimensions)
For the easiest concatenate:
import numpy as np
set1 = np.array([[3],[3]])
set2 = np.array([[1,2],[1,2]])
indx1 = [False,False,True]
indx2 = [True,True,False]
sets = np.concatenate((set1, set2), axis=1)
np.concatenate((indx1, indx2), axis=0)
sets.sort()
output sets:
output index:
If you wan't to correlate sets with index - provide the proper output.
I did not manage to find an exact solution to the problem, but maybe (depending on how you generate the sets and indices), this will lead you in the right direction.
Let's suppose that, instead of the sparse definition of set1 and set2, you have dense arrays, each with the same size as x:
indx1 = [False,False,True]
indx2 = [True,True,False]
fullset1 = np.array([[0, 0, -1],
[0, 0, -1]])
fullset2 = np.array([[1, 2, 0],
[1, 2, 0]])
x = np.select( [indx1, indx2], [fullset1, fullset2] )
print(x)
#[[1 2 -1]
# [1 2 -1]]
It works with one command and can be easily extended if you have indx3, indx4, etc. However, I see several drawbacks. First, it creates a new variable that satisfies the conditions, which may not be your use case. Also, if there is an index that is set to false for all indx variables, the result might be unexpected:
indx1 = [False,False,True,False]
indx2 = [True,True,False,False]
fullset1 = np.array([[0, 0, -1, 0],
[0, 0, -1, 0]])
fullset2 = np.array([[1, 2, 0, 0],
[1, 2, 0, 0]])
x = np.select( [indx1, indx2], [fullset1, fullset2], default=None )
print(x)
#[[1 2 -1 None]
# [1 2 -1 None]]
In that case, my proposal (but I haven't tested the performances) would be to use an intermediate variable and np.where to fill the final variable:
x = np.array([[11, 12, 13, 14],
[15, 16, 17, 18]])
#....
intermediate_x = np.select( [indx1, indx2], [fullset1, fullset2], default=None )
indx_final = np.where(intermediate_x == None)
x[indx_final] = intermediate_x[indx_final]
print(x)
#[[ 1 2 -1 14]
# [ 1 2 -1 18]]

Filter dictionary whose values are arrays

I have data which looks like this:
features_dict = {
'feat1': np.array([[0,1],[2,3],[4,5]]),
'feat2': np.array([[6,7],[8,9],[10,11]]),
'feat3': np.array([1, 0, 0]),
'feat4': np.array([[1],[2],[1]])
}
I want to filter the values of above dictionary based on the first dimension index where feat3 values are 0. Hence, the output I'm looking for is:
features_dict = {
'feat1': np.array([[2,3],[4,5]]),
'feat2': np.array([[8,9],[10,11]]),
'feat3': np.array([0, 0]),
'feat4': np.array([[2],[1]])
}
Notice that I want to have only the 2nd and 3rd elements of each dict value because that's where feat3 values are 0.
Initially, I was thinking of converting the dict to pandas and filter the rows using .loc but it turned out that pandas can't accept arrays.
Can anyone please help? Thanks
import numpy as np
features_dict = {
'feat1': np.array([[0,1],[2,3],[4,5]]),
'feat2': np.array([[6,7],[8,9],[10,11]]),
'feat3': np.array([1, 0, 0]),
'feat4': np.array([[1],[2],[1]])
}
ind = features_dict['feat3'] == 0
features_dict = {k: v[ind] for k,v in features_dict.items()}
After filtering:
{
'feat1': array([[2, 3],[4, 5]]),
'feat2': array([[ 8, 9],[10, 11]]),
'feat3': array([0, 0]),
'feat4': array([[2],[1]])
}

How to interpret an array into squere bracket of another array?

I've write this code and I'm trying to understand the meaning of the output to apply a mask to an array
matrix = np.random.rand(3,3)
matrix
output:
array([[0.7441097 , 0.02908848, 0.60378581],
[0.53335156, 0.21701412, 0.51545259],
[0.91777356, 0.49123304, 0.15410852]])
mask
output:
matrix([[0, 0, 2],
[1, 1, 0],
[2, 2, 2]])
matrix[mask]
output:
array([[[0.7441097 , 0.02908848, 0.60378581],
[0.7441097 , 0.02908848, 0.60378581],
[0.91777356, 0.49123304, 0.15410852]],
[[0.53335156, 0.21701412, 0.51545259],
[0.53335156, 0.21701412, 0.51545259],
[0.7441097 , 0.02908848, 0.60378581]],
[[0.91777356, 0.49123304, 0.15410852],
[0.91777356, 0.49123304, 0.15410852],
[0.91777356, 0.49123304, 0.15410852]]])
how can this result be interpreted?
This is simply doing this:
In [1108]: matrix[0]
Out[1108]: array([0.02502891, 0.74397363, 0.74176154])
In [1109]: matrix[1]
Out[1109]: array([0.76480152, 0.84331737, 0.29647379])
In [1110]: matrix[2]
Out[1110]: array([0.68258943, 0.43118925, 0.82981894])
When you do :
matrix[mask]
where mask is :
matrix([[0, 0, 2],
[1, 1, 0],
[2, 2, 2]])
It returns you an array whose first element will be :
[matrix[0], matrix[0], matrix[2]],
2nd:
[matrix[1], matrix[1], matrix[0]]
and so on.

How can I create a label encoder utilizing only numpy (and not sklearn LabelEncoder)?

I am trying to recreate something similar to the
sklearn.preprocessing.LabelEncoder
However I do not want to use sklearn or pandas. I would like to only use numpy and the Python standard library. Here's what I would like to achieve:
import numpy as np
input = np.array([['hi', 'there'],
['scott', 'james'],
['hi', 'scott'],
['please', 'there']])
# Output would look like
np.ndarray([[0, 0],
[1, 1],
[0, 2],
[2, 0]])
It would also be great to be able to map it back as well, so a result would then look exactly like the input again.
If this were in a spreadsheet, the input would look like this:
Here's a simple comprehension, using the return_inverse result from np.unique
arr = np.array([['hi', 'there'], ['scott', 'james'],
['hi', 'scott'], ['please', 'there']])
np.column_stack([np.unique(arr[:, i], return_inverse=True)[1] for i in range(arr.shape[1])])
array([[0, 2],
[2, 0],
[0, 1],
[1, 2]], dtype=int64)
Or applying along the axis:
np.column_stack(np.apply_along_axis(np.unique, 0, arr, return_inverse=True)[1])
Was talking to #Scott Stoltzmann and spit balled about a way to reverse the accepted answer.
One can either carry the original arr along with them through out their program or record the mappings for each column. If you do the latter, here's some simple non-performant code to do so:
l = []
for real_column, encoded_column in zip(np.column_stack(arr), np.column_stack(arr2)):
d = {}
for real_element, encoded_element in zip(real_column, encoded_column):
d[encoded_element] = real_element
l.append(d)
print(l)
Doing this with the above yields:
[{0: 'hi', 2: 'scott', 1: 'please'}, {2: 'there', 0: 'james', 1: 'scott'}]
Try this method, which is both beautiful (almost) and optimal:
labels = np.array([['hi', 'there'], ['scott', 'james'],
['hi', 'scott'], ['please', 'there']])
indexes = {val: idx for idx, val in enumerate(np.unique(labels))}
encoded = np.array([indexes[val] for val in labels.flatten()]).reshape(labels.shape)
print(f'Indexes: {indexes}')
print(f'Encoded labels: {encoded}')
The output:
Indexes: {'hi': 0, 'james': 1, 'please': 2, 'scott': 3, 'there': 4}
Encoded labels: [[0 4]
[3 1]
[0 3]
[2 4]]
Enjoy the labels encoder ;)

Categories