I am using tf.keras.utils.to_categorical() for data preparation.
I have this very simple list and I want to get the categorical values out of it.
So I do this:
tf.keras.utils.to_categorical([1,2,3], num_classes=6)
and I get:
array([[0., 1., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0.],
[0., 0., 0., 1., 0., 0.]], dtype=float32)
Now for further usage, I reduce the values I sent to the function by 1, to get a amount of 6 classes, without 0 as placeholder:
tf.keras.utils.to_categorical([x -1 for x in [1,2,3]], num_classes=6)
which results in this:
array([[1., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0.]], dtype=float32)
Now comes the weird part. I want to set certain features to 0 and thats why I found this behaviour:
tf.keras.utils.to_categorical([x -1 for x in [-4,2,3]], num_classes=6)
results in:
array([[0., 1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0.]], dtype=float32)
So to_categorical() is mixing -4 and 2 into the same class, which I find pretty weird. I would have expected an exception as the list was not map-able to 6 classes. But I did not expect this to happen. Is this a bug or a feature, why is this happening?
Thanks!
That's completely normal. It just works consistently with Python's negative indexing. See:
import tensorflow as tf
tf.keras.utils.to_categorical([0, 1, 2, -1, -2, -3])
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[1., 0., 0.]], dtype=float32)
To put it differently:
import tensorflow as tf
a = tf.keras.utils.to_categorical([0, 1, 2], num_classes=3)
b = tf.keras.utils.to_categorical([-3, -2, -1], num_classes=3)
print(a)
print(b)
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
If you want to know why this happened, I think to_categorical in keras doesn't work with negative numbers. but if you want to solve it I suggest to make all numbers greater than 0.
this code do that:
arr=numpy.array([-5,-4,-2,-1,0,1,2,3,4]) #anything
arr+=(0-arr.min())
Keras to_categorical doesn't work for negative numbers. It's clearly written that the numbers must start from 0.
https://keras.io/api/utils/python_utils/#to_categorical-function
If you still need to make it work, make a dictionary to map the negative numbers.
Related
I'm using the following code to generate an array based on coordinates of edges:
verts = np.array(list(itertools.product((0,2), (0,2))))
arr = np.zeros((5, 5))
arr[tuple(verts.T)] = 1
plt.imshow(arr)
which gives me
or, as a numeric array:
[[1., 0., 1., 0., 0.],
[0., 0., 0., 0., 0.],
[1., 0., 1., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]]
Now, I would like to fill out the spaces in between the corners (ie. yellow squares):
so that I get the following array:
[[1., 1., 1., 0., 0.],
[1., 1., 1., 0., 0.],
[1., 1., 1., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]]
Replace (0,2) using range(0,3) (3 as ranges are inclusive-exclusive) that is
import itertools
import numpy as np
verts = np.array(list(itertools.product(range(0,3), range(0,3))))
arr = np.zeros((5, 5))
arr[tuple(verts.T)] = 1
print(arr)
output
[[1. 1. 1. 0. 0.]
[1. 1. 1. 0. 0.]
[1. 1. 1. 0. 0.]
[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]]
I have a matrix of what is effectively counters. I would like to increment those counters based on a list of column indices - with each positional index also corresponding to a row increment.
This is straightforward with a for loop, and a little less straightforward with list comprehension. In either case, iteration is involved. But I was wondering if there is any way to vectorise this problem?
The minimal problem is:
counters = np.zeros((4,4))
counters
array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
update_columns = [1,0,2,2]
for row, col in zip(range(len(update_columns)), update_columns):
counters[row, col] += 1
counters
array([[0., 1., 0., 0.],
[1., 0., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 1., 0.]])
What you are looking for is called advanced numpy indexing. You can pass the row index using np.arange and column index using update_columns:
update_columns = np.array(update_columns)
counters[np.arange(update_columns.size), update_columns] += 1
output:
[[0. 1. 0. 0.]
[1. 0. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 1. 0.]]
pop=np.zeros((population_size,chromosome_length))
for i in range(population_size):
for j in range(i,chromosome_length):
pop[i,j] = random.randint(0, 1)
pop
array([[0., 1., 0., 1., 1., 1., 0., 0., 1., 1.],
[0., 0., 1., 0., 1., 0., 1., 1., 0., 0.],
[0., 0., 1., 0., 0., 1., 1., 0., 0., 1.],
[0., 0., 0., 0., 1., 0., 1., 1., 1., 0.],
[0., 0., 0., 0., 1., 0., 1., 1., 0., 1.],
[0., 0., 0., 0., 0., 0., 0., 1., 0., 0.]])
I have another array expected, generated from un-shown code, with an example below:
array([[1.99214608],
[1.45140389],
[0.07068525],
[0.69507167],
[1.08384057],
[0.70685254]])
I then want to bin the values of expected based on custom intervals:
actual=np.zeros((population_size,1))
for i in range(len(expected)):
if expected[i]>=1.5:
actual[i]=2
elif 1.5>expected[i]>=0.9:
actual[i]=1
else:
actual[i]=0
actual=actual.astype(int)
total_count=int(np.sum(actual))
print(total_count)
[[2]
[1]
[0]
[0]
[1]
[0]]
4
and I want the final output as:
array([[0., 1., 0., 1., 1., 1., 0., 0., 1., 1.],
[0., 1., 0., 1., 1., 1., 0., 0., 1., 1.],
[0., 0., 1., 0., 1., 0., 1., 1., 0., 0.],
[0., 0., 0., 0., 1., 0., 1., 1., 0., 1.]])
based on the values in total_count. The first row of pop got copied twice, the second row once and the fifth row once. In short, what I want is repeat/copy/duplicate elements of an array based on the integer element of another array.
I'll try address this question in sections as you are using NumPy arrays as though they are lists, and therefore losing a lot of the purpose of the library in the first place. Although the syntax is much more compact, it comes with significant speed increases.
Creating the population
This one is simple enough. We can make a direct replacement for generating pop by using numpy.random.randint. We need to specify values for population_size and chromosome length and use those to specify the output size.
population_size = 6
chromosome_length = 10
pop = np.random.randint(0, 2, (population_size, chromosome_length))
NOTE: This won't give the exact same values as you've included in your actual question because we haven't set a seed for the random number generator. However, the code is directly equivalent to your for loop but more performant.
Generating expected
I can't make an exact replacement for this section because it's too much to replace your loops, with some variables also being undefined. So, I'm just assuming that I'll get the same 2D array as you have shown:
expected = np.array([[1.99214608],
[1.45140389],
[0.07068525],
[0.69507167],
[1.08384057],
[0.70685254]])
Binning the data
This is a bit more complex. We can make use of numpy.digitize to bin the data between your intervals (0, 0.9 and 1.5). However, this method will not work with 2D arrays so I'm going to use numpy.ravel() to flatten the array first.
This is going to give back a list of bin identities that each value of expected belongs to. However, bin identities start at 1, and we want to use these values as indicies of an array further on, so I'm also going to subtract 1 from the result at the same time.
bins = np.array([0, 0.9, 1.5])
dig = np.digitize(expected.ravel(), bins) - 1
Last Steps
I'm going to create an array of values that correspond to the bin categories. We can then use numpy.take to replace the values of dig with the corresponding replacement values.
replacements = np.array([0, 1, 2])
actual = np.take(replacements, dig)
And finally :), we can use numpy.repeat using actual to take rows from pop in the correct proportions to build the output.
Final Code
import numpy as np
population_size = 6
chromosome_length = 10
pop = np.random.randint(0, 2, (population_size, chromosome_length))
# But I'm going to deliberately overwrite the above to solve your particular case
pop = np.array([[0., 1., 0., 1., 1., 1., 0., 0., 1., 1.],
[0., 0., 1., 0., 1., 0., 1., 1., 0., 0.],
[0., 0., 1., 0., 0., 1., 1., 0., 0., 1.],
[0., 0., 0., 0., 1., 0., 1., 1., 1., 0.],
[0., 0., 0., 0., 1., 0., 1., 1., 0., 1.],
[0., 0., 0., 0., 0., 0., 0., 1., 0., 0.]])
# Hard-coded :/
expected = np.array([[1.99214608],
[1.45140389],
[0.07068525],
[0.69507167],
[1.08384057],
[0.70685254]])
bins = np.array([0, 0.9, 1.5])
dig = np.digitize(expected.ravel(), bins) - 1
replacements = np.array([0, 1, 2])
actual = np.take(replacements, dig)
out = np.repeat(pop, actual, axis=0)
print(out)
Gives:
[[0. 1. 0. 1. 1. 1. 0. 0. 1. 1.]
[0. 1. 0. 1. 1. 1. 0. 0. 1. 1.]
[0. 0. 1. 0. 1. 0. 1. 1. 0. 0.]
[0. 0. 0. 0. 1. 0. 1. 1. 0. 1.]]
There is a function in Keras to generate a binary matrix for an array of labels:
# Consider an array of 5 labels out of a set of 3 classes {0, 1, 2}:
> labels
array([0, 2, 1, 2, 0])
# `to_categorical` converts this into a matrix with as many
# columns as there are classes. The number of rows
# stays the same.
> to_categorical(labels)
array([[ 1., 0., 0.],
[ 0., 0., 1.],
[ 0., 1., 0.],
[ 0., 0., 1.],
[ 1., 0., 0.]], dtype=float32)
I need the above functionality, but having -1 instead of zeros. I didn't find any option or other functions to do it. Is there any easy way to do that?
You could do the following:
import numpy as np
arr = np.array([[1., 0., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[1., 0., 0.]])
arr[np.isclose(arr, 0)] = -1
print(arr)
Output
[[ 1. -1. -1.]
[-1. -1. 1.]
[-1. 1. -1.]
[-1. -1. 1.]
[ 1. -1. -1.]]
Just rescale your data:
2*to_categorical(labels)-1
I'm trying to make an array of one-hot vector of integers into an array of one-hot vector that keras will be able to use to fit my model. Here's the relevant part of the code:
Y_train = np.hstack(np.asarray(dataframe.output_vector)).reshape(len(dataframe),len(output_cols))
dummy_y = np_utils.to_categorical(Y_train)
Below is an image showing what Y_train and dummy_y actually are.
I couldn't find any documentation for to_categorical that could help me.
Thanks in advance.
np_utils.to_categorical is used to convert array of labeled data(from 0 to nb_classes - 1) to one-hot vector.
The official doc with an example.
In [1]: from keras.utils import np_utils # from keras import utils as np_utils
Using Theano backend.
In [2]: np_utils.to_categorical?
Signature: np_utils.to_categorical(y, num_classes=None)
Docstring:
Convert class vector (integers from 0 to nb_classes) to binary class matrix, for use with categorical_crossentropy.
# Arguments
y: class vector to be converted into a matrix
nb_classes: total number of classes
# Returns
A binary matrix representation of the input.
File: /usr/local/lib/python3.5/dist-packages/keras/utils/np_utils.py
Type: function
In [3]: y_train = [1, 0, 3, 4, 5, 0, 2, 1]
In [4]: """ Assuming the labeled dataset has total six classes (0 to 5), y_train is the true label array """
In [5]: np_utils.to_categorical(y_train, num_classes=6)
Out[5]:
array([[ 0., 1., 0., 0., 0., 0.],
[ 1., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0., 0.],
[ 0., 1., 0., 0., 0., 0.]])
from keras.utils.np_utils import to_categorical
UPDATED --- keras.utils.np_utils doesn't work in newer versions; if so use:
from tensorflow.keras.utils import to_categorical
In both cases
to_categorical(0, max_value_of_array)
It assumes the class values were in string and you will be label encoding them, hence starting everytime from 0 to n-classes.
for the same example:- consider an array of {1,2,3,4,2}
The output will be [zero value, one value, two value, three value, four value]
array([[ 0., 1., 0., 0., 0.],
[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 1., 0., 0.]],
Let's look at another example:-
Again, for an array having 3 classes, Y = {4, 8, 9, 4, 9}
to_categorical(Y) will output
array([[0., 0., 0., 0., 1., 0., 0., 0., 0., 0. ],
[0., 0., 0., 0., 0., 0., 0., 0., 1., 0. ],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1. ],
[0., 0., 0., 0., 1., 0., 0., 0., 0., 0. ],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1. ]]