One hot encoding using sklearn preprocessing Label Binarizer

One hot encoding using sklearn preprocessing Label Binarizer - python

I am trying to use sklearn.preprocessing.LabelBinarizer() to create a one hot encoding of only a two-column labels, i.e. I only want to categorize two set of objects. In this case, when I use fit(range(0,2)), it just returns a one dimensional array, instead of 2x1. This is fine, but when I want to use them in Tensorflow, the shape should really be (2,1) for dimensional consistency. Please advise how I can resolve it.
Here is the code:
from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
lb.fit(range(0, 3))
Calling lb.transform([1, 0]), the result is:
[[0 1 0]
[1 0 0]]
whereas when we change 3 to 2, i.e. lb.fit(range(0, 2)), the result would be
[[1]
[0]]
instead of
[[0 1]
[1 0]]
This will create problems in the algorithms that work consistently with arrays with n dimensions. Is it any way to resolve this issue?

labelBinarizer()'s purpose according to the documentation is
Binarize labels in a one-vs-all fashion
Several regression and binary classification algorithms are available in scikit-learn.
A simple way to extend these algorithms to the multi-class classification case is to use > the so-called one-vs-all scheme.
If your data has only two types of labels, then you can directly feed that to binary classifier. Hence, one column is good enough to capture two classes in One-Vs-Rest fashion.
Binary targets transform to a column vector
>>> lb = preprocessing.LabelBinarizer()
>>> lb.fit_transform(['yes', 'no', 'no', 'yes'])
array([[1],
[0],
[0],
[1]])
If your intention is just creating one-hot encoding, use the following method.
from sklearn.preprocessing import OneHotEncoder
>>> enc = OneHotEncoder()
>>> enc.fit_transform([['yes'], ['no'], ['no'], ['yes']]).toarray()
array([[0., 1.],
[1., 0.],
[1., 0.],
[0., 1.]])
Hope this clarifies, your question of why Sklearn labelBinarizer() does not convert the 2 class data into two column output.

As already said as a comment, this is not an issue of the method. According to the documentation: Binary targets transform to a column vector. You can build the array you want from the colomn vector result, in the case the dimension is 2.
A direct and simple way to do this is:
from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
lb.fit(range(2) # range(0, 2) is the same as range(2)
a = lb.transform([1, 0])
result_2d = np.array([[item[0], 0 if item[0] else 1] for item in a])

Related

Slicing the output of multilabel predict_proba

I am using scikit-learn MultiOutputClassifier and create a multi-label output for 7 distinct classes using:
multilabel_model.predict_proba(X_test)
which gives me an array with shape (7, 14545, 2) that contains both probabilities for the class being 0 and being 1:
[array([[9.7169727e-01, 2.8302711e-02],
[9.9807453e-01, 1.9254771e-03],
[9.9955606e-01, 4.4392250e-04],
...,
[9.9957782e-01, 4.2216384e-04],
[9.9833119e-01, 1.6688267e-03],
[9.9959826e-01, 4.0173010e-04]], dtype=float32),
array([[9.7968739e-01, 2.0312620e-02],
[9.9961036e-01, 3.8966016e-04],
[9.9990100e-01, 9.8974662e-05],
...,
Now I am looking for a way to slice the array such that the output only contains the probability for each of the 7 possible classes to equal 1 i.e. an output that would looks like this:
[[0.3,0.45,0.2,0.1,0.1,0.45,0.2],
[0.1,0.45,0.2,0.3,0.45,0.2,0.1],
...]
Is there a way of using some slicing magic to achieve this or does this require a sophisticated custom function?

To just extract the probabilities for 1 (2nd position) use:
probas = multilabel_model.predict_proba(X_test)
# probas.shape == (7, 14545, 2)
one_probas = probas[:, :, 1].reshape((probas.shape[1], probas.shape[0]))
# one_probas.shape == (14545, 7)

How to convert deep learning gradient descent equation into python - axis=0

I think I'm following the same online tutorial as what is mentioned in the post:
How to convert deep learning gradient descent equation into python
I understand we have to calculate the cost and db but my question is why do they put axis=0 in both equations? In other words, I do not understand the axis=0, what is it used for in this calculation. What would be the result if you do the calculation without axis=0
import numpy as np
cost = -1*((np.sum(np.dot(Y,np.log(A))+np.dot((1-Y),(np.log(1-A))),axis=0))/m)
db = np.sum((A-Y),axis=0)/m

This is an example of a type of question that you could have tried out in the interpreter yourself to get an understanding of it in the same or less amount of time you probably took to compose this question.
Another way is to look at documentation. It is always a good habit to consult documentation here. Documentation on np.sum() can be found here
Some excerpts from the documentation, if you still feel lazy:
...
axis : None or int or tuple of ints, optional
Axis or axes along which a sum is performed. The default, axis=None,
will sum all of the elements of the input array. If axis is negative it
counts from the last to the first axis.
...
Some examples from the documentation:
>>> np.sum([0.5, 1.5])
2.0
>>> np.sum([[0, 1], [0, 5]])
6
>>> np.sum([[0, 1], [0, 5]], axis=0)
array([0, 6])
>>> np.sum([[0, 1], [0, 5]], axis=1)
array([1, 5])
Visualization
-----> axis = 1
| [[0, 1
| [0, 5]]
v
axis = 0

Just for clarity: in many deep learning frameworks, all parameters are treated as tensors, and so scalars are simply treated as 0-th order tensors (size 1x1). If you do a np.sum(), you flatten the tensor and sum-up all components to produce a scalar (not a tensor). By explicitly using axis=1, you create a 0-th order tensor (in your case). I don't know if this is required by the code you linked in your question, but I can imagine that this plays a role in some deep learning frameworks.
Here is a quick example that illustrates my point:
import numpy as np
x = np.ones((1, 10))
no_ax = np.sum(x)
ax0 = np.sum(x, axis=0)
ax1 = np.sum(x, axis=1)
print(no_ax, ax0, ax1)
Result:
(10.0, array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]), array([10.]))

What type of normalization happens with sklearn

I have a matrix which I'm trying to normalize by transforming each feature column to zero mean and unit standard deviation.
I have the following code that I'm using, but I want to know if that method actually does what I'm trying to or if it uses a different method.
from sklearn import preprocessing
mat_normalized = preprocessing.normalize(mat_from_df)

sklearn.preprocessing.normalize scales each sample vector to unit norm. (The default axis is 1, not 0.) Here's proof of that:
from sklearn.preprocessing import normalize
np.random.seed(444)
data = np.random.normal(loc=5, scale=2, size=(15, 2))
np.linalg.norm(normalize(data), axis=1)
# array([ 1., 1., 1., 1., 1., 1., ...
It sounds like you're looking for sklearn.preprocessing.scale to scale each feature vector to ~N(0, 1).
from sklearn.preprocessing import scale
# Are the scaled column-wise means approx. 0.?
np.allclose(scale(data).mean(axis=0), 0.)
# True
# Are the scaled column-wise stdevs. approx. 1.?
np.allclose(scale(data).std(axis=0), 1.)
# True

Like the documentation states:
sklearn.preprocessing.normalize(X, norm='l2',
axis=1, copy=True,
return_norm=False)
Scale input vectors individually to unit norm (vector length).
So it takes the norm (by default the L2 norm) and then ensures that the vector is unit.
So if we take as input an n×m-matrix, the output is an n×m-matrix. Every m-vector is normalized. For norm='l2' (the default), thus this means that the length is calculated (by the square root of the sum of the square of the components), and every element is divided by that length, such that the result is a vector with length 1.

pybrain - ClassificationDataSet - how to understand the output when using SoftmaxLayer

I am trying to build first classifier with Pybrain neural network along with specialized ClassificationDataSet and I am not sure I fully understand it works.
So I have a pandas dataframe with 6 feature columns and 1 column for class label (Survived, just 0 or 1).
I build a dataset out of it:
ds = ClassificationDataSet(6, 1, nb_classes=2)
for i in df[['Gender', 'Pclass', 'AgeFill', 'FamilySize', 'FarePerPerson', 'Deck','Survived']].values:
ds.addSample(tuple(i[:-1]), i[-1])
ds._convertToOneOfMany()
return ds
Ok, I check how dataset looks like:
for i, m in ds:
i, m
(array([ 1., 3., 2., 2., 1., 8.]), array([1, 0]))
(array([ 0., 1., 1., 2., 0., 2.]), array([0, 1]))
And I already have a problem. What means [1,0] or [0,1]? Is it just '0' or '1' of original 'survived' column? How to get back to original values?
Later, when I finish with training of my network:
net = buildNetwork(6, 6, 2, hiddenclass=TanhLayer, bias=True, outclass=SoftmaxLayer)
trainer = BackpropTrainer(net, ds)
trainer.trainEpochs(10)
I will try to activate it on my another dataset (for which I want to do actual classification) and I will get a pairs of activation results for each of 2 output neurons, but how to understand which output neuron corresponds to which original class? Probably this is something obvious, but I am not able to understand it from the docs, unfortunately.

Ok, looks like pybrain uses position to determine which class it means by (0,1) or (1,0).
To go back to original 0 or 1 mark you need to use argmax() function. So for example if I already have a trained network and I want to validate it on the same data as I used for training I could do this:
for inProp, num in ds:
out = net.activate(inProp).argmax()
if out == num.argmax():
true+=1
total+=1
res = true/total
inProp will look like a tuple of my input values for activation, num - a tuple of expected two-neuron output (either (0,1) or (1,0)) and num.argmax() will translate it into just 0 or 1 - real output.
I might be wrong since this is a pure heuristic, but it works in my example.

How to run and interpret Fisher's Linear Discriminant Analysis from scikit-learn

I am trying to run a Fisher's LDA (1, 2) to reduce the number of features of matrix.
Basically, correct if I am wrong, given n samples classified in several classes, Fisher's LDA tries to find an axis that projecting thereon should maximize the value J(w), which is the ratio of total sample variance to the sum of variances within separate classes.
I think this can be used to find the most useful features for each class.
I have a matrix X of m features and n samples (m rows, n columns).
I have a sample classification y, i.e. an array of n labels, each one for each sample.
Basing on y I want to reduce the number of features to, for example, 3 most representative features.
Using scikit-learn I tried in this way (following this documentation):
>>> import numpy as np
>>> from sklearn.lda import LDA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = LDA(n_components=3)
>>> clf.fit_transform(X, y)
array([[ 4.],
[ 4.],
[ 8.],
[-4.],
[-4.],
[-8.]])
At this point I am a bit confused, how to obtain the most representative features?

The features you are looking for are in clf.coef_ after you have fitted the classifier.
Note that n_components=3 doesn't make sense here, since X.shape[1] == 2, i.e. your feature space only has two dimensions.
You do not need to invoke fit_transform in order to obtain coef_, calling clf.fit(X, y) will suffice.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

One hot encoding using sklearn preprocessing Label Binarizer - python

Related

Slicing the output of multilabel predict_proba

How to convert deep learning gradient descent equation into python - axis=0

What type of normalization happens with sklearn

pybrain - ClassificationDataSet - how to understand the output when using SoftmaxLayer

How to run and interpret Fisher's Linear Discriminant Analysis from scikit-learn

Categories

Resources