Is there a sample python code for 2-class SVM classification using the custom kernel or sigmoid kernel?
The code below uses 3-class classification. How to modify this to a 2-class SVM?
http://scikit-learn.org/stable/auto_examples/svm/plot_custom_kernel.html
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features. We could
# avoid this ugly slicing by using a two-dim dataset
Y = iris.target
def my_kernel(X, Y):
"""
We create a custom kernel:
(2 0)
k(X, Y) = X ( ) Y.T
(0 1)
"""
M = np.array([[2, 0], [0, 1.0]])
return np.dot(np.dot(X, M), Y.T)
h = .02 # step size in the mesh
# we create an instance of SVM and fit out data.
clf = svm.SVC(kernel=my_kernel)
clf.fit(X, Y)
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=plt.cm.Paired, edgecolors='k')
plt.title('3-Class classification using Support Vector Machine with custom'
' kernel')
plt.axis('tight')
plt.show()
You have 3 classes, since your data is categorized into 3 classes. You can manipulate the target vector to only contain two classes.
Y = iris.target
for index, value in enumerate(Y):
Y[index] = value % 2
But this is a bit of an ugly hack, since it effectively merges every class > 1 into class 1.
Related
Let's consider data following :
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features.
y = iris.target
I want to create logistic regression on that data set and after that create plot which shows classification area. So I used :
model = LogisticRegression(solver='liblinear', random_state=0)
est=model.fit(X, y)
plt.scatter(X[:, 0], X[:, 1], c=est.predict(X))
plt.show()
But how can make it look like the one below ?
Edit
I created plot below, but I still don't know how to change specific ones to squares, x'is and create a legend. Do you know maybe how it can be done ? I know I have to do something with marker='s' and marker='x' but it changes look for all image and I only want to change specific classifications.
print(__doc__)
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features.
Y = iris.target
logreg = LogisticRegression(C=1e5)
# Create an instance of Logistic Regression Classifier and fit the data.
logreg.fit(X, Y)
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
h = .02 # step size in the mesh
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = logreg.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure(1, figsize=(4, 3))
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=Y, edgecolors='k', cmap=plt.cm.Paired)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(())
plt.show()
How do I color the decision boundaries for a k-Nearest Neighbor classifier as seen here:
I've got the data for the 3 classes successfully plotted out using scatter (left picture).
Image source: http://cs231n.github.io/classification/
To plot Desicion boundaries you need to make a meshgrid. You can use np.meshgrid to do this. np.meshgrid requires min and max values of X and Y and a meshstep size parameter. It is sometimes prudent to make the minimal values a bit lower then the minimal value of x and y and the max value a bit higher.
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
You then feed your classifier your meshgrid like so Z=clf.predict(np.c_[xx.ravel(), yy.ravel()]) You need to reshape the output of this to be the same format as your original meshgrid Z = Z.reshape(xx.shape). Finally when you are making your plot you need to call plt.pcolormesh(xx, yy, Z, cmap=cmap_light) this will make the dicision boundaries visible in your plot.
Below is a complete example to achieve this found at http://scikit-learn.org/stable/auto_examples/neighbors/plot_classification.html#sphx-glr-auto-examples-neighbors-plot-classification-py.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn import neighbors, datasets
n_neighbors = 15
# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features. We could
# avoid this ugly slicing by using a two-dim dataset
y = iris.target
h = .02 # step size in the mesh
# Create color maps
cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])
for weights in ['uniform', 'distance']:
# we create an instance of Neighbours Classifier and fit the data.
clf = neighbors.KNeighborsClassifier(n_neighbors, weights=weights)
clf.fit(X, y)
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title("3-Class classification (k = %i, weights = '%s')"
% (n_neighbors, weights))
plt.show()
This results in the following two graphs to be outputted
X = iris.data[:, :2] # we only take the first two features. We could
# avoid this ugly slicing by using a two-dim dataset
If i take this X as 3-dim dataset what would be the change in the following code:
for weights in ['uniform', 'distance']:
# we create an instance of Neighbours Classifier and fit the data.
clf = neighbors.KNeighborsClassifier(n_neighbors, weights=weights)
clf.fit(X, y)
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title("3-Class classification (k = %i, weights = '%s')"
% (n_neighbors, weights))
plt.show()
I'm doing a small plot with some training data before testing it in python. My plot looks like this right now. The training data comes from images of letters in Fourier space which I have masked to produce different values for the letters.
These boundaries look less than ideal to me and I'm not sure how to go about fixing them so that red and blue points have there own distinct areas. Here is the code I am using:
X = np.matrix(X)
#knn = KNeighborsClassifier(n_neighbors=3)
y = [0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2]
h = 0.2 # step size in the mesh
# Create color maps
cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])
n_neighbors = 10
for weights in ['uniform', 'distance']:
# we create an instance of Neighbours Classifier and fit the data.
clf = KNeighborsClassifier(n_neighbors, weights=weights)
clf.fit(X, y)
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - 30, X[:, 0].max() + 20
y_min, y_max = X[:, 1].min() - 5, X[:, 1].max() + 5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title("3-Class classification (k = %i, weights = '%s')"
% (n_neighbors, weights))
plt.show()
Is it a case of finding different data points to classify my data with? Or is there a way of changing how these decision boundaries are formed? Any help would be appreciated, please let me know if I'm being too vague on something. Thanks in advance!
I am working with Python/Pandas and I'm trying to plot a decision boundary. According to [1] and [2], this is considered best practice:
# X - some data in 2dimensional np.array
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
# here "model" is your model's prediction (classification) function
Z = model(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=pl.cm.Paired)
plt.axis('off')
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=pl.cm.Paired)
My question concerns this line: Z = model(np.c_[xx.ravel(), yy.ravel()])
My model function currently accepts a feature Series as input (representing a single datapoint). I need to modify it so that it can accept a mesh grid and return a numpy array with predictions for all points on the grid. How can I do this efficiently? I am not familiar with NumPy data types and I'm not good at vectorizing operations.
I am trying to learn sklearn and for this, I am trying a simple exercise with a linear SVM. The SVC tries to predict the number of bedrooms in a house, based on the value of the house and its area. I have managed to get something that looks ok, but the template I took from matplotlib's documentation uses a color map and I don't know exactly what corresponds to what.
How could I add a legend that specifies what the color of each scattered point corresponds to, and what the SVM's sections correspond to as well?
Also, in order to make the same work, I had to preprocess.scale my features, and the ticks now have the preprocessed value ;( How could I unscale somehow or retrieve the original values to use for the graduation.
Here is the plot:
http://i.imgur.com/ERFuEmJ.png (I don't have enough reputation to post directly)
And here is my code:
style.use('ggplot')
dataset = pd.read_csv('/Path/Paros.csv')
dataset = dataset[dataset['size']<3000]
X = np.array(dataset[['size', 'value']])
y = np.array(dataset[['bedrooms']])
X = preprocessing.scale(X)
h = 0.01 # step size in the mesh
C = 0.01 # SVM regularization parameter
clf = svm.SVC(kernel='linear', C=C).fit(X, y[:,0])
# create a mesh to plot in
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
print "mesh"
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
plt.xlabel('Size')
plt.ylabel('Price')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.show()
plt.colorbar() did what I was looking for.