Separate svm classes for matplotlib legend - python

I am trying to learn sklearn and for this, I am trying a simple exercise with a linear SVM. The SVC tries to predict the number of bedrooms in a house, based on the value of the house and its area. I have managed to get something that looks ok, but the template I took from matplotlib's documentation uses a color map and I don't know exactly what corresponds to what.
How could I add a legend that specifies what the color of each scattered point corresponds to, and what the SVM's sections correspond to as well?
Also, in order to make the same work, I had to preprocess.scale my features, and the ticks now have the preprocessed value ;( How could I unscale somehow or retrieve the original values to use for the graduation.
Here is the plot:
http://i.imgur.com/ERFuEmJ.png (I don't have enough reputation to post directly)
And here is my code:
style.use('ggplot')
dataset = pd.read_csv('/Path/Paros.csv')
dataset = dataset[dataset['size']<3000]
X = np.array(dataset[['size', 'value']])
y = np.array(dataset[['bedrooms']])
X = preprocessing.scale(X)
h = 0.01 # step size in the mesh
C = 0.01 # SVM regularization parameter
clf = svm.SVC(kernel='linear', C=C).fit(X, y[:,0])
# create a mesh to plot in
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
print "mesh"
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
plt.xlabel('Size')
plt.ylabel('Price')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.show()

plt.colorbar() did what I was looking for.

Related

How to convert clustering simple plot to region coloring plots?

I have this plot for my clusterin, and it is my code to plot this:
y_pred = KMeans(n_clusters=4,
random_state=random_state).fit_predict(X_train_pca)
plt.scatter(X_train_pca[:, 0], X_train_pca[:, 1], c=y_pred)
plt.title("Unevenly Sized Blobs")
plt.show()
But I want to change it as that background has the color. I mean, it shows regions something like this
I really appreciate any help that you can provide.
Based on scikit-learn's demo for plotting kmeans decision boundaries:
Modify your current code to store the KMeans model so we can use the fitted model later to color the decision surface. Here it's stored as lowercase kmeans.
kmeans = KMeans(n_clusters=4, random_state=random_state)
y_pred = kmeans.fit_predict(X_train_pca)
plt.scatter(X_train_pca[:, 0], X_train_pca[:, 1], c=y_pred, cmap='Dark2')
Construct an underlying meshgrid from X_train_pca and use kmeans.predict to label the entire mesh. Then use imshow to plot this mesh in color.
# step size of mesh (decrease h to increase plot quality)
h = 0.02
# construct mesh
x_min, x_max = X_train_pca[:, 0].min() - 1, X_train_pca[:, 0].max() + 1
y_min, y_max = X_train_pca[:, 1].min() - 1, X_train_pca[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
# obtain labels per mesh point (reuse stored model)
Z = kmeans.predict(np.c_[xx.ravel(), yy.ravel()])
# put result into color plot
Z = Z.reshape(xx.shape)
plt.imshow(
Z, interpolation='nearest', cmap='Set2', alpha=0.75,
extent=(xx.min(), xx.max(), yy.min(), yy.max()),
aspect='auto', origin='lower',
)
I don't have your original data, but this is the output from some random training data:

2-Class Support Vector Machine using Custom Kernel

Is there a sample python code for 2-class SVM classification using the custom kernel or sigmoid kernel?
The code below uses 3-class classification. How to modify this to a 2-class SVM?
http://scikit-learn.org/stable/auto_examples/svm/plot_custom_kernel.html
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features. We could
# avoid this ugly slicing by using a two-dim dataset
Y = iris.target
def my_kernel(X, Y):
"""
We create a custom kernel:
(2 0)
k(X, Y) = X ( ) Y.T
(0 1)
"""
M = np.array([[2, 0], [0, 1.0]])
return np.dot(np.dot(X, M), Y.T)
h = .02 # step size in the mesh
# we create an instance of SVM and fit out data.
clf = svm.SVC(kernel=my_kernel)
clf.fit(X, Y)
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=plt.cm.Paired, edgecolors='k')
plt.title('3-Class classification using Support Vector Machine with custom'
' kernel')
plt.axis('tight')
plt.show()
You have 3 classes, since your data is categorized into 3 classes. You can manipulate the target vector to only contain two classes.
Y = iris.target
for index, value in enumerate(Y):
Y[index] = value % 2
But this is a bit of an ugly hack, since it effectively merges every class > 1 into class 1.

Graph k-NN decision boundaries in Matplotlib

How do I color the decision boundaries for a k-Nearest Neighbor classifier as seen here:
I've got the data for the 3 classes successfully plotted out using scatter (left picture).
Image source: http://cs231n.github.io/classification/
To plot Desicion boundaries you need to make a meshgrid. You can use np.meshgrid to do this. np.meshgrid requires min and max values of X and Y and a meshstep size parameter. It is sometimes prudent to make the minimal values a bit lower then the minimal value of x and y and the max value a bit higher.
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
You then feed your classifier your meshgrid like so Z=clf.predict(np.c_[xx.ravel(), yy.ravel()]) You need to reshape the output of this to be the same format as your original meshgrid Z = Z.reshape(xx.shape). Finally when you are making your plot you need to call plt.pcolormesh(xx, yy, Z, cmap=cmap_light) this will make the dicision boundaries visible in your plot.
Below is a complete example to achieve this found at http://scikit-learn.org/stable/auto_examples/neighbors/plot_classification.html#sphx-glr-auto-examples-neighbors-plot-classification-py.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn import neighbors, datasets
n_neighbors = 15
# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features. We could
# avoid this ugly slicing by using a two-dim dataset
y = iris.target
h = .02 # step size in the mesh
# Create color maps
cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])
for weights in ['uniform', 'distance']:
# we create an instance of Neighbours Classifier and fit the data.
clf = neighbors.KNeighborsClassifier(n_neighbors, weights=weights)
clf.fit(X, y)
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title("3-Class classification (k = %i, weights = '%s')"
% (n_neighbors, weights))
plt.show()
This results in the following two graphs to be outputted
X = iris.data[:, :2] # we only take the first two features. We could
# avoid this ugly slicing by using a two-dim dataset
If i take this X as 3-dim dataset what would be the change in the following code:
for weights in ['uniform', 'distance']:
# we create an instance of Neighbours Classifier and fit the data.
clf = neighbors.KNeighborsClassifier(n_neighbors, weights=weights)
clf.fit(X, y)
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title("3-Class classification (k = %i, weights = '%s')"
% (n_neighbors, weights))
plt.show()

Trying to clean up boundaries on nearest neighbour plot

I'm doing a small plot with some training data before testing it in python. My plot looks like this right now. The training data comes from images of letters in Fourier space which I have masked to produce different values for the letters.
These boundaries look less than ideal to me and I'm not sure how to go about fixing them so that red and blue points have there own distinct areas. Here is the code I am using:
X = np.matrix(X)
#knn = KNeighborsClassifier(n_neighbors=3)
y = [0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2]
h = 0.2 # step size in the mesh
# Create color maps
cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])
n_neighbors = 10
for weights in ['uniform', 'distance']:
# we create an instance of Neighbours Classifier and fit the data.
clf = KNeighborsClassifier(n_neighbors, weights=weights)
clf.fit(X, y)
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - 30, X[:, 0].max() + 20
y_min, y_max = X[:, 1].min() - 5, X[:, 1].max() + 5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title("3-Class classification (k = %i, weights = '%s')"
% (n_neighbors, weights))
plt.show()
Is it a case of finding different data points to classify my data with? Or is there a way of changing how these decision boundaries are formed? Any help would be appreciated, please let me know if I'm being too vague on something. Thanks in advance!

Python/Pandas how to fill a mesh grid when plotting a decision boundary

I am working with Python/Pandas and I'm trying to plot a decision boundary. According to [1] and [2], this is considered best practice:
# X - some data in 2dimensional np.array
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
# here "model" is your model's prediction (classification) function
Z = model(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=pl.cm.Paired)
plt.axis('off')
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=pl.cm.Paired)
My question concerns this line: Z = model(np.c_[xx.ravel(), yy.ravel()])
My model function currently accepts a feature Series as input (representing a single datapoint). I need to modify it so that it can accept a mesh grid and return a numpy array with predictions for all points on the grid. How can I do this efficiently? I am not familiar with NumPy data types and I'm not good at vectorizing operations.

Categories