Why is this Linear Classifier algorithm wrong?

Why is this Linear Classifier algorithm wrong? - python

I specify 'n' amount of points. Label them +1 or -1. I store all this in a dictionary that looks like: {'point1' : [(0.565,-0.676), +1], ... }. I am trying to find a line that separates them - i.e. points labeled +1 above the line, those -1 below the line. Can anyone help?
I'm trying to apply w = w + y(r) as the "learning algorithm", w is the weight vector y is +1 or -1, r is the point
The code runs but the separating line is not precise - it doesn't separate correctly. Also, as I increase the number of points to separate, the line gets less efficient.
If you run the code, the green line is supposed to be the separating line. The closer it get to the slope of the blue line (the perfect line by definition), the better.
from matplotlib import pyplot as plt
import numpy as np
import random
n = 4
x_values = [round(random.uniform(-1,1),3) for _ in range(n)]
y_values = [round(random.uniform(-1,1),3) for _ in range(n)]
pts10 = zip(x_values, y_values)
label_dict = {}
x1, y1, x2, y2 = (round(random.uniform(-1,1),3) for _ in range(4))
b = [x1, y1]
d = [x2, y2]
slope, intercept = np.polyfit(b, d, 1)
fig, ax = plt.subplots(figsize=(8,8))
ax.scatter(*zip(*pts10), color = 'black')
ax.plot(b,d,'b-')
label_plus = '+'
label_minus = '--'
i = 1
for x,y in pts10:
if(y > (slope*x + intercept)):
ax.annotate(label_plus, xy=(x,y), xytext=(0, -10), textcoords='offset points', color = 'blue', ha='center', va='center')
label_dict['point{}'.format(i)] = [(x,y), "+1"]
else:
ax.annotate(label_minus, xy=(x,y), xytext=(0, -10), textcoords='offset points', color = 'red', ha='center', va='center')
label_dict['point{}'.format(i)] = [(x,y), "-1"]
i += 1
# this is the algorithm
def check(ww,rr):
while(np.dot(ww,rr) >= 0):
print "being refined 1"
ww = np.subtract(ww,rr)
return ww
def check_two(ww,rr):
while(np.dot(ww,rr) < 0):
print "being refined 2"
ww = np.add(ww,rr)
return ww
w = np.array([0,0])
ii = 1
for x,y in pts10:
r = np.array([x,y])
print w
if (np.dot(w,r) >= 0) != int(label_dict['point{}'.format(ii)][1]) < 0:
print "Point " + str(ii) + " should have been below the line"
w = np.subtract(w,r)
w = check(w,r)
elif (np.dot(w,r) < 0) != int(label_dict['point{}'.format(ii)][1]) >= 0:
print "Point " + str(ii) + " should have been above the line"
w = np.add(w,r)
w = check_two(w,r)
else:
print "Point " + str(ii) + " is in the correct position"
ii += 1
ax.plot(w,'g--')
ax.set_xlabel('X-axis')
ax.set_ylabel('Y-axis')
ax.set_title('Labelling 10 points')
ax.set_xticks(np.arange(-1, 1.1, 0.2))
ax.set_yticks(np.arange(-1, 1.1, 0.2))
ax.set_xlim(-1, 1)
ax.set_ylim(-1, 1)
ax.legend()

You can for example use the SGDClassifier from scikit-learn (sklearn). The linear classifiers compute predictions as follows (see the source code):
def predict(self, X):
scores = self.decision_function(X)
if len(scores.shape) == 1:
indices = (scores > 0).astype(np.int)
else:
indices = scores.argmax(axis=1)
return self.classes_[indices]
where the decision_function is given by:
def decision_function(self, X):
[...]
scores = safe_sparse_dot(X, self.coef_.T,
dense_output=True) + self.intercept_
return scores.ravel() if scores.shape[1] == 1 else scores
So for the two-dimensional case of your example this means that a data point is classified +1 if
x*w1 + y*w2 + i > 0
where
x, y = X
w1, w2 = self.coef_
i = self.intercept_
and -1 otherwise. So the decision depends on x*w1 + y*w2 + i being greater than or less than (or equal to) zero. Thus the "border" is found by setting x*w1 + y*w2 + i == 0. We are free to choose one of the components and the other one is determined by this equation.
The following snippet fits a SGDClassifier and plots the resulting "border". It assumes that the data points are scattered around the origin (x, y = 0, 0), i.e. that their mean is (approximately) zero. Actually, in order to obtain good results, one should first subtract the mean from the data points, then perform the fit and then add the mean back to the result. The following snippet just scatters the points around the origin.
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import SGDClassifier
n = 100
x = np.random.uniform(-1, 1, size=(n, 2))
# We assume points are scatter around zero.
b = np.zeros(2)
d = np.random.uniform(-1, 1, size=2)
slope, intercept = (d[1] / d[0]), 0.
fig, ax = plt.subplots(figsize=(8,8))
ax.scatter(x[:, 0], x[:, 1], color = 'black')
ax.plot([b[0], d[0]], [b[1], d[1]], 'b-', label='Ideal')
labels = []
for point in x:
if(point[1] > (slope * point[0] + intercept)):
ax.annotate('+', xy=point, xytext=(0, -10), textcoords='offset points', color = 'blue', ha='center', va='center')
labels.append(1)
else:
ax.annotate('--', xy=point, xytext=(0, -10), textcoords='offset points', color = 'red', ha='center', va='center')
labels.append(-1)
labels = np.array(labels)
classifier = SGDClassifier()
classifier.fit(x, labels)
x1 = np.random.uniform(-1, 1)
x2 = (-classifier.intercept_ - x1 * classifier.coef_[0, 0]) / classifier.coef_[0, 1]
ax.plot([0, x1], [0, x2], 'g--', label='Fit')
plt.legend()
plt.show()
This plot shows the result for n = 100 data points:
The following plot shows the results for different n where the points have been chosen randomly from the pool which contains 1000 data points:

This is the answer I've come up with. Some notes I realised:
w = w + y(r) algorithm only works for normalised vectors. 'w' is the weight vector, 'r' is [x,y] of the point in question, 'y' is the sign of the label.
You can find the slope and intercept from the resulting vector 'w' by putting the coefficients in ax+by+c = 0 form and solving for 'y'.
w = np.array([0,0,0])
restart = True
while restart:
ii = 0
restart = False
for x,y in pts10:
if(restart == False):
ii += 1
r = np.array([x,y,1])
if (np.dot(w,r) >= 0) and int(label_dict['point{}'.format(ii)][1]) >= 0:
print "Point " + str(ii) + " is correctly above the line --> no adjustments"
elif (np.dot(w,r) < 0) and int(label_dict['point{}'.format(ii)][1]) < 0:
print "Point " + str(ii) + " is correctly below the line --> no adjustments"
elif (np.dot(w,r) >= 0) and int(label_dict['point{}'.format(ii)][1]) < 0:
print "Point " + str(ii) + " should have been below the line"
w = np.subtract(w,r)
restart = True
break
elif (np.dot(w,r) < 0) and int(label_dict['point{}'.format(ii)][1]) >= 0:
print "Point " + str(ii) + " should have been above the line"
w = np.add(w,r)
restart = True
break
else:
print "THERE IS AN ERROR, A POINT PASSED THROUGH HERE"
print w
slope_w = (-w[0])/w[1]
intercept_w = (-w[2])/w[1]

Related

Is there a way I can plot a figure at initial and final conditions?

The code I am using works and is written correctly, only I want to have plots of the initial and final conditions (time = 0, time = .01)
Whenever run the code to show plot at n=0,10 I get the error "show() got an unexpected keyword argument 'n'."
import numpy
import matplotlib.pyplot as plt
n = 10 #number of timesteps
dt = .001 #(timestep)
L = 1.0 #domain (total length)
dx = 0.1 #spacial resolution
T0 = float(q + 1)
T1s = float(q + 1 - r)
T2s = float(q + 1 + s)
t_final = n*dt
alpha = float(p + 1)
x = np.linspace(0, L, n)
T = np.ones(n)*T0
dTdt = np.empty(n)
t = np.arange(0,t_final, dt)
for j in range(1,len(t)):
plt.clf()
for i in range(1,n-1):
dTdt[i] = alpha*(-(T[i]-T[i-1])/dx**2+(T[i+1]-T[i])/dx**2)
dTdt[0] = alpha*(-(T[0]-T1s)/dx**2+(T[1]-T[0])/dx**2)
dTdt[n-1] = alpha*(-(T[n-1]-T[n-2])/dx**2+(T2s-T[n-1])/dx**2)
T = T + dTdt*dt
plt.figure(1)
plt.plot(x,T)
plt.axis([0, L, 0, 14])
plt.xlabel('Distance')
plt.ylabel('Temperature')
plt.show(n=0)
plt.show(n=10)

Of course, because matplotlib doesn't know what "n" is. I suspect what you want is to replace the last seven lines with:
if j == 0 or j == n-1:
plt.figure(1)
plt.plot(x,T)
plt.axis([0, L, 0, 14])
plt.xlabel('Distance')
plt.ylabel('Temperature')
plt.show()

Perceptron Algorithm plotting with matplotlib

In an ML course, I m taking, I have 100 entries of data, and I'm using it in a Perceptron Algorithm.
What I want is to show a plot like this one.
As you can see above we have the data represented by point in red and blue and the different calculated lines that minimize the error. This is the output that I want.. Here is my Data and my code.
data.csv
0.78051,-0.063669,1
0.28774,0.29139,1
0.40714,0.17878,1
0.2923,0.4217,1
0.50922,0.35256,1
0.27785,0.10802,1
0.27527,0.33223,1
0.43999,0.31245,1
0.33557,0.42984,1
0.23448,0.24986,1
0.0084492,0.13658,1
0.12419,0.33595,1
0.25644,0.42624,1
0.4591,0.40426,1
0.44547,0.45117,1
0.42218,0.20118,1
0.49563,0.21445,1
0.30848,0.24306,1
0.39707,0.44438,1
0.32945,0.39217,1
0.40739,0.40271,1
0.3106,0.50702,1
0.49638,0.45384,1
0.10073,0.32053,1
0.69907,0.37307,1
0.29767,0.69648,1
0.15099,0.57341,1
0.16427,0.27759,1
0.33259,0.055964,1
0.53741,0.28637,1
0.19503,0.36879,1
0.40278,0.035148,1
0.21296,0.55169,1
0.48447,0.56991,1
0.25476,0.34596,1
0.21726,0.28641,1
0.67078,0.46538,1
0.3815,0.4622,1
0.53838,0.32774,1
0.4849,0.26071,1
0.37095,0.38809,1
0.54527,0.63911,1
0.32149,0.12007,1
0.42216,0.61666,1
0.10194,0.060408,1
0.15254,0.2168,1
0.45558,0.43769,1
0.28488,0.52142,1
0.27633,0.21264,1
0.39748,0.31902,1
0.5533,1,0
0.44274,0.59205,0
0.85176,0.6612,0
0.60436,0.86605,0
0.68243,0.48301,0
1,0.76815,0
0.72989,0.8107,0
0.67377,0.77975,0
0.78761,0.58177,0
0.71442,0.7668,0
0.49379,0.54226,0
0.78974,0.74233,0
0.67905,0.60921,0
0.6642,0.72519,0
0.79396,0.56789,0
0.70758,0.76022,0
0.59421,0.61857,0
0.49364,0.56224,0
0.77707,0.35025,0
0.79785,0.76921,0
0.70876,0.96764,0
0.69176,0.60865,0
0.66408,0.92075,0
0.65973,0.66666,0
0.64574,0.56845,0
0.89639,0.7085,0
0.85476,0.63167,0
0.62091,0.80424,0
0.79057,0.56108,0
0.58935,0.71582,0
0.56846,0.7406,0
0.65912,0.71548,0
0.70938,0.74041,0
0.59154,0.62927,0
0.45829,0.4641,0
0.79982,0.74847,0
0.60974,0.54757,0
0.68127,0.86985,0
0.76694,0.64736,0
0.69048,0.83058,0
0.68122,0.96541,0
0.73229,0.64245,0
0.76145,0.60138,0
0.58985,0.86955,0
0.73145,0.74516,0
0.77029,0.7014,0
0.73156,0.71782,0
0.44556,0.57991,0
0.85275,0.85987,0
0.51912,0.62359,0
And now this is my code. The first part
import numpy as np
import pandas as pd
# Setting the random seed, feel free to change it and see different solutions.
np.random.seed(42)
import matplotlib.pyplot as plt
def stepFunction(t):
return 1 if t >= 0 else 0
def prediction(X, W, b):
return stepFunction((np.matmul(X, W) + b)[0])
# TODO: Fill in the code below to implement the perceptron trick.
# INPUTS
# data X, the labels y,
# the weights W (as an array), and the bias b,
# The function weights and bias W, b, according to the perceptron algorithm,
# and return W and b.
def perceptronStep(X, y, W, b, learn_rate=0.01):
for i in range(len(X)):
y_hat = prediction(X[i], W, b)
if y[i] - y_hat == 1:
W[0] += X[i][0] * learn_rate
W[1] += X[i][1] * learn_rate
b += learn_rate
elif y[i] - y_hat == -1:
W[0] -= X[i][0] * learn_rate
W[1] -= X[i][1] * learn_rate
b -= learn_rate
return W, b
# This function runs the perceptron algorithm repeatedly on the dataset,
# and returns a few of the boundary lines obtained in the iterations,
# for plotting purposes.
# Feel free to play with the learning rate and the num_epochs,
# and see your results plotted below.
def trainPerceptronAlgorithm(X, y, learn_rate=0.01, num_epochs=25):
x_min, x_max = min(X.T[0]), max(X.T[0])
y_min, y_max = min(X.T[1]), max(X.T[1])
W = np.array(np.random.rand(2, 1))
b = np.random.rand(1)[0] + x_max
# These are the solution lines that get plotted below.
boundary_lines = []
for i in range(num_epochs):
# In each epoch, we apply the perceptron step.
W, b = perceptronStep(X, y, W, b, learn_rate)
# Here I have a doubt . Why if y = W0*x1 + W1*x2 + b
# So we can get x2 =y/W1 -(W0*x1)/W1 -b/W1 + y/W1)
# If we remove y/W1 we just get intercept and slope
# But why we are not using the last term y/W1
boundary_lines.append((-W[0] / W[1], -b / W[1]))
return boundary_lines
# Get data and plot the points
data = pd.read_csv('data.csv', header = None)
X = data.iloc[:, :2].values
y = data.iloc[:, -1].values
x1 = X[:, 0]
x2 = X[:, 1]
color = ['red' if value == 1 else 'blue' for value in y]
plt.scatter(x1, x2, marker='o', color=color)
plt.xlabel('X1 input feature')
plt.ylabel('X2 input feature')
plt.title('Perceptron regression for X1, X2')
plt.show()
When you run this code you correctly get
So now I want to plot the line in the same plot the lines that represent the best function for each iteration.For that, I commented the last line above plt.show() and did
# So now lets plot the lines that represent the best function for each iteration
boundary_lines = trainPerceptronAlgorithm(X, y)
x_lin = np.linspace(0, 1, 100)
for line in boundary_lines:
Θo, Θ1 = line
Θ1 = Θ1[0]
Θo = Θo[0]
# TODO: The equation of the error function is
# y = W0*x1 + W1*x2 + b
# So we can get x2 =y/W1 -(W0*x1)/W1 -b/W1 + y/W1)
# If we remove y/W1 we just get intercept and slope
# boundary_lines.append((-W[0] / W[1], -b / W[1])
# plt.axes([-0.5, -0.5, 1.5, 1.5])
plt.plot(x_lin, (Θ1 * x_lin / Θo))
plt.draw()
plt.pause(5)
input("Press enter to continue")
plt.close()
But that does not get me the expected result.
Why doesn't this get the expected result?

The mistake is in plt.plot(x_lin, (Θ1 * x_lin / Θo)) where instead of Θ1 * x_lin / Θo you should have Θo * x_lin + Θ1.
fig, ax = plt.subplots(1, 1, figsize=(8,5))
ax.set_xlim(0, 1)
ax.set_ylim(0, 1)
ax.scatter(x1, x2, marker='o', color=color)
for i, line in enumerate(boundary_lines):
Θo, Θ1 = line
if i == len(boundary_lines) - 1:
c, ls, lw = 'k', '-', 2
else:
c, ls, lw = 'g', '--', 1.5
ax.plot(x_lin, Θo * x_lin + Θ1, c=c, ls=ls, lw=lw)
plt.show()
Result:

Set numeric interval limits as input to plotting function in Matplotlib

I am plotting this figure but I would like to play with the intervals.
However, I don't want to have to modify the legend, DataFrame column names and other variables every single time manually. ideally I would send the ranges "<", "<=", ">=" as input arguments. Is this possible in Python?
The code:
def plotHistDistances(pat_name, lesion_id, rootdir, distanceMap, num_voxels, title, ablation_date):
# PLOT THE HISTOGRAM FOR THE MAUERER EUCLIDIAN DISTANCES
lesion_id_str = str(lesion_id)
lesion_id = lesion_id_str.split('.')[0]
figName_hist = 'Pat_' + str(pat_name) + '_Lesion' + str(lesion_id) + '_AblationDate_' + ablation_date + '_histogram'
min_val = int(np.floor(min(distanceMap)))
max_val = int(np.ceil(max(distanceMap)))
fig, ax = plt.subplots(figsize=(18, 16))
col_height, bins, patches = ax.hist(distanceMap, ec='darkgrey', bins=range(min_val - 1, max_val + 1))
voxels_nonablated = []
voxels_insuffablated = []
voxels_ablated = []
for b, p, col_val in zip(bins, patches, col_height):
if b < 0:
voxels_nonablated.append(col_val)
elif 0 <= b <= 5:
voxels_insuffablated.append(col_val)
elif b > 5:
voxels_ablated.append(col_val)
# %%
'''calculate the total percentage of surface for ablated, non-ablated, insufficiently ablated'''
voxels_nonablated = np.asarray(voxels_nonablated)
voxels_insuffablated = np.asarray(voxels_insuffablated)
voxels_ablated = np.asarray(voxels_ablated)
sum_perc_nonablated = ((voxels_nonablated / num_voxels) * 100).sum()
sum_perc_insuffablated = ((voxels_insuffablated / num_voxels) * 100).sum()
sum_perc_ablated = ((voxels_ablated / num_voxels) * 100).sum()
# %%
'''iterate through the bins to change the colors of the patches bases on the range [mm]'''
for b, p, col_val in zip(bins, patches, col_height):
if b < 0:
plt.setp(p, label='Ablation Surface Margin ' + r'$x < 0$' + 'mm :' + " %.2f" % sum_perc_nonablated + '%')
elif 0 <= b <= 5:
plt.setp(p, 'facecolor', 'orange',
label='Ablation Surface Margin ' + r'$0 \leq x \leq 5$' + 'mm: ' + "%.2f" % sum_perc_insuffablated + '%')
elif b > 5:
plt.setp(p, 'facecolor', 'darkgreen',
label='Ablation Surface Margin ' + r'$x > 5$' + 'mm: ' + " %.2f" % sum_perc_ablated + '%')
# %%
'''edit the axes limits and labels'''
plt.xlabel('Euclidean Distances [mm]', fontsize=30, color='black')
plt.tick_params(labelsize=28, color='black')
ax.tick_params(colors='black', labelsize=28)
plt.grid(True)
# TODO: set equal axis limits
ax.set_xlim([-15, 15])
# edit the y-ticks: change to percentage of surface
yticks, locs = plt.yticks()
percent = (yticks / num_voxels) * 100
percentage_surface_rounded = np.round(percent)
yticks_percent = [str(x) + '%' for x in percentage_surface_rounded]
new_yticks = (percentage_surface_rounded * yticks) / percent
new_yticks[0] = 0
plt.yticks(new_yticks, yticks_percent)
# plt.yticks(yticks,yticks_percent)
plt.ylabel('Percentage of tumor surface voxels', fontsize=30, color='black')
handles, labels = plt.gca().get_legend_handles_labels()
by_label = OrderedDict(zip(labels, handles))
plt.legend(by_label.values(), by_label.keys(), fontsize=30, loc='best')
plt.title(title + '. Patient ' + str(pat_name) + '. Lesion ' + str(lesion_id), fontsize=30)
The figure:
So I would like to send the intervals you see in legend as input here:
def plotHistDistances(pat_name, lesion_id, rootdir, distanceMap,
num_voxels, title, ablation_date, interval_limits):

The idea is to parameterize the range element (i.e. 0 and 5 in your sample code) into interval_limits. To do so I have assumed that the parameter interval_limits will be a list of 2 values in the following form: [min_value, max_value] or concretely given your sample, interval_limits should be a list of 0, 5 like the following:
interval_limits = [0, 5]
Based on the assumption, I have modified your code a little bit. Pay attention to the new block where I assign the first element of interval_limits into a new variable min_limit and the 2nd element of interval_limits into another new variable max_limit and then I have modified the label string using the '%.2f format (feel free to change into whatever format you want)
Here's the code:
def plotHistDistances(pat_name, lesion_id, rootdir, distanceMap, num_voxels, title, ablation_date, interval_limits):
##########################################
# NEW COODE SECTION
##########################################
# Check if interval_limits contains all the limits
if len(interval_limits) != 2:
raise ValueError("2 limits are expected, got {} instead.".format(len(interval_limits)))
# Assign the limits
min_limit = interval_limits[0]
max_limit = interval_limits[1]
##########################################
# END OF NEW CODE SECTION
##########################################
# PLOT THE HISTOGRAM FOR THE MAUERER EUCLIDIAN DISTANCES
lesion_id_str = str(lesion_id)
lesion_id = lesion_id_str.split('.')[0]
figName_hist = 'Pat_' + str(pat_name) + '_Lesion' + str(lesion_id) + '_AblationDate_' + ablation_date + '_histogram'
min_val = int(np.floor(min(distanceMap)))
max_val = int(np.ceil(max(distanceMap)))
fig, ax = plt.subplots(figsize=(18, 16))
col_height, bins, patches = ax.hist(distanceMap, ec='darkgrey', bins=range(min_val - 1, max_val + 1))
voxels_nonablated = []
voxels_insuffablated = []
voxels_ablated = []
for b, p, col_val in zip(bins, patches, col_height):
if b < min_limit:
voxels_nonablated.append(col_val)
elif min_limit <= b <= max_limit:
voxels_insuffablated.append(col_val)
elif b > max_limit:
voxels_ablated.append(col_val)
# %%
'''calculate the total percentage of surface for ablated, non-ablated, insufficiently ablated'''
voxels_nonablated = np.asarray(voxels_nonablated)
voxels_insuffablated = np.asarray(voxels_insuffablated)
voxels_ablated = np.asarray(voxels_ablated)
sum_perc_nonablated = ((voxels_nonablated / num_voxels) * 100).sum()
sum_perc_insuffablated = ((voxels_insuffablated / num_voxels) * 100).sum()
sum_perc_ablated = ((voxels_ablated / num_voxels) * 100).sum()
# %%
'''iterate through the bins to change the colors of the patches bases on the range [mm]'''
for b, p, col_val in zip(bins, patches, col_height):
if b < min_limit:
plt.setp(p, label='Ablation Surface Margin ' + r'$x < %.2f$' % min_limit + 'mm :' + " %.2f" % sum_perc_nonablated + '%')
elif min_limit <= b <= max_limit:
plt.setp(p, 'facecolor', 'orange',
label='Ablation Surface Margin ' + r'$%.2f \leq x \leq %.2f$' % (min_limit, max_limit) + 'mm: ' + "%.2f" % sum_perc_insuffablated + '%')
elif b > max_limit:
plt.setp(p, 'facecolor', 'darkgreen',
label='Ablation Surface Margin ' + r'$x > %.2f$' % max_limit + 'mm: ' + " %.2f" % sum_perc_ablated + '%')
# %%
'''edit the axes limits and labels'''
plt.xlabel('Euclidean Distances [mm]', fontsize=30, color='black')
plt.tick_params(labelsize=28, color='black')
ax.tick_params(colors='black', labelsize=28)
plt.grid(True)
# TODO: set equal axis limits
ax.set_xlim([-15, 15])
# edit the y-ticks: change to percentage of surface
yticks, locs = plt.yticks()
percent = (yticks / num_voxels) * 100
percentage_surface_rounded = np.round(percent)
yticks_percent = [str(x) + '%' for x in percentage_surface_rounded]
new_yticks = (percentage_surface_rounded * yticks) / percent
new_yticks[0] = 0
plt.yticks(new_yticks, yticks_percent)
# plt.yticks(yticks,yticks_percent)
plt.ylabel('Percentage of tumor surface voxels', fontsize=30, color='black')
handles, labels = plt.gca().get_legend_handles_labels()
by_label = OrderedDict(zip(labels, handles))
plt.legend(by_label.values(), by_label.keys(), fontsize=30, loc='best')
plt.title(title + '. Patient ' + str(pat_name) + '. Lesion ' + str(lesion_id), fontsize=30)
Disclaimer: I have not tested this code since I do not have the full set of parameter to reproduce the result, but that should work. If it doesn't feel free to provide me the set of parameters you use and I will see how I can rectify the issue

Classification perceptron implementation

I have written Percentron example in Python from here.
Here is the complete code
import matplotlib.pyplot as plt
import random as rnd
import matplotlib.animation as animation
NUM_POINTS = 5
LEANING_RATE=0.1
fig = plt.figure() # an empty figure with no axes
ax1 = fig.add_subplot(1,1,1)
plt.xlim(0, 120)
plt.ylim(0, 120)
points = []
weights = [rnd.uniform(-1,1),rnd.uniform(-1,1),rnd.uniform(-1,1)]
circles = []
plt.plot([x for x in range(100)], [x for x in range(100)])
for i in range(NUM_POINTS):
x = rnd.uniform(1, 100)
y = rnd.uniform(1, 100)
circ = plt.Circle((x, y), radius=1, fill=False, color='g')
ax1.add_patch(circ)
points.append((x,y,1))
circles.append(circ)
def activation(val):
if val >= 0:
return 1
else:
return -1;
def guess(pt):
vsum = 0
#x and y and bias weights
vsum = vsum + pt[0] * weights[0]
vsum = vsum + pt[1] * weights[1]
vsum = vsum + pt[2] * weights[2]
gs = activation(vsum)
return gs;
def animate(i):
for i in range(NUM_POINTS):
pt = points[i]
if pt[0] > pt[1]:
target = 1
else:
target = -1
gs = guess(pt)
error = target - gs
if target == gs:
circles[i].set_color('r')
else:
circles[i].set_color('b')
#adjust weights
weights[0] = weights[0] + (pt[0] * error * LEANING_RATE)
weights[1] = weights[1] + (pt[1] * error * LEANING_RATE)
weights[2] = weights[2] + (pt[2] * error * LEANING_RATE)
ani = animation.FuncAnimation(fig, animate, interval=1000)
plt.show()
I expect the points plotted on graph to classify themselves to red or blue depending on expected condition (x coordinate > y coordinate) i.e. above or below reference line (y=x)
This does not seem to work and all points go red after some iterations.
What am I doing wrong here. Same is working in youtube example.

I looked at your code and the video and I believe the way your code is written, the points start out as green, if their guess matches their target they turn red and if their guess doesn't match the target they turn blue. This repeats with the remaining blue eventually turning red as their guess matches the target. (The changing weights may turn a red to blue but eventually it will be corrected.)
Below is my rework of your code that slows down the process by: adding more points; only processing one point per frame instead of all of them:
import random as rnd
import matplotlib.pyplot as plt
import matplotlib.animation as animation
NUM_POINTS = 100
LEARNING_RATE = 0.1
X, Y = 0, 1
fig = plt.figure() # an empty figure with no axes
ax1 = fig.add_subplot(1, 1, 1)
plt.xlim(0, 120)
plt.ylim(0, 120)
plt.plot([x for x in range(100)], [y for y in range(100)])
weights = [rnd.uniform(-1, 1), rnd.uniform(-1, 1)]
points = []
circles = []
for i in range(NUM_POINTS):
x = rnd.uniform(1, 100)
y = rnd.uniform(1, 100)
points.append((x, y))
circle = plt.Circle((x, y), radius=1, fill=False, color='g')
circles.append(circle)
ax1.add_patch(circle)
def activation(val):
if val >= 0:
return 1
return -1
def guess(point):
vsum = 0
# x and y and bias weights
vsum += point[X] * weights[X]
vsum += point[Y] * weights[Y]
return activation(vsum)
def train(point, error):
# adjust weights
weights[X] += point[X] * error * LEARNING_RATE
weights[Y] += point[Y] * error * LEARNING_RATE
point_index = 0
def animate(frame):
global point_index
point = points[point_index]
if point[X] > point[Y]:
answer = 1 # group A (X > Y)
else:
answer = -1 # group B (Y > X)
guessed = guess(point)
if answer == guessed:
circles[point_index].set_color('r')
else:
circles[point_index].set_color('b')
train(point, answer - guessed)
point_index = (point_index + 1) % NUM_POINTS
ani = animation.FuncAnimation(fig, animate, interval=100)
plt.show()
I tossed the special 0,0 input fix as it doesn't apply for this example.
The bottom line is that if everything is working, they should all turn red. If you want the color to reflect classification, then you can change this clause:
if answer == guessed:
circles[point_index].set_color('r' if answer == 1 else 'b')
else:
circles[point_index].set_color('g')
train(point, answer - guessed)

Drawing directions fields

Is there a way to draw direction fields in python?
My attempt is to modify http://www.compdigitec.com/labs/files/slopefields.py giving
#!/usr/bin/python
import math
from subprocess import CalledProcessError, call, check_call
def dy_dx(x, y):
try:
# declare your dy/dx here:
return x**2-x-2
except ZeroDivisionError:
return 1000.0
# Adjust window parameters
XMIN = -5.0
XMAX = 5.0
YMIN = -10.0
YMAX = 10.0
XSCL = 0.5
YSCL = 0.5
DISTANCE = 0.1
def main():
fileobj = open("data.txt", "w")
for x1 in xrange(int(XMIN / XSCL), int(XMAX / XSCL)):
for y1 in xrange(int(YMIN / YSCL), int(YMAX / YSCL)):
x= float(x1 * XSCL)
y= float(y1 * YSCL)
slope = dy_dx(x,y)
dx = math.sqrt( DISTANCE/( 1+math.pow(slope,2) ) )
dy = slope*dx
fileobj.write(str(x) + " " + str(y) + " " + str(dx) + " " + str(dy) + "\n")
fileobj.close()
try:
check_call(["gnuplot","-e","set terminal png size 800,600 enhanced font \"Arial,12\"; set xrange [" + str(XMIN) + ":" + str(XMAX) + "]; set yrange [" + str(YMIN) + ":" + str(YMAX) + "]; set output 'output.png'; plot 'data.txt' using 1:2:3:4 with vectors"])
except (CalledProcessError, OSError):
print "Error: gnuplot not found on system!"
exit(1)
print "Saved image to output.png"
call(["xdg-open","output.png"])
return 0
if __name__ == '__main__':
main()
However the best image I get from this is.
How can I get an output that looks more like the first image? Also, how can I add the three solid lines?

You can use this matplotlib code as a base. Modify it for your needs.
I have updated the code to show same length arrows. The important option is to set the angles option of the quiver function, so that the arrows are correctly printed from (x,y) to (x+u,y+v) (instead of the default, which just takes into account of (u,v) when computing the angles).
It is also possible to change the axis form "boxes" to "arrows". Let me know if you need that change and I could add it.
import matplotlib.pyplot as plt
from scipy.integrate import odeint
import numpy as np
fig = plt.figure()
def vf(x, t):
dx = np.zeros(2)
dx[0] = 1.0
dx[1] = x[0] ** 2 - x[0] - 2.0
return dx
# Solution curves
t0 = 0.0
tEnd = 10.0
# Vector field
X, Y = np.meshgrid(np.linspace(-5, 5, 20), np.linspace(-10, 10, 20))
U = 1.0
V = X ** 2 - X - 2
# Normalize arrows
N = np.sqrt(U ** 2 + V ** 2)
U = U / N
V = V / N
plt.quiver(X, Y, U, V, angles="xy")
t = np.linspace(t0, tEnd, 100)
for y0 in np.linspace(-5.0, 0.0, 10):
y_initial = [y0, -10.0]
y = odeint(vf, y_initial, t)
plt.plot(y[:, 0], y[:, 1], "-")
plt.xlim([-5, 5])
plt.ylim([-10, 10])
plt.xlabel(r"$x$")
plt.ylabel(r"$y$")

I had a lot of fun making one of these as a hobby project using pygame. I plotted the slope at each pixel, using shades of blue for positive and shades of red for negative. Black is for undefined. This is dy/dx = log(sin(x/y)+cos(y/x)):
You can zoom in & out - here is zoomed in on the middle upper part here:
and also click on a point to graph the line going through that point:
It's just 440 lines of code, so here is the .zip of all the files. I guess I'll excerpt relevant bits here.
The equation itself is input as a valid Python expression in a string, e.g. "log(sin(x/y)+cos(y/x))". This is then compiled. This function here graphs the color field, where self.func.eval() gives the dy/dx at the given point. The code is a bit complicated here because I made it render in stages - first 32x32 blocks, then 16x16, etc. - to make it snappier for the user.
def graphcolorfield(self, sqsizes=[32,16,8,4,2,1]):
su = ScreenUpdater(50)
lastskip = self.xscreensize
quitit = False
for squaresize in sqsizes:
xsquaresize = squaresize
ysquaresize = squaresize
if squaresize == 1:
self.screen.lock()
y = 0
while y <= self.yscreensize:
x = 0
skiprow = y%lastskip == 0
while x <= self.xscreensize:
if skiprow and x%lastskip==0:
x += squaresize
continue
color = (255,255,255)
try:
m = self.func.eval(*self.ct.untranscoord(x, y))
if m >= 0:
if m < 1:
c = 255 * m
color = (0, 0, c)
else:
#c = 255 - 255 * (1.0/m)
#color = (c, c, 255)
c = 255 - 255 * (1.0/m)
color = (c/2.0, c/2.0, 255)
else:
pm = -m
if pm < 1:
c = 255 * pm
color = (c, 0, 0)
else:
c = 255 - 255 * (1.0/pm)
color = (255, c/2.0, c/2.0)
except:
color = (0, 0, 0)
if squaresize > 1:
self.screen.fill(color, (x, y, squaresize, squaresize))
else:
self.screen.set_at((x, y), color)
if su.update():
quitit = True
break
x += xsquaresize
if quitit:
break
y += ysquaresize
if squaresize == 1:
self.screen.unlock()
lastskip = squaresize
if quitit:
break
This is the code which graphs a line through a point:
def _grapheqhelp(self, sx, sy, stepsize, numsteps, color):
x = sx
y = sy
i = 0
pygame.draw.line(self.screen, color, (x, y), (x, y), 2)
while i < numsteps:
lastx = x
lasty = y
try:
m = self.func.eval(x, y)
except:
return
x += stepsize
y = y + m * stepsize
screenx1, screeny1 = self.ct.transcoord(lastx, lasty)
screenx2, screeny2 = self.ct.transcoord(x, y)
#print "(%f, %f)-(%f, %f)" % (screenx1, screeny1, screenx2, screeny2)
try:
pygame.draw.line(self.screen, color,
(screenx1, screeny1),
(screenx2, screeny2), 2)
except:
return
i += 1
stx, sty = self.ct.transcoord(sx, sy)
pygame.draw.circle(self.screen, color, (int(stx), int(sty)), 3, 0)
And it runs backwards & forwards starting from that point:
def graphequation(self, sx, sy, stepsize=.01, color=(255, 255, 127)):
"""Graph the differential equation, given the starting point sx and sy, for length
length using stepsize stepsize."""
numstepsf = (self.xrange[1] - sx) / stepsize
numstepsb = (sx - self.xrange[0]) / stepsize
self._grapheqhelp(sx, sy, stepsize, numstepsf, color)
self._grapheqhelp(sx, sy, -stepsize, numstepsb, color)
I never got around to drawing actual lines because the pixel approach looked too cool.

Try changing your values for the parameters to this:
XSCL = .2
YSCL = .2
These parameters determine how many points are sampled on the axes.
As per your comment, you'll need to also plot the functions for which the derivation dy_dx(x, y) applies.
Currently, you're only calculating and plotting the slope lines as calculated by your function dy_dx(x,y). You'll need to find (in this case 3) functions to plot in addition to the slope.
Start by defining a function:
def f1_x(x):
return x**3-x**2-2x;
and then, in your loop, you'll have to also write the desired values for the functions into the fileobj file.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why is this Linear Classifier algorithm wrong? - python

Related

Is there a way I can plot a figure at initial and final conditions?

Perceptron Algorithm plotting with matplotlib

Set numeric interval limits as input to plotting function in Matplotlib

Classification perceptron implementation

Drawing directions fields

Categories

Resources