compare two arrays to make an accuracy of KNN prediction

compare two arrays to make an accuracy of KNN prediction - python

I have two arrays from which I have to find the accuracy of my prediction.
predictions = [1, 0, 0, 1, 1, 1, 0, 1, 1, 0]
y_test = [1, 0, 0, 1, 0, 1, 0, 1, 1, 1]
so in this case, the accuracy is = (8/10)*100 = 80%
I have written a method to do this task. Here is my code, but I dont get the accuracy of 80% in this case.
def getAccuracy(y_test, predictions):
correct = 0
for x in range(len(y_test)):
if y_test[x] is predictions[x]:
correct += 1
return (correct/len(y_test)) * 100.0
Thanks for helping me.

You're code should work, if the numbers in the arrays are in a specific range that are not recreated by the python interpreter. This is because you used is which is an identity check and not an equality check. So, you are checking memory addresses, which are only equal for a specific range of numbers. So, use == instead and it will always work.
For a more Pythonic solution you can also take a look at list comprehensions:
assert len(predictions) == len(y_test), "Unequal arrays"
identity = sum([p == y for p, y in zip(predictions, y_test)]) / len(predictions) * 100

if you want to take 80.0 as result for your example, It's doing that.

Your code gives 80.0 as you wanted, however you should use == instead of is, see the reason.
def getAccuracy(y_test, predictions):
n = len(y_test)
correct = 0
for x in range(n):
if y_test[x] == predictions[x]:
correct += 1
return (correct/n) * 100.0
predictions = [1, 0, 0, 1, 1, 1, 0, 1, 1, 0]
y_test = [1, 0, 0, 1, 0, 1, 0, 1, 1, 1]
print(getAccuracy(y_test, predictions))
80.0
Here's an implementation using Numpy:
import numpy as np
n = len(y_test)
100*np.sum(np.isclose(predictions, y_test))/n
or if you convert your lists to numpy arrays, then
100*np.sum(predictions == y_test)/n

Related

How to get the vector representation of a polynomial in GF(2)[x]?

I tried to multiply two polynomials g(x), h(x) ∈ GF(2)[x]. And I got the result as c(x). I would like to get the vector representation of c(x).
Here is the code I am sharing.
import galois
GF = galois.GF(2)
g = galois.Poly([1, 0, 1, 1], field=GF)
h = galois.Poly([1, 1, 0], field=GF)
c = g * h
print(c)
The output I am getting is
x^5 + x^4 + x^3 + x
but I would like to get output in vector form, i.e.,
[1, 1, 1, 0, 1, 0]
Any suggestion about how I can get the answer in vector form?
I tried the using calling the function
GF("c").vector()
but it is giving me the wrong answer.

Use c.coeffs. This gives:
GF([1, 1, 1, 0, 1, 0], order=2)
This form may be satisfactory for whatever you are trying to do.
If not, you can (among other things) turn it into a normal Python list[int] with [int(i) for i in c.coeffs], yielding:
[1, 1, 1, 0, 1, 0]

Confusion_matrix ValueError: Classification metrics can't handle a mix of binary and continuous-multioutput targets

can anyone fix undersampling confusion matrix error from the line:
undersample_cm = confusion_matrix(original_ytest, undersample_fraud_predictions)
I think the problem is from the import or original_ytest and undersample_fraud_predictions
undersample_cm = confusion_matrix(original_ytest, undersample_fraud_predictions)
actual_cm = confusion_matrix(original_ytest, original_ytest)
labels = ['No Fraud', 'Fraud']
The picture of the error here

confusion_matrix works with two arrays of same size, same type of data.
y_true = [1, 0, 0, 1, 0, 1]
y_pred = [0, 0, 0, 0, 0, 1]
confusion_matrix(y_true, y_pred)
I guess there is a continuous variable in any of your arrays. For this, if you run the actual_cm line first, you can see which array is the problem. One of your arrays, namely original_ytest, contains integer values as it should, but you're probably getting an error because it contains continuous values in the undersample_fraud_predictions array.
original_ytest = [1, 0, 0, 1, 0, 1]
undersample_fraud_predictions = [1, 0, 0, 0, 0, 1.5]
confusion_matrix(original_ytest , undersample_fraud_predictions )
When you run the code I want to explain above, the error you get will be the same (Classification metrics can't handle a mix of binary and continuous targets).

Simple Mutation with a Probability

As per the section of code below, I am trying to implement an automatic and random mutation process.
data = [0,1,0,0,0,0,0,1,0,0,1,1,0,0,1]
data[random.randint(0,len(data)-1)]=random.randrange(0,1)
print(data)
The code is an adaptation of some other posts I have found, although it is randomly mutating a value every time with either a 0 or 1. I require this to occur with only a certain probability (such as a 0.05 chance of mutation) rather than always being guaranteed.
Additionally, often a 0 is being replaced with a 0 and therefore there is no change to the output, so I would like to limit it in a way that a 0 will only mutate to a 1 and a 1 mutates to a 0.
I would really appreciate the assistance in resolving these two issues.

Resume
mutate any value with a choosen probability
randomly choose the position
when the position is choosen, switch between 0 and 1
def mutate(data, proba=0.05):
if random.random() < proba:
data[random.randrange(len(data))] ^= 1
if __name__ == '__main__':
data = [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1]
for i in range(10):
mutate(data)
print(data)

import random
def changeData(data):
seed = random.randint(0,1000)
# probability of 0.05 (50 / 1000)
if seed <= 50:
indexToChange = random.randint(0,len(data)-1)
# change 0 with 1 and viceversa
data[indexToChange] = 1 if data[indexToChange] == 0 else 0
if __name__== '__main__':
data = [0,1,0,0,0,0,0,1,0,0,1,1,0,0,1]
for i in range(0,100):
changeData(data)
print(data)

You can do as following:
For each element in data, mutate it (1 - val) only if a random value generated by random() function is less than the defined mutation probability.
For example:
import random
mutation_prob = 0.05
data = [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1]
mutated_data = [1 - x if random.random() < mutation_prob else x for x in data]
If the mutation should be decided regarding the data as a whole, you can do:
mutation_prob = 0.05
data = [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1]
do_mutation = random.random() < mutation_prob
mutated_data = [1 - x if do_mutation else x for x in data]

Roulette Wheel Selection for non-ordered fitness values

I need to have a fitness proportionate selection approach to a GA, however my population cant loose the structure (order), in this case while generating the probabilities, I believe the individuals get the wrong weights, the program is:
population=[[[0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1], [6], [0]],
[[0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1], [4], [1]],
[[0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0], [6], [2]],
[[1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0], [4], [3]]]
popultion_d={'0,0,1,0,1,1,0,1,1,1,1,0,0,0,0,1': 6,
'0,0,1,1,1,0,0,1,1,0,1,1,0,0,0,1': 4,
'0,1,1,0,1,1,0,0,1,1,1,0,0,1,0,0': 6,
'1,0,0,1,1,1,0,0,1,1,0,1,1,0,0,0': 4}
def ProbabilityList(population_d):
fitness = population_d.values()
total_fit = (sum(fitness))
relative_fitness = [f/total_fit for f in fitness]
probabilities = [sum(relative_fitness[:i+1]) for i in range(len(relative_fitness))]
return (probabilities)
def FitnessProportionateSelection(population, probabilities, number):
chosen = []
for n in range(number):
r = random.random()
for (i, individual) in enumerate(population):
if r <= probabilities[i]:
chosen.append(list(individual))
break
return chosen
number=2
The population element is: [[individual],[fitness],[counter]]
The probabilities function output is: [0.42857142857142855, 0.5714285714285714, 0.8571428571428571, 1.0]
What I notice here is that the previous weight is summed up to the next one, not necessarily being in crescent order, so a think a higher weight is given to the cromosome with a lowest fitness.
I dont want to order it because I need to index the lists by position later, so I think I will have wrong matches.
Anyone knows a possible solution, package or different approach to perform a weighted the selection in this case?
p.s: I know the dictionary may be redundant here, but I had several other problems using the list itself.
Edit: I tried to use random.choices() as you can see below (using relative fitness):
def FitnessChoices(population, probabilities, number):
return random.choices(population, probabilities, number)
But I get this error: TypeError: choices() takes from 2 to 3 positional arguments but 4 were given
Thank you!

Using random.choices is certainly a good idea. You just need to understand the function call. You have to specify, whether your probabilities are marginal or cumulated. So you could use either
import random
def ProbabilityList(population_d):
fitness = population_d.values()
total_fit = sum(fitness)
relative_fitness = [f/total_fit for f in fitness]
return relative_fitness
def FitnessChoices(population, relative_fitness, number):
return random.choices(population, weights = relative_fitness, k = number)
or
import random
def ProbabilityList(population_d):
fitness = population_d.values()
total_fit = sum(fitness)
relative_fitness = [f/total_fit for f in fitness]
cum_probs = [sum(relative_fitness[:i+1]) for i in range(len(relative_fitness))]
return cum_probs
def FitnessChoices(population, cum_probs, number):
return random.choices(population, cum_weights = cum_probs, k = number)
I'd recommend you to have a look at the differences between keyword and positional arguments in python.

How to produce different random results with DEAP?

I am using the DEAP library to maximize a metric, and I noticed that whenever I restart the algorithm (which is supposed to create a random list of binary values - 1s and 0s) it is producing the same initial values.
I became suspicious and copied their basic DEAP example here and re-ran the algorithms again:
import array, random
from deap import creator, base, tools, algorithms
creator.create("FitnessMax", base.Fitness, weights=(1.0,))
creator.create("Individual", array.array, typecode='b', fitness=creator.FitnessMax)
toolbox = base.Toolbox()
toolbox.register("attr_bool", random.randint, 0, 1)
toolbox.register("individual", tools.initRepeat, creator.Individual, toolbox.attr_bool, 10)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
def evalOneMax(individual):
return sum(individual),
toolbox.register("evaluate", evalOneMax)
toolbox.register("mate", tools.cxTwoPoints)
toolbox.register("mutate", tools.mutFlipBit, indpb=0.05)
toolbox.register("select", tools.selTournament, tournsize=3)
population = toolbox.population(n=10)
NGEN=40
for gen in range(NGEN):
offspring = algorithms.varAnd(population, toolbox, cxpb=0.5, mutpb=0.1)
fits = toolbox.map(toolbox.evaluate, offspring)
for fit, ind in zip(fits, offspring):
ind.fitness.values = fit
population = offspring
The code above is exactly their example, but with the population and individual size reduced to 10. I ran the algorithm 5 times and it produced exact copies of each other. I also added a print statement to get the below output:
>python testGA.py
[1, 0, 1, 0, 1, 0, 1, 1, 1, 1]
Starting the Evolution Algorithm...
Evaluating Individual: [0, 1, 0, 1, 0, 1, 1, 1, 1, 0]
Evaluating Individual: [1, 1, 0, 1, 0, 1, 0, 1, 0, 0]
Evaluating Individual: [0, 0, 1, 0, 0, 1, 1, 0, 0, 1]
Evaluating Individual: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Evaluating Individual: [0, 1, 1, 0, 1, 0, 1, 1, 0, 1]
Evaluating Individual: [1, 0, 1, 1, 1, 0, 0, 1, 0, 0]
Evaluating Individual: [0, 1, 0, 0, 0, 1, 0, 0, 0, 1]
Evaluating Individual: [1, 1, 0, 1, 0, 1, 0, 1, 1, 1]
Evaluating Individual: [1, 1, 1, 1, 0, 0, 1, 0, 0, 0]
Evaluating Individual: [0, 0, 1, 1, 1, 1, 0, 1, 1, 1]
This output is generated every time I call the function - In that order. They are exactly identical.
I have read that I shouldn't have to seed the random.randint function, and I tested it by writing a basic script that just prints out a list of 10 random ints ranged 0 to 1. This workd fine, it just seems to produce the same values when I feed it through DEAP.
Is this normal? How can I ensure that, when I run the algorithm, I get different 'individuals' every time?
EDIT:
Sorry for the late reply, here is the full source I am using:
import random, sys
from deap import creator, base, tools
class Max():
def __init__(self):
creator.create("FitnessMax", base.Fitness, weights=(1.0,))
creator.create("Individual", list, fitness=creator.FitnessMax)
INDIVIDUAL_SIZE = 10
self.toolbox = base.Toolbox()
self.toolbox.register("attr_bool", random.randint, 0, 1)
self.toolbox.register("individual", tools.initRepeat, creator.Individual, self.toolbox.attr_bool, n=INDIVIDUAL_SIZE)
self.toolbox.register("population", tools.initRepeat, list, self.toolbox.individual)
self.toolbox.register("mate", tools.cxTwoPoints)
self.toolbox.register("mutate", tools.mutFlipBit, indpb=0.05)
self.toolbox.register("select", tools.selTournament, tournsize=3)
self.toolbox.register("evaluate", self.evaluate)
print self.main()
def evaluate(self, individual):
# Some debug code
print 'Evaluating Individual: ' + str(individual)
return sum(individual),
def main(self):
CXPB, MUTPB, NGEN = 0.5, 0.2, 40
random.seed(64)
pop = self.toolbox.population(n=10)
print "Starting the Evolution Algorithm..."
fitnesses = list(map(self.toolbox.evaluate, pop))
for ind, fit in zip(pop, fitnesses):
ind.fitness.values = fit
# ----------------------------------------------------------
# Killing the program here - just want to see the population created
sys.exit()
print "Evaluated %i individuals" % (len(pop))
for g in range(NGEN):
print "-- Generation %i --" % (g)
# Select the next genereation individuals
offspring = self.toolbox.select(pop, len(pop))
# Clone the selected individuals
offspring = list(map(self.toolbox.clone, offspring))
# Apply crossover and mutation on the offspring
for child1, child2 in zip(offspring[::2], offspring[1::2]):
if random.random() < CXPB:
self.toolbox.mate(child1, child2)
del child1.fitness.values
del child2.fitness.values
for mutant in offspring:
if random.random() < MUTPB:
self.toolbox.mutate(mutant)
del mutant.fitness.values
# Evaluate the individuals with an invalid fitness
invalid_ind = [ind for ind in offspring if not ind.fitness.valid]
fitnesses = map(self.toolbox.evaluate, invalid_ind)
for ind, fit in zip(invalid_ind, fitnesses):
ind.fitness.values = fit
print "\tEvaluated %i individuals" % (len(pop))
pop[:] = offspring
fits = [ind.fitness.values[0] for ind in pop]
length = len(pop)
mean = sum(fits) / length
sum2 = sum(x*x for x in fits)
std = abs(sum2 / length - mean**2)**0.5
print "\tMin %s" % (min(fits))
print "\tMax %s" % (max(fits))
print "\tAvg %s" % (mean)
print "\tStd %s" % (std)
class R_Test:
def __init__(self):
print str([random.randint(0, 1) for i in range(10)])
if __name__ == '__main__':
#rt = R_Test()
mx = Max()
The R_Test class is there to test the random generation in Python. I read here that the seed is dynamically called if not given in Python, and I wanted to test this.
How I have been executing the above code has been as such:
> python testGA.py
... the 10 outputs
> python testGA.py
... the exact same outputs
> python testGA.py
... the exact same outputs
> python testGA.py
... the exact same outputs
> python testGA.py
... the exact same outputs
Obviously 5 times isn't exactly a strenuous test, but the fact that all 10 values are the same 5 times in a row raised a red flag.

The problem is that you specify a seed for the random number generator in your main function. Simply comment the line : random.seed(64) and you will get different results every time you execute your program.
In DEAP example files, a specific seed is set because we also use these examples as integration tests. If after a modification in the framework base code, the output of an example is different, we want to know. It also allow us to bench the time required by each example and provide a ballpark estimate to our users. The results of these benchmarks are available online at http://deap.gel.ulaval.ca/speed/.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

compare two arrays to make an accuracy of KNN prediction - python

if you want to take 80.0 as result for your example, It's doing that.

Related

How to get the vector representation of a polynomial in GF(2)[x]?

Confusion_matrix ValueError: Classification metrics can't handle a mix of binary and continuous-multioutput targets

Simple Mutation with a Probability

Roulette Wheel Selection for non-ordered fitness values

How to produce different random results with DEAP?

Categories

Resources