I need to have a fitness proportionate selection approach to a GA, however my population cant loose the structure (order), in this case while generating the probabilities, I believe the individuals get the wrong weights, the program is:
population=[[[0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1], [6], [0]],
[[0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1], [4], [1]],
[[0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0], [6], [2]],
[[1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0], [4], [3]]]
popultion_d={'0,0,1,0,1,1,0,1,1,1,1,0,0,0,0,1': 6,
'0,0,1,1,1,0,0,1,1,0,1,1,0,0,0,1': 4,
'0,1,1,0,1,1,0,0,1,1,1,0,0,1,0,0': 6,
'1,0,0,1,1,1,0,0,1,1,0,1,1,0,0,0': 4}
def ProbabilityList(population_d):
fitness = population_d.values()
total_fit = (sum(fitness))
relative_fitness = [f/total_fit for f in fitness]
probabilities = [sum(relative_fitness[:i+1]) for i in range(len(relative_fitness))]
return (probabilities)
def FitnessProportionateSelection(population, probabilities, number):
chosen = []
for n in range(number):
r = random.random()
for (i, individual) in enumerate(population):
if r <= probabilities[i]:
chosen.append(list(individual))
break
return chosen
number=2
The population element is: [[individual],[fitness],[counter]]
The probabilities function output is: [0.42857142857142855, 0.5714285714285714, 0.8571428571428571, 1.0]
What I notice here is that the previous weight is summed up to the next one, not necessarily being in crescent order, so a think a higher weight is given to the cromosome with a lowest fitness.
I dont want to order it because I need to index the lists by position later, so I think I will have wrong matches.
Anyone knows a possible solution, package or different approach to perform a weighted the selection in this case?
p.s: I know the dictionary may be redundant here, but I had several other problems using the list itself.
Edit: I tried to use random.choices() as you can see below (using relative fitness):
def FitnessChoices(population, probabilities, number):
return random.choices(population, probabilities, number)
But I get this error: TypeError: choices() takes from 2 to 3 positional arguments but 4 were given
Thank you!
Using random.choices is certainly a good idea. You just need to understand the function call. You have to specify, whether your probabilities are marginal or cumulated. So you could use either
import random
def ProbabilityList(population_d):
fitness = population_d.values()
total_fit = sum(fitness)
relative_fitness = [f/total_fit for f in fitness]
return relative_fitness
def FitnessChoices(population, relative_fitness, number):
return random.choices(population, weights = relative_fitness, k = number)
or
import random
def ProbabilityList(population_d):
fitness = population_d.values()
total_fit = sum(fitness)
relative_fitness = [f/total_fit for f in fitness]
cum_probs = [sum(relative_fitness[:i+1]) for i in range(len(relative_fitness))]
return cum_probs
def FitnessChoices(population, cum_probs, number):
return random.choices(population, cum_weights = cum_probs, k = number)
I'd recommend you to have a look at the differences between keyword and positional arguments in python.
Related
I tried to multiply two polynomials g(x), h(x) ∈ GF(2)[x]. And I got the result as c(x). I would like to get the vector representation of c(x).
Here is the code I am sharing.
import galois
GF = galois.GF(2)
g = galois.Poly([1, 0, 1, 1], field=GF)
h = galois.Poly([1, 1, 0], field=GF)
c = g * h
print(c)
The output I am getting is
x^5 + x^4 + x^3 + x
but I would like to get output in vector form, i.e.,
[1, 1, 1, 0, 1, 0]
Any suggestion about how I can get the answer in vector form?
I tried the using calling the function
GF("c").vector()
but it is giving me the wrong answer.
Use c.coeffs. This gives:
GF([1, 1, 1, 0, 1, 0], order=2)
This form may be satisfactory for whatever you are trying to do.
If not, you can (among other things) turn it into a normal Python list[int] with [int(i) for i in c.coeffs], yielding:
[1, 1, 1, 0, 1, 0]
I am trying to implement a simple mapping to a set of values from an array created with numpy of 2-D.
For each row in the array I need to choose the correct value corresponding with the set of values and add it to a array.
For example:
[0, 1, 0, 0] -> 3
...
[1, 0, 1, 0] -> 2
But, my first implementation made me wonder if I'm doing something really wrong or not efficient at all because of the size of my dataset, so I did this workaround without using for loops and optimize speed execution using dictionary lookup.
import numpy as np
# function to perform the search and return the index accordingly (it is supposed to be fast because of data structure)
def get_val(n):
map_list = {0: [0, 1, 0], 1: [0, 1, 0], 2: [1, 0, 0], 3: [0, 0, 1]}
map_vals = list(map_list.values())
index = map_vals.index(list(n))
return(index)
# set of arbitrary arrays
li = np.array([[0, 1, 0], [0, 0, 1]])
# here is the performance improvement attempt with the help of the function above
arr = [get_val(n) for n in li]
print(arr)
I'm not completely sure if this is the correct way to do it for getting the needed value for a set like this. If there is a better way, please let me know.
Otherwise, I refer to my main question:
what is the best way possible to optimize the code?
Thanks so much for your help.
You can try use matrix multiplication (dot product):
a=np.array([[0, 0, 0],[0, 1, 0], [1, 0, 0], [0, 0, 1]]) # dict values
c=np.array([0,1,2,3]) # dict keys
li = np.array([[0, 1, 0], [0, 0, 1]])
b=np.linalg.pinv(a)#c # decoding table
result=li#b
print(result)
I have two arrays from which I have to find the accuracy of my prediction.
predictions = [1, 0, 0, 1, 1, 1, 0, 1, 1, 0]
y_test = [1, 0, 0, 1, 0, 1, 0, 1, 1, 1]
so in this case, the accuracy is = (8/10)*100 = 80%
I have written a method to do this task. Here is my code, but I dont get the accuracy of 80% in this case.
def getAccuracy(y_test, predictions):
correct = 0
for x in range(len(y_test)):
if y_test[x] is predictions[x]:
correct += 1
return (correct/len(y_test)) * 100.0
Thanks for helping me.
You're code should work, if the numbers in the arrays are in a specific range that are not recreated by the python interpreter. This is because you used is which is an identity check and not an equality check. So, you are checking memory addresses, which are only equal for a specific range of numbers. So, use == instead and it will always work.
For a more Pythonic solution you can also take a look at list comprehensions:
assert len(predictions) == len(y_test), "Unequal arrays"
identity = sum([p == y for p, y in zip(predictions, y_test)]) / len(predictions) * 100
if you want to take 80.0 as result for your example, It's doing that.
Your code gives 80.0 as you wanted, however you should use == instead of is, see the reason.
def getAccuracy(y_test, predictions):
n = len(y_test)
correct = 0
for x in range(n):
if y_test[x] == predictions[x]:
correct += 1
return (correct/n) * 100.0
predictions = [1, 0, 0, 1, 1, 1, 0, 1, 1, 0]
y_test = [1, 0, 0, 1, 0, 1, 0, 1, 1, 1]
print(getAccuracy(y_test, predictions))
80.0
Here's an implementation using Numpy:
import numpy as np
n = len(y_test)
100*np.sum(np.isclose(predictions, y_test))/n
or if you convert your lists to numpy arrays, then
100*np.sum(predictions == y_test)/n
Take the probability distribution of a XOR gate in which every configuration is equally probable (configurations are given by outcomes_sub; the probability mass function by pmf_xor_sub):
import numpy as np
import itertools as it
outcomes_sub = [list(item) for item in list(it.product([0,1], repeat=3))]
pmf_xor_sub = np.array([1/4, 0, 0, 1/4, 0, 1/4, 1/4, 0])
Now take the probability distribution corresponding to two uncorrelated such XORs:
outcomes = [outcome1 + outcome2 for (outcome1, outcome2)
in it.product(outcomes_sub, outcomes_sub)]
pmf_xor = [pmf1 * pmf2 for (pmf1, pmf2)
in it.product(pmf_xor_sub, pmf_xor_sub)]
And create some data based on it:
indices = np.random.choice(len(outcomes), 10000, p=pmf_xor)
data_xor = np.array([outcomes[index] for index in indices])
data_xor looks like this:
array([[1, 1, 0, 0, 0, 0],
[1, 0, 1, 0, 0, 0],
[0, 1, 1, 1, 1, 0],
...,
[0, 1, 1, 1, 1, 0],
[1, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0]])
I.e., two independent XORs back to back. What's the right way to perform dimensionality reduction on it? PCA won't work (because the dependence is non-linear, right?):
from sklearn import decomposition
pca_xor = decomposition.PCA()
pca_xor.fit(data_xor)
Now, pca_xor.explained_variance_ratio_ gives:
array([ 0.17145045, 0.17018817, 0.16758773, 0.16575979, 0.16410862,
0.16090524], dtype=float32)
No two components stand out. I understand that a non-linear method such as kernel PCA should work here, but I am struggling to find pointers to ways of applying it to my problem.
To give a bit more context: what I am actually after is ways to bring out the structure in data_xor: two big XOR blobs, each of which is composed of some finer-grained stuff. If I am going about it all wrong, feel free to point that out too.
I am using the DEAP library to maximize a metric, and I noticed that whenever I restart the algorithm (which is supposed to create a random list of binary values - 1s and 0s) it is producing the same initial values.
I became suspicious and copied their basic DEAP example here and re-ran the algorithms again:
import array, random
from deap import creator, base, tools, algorithms
creator.create("FitnessMax", base.Fitness, weights=(1.0,))
creator.create("Individual", array.array, typecode='b', fitness=creator.FitnessMax)
toolbox = base.Toolbox()
toolbox.register("attr_bool", random.randint, 0, 1)
toolbox.register("individual", tools.initRepeat, creator.Individual, toolbox.attr_bool, 10)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
def evalOneMax(individual):
return sum(individual),
toolbox.register("evaluate", evalOneMax)
toolbox.register("mate", tools.cxTwoPoints)
toolbox.register("mutate", tools.mutFlipBit, indpb=0.05)
toolbox.register("select", tools.selTournament, tournsize=3)
population = toolbox.population(n=10)
NGEN=40
for gen in range(NGEN):
offspring = algorithms.varAnd(population, toolbox, cxpb=0.5, mutpb=0.1)
fits = toolbox.map(toolbox.evaluate, offspring)
for fit, ind in zip(fits, offspring):
ind.fitness.values = fit
population = offspring
The code above is exactly their example, but with the population and individual size reduced to 10. I ran the algorithm 5 times and it produced exact copies of each other. I also added a print statement to get the below output:
>python testGA.py
[1, 0, 1, 0, 1, 0, 1, 1, 1, 1]
Starting the Evolution Algorithm...
Evaluating Individual: [0, 1, 0, 1, 0, 1, 1, 1, 1, 0]
Evaluating Individual: [1, 1, 0, 1, 0, 1, 0, 1, 0, 0]
Evaluating Individual: [0, 0, 1, 0, 0, 1, 1, 0, 0, 1]
Evaluating Individual: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Evaluating Individual: [0, 1, 1, 0, 1, 0, 1, 1, 0, 1]
Evaluating Individual: [1, 0, 1, 1, 1, 0, 0, 1, 0, 0]
Evaluating Individual: [0, 1, 0, 0, 0, 1, 0, 0, 0, 1]
Evaluating Individual: [1, 1, 0, 1, 0, 1, 0, 1, 1, 1]
Evaluating Individual: [1, 1, 1, 1, 0, 0, 1, 0, 0, 0]
Evaluating Individual: [0, 0, 1, 1, 1, 1, 0, 1, 1, 1]
This output is generated every time I call the function - In that order. They are exactly identical.
I have read that I shouldn't have to seed the random.randint function, and I tested it by writing a basic script that just prints out a list of 10 random ints ranged 0 to 1. This workd fine, it just seems to produce the same values when I feed it through DEAP.
Is this normal? How can I ensure that, when I run the algorithm, I get different 'individuals' every time?
EDIT:
Sorry for the late reply, here is the full source I am using:
import random, sys
from deap import creator, base, tools
class Max():
def __init__(self):
creator.create("FitnessMax", base.Fitness, weights=(1.0,))
creator.create("Individual", list, fitness=creator.FitnessMax)
INDIVIDUAL_SIZE = 10
self.toolbox = base.Toolbox()
self.toolbox.register("attr_bool", random.randint, 0, 1)
self.toolbox.register("individual", tools.initRepeat, creator.Individual, self.toolbox.attr_bool, n=INDIVIDUAL_SIZE)
self.toolbox.register("population", tools.initRepeat, list, self.toolbox.individual)
self.toolbox.register("mate", tools.cxTwoPoints)
self.toolbox.register("mutate", tools.mutFlipBit, indpb=0.05)
self.toolbox.register("select", tools.selTournament, tournsize=3)
self.toolbox.register("evaluate", self.evaluate)
print self.main()
def evaluate(self, individual):
# Some debug code
print 'Evaluating Individual: ' + str(individual)
return sum(individual),
def main(self):
CXPB, MUTPB, NGEN = 0.5, 0.2, 40
random.seed(64)
pop = self.toolbox.population(n=10)
print "Starting the Evolution Algorithm..."
fitnesses = list(map(self.toolbox.evaluate, pop))
for ind, fit in zip(pop, fitnesses):
ind.fitness.values = fit
# ----------------------------------------------------------
# Killing the program here - just want to see the population created
sys.exit()
print "Evaluated %i individuals" % (len(pop))
for g in range(NGEN):
print "-- Generation %i --" % (g)
# Select the next genereation individuals
offspring = self.toolbox.select(pop, len(pop))
# Clone the selected individuals
offspring = list(map(self.toolbox.clone, offspring))
# Apply crossover and mutation on the offspring
for child1, child2 in zip(offspring[::2], offspring[1::2]):
if random.random() < CXPB:
self.toolbox.mate(child1, child2)
del child1.fitness.values
del child2.fitness.values
for mutant in offspring:
if random.random() < MUTPB:
self.toolbox.mutate(mutant)
del mutant.fitness.values
# Evaluate the individuals with an invalid fitness
invalid_ind = [ind for ind in offspring if not ind.fitness.valid]
fitnesses = map(self.toolbox.evaluate, invalid_ind)
for ind, fit in zip(invalid_ind, fitnesses):
ind.fitness.values = fit
print "\tEvaluated %i individuals" % (len(pop))
pop[:] = offspring
fits = [ind.fitness.values[0] for ind in pop]
length = len(pop)
mean = sum(fits) / length
sum2 = sum(x*x for x in fits)
std = abs(sum2 / length - mean**2)**0.5
print "\tMin %s" % (min(fits))
print "\tMax %s" % (max(fits))
print "\tAvg %s" % (mean)
print "\tStd %s" % (std)
class R_Test:
def __init__(self):
print str([random.randint(0, 1) for i in range(10)])
if __name__ == '__main__':
#rt = R_Test()
mx = Max()
The R_Test class is there to test the random generation in Python. I read here that the seed is dynamically called if not given in Python, and I wanted to test this.
How I have been executing the above code has been as such:
> python testGA.py
... the 10 outputs
> python testGA.py
... the exact same outputs
> python testGA.py
... the exact same outputs
> python testGA.py
... the exact same outputs
> python testGA.py
... the exact same outputs
Obviously 5 times isn't exactly a strenuous test, but the fact that all 10 values are the same 5 times in a row raised a red flag.
The problem is that you specify a seed for the random number generator in your main function. Simply comment the line : random.seed(64) and you will get different results every time you execute your program.
In DEAP example files, a specific seed is set because we also use these examples as integration tests. If after a modification in the framework base code, the output of an example is different, we want to know. It also allow us to bench the time required by each example and provide a ballpark estimate to our users. The results of these benchmarks are available online at http://deap.gel.ulaval.ca/speed/.