Fitness proportionate selection (roulette wheel selection) in Python - python

I have a list of objects (Chromosome) which have an attribute fitness (chromosome.fitness is between 0 and 1)
Given a list of such objects, how can I implement a function which returns a single chromosome whose chance of being selected is proportional to its fitness? That is, a chromosome with fitness 0.8 is twice as likely to be selected as one with fitness 0.4.
I've found a few Python and pseudocode implementations, but they are too complex for this requirement: the function needs only a list of chromosomes. Chromosomes store their own fitness as an internal variable.
The implementation I already wrote was before I decided to allow chromosomes to store their own fitness, so was a lot more complicated and involved zipping lists and things.
----------------------------EDIT----------------------------
Thanks Lattyware. The following function seems to work.
def selectOne(self, population):
max = sum([c.fitness for c in population])
pick = random.uniform(0, max)
current = 0
for chromosome in population:
current += chromosome.fitness
if current > pick:
return chromosome

Use numpy.random.choice.
import numpy.random as npr
def selectOne(self, population):
max = sum([c.fitness for c in population])
selection_probs = [c.fitness/max for c in population]
return population[npr.choice(len(population), p=selection_probs)]

There is a very simple way to select a weighted random choice from a dictionary:
def weighted_random_choice(choices):
max = sum(choices.values())
pick = random.uniform(0, max)
current = 0
for key, value in choices.items():
current += value
if current > pick:
return key
If you don't have a dictionary at hand, you could modify this to suit your class (as you haven't given more details of it, or generate a dictionary:
choices = {chromosome: chromosome.fitness for chromosome in chromosomes}
Presuming that fitness is an attribute.
Here is an example of the function modified to take an iterable of chromosomes, again, making the same presumption.
def weighted_random_choice(chromosomes):
max = sum(chromosome.fitness for chromosome in chromosomes)
pick = random.uniform(0, max)
current = 0
for chromosome in chromosomes:
current += chromosome.fitness
if current > pick:
return chromosome

I'd prefer fewer lines:
import itertools
def choose(population):
bounds = list(itertools.accumulate(chromosome.fitness for chromosome in population))
pick = random.random() * bounds[-1]
return next(chromosome for chromosome, bound in zip(population, bounds) if pick < bound)

def Indvs_wieght(Indvs): # to comput probality of selecting each Indvs by its fitness
s=1
s=sum(i.fitness for i in Indvs)
wieghts = list()
for i in range(len(Indvs)) :
wieghts.append(Indvs[i].fitness/s)
return wieghts
def select_parents(indvs,indvs_wieghts,number_of_parents=40): # Roulette Wheel Selection method #number of selected parent
return np.random.choice(indvs,size=number_of_parents,p=indvs_wieghts)

from __future__ import division
import numpy as np
import random,pdb
import operator
def roulette_selection(weights):
'''performs weighted selection or roulette wheel selection on a list
and returns the index selected from the list'''
# sort the weights in ascending order
sorted_indexed_weights = sorted(enumerate(weights), key=operator.itemgetter(1));
indices, sorted_weights = zip(*sorted_indexed_weights);
# calculate the cumulative probability
tot_sum=sum(sorted_weights)
prob = [x/tot_sum for x in sorted_weights]
cum_prob=np.cumsum(prob)
# select a random a number in the range [0,1]
random_num=random.random()
for index_value, cum_prob_value in zip(indices,cum_prob):
if random_num < cum_prob_value:
return index_value
if __name__ == "__main__":
weights=[1,2,6,4,3,7,20]
print (roulette_selection(weights))
weights=[1,2,2,2,2,2,2]
print (roulette_selection(weights))

import random
def weighted_choice(items):
total_weight = sum(item.weight for item in items)
weight_to_target = random.uniform(0, total_weight)
for item in items:
weight_to_target -= item.weight
if weight_to_target <= 0:
return item

Related

Calculate the probability of reaching target sum across multiple lotteries

If I have separate lotteries with different prize sizes and different odds of winning, how do I calculate the probability of winning at least a given amount if I play each lottery simultaneously and only once?
Lottery A: 50% chance of winning 5 gold.
Lottery B: 40% chance of winning 6 gold.
Lottery C: 30% chance of winning 7 gold.
Is there a method of calculating the probability that I win at least a given value (like 10 gold) if I play all the lotteries simultaneously? I would ideally like it to work for a set of approximately 40 lotteries.
The input would be a list of tuples with probability and prize size, like:
lottery_list = [(0.5, 5), (0.4, 6), (0.3, 7)]
And then a function to calculate the probability of winning at least a target value, like in this case 10 gold:
prob = win_at_least(lottery_list, target_val=10)
I ended up creating a solution myself. This code ignores combinations that are irrelevant, i.e. all combinations of 'downstream' lotteries if a given set of lotteries has already yielded the target value. It takes about 10 ms for 40 lotteries and seems to scale very well.
import numpy as np
import pandas as pd
def multi_lottery(lottery_list: list, target: float) -> float:
"""
This function calculates the odds of winning at least the target
value from a list of lotteries, where each lottery has a
different probability of winning and a different value.
Args:
lottery_list (list): A list of tuples with lottery
probabilities and values.
target (float): The target value.
Returns:
float: The odds of winning at least target value.
"""
# Create a pandas dataframe with the lottery list.
df = pd.DataFrame(lottery_list, columns=["probability", "value"])
# Sort the dataframe by descending value.
df_sorted = df.sort_values("value", ascending=False, ignore_index=True)
probs = df_sorted["probability"].values
values = df_sorted["value"].values
# Create a mask equal to the length of the lottery list.
# This will be used to determine which lotteries are included in
# each combination.
length = len(df_sorted)
mask = np.ones(length, dtype=int)
# Start with odds of losing at 1.
total_odds = 0
while True:
# Go through the lottery list using the mask to determine if
# each lottery is won, and add the values together until the target
# value is reached.
odds = 1
value = 0
for idx, prob in enumerate(probs):
# Include the winning chance if the binary number is 1.
if mask[idx] == 1:
odds *= prob
value += values[idx]
# Else include the losing chance.
else:
odds *= 1 - prob
# If the target value is reached, all subsequent lotteries
# are ignored and the odds are added to the total.
if value >= target:
# Update the anti-odds.
total_odds += odds
# Update the mask by setting the current lottery to zero.
mask[idx] = 0
break
# Else if the last lottery is reached, update the mask by
# setting the largest active lottery to zero and all smaller lotteries to one.
elif idx == length - 1:
largest_active_idx = 0
for i, v in enumerate(mask):
if v == 1:
largest_active_idx = i
mask[largest_active_idx] = 0
mask[largest_active_idx + 1 :] = 1
# Check if the mask is all zeros. If so, break the loop.
if np.sum(mask) == 0:
break
return total_odds

Getting specific values from ASCII table

I'm currently creating a genetic algorithm and am trying to only get certain values from the ASCII table so the runtime of the algorithm can be a bit faster. In the code below I get the values between 9-127 but I only need the values 9-10, and 32-127 from the ASCII table and I'm not sure on how to exactly only get those specific values. Code below is done in python.
import numpy as np
TARGET_PHRASE = """The smartest and fastest Pixel yet.
Google Tensor: Our first custom-built processor.
The first processor designed by Google and made for Pixel, Tensor makes the new Pixel phones our most powerful yet.
The most advanced Pixel Camera ever.
Capture brilliant color and vivid detail with Pixels best-in-class computational photography and new pro-level lenses.""" # target DNA
POP_SIZE = 4000 # population size
CROSS_RATE = 0.8 # mating probability (DNA crossover)
MUTATION_RATE = 0.00001 # mutation probability
N_GENERATIONS = 100000
DNA_SIZE = len(TARGET_PHRASE)
TARGET_ASCII = np.fromstring(TARGET_PHRASE, dtype=np.uint8) # convert string to number
ASCII_BOUND = [9, 127]
class GA(object):
def __init__(self, DNA_size, DNA_bound, cross_rate, mutation_rate, pop_size):
self.DNA_size = DNA_size
DNA_bound[1] += 1
self.DNA_bound = DNA_bound
self.cross_rate = cross_rate
self.mutate_rate = mutation_rate
self.pop_size = pop_size
self.pop = np.random.randint(*DNA_bound, size=(pop_size, DNA_size)).astype(np.int8) # int8 for convert to ASCII
def translateDNA(self, DNA): # convert to readable string
return DNA.tostring().decode('ascii')
def get_fitness(self): # count how many character matches
match_count = (self.pop == TARGET_ASCII).sum(axis=1)
return match_count
def select(self):
fitness = self.get_fitness() # add a small amount to avoid all zero fitness
idx = np.random.choice(np.arange(self.pop_size), size=self.pop_size, replace=True, p=fitness/fitness.sum())
return self.pop[idx]
def crossover(self, parent, pop):
if np.random.rand() < self.cross_rate:
i_ = np.random.randint(0, self.pop_size, size=1) # select another individual from pop
cross_points = np.random.randint(0, 2, self.DNA_size).astype(np.bool) # choose crossover points
parent[cross_points] = pop[i_, cross_points] # mating and produce one child
return parent
def mutate(self, child):
for point in range(self.DNA_size):
if np.random.rand() < self.mutate_rate:
child[point] = np.random.randint(*self.DNA_bound) # choose a random ASCII index
return child
def evolve(self):
pop = self.select()
pop_copy = pop.copy()
for parent in pop: # for every parent
child = self.crossover(parent, pop_copy)
child = self.mutate(child)
parent[:] = child
self.pop = pop
if __name__ == '__main__':
ga = GA(DNA_size=DNA_SIZE, DNA_bound=ASCII_BOUND, cross_rate=CROSS_RATE,
mutation_rate=MUTATION_RATE, pop_size=POP_SIZE)
for generation in range(N_GENERATIONS):
fitness = ga.get_fitness()
best_DNA = ga.pop[np.argmax(fitness)]
best_phrase = ga.translateDNA(best_DNA)
print('Gen', generation, ': ', best_phrase)
if best_phrase == TARGET_PHRASE:
break
ga.evolve()
You need a customed method to generate random samples in range 9-10, and 32-127, like
def my_rand(pop_size, DNA_size):
bold1=[9,10]
bold2=list(range(32,127))
bold=bold1+bold2
pop = np.random.choice(bold,(pop_size,DNA_size)).astype(np.int8)
return pop
then call this method to replace the line 29, like
delete -- self.pop = np.random.randint(*DNA_bound, size=(pop_size, DNA_size)).astype(np.int8) # int8 for convert to ASCII
call ---self.pop = my_rand(pop_size, DNA_size)

Changing this Python program to have function def()

The following Python program flips a coin several times, then reports the longest series of heads and tails. I am trying to convert this program into a program that uses functions so it uses basically less code. I am very new to programming and my teacher requested this of us, but I have no idea how to do it. I know I'm supposed to have the function accept 2 parameters: a string or list, and a character to search for. The function should return, as the value of the function, an integer which is the longest sequence of that character in that string. The function shouldn't accept input or output from the user.
import random
print("This program flips a coin several times, \nthen reports the longest
series of heads and tails")
cointoss = int(input("Number of times to flip the coin: "))
varlist = []
i = 0
varstring = ' '
while i < cointoss:
r = random.choice('HT')
varlist.append(r)
varstring = varstring + r
i += 1
print(varstring)
print(varlist)
print("There's this many heads: ",varstring.count("H"))
print("There's this many tails: ",varstring.count("T"))
print("Processing input...")
i = 0
longest_h = 0
longest_t = 0
inarow = 0
prevIn = 0
while i < cointoss:
print(varlist[i])
if varlist[i] == 'H':
prevIn += 1
if prevIn > longest_h:
longest_h = prevIn
print("",longest_h,"")
inarow = 0
if varlist[i] == 'T':
inarow += 1
if inarow > longest_t:
longest_t = inarow
print("",longest_t,"")
prevIn = 0
i += 1
print ("The longest series of heads is: ",longest_h)
print ("The longest series of tails is: ",longest_t)
If this is asking too much, any explanatory help would be really nice instead. All I've got so far is:
def flip (a, b):
flipValue = random.randint
but it's barely anything.
import random
def Main():
numOfFlips=getFlips()
outcome=flipping(numOfFlips)
print(outcome)
def getFlips():
Flips=int(input("Enter number if flips:\n"))
return Flips
def flipping(numOfFlips):
longHeads=[]
longTails=[]
Tails=0
Heads=0
for flips in range(0,numOfFlips):
flipValue=random.randint(1,2)
print(flipValue)
if flipValue==1:
Tails+=1
longHeads.append(Heads) #recording value of Heads before resetting it
Heads=0
else:
Heads+=1
longTails.append(Tails)
Tails=0
longestHeads=max(longHeads) #chooses the greatest length from both lists
longestTails=max(longTails)
return "Longest heads:\t"+str(longestHeads)+"\nLongest tails:\t"+str(longestTails)
Main()
I did not quite understand how your code worked, so I made the code in functions that works just as well, there will probably be ways of improving my code alone but I have moved the code over to functions
First, you need a function that flips a coin x times. This would be one possible implementation, favoring random.choice over random.randint:
def flip(x):
result = []
for _ in range(x):
result.append(random.choice(("h", "t")))
return result
Of course, you could also pass from what exactly we are supposed to take a choice as a parameter.
Next, you need a function that finds the longest sequence of some value in some list:
def longest_series(some_value, some_list):
current, longest = 0, 0
for r in some_list:
if r == some_value:
current += 1
longest = max(current, longest)
else:
current = 0
return longest
And now you can call these in the right order:
# initialize the random number generator, so we get the same result
random.seed(5)
# toss a coin a hundred times
series = flip(100)
# count heads/tails
headflips = longest_series('h', series)
tailflips = longest_series('t', series)
# print the results
print("The longest series of heads is: " + str(headflips))
print("The longest series of tails is: " + str(tailflips))
Output:
>> The longest series of heads is: 8
>> The longest series of heads is: 5
edit: removed the flip implementation with yield, it made the code weird.
Counting the longest run
Let see what you have asked for
I'm supposed to have the function accept 2 parameters: a string or list,
or, generalizing just a bit, a sequence
and a character
again, we'd speak, generically, of an item
to search for. The function should return, as the value of the
function, an integer which is the longest sequence of that character
in that string.
My implementation of the function you are asking for, complete of doc
string, is
def longest_run(i, s):
'Counts the longest run of item "i" in sequence "s".'
c, m = 0, 0
for el in s:
if el==i:
c += 1
elif c:
m = m if m >= c else c
c = 0
return m
We initialize c (current run) and m (maximum run so far) to zero,
then we loop, looking at every element el of the argument sequence s.
The logic is straightforward but for elif c: whose block is executed at the end of a run (because c is greater than zero and logically True) but not when the previous item (not the current one) was not equal to i. The savings are small but are savings...
Flipping coins (and more...)
How can we simulate flipping n coins? We abstract the problem and recognize that flipping n coins corresponds to choosing from a collection of possible outcomes (for a coin, either head or tail) for n times.
As it happens, the random module of the standard library has the exact answer to this problem
In [52]: random.choices?
Signature: choices(population, weights=None, *, cum_weights=None, k=1)
Docstring:
Return a k sized list of population elements chosen with replacement.
If the relative weights or cumulative weights are not specified,
the selections are made with equal probability.
File: ~/lib/miniconda3/lib/python3.6/random.py
Type: method
Our implementation, aimed at hiding details, could be
def roll(n, l):
'''Rolls "n" times a dice/coin whose face values are listed in "l".
E.g., roll(2, range(1,21)) -> [12, 4] simulates rolling 2 icosahedron dices.
'''
from random import choices
return choices(l, k=n)
Putting this together
def longest_run(i, s):
'Counts the longest run of item "i" in sequence "s".'
c, m = 0, 0
for el in s:
if el==i:
c += 1
elif c:
m = m if m >= c else c
c = 0
return m
def roll(n, l):
'''Rolls "n" times a dice/coin whose face values are listed in "l".
E.g., roll(2, range(1,21)) -> [12, 4] simulates rolling 2 icosahedron dices.
'''
from random import choices
return choices(l, k=n)
N = 100 # n. of flipped coins
h_or_t = ['h', 't']
random_seq_of_h_or_t = flip(N, h_or_t)
max_h = longest_run('h', random_seq_of_h_or_t)
max_t = longest_run('t', random_seq_of_h_or_t)

How can I know what's the largest number from a while loop in python?

so far what I've done is this:
from random import randint
def num_random():
counter = 0
while (counter <= 99):
counter=counter+1
base=randint(1,100)
height=randint(1,100)
area = base*height/2
data(area)
def data(area):
print("area=",area)
num_random()
but I want to determinate what triangle has the largest area.
I thnik I could store the value of each area in a list and then use max(list), so I can know what is the largest. The problem is that I'm not sure how to store the values from the while loop in a list.
Thanks in advance!
You could simply do
max(randint(1,100)*randint(1,100)/2 for i in range(100))
or if you want to make it a little clearer,
def area_triangle(base, height):
return base*height/2
def max_rand_area(num_trials=100):
return max(area_triangle(randint(1,100), randint(1,100)) for i in range(num_trials))
You are right, simply append area to a list, then call the max function:
from random import randint
def num_random():
counter = 0
areas = []
while (counter <= 99):
counter=counter+1
base=randint(1,100)
height=randint(1,100)
area = base*height/2
areas.append(area)
data(max(areas))
def data(area):
print("area=",area)
num_random()
>>> num_random()
('area=', 4140)
>>> num_random()
('area=', 4464)
Here is how I would write the original code, plus being in list format:
from random import randint
def num_random():
base = []
height = []
area = []
for i in range(99): #Use For Loop rather than while loop with counter
base.append(randint(1,100)) #Add value to "base" list
height.append(randint(1,100)) #Add value to "height" list
area.append(base[i]*height[i]/2) #Calculate Area h*b/2
print("Area = {}".format(area)) #Print Area Value to Terminal
print("Max Area = {}".format(max(area))
num_random() #call the function
In addition, now you can use index i to pull indexed values from the created lists.

Quickly counting particles in grid

I've written some python code to calculate a certain quantity from a cosmological simulation. It does this by checking whether a particle in contained within a box of size 8,000^3, starting at the origin and advancing the box when all particles contained within it are found. As I am counting ~2 million particles altogether, and the total size of the simulation volume is 150,000^3, this is taking a long time.
I'll post my code below, does anybody have any suggestions on how to improve it?
Thanks in advance.
from __future__ import division
import numpy as np
def check_range(pos, i, j, k):
a = 0
if i <= pos[2] < i+8000:
if j <= pos[3] < j+8000:
if k <= pos[4] < k+8000:
a = 1
return a
def sigma8(data):
N = []
to_do = data
print 'Counting number of particles per cell...'
for k in range(0,150001,8000):
for j in range(0,150001,8000):
for i in range(0,150001,8000):
temp = []
n = []
for count in range(len(to_do)):
n.append(check_range(to_do[count],i,j,k))
to_do[count][1] = n[count]
if to_do[count][1] == 0:
temp.append(to_do[count])
#Only particles that have not been found are
# searched for again
to_do = temp
N.append(sum(n))
print 'Next row'
print 'Next slice, %i still to find' % len(to_do)
print 'Calculating sigma8...'
if not sum(N) == len(data):
return 'Error!\nN measured = {0}, total N = {1}'.format(sum(N), len(data))
else:
return 'sigma8 = %.4f, variance = %.4f, mean = %.4f' % (np.sqrt(sum((N-np.mean(N))**2)/len(N))/np.mean(N), np.var(N),np.mean(N))
I'll try to post some code, but my general idea is the following: create a Particle class that knows about the box that it lives in, which is calculated in the __init__. Each box should have a unique name, which might be the coordinate of the bottom left corner (or whatever you use to locate your boxes).
Get a new instance of the Particle class for each particle, then use a Counter (from the collections module).
Particle class looks something like:
# static consts - outside so that every instance of Particle doesn't take them along
# for the ride...
MAX_X = 150,000
X_STEP = 8000
# etc.
class Particle(object):
def __init__(self, data):
self.x = data[xvalue]
self.y = data[yvalue]
self.z = data[zvalue]
self.compute_box_label()
def compute_box_label(self):
import math
x_label = math.floor(self.x / X_STEP)
y_label = math.floor(self.y / Y_STEP)
z_label = math.floor(self.z / Z_STEP)
self.box_label = str(x_label) + '-' + str(y_label) + '-' + str(z_label)
Anyway, I imagine your sigma8 function might look like:
def sigma8(data):
import collections as col
particles = [Particle(x) for x in data]
boxes = col.Counter([x.box_label for x in particles])
counts = boxes.most_common()
#some other stuff
counts will be a list of tuples which map a box label to the number of particles in that box. (Here we're treating particles as indistinguishable.)
Using list comprehensions is much faster than using loops---I think the reason is that you're basically relying more on the underlying C, but I'm not the person to ask. Counter is (supposedly) highly-optimized as well.
Note: None of this code has been tested, so you shouldn't try the cut-and-paste-and-hope-it-works method here.

Categories