I tried using random.randint(0, 100), but some numbers were the same. Is there a method/module to create a list unique random numbers?
This will return a list of 10 numbers selected from the range 0 to 99, without duplicates.
import random
random.sample(range(100), 10)
You can use the shuffle function from the random module like this:
import random
nums = list(range(1, 100)) # list of integers from 1 to 99
# adjust this boundaries to fit your needs
random.shuffle(nums)
print(nums) # <- List of unique random numbers
Note here that the shuffle method doesn't return any list as one may expect, it only shuffle the list passed by reference.
You can first create a list of numbers from a to b, where a and b are respectively the smallest and greatest numbers in your list, then shuffle it with Fisher-Yates algorithm or using the Python's random.shuffle method.
Linear Congruential Pseudo-random Number Generator
O(1) Memory
O(k) Operations
This problem can be solved with a simple Linear Congruential Generator. This requires constant memory overhead (8 integers) and at most 2*(sequence length) computations.
All other solutions use more memory and more compute! If you only need a few random sequences, this method will be significantly cheaper. For ranges of size N, if you want to generate on the order of N unique k-sequences or more, I recommend the accepted solution using the builtin methods random.sample(range(N),k) as this has been optimized in python for speed.
Code
# Return a randomized "range" using a Linear Congruential Generator
# to produce the number sequence. Parameters are the same as for
# python builtin "range".
# Memory -- storage for 8 integers, regardless of parameters.
# Compute -- at most 2*"maximum" steps required to generate sequence.
#
def random_range(start, stop=None, step=None):
import random, math
# Set a default values the same way "range" does.
if (stop == None): start, stop = 0, start
if (step == None): step = 1
# Use a mapping to convert a standard range into the desired range.
mapping = lambda i: (i*step) + start
# Compute the number of numbers in this range.
maximum = (stop - start) // step
# Seed range with a random integer.
value = random.randint(0,maximum)
#
# Construct an offset, multiplier, and modulus for a linear
# congruential generator. These generators are cyclic and
# non-repeating when they maintain the properties:
#
# 1) "modulus" and "offset" are relatively prime.
# 2) ["multiplier" - 1] is divisible by all prime factors of "modulus".
# 3) ["multiplier" - 1] is divisible by 4 if "modulus" is divisible by 4.
#
offset = random.randint(0,maximum) * 2 + 1 # Pick a random odd-valued offset.
multiplier = 4*(maximum//4) + 1 # Pick a multiplier 1 greater than a multiple of 4.
modulus = int(2**math.ceil(math.log2(maximum))) # Pick a modulus just big enough to generate all numbers (power of 2).
# Track how many random numbers have been returned.
found = 0
while found < maximum:
# If this is a valid value, yield it in generator fashion.
if value < maximum:
found += 1
yield mapping(value)
# Calculate the next value in the sequence.
value = (value*multiplier + offset) % modulus
Usage
The usage of this function "random_range" is the same as for any generator (like "range"). An example:
# Show off random range.
print()
for v in range(3,6):
v = 2**v
l = list(random_range(v))
print("Need",v,"found",len(set(l)),"(min,max)",(min(l),max(l)))
print("",l)
print()
Sample Results
Required 8 cycles to generate a sequence of 8 values.
Need 8 found 8 (min,max) (0, 7)
[1, 0, 7, 6, 5, 4, 3, 2]
Required 16 cycles to generate a sequence of 9 values.
Need 9 found 9 (min,max) (0, 8)
[3, 5, 8, 7, 2, 6, 0, 1, 4]
Required 16 cycles to generate a sequence of 16 values.
Need 16 found 16 (min,max) (0, 15)
[5, 14, 11, 8, 3, 2, 13, 1, 0, 6, 9, 4, 7, 12, 10, 15]
Required 32 cycles to generate a sequence of 17 values.
Need 17 found 17 (min,max) (0, 16)
[12, 6, 16, 15, 10, 3, 14, 5, 11, 13, 0, 1, 4, 8, 7, 2, ...]
Required 32 cycles to generate a sequence of 32 values.
Need 32 found 32 (min,max) (0, 31)
[19, 15, 1, 6, 10, 7, 0, 28, 23, 24, 31, 17, 22, 20, 9, ...]
Required 64 cycles to generate a sequence of 33 values.
Need 33 found 33 (min,max) (0, 32)
[11, 13, 0, 8, 2, 9, 27, 6, 29, 16, 15, 10, 3, 14, 5, 24, ...]
The solution presented in this answer works, but it could become problematic with memory if the sample size is small, but the population is huge (e.g. random.sample(insanelyLargeNumber, 10)).
To fix that, I would go with this:
answer = set()
sampleSize = 10
answerSize = 0
while answerSize < sampleSize:
r = random.randint(0,100)
if r not in answer:
answerSize += 1
answer.add(r)
# answer now contains 10 unique, random integers from 0.. 100
If you need to sample extremely large numbers, you cannot use range
random.sample(range(10000000000000000000000000000000), 10)
because it throws:
OverflowError: Python int too large to convert to C ssize_t
Also, if random.sample cannot produce the number of items you want due to the range being too small
random.sample(range(2), 1000)
it throws:
ValueError: Sample larger than population
This function resolves both problems:
import random
def random_sample(count, start, stop, step=1):
def gen_random():
while True:
yield random.randrange(start, stop, step)
def gen_n_unique(source, n):
seen = set()
seenadd = seen.add
for i in (i for i in source() if i not in seen and not seenadd(i)):
yield i
if len(seen) == n:
break
return [i for i in gen_n_unique(gen_random,
min(count, int(abs(stop - start) / abs(step))))]
Usage with extremely large numbers:
print('\n'.join(map(str, random_sample(10, 2, 10000000000000000000000000000000))))
Sample result:
7822019936001013053229712669368
6289033704329783896566642145909
2473484300603494430244265004275
5842266362922067540967510912174
6775107889200427514968714189847
9674137095837778645652621150351
9969632214348349234653730196586
1397846105816635294077965449171
3911263633583030536971422042360
9864578596169364050929858013943
Usage where the range is smaller than the number of requested items:
print(', '.join(map(str, random_sample(100000, 0, 3))))
Sample result:
2, 0, 1
It also works with with negative ranges and steps:
print(', '.join(map(str, random_sample(10, 10, -10, -2))))
print(', '.join(map(str, random_sample(10, 5, -5, -2))))
Sample results:
2, -8, 6, -2, -4, 0, 4, 10, -6, 8
-3, 1, 5, -1, 3
If the list of N numbers from 1 to N is randomly generated, then yes, there is a possibility that some numbers may be repeated.
If you want a list of numbers from 1 to N in a random order, fill an array with integers from 1 to N, and then use a Fisher-Yates shuffle or Python's random.shuffle().
Here is a very small function I made, hope this helps!
import random
numbers = list(range(0, 100))
random.shuffle(numbers)
A very simple function that also solves your problem
from random import randint
data = []
def unique_rand(inicial, limit, total):
data = []
i = 0
while i < total:
number = randint(inicial, limit)
if number not in data:
data.append(number)
i += 1
return data
data = unique_rand(1, 60, 6)
print(data)
"""
prints something like
[34, 45, 2, 36, 25, 32]
"""
One straightforward alternative is to use np.random.choice() as shown below
np.random.choice(range(10), size=3, replace=False)
This results in three integer numbers that are different from each other. e.g., [1, 3, 5], [2, 5, 1]...
The answer provided here works very well with respect to time
as well as memory but a bit more complicated as it uses advanced python
constructs such as yield. The simpler answer works well in practice but, the issue with that
answer is that it may generate many spurious integers before actually constructing
the required set. Try it out with populationSize = 1000, sampleSize = 999.
In theory, there is a chance that it doesn't terminate.
The answer below addresses both issues, as it is deterministic and somewhat efficient
though currently not as efficient as the other two.
def randomSample(populationSize, sampleSize):
populationStr = str(populationSize)
dTree, samples = {}, []
for i in range(sampleSize):
val, dTree = getElem(populationStr, dTree, '')
samples.append(int(val))
return samples, dTree
where the functions getElem, percolateUp are as defined below
import random
def getElem(populationStr, dTree, key):
msd = int(populationStr[0])
if not key in dTree.keys():
dTree[key] = range(msd + 1)
idx = random.randint(0, len(dTree[key]) - 1)
key = key + str(dTree[key][idx])
if len(populationStr) == 1:
dTree[key[:-1]].pop(idx)
return key, (percolateUp(dTree, key[:-1]))
newPopulation = populationStr[1:]
if int(key[-1]) != msd:
newPopulation = str(10**(len(newPopulation)) - 1)
return getElem(newPopulation, dTree, key)
def percolateUp(dTree, key):
while (dTree[key] == []):
dTree[key[:-1]].remove( int(key[-1]) )
key = key[:-1]
return dTree
Finally, the timing on average was about 15ms for a large value of n as shown below,
In [3]: n = 10000000000000000000000000000000
In [4]: %time l,t = randomSample(n, 5)
Wall time: 15 ms
In [5]: l
Out[5]:
[10000000000000000000000000000000L,
5731058186417515132221063394952L,
85813091721736310254927217189L,
6349042316505875821781301073204L,
2356846126709988590164624736328L]
In order to obtain a program that generates a list of random values without duplicates that is deterministic, efficient and built with basic programming constructs consider the function extractSamples defined below,
def extractSamples(populationSize, sampleSize, intervalLst) :
import random
if (sampleSize > populationSize) :
raise ValueError("sampleSize = "+str(sampleSize) +" > populationSize (= " + str(populationSize) + ")")
samples = []
while (len(samples) < sampleSize) :
i = random.randint(0, (len(intervalLst)-1))
(a,b) = intervalLst[i]
sample = random.randint(a,b)
if (a==b) :
intervalLst.pop(i)
elif (a == sample) : # shorten beginning of interval
intervalLst[i] = (sample+1, b)
elif ( sample == b) : # shorten interval end
intervalLst[i] = (a, sample - 1)
else :
intervalLst[i] = (a, sample - 1)
intervalLst.append((sample+1, b))
samples.append(sample)
return samples
The basic idea is to keep track of intervals intervalLst for possible values from which to select our required elements from. This is deterministic in the sense that we are guaranteed to generate a sample within a fixed number of steps (solely dependent on populationSize and sampleSize).
To use the above function to generate our required list,
In [3]: populationSize, sampleSize = 10**17, 10**5
In [4]: %time lst1 = extractSamples(populationSize, sampleSize, [(0, populationSize-1)])
CPU times: user 289 ms, sys: 9.96 ms, total: 299 ms
Wall time: 293 ms
We may also compare with an earlier solution (for a lower value of populationSize)
In [5]: populationSize, sampleSize = 10**8, 10**5
In [6]: %time lst = random.sample(range(populationSize), sampleSize)
CPU times: user 1.89 s, sys: 299 ms, total: 2.19 s
Wall time: 2.18 s
In [7]: %time lst1 = extractSamples(populationSize, sampleSize, [(0, populationSize-1)])
CPU times: user 449 ms, sys: 8.92 ms, total: 458 ms
Wall time: 442 ms
Note that I reduced populationSize value as it produces Memory Error for higher values when using the random.sample solution (also mentioned in previous answers here and here). For above values, we can also observe that extractSamples outperforms the random.sample approach.
P.S. : Though the core approach is similar to my earlier answer, there are substantial modifications in implementation as well as approach alongwith improvement in clarity.
The problem with the set based approaches ("if random value in return values, try again") is that their runtime is undetermined due to collisions (which require another "try again" iteration), especially when a large amount of random values are returned from the range.
An alternative that isn't prone to this non-deterministic runtime is the following:
import bisect
import random
def fast_sample(low, high, num):
""" Samples :param num: integer numbers in range of
[:param low:, :param high:) without replacement
by maintaining a list of ranges of values that
are permitted.
This list of ranges is used to map a random number
of a contiguous a range (`r_n`) to a permissible
number `r` (from `ranges`).
"""
ranges = [high]
high_ = high - 1
while len(ranges) - 1 < num:
# generate a random number from an ever decreasing
# contiguous range (which we'll map to the true
# random number).
# consider an example with low=0, high=10,
# part way through this loop with:
#
# ranges = [0, 2, 3, 7, 9, 10]
#
# r_n :-> r
# 0 :-> 1
# 1 :-> 4
# 2 :-> 5
# 3 :-> 6
# 4 :-> 8
r_n = random.randint(low, high_)
range_index = bisect.bisect_left(ranges, r_n)
r = r_n + range_index
for i in xrange(range_index, len(ranges)):
if ranges[i] <= r:
# as many "gaps" we iterate over, as much
# is the true random value (`r`) shifted.
r = r_n + i + 1
elif ranges[i] > r_n:
break
# mark `r` as another "gap" of the original
# [low, high) range.
ranges.insert(i, r)
# Fewer values possible.
high_ -= 1
# `ranges` happens to contain the result.
return ranges[:-1]
I found a quite faster way than having to use the range function (very slow), and without using random function from python (I donĀ“t like the random built-in library because when you seed it, it repeats the pattern of the random numbers generator)
import numpy as np
nums = set(np.random.randint(low=0, high=100, size=150)) #generate some more for the duplicates
nums = list(nums)[:100]
This is quite fast.
You can use Numpy library for quick answer as shown below -
Given code snippet lists down 6 unique numbers between the range of 0 to 5. You can adjust the parameters for your comfort.
import numpy as np
import random
a = np.linspace( 0, 5, 6 )
random.shuffle(a)
print(a)
Output
[ 2. 1. 5. 3. 4. 0.]
It doesn't put any constraints as we see in random.sample as referred here.
import random
sourcelist=[]
resultlist=[]
for x in range(100):
sourcelist.append(x)
for y in sourcelist:
resultlist.insert(random.randint(0,len(resultlist)),y)
print (resultlist)
Try using...
import random
LENGTH = 100
random_with_possible_duplicates = [random.randrange(-3, 3) for _ in range(LENGTH)]
random_without_duplicates = list(set(random_with_possible_duplicates)) # This removes duplicates
Advatages
Fast, efficient and readable.
Possible Issues
This method can change the length of the list if there are duplicates.
If you wish to ensure that the numbers being added are unique, you could use a Set object
if using 2.7 or greater, or import the sets module if not.
As others have mentioned, this means the numbers are not truly random.
If the amount of numbers you want is random, you can do something like this. In this case, length is the highest number you want to choose from.
If it notices the new random number was already chosen, itll subtract 1 from count (since a count was added before it knew whether it was a duplicate or not). If its not in the list, then do what you want with it and add it to the list so it cant get picked again.
import random
def randomizer():
chosen_number=[]
count=0
user_input = int(input("Enter number for how many rows to randomly select: "))
numlist=[]
#length = whatever the highest number you want to choose from
while 1<=user_input<=length:
count=count+1
if count>user_input:
break
else:
chosen_number = random.randint(0, length)
if line_number in numlist:
count=count-1
continue
if chosen_number not in numlist:
numlist.append(chosen_number)
#do what you want here
Edit: ignore my answer here. use python's random.shuffle or random.sample, as mentioned in other answers.
to sample integers without replacement between `minval` and `maxval`:
import numpy as np
minval, maxval, n_samples = -50, 50, 10
generator = np.random.default_rng(seed=0)
samples = generator.permutation(np.arange(minval, maxval))[:n_samples]
# or, if minval is 0,
samples = generator.permutation(maxval)[:n_samples]
with jax:
import jax
minval, maxval, n_samples = -50, 50, 10
key = jax.random.PRNGKey(seed=0)
samples = jax.random.shuffle(key, jax.numpy.arange(minval, maxval))[:n_samples]
From the CLI in win xp:
python -c "import random; print(sorted(set([random.randint(6,49) for i in range(7)]))[:6])"
In Canada we have the 6/49 Lotto. I just wrap the above code in lotto.bat and run C:\home\lotto.bat or just C:\home\lotto.
Because random.randint often repeats a number, I use set with range(7) and then shorten it to a length of 6.
Occasionally if a number repeats more than 2 times the resulting list length will be less than 6.
EDIT: However, random.sample(range(6,49),6) is the correct way to go.
For a college project I need to output of the number of votes + the percentage of votes out of the total votes for each team that (input)(there are six in total).
I made the program using lists, and got to the part where i made a list with 7 elements: the total number of votes the program registered + the votes consecutively each team got.
I then use this list to run a function that changes the values of the indexes of the list to their percentage, with another function working as a percentage calculator. (Called 'porcentagem' that I tested out and works as intended.)
def porcentagem(p, w):
pc = 100 * float(p)/float(w)
return str(pc) + "%"
def per(list):
listF = [0,0,0,0,0,0,0]
for x in list[1:7]:
if x != 0:
listF[x] = porcentagem(x, list[0])
else:
listF[x] = 0
return listF
For some reason when I input the votes, the results come all out of order. For example:
The list input is List = [6, 3, 2, 1, 0, 0, 0,] but the output is [0, '16.666666666666668%', '33.333333333333336%', '50.0%', 0, 0, 0] (Index 0 is the total if it wasn't clear, and
I have no idea what could be causing this, its changing the orders of the elements apparently (its supposed to come out as 50%, then 33,3...% etc..)
I'm 'new' at programming + spent two months not coding anything + english is not my first language and I'm learning python in portuguese, sorry if it looks obvious lol
The x in for x in list[1:7]: returns the actual value, not the index. So x will be: 3, 2, 1, 0, 0, 0. That means the first listF[x] is listF[3] which is assigning to the 4th element.
A word of caution: list is a constructor for a built-in function, so if you use list as a variable, it might have unintended consequences. Change it to something like percentage_list.
Do something like the following:
def per(per_list):
listF = [0,0,0,0,0,0,0]
for i in range(1, len(per_list)):
x = per_list[i]
if x != 0:
listF[i] = porcentagem(x, per_list[0])
else:
listF[i] = 0
return listF
Output: [0, '50.0%', '33.333333333333336%', '16.666666666666668%', 0, 0, 0]
I find reading all values of an assignment, obtained from
assignment = routing.SolveWithParameters(search_params)
of routing problems with time windows quiet tricky. First of all, there are nodes and indices. I obtain the indices of a vehicle (route) via
index = routing.Start(vehicle)
indices = [index]
while not routing.IsEnd(index):
index = assignment.Value(routing.NextVar(index))
indices.append(index)
and the corresponding nodes are obtained by
nodes = [routing.IndexToNode(x) for x in indices]
For a particular routing problem with 5 stops and depot=0 the solver finds an assignment with the following indices and nodes:
vehicle: 0
indices: [0, 1, 6]
nodes: [0, 1, 0]
vehicle: 1
indices: [5, 4, 3, 2, 7]
nodes: [0, 4, 3, 2, 0]
So there are three more indices than nodes, because each vehicle starts and ends in the depot. I have defined a cost of 1 for every transit and reading the cost values via
cost_dimension = routing.GetDimensionOrDie("cost")
cost_vars = [cost_dimension.CumulVar(x) for x in indices]
costs = [assignment.Value(x) for x in cost_vars]
seems to work:
vehicle: 0
indices: [0, 1, 6]
nodes: [0, 1, 0]
costs: [0, 1, 2]
vehicle: 1
indices: [5, 4, 3, 2, 7]
nodes: [0, 4, 3, 2, 0]
costs: [0, 1, 2, 3, 4]
But when I add time constraints I run into problems. Let's first look at the code that defines the problem. The time unit is minutes.
def time_function(x,y):
return 30
evaluator = time_function
slack_max = 40
capacity = 24*60
fix_start_cumul_to_zero = False
name = "time"
routing.AddDimension(evaluator, slack_max, capacity, fix_start_cumul_to_zero, name)
time_dimension = routing.GetDimensionOrDie("time")
time_windows = [(7*60, 8*60),(9*60, 10*60),(9*60, 10*60),
(9*60, 10*60),(9*60, 10*60)]
for node, time_window in enumerate(time_windows):
time_dimension.CumulVar(node).SetRange(time_window[0], time_window[1])
routing.AddToAssignment(time_dimension.SlackVar(node))
So each trip takes 30 minutes, vehicles may be idle at a stop for 40 minutes (slack_max=40) and each stop should be serviced between 9am and 10am. The range constraints that are enforced via time_windows[0] are intended to define the starting times of each trip in the morning. But since the depot is the first and last stop of each route they could also be interpreted as arriving times in the evening.
So here is my first difficulty with time windows: the depot appears twice on each route but the range constraint is defined on nodes. I am assuming that the routing model is not designed to take two windows for the depot?
Let me continue to get to the second part of my question. I set fix_start_cumul_to_zero = False so that routes may start at any time. Also note that routing.AddToAssignment(time_dimension.SlackVar(node)) is supposed to give me access to the slack variables later. Now, when I inspect the time values per index, via
time_vars = [time_dimension.CumulVar(x) for x in indices]
times.append([assignment.Value(x) for x in time_vars])
formated with datetime, I get reasonable results:
vehicle: 0
indices: [0, 1, 6]
nodes: [0, 1, 0]
costs: [0, 1, 2]
times: ['7:50:00', '9:00:00', '9:30:00']
vehicle: 1
indices: [5, 4, 3, 2, 7]
nodes: [0, 4, 3, 2, 0]
costs: [0, 1, 2, 3, 4]
times: ['7:50:00', '9:00:00', '9:30:00', '10:00:00', '10:30:00']
The solver apparently favors early departure times. Given the max slack of 40min, each vehicle might also start a bit later, e.g. at 8am.
The trouble begins when I try to read the slack variables via
slack_vars = [time_dimension.SlackVar(x) for x in indices]
slacks = [assignment.Value(x) for x in slack_vars]
The program crashes with the message:
SystemError: returned NULL
without setting an error
which suggests that time_dimension does not have a slack variable for every index. Is that right? Why not?
Thanks for reading this excessive post. Here are the two questions:
Is it possible to define arrival and departure time windows for the depot?
How to properly read the slacks for time windows for all nodes, including the depo?
I'll answer question 2 first, since 1 is a guess from this answer...
2) First, Slack variable is only needed if you have a next node so there is not slack var for the end node.
basically if j is next(i) then cumul(j) = cumul(i) + transit(i,j) + slack(i)
You must use "index" not "node_index" for accessing/setting SlackVar.
i.e. there is N node_index (i.e. N locations including the depot) but
M = N-1/*depot*/ + num_vehicle*2 /*Start, End for each vehicle*/ index
i.e. each vehicle has a specific objet instance for its start and end node
-> that's why you nee to use routing.Start(i) to get the "start" index of the i-th vehicle (e.d.: in fact NodeToIndex(0) give you the start index for the vehicle 0).
def add_time_window_constraints(routing, data, time_evaluator):
"""Add Global Span constraint"""
time = "Time"
horizon = 120
routing.AddDimension(
time_evaluator,
horizon, # allow waiting time
horizon, # maximum time per vehicle
True, # start cumul to zero
time)
time_dimension = routing.GetDimensionOrDie(time)
for location_idx, time_window in enumerate(data.time_windows):
if location_idx == 0:
continue
time_dimension.CumulVar(routing.NodeToIndex(location_idx)).SetRange(time_window[0], time_window[1])
routing.AddToAssignment(time_dimension.SlackVar(routing.NodeToIndex(location_idx)))
for vehicle_id in xrange(data.num_vehicles):
routing.AddToAssignment(time_dimension.SlackVar(routing.Start(vehicle_id)))
# slack not defined for vehicle ends
#routing.AddToAssignment(time_dimension.SlackVar(self.routing.End(vehicle_id)))
1) /!\ NOT TESTED /!\
For time windows for departure I would use:
for vehicle_id in xrange(data.num_vehicles):
time_dimension.CumulVar(routing.Start(vehicle_id)).SetRange(begin, end)
/!\ you must set cumul_start_to_zero to False when creating the dimension if begin if > 0 otherwise it won't find a solution obviously /!\
i.e. You have to set it for each vehicle...
Your code barely work because node_index == index for the first nodes then it start to crash....
ps: I'm working on fixing the doc and examples for the index/node_index usage
Full example and tracked issue in google/or-tools #708.
ps: Thanks for having spotted this issue ;)
I have one scenario here. Let me explain by small example.
I have 10 pens, I have to give it to 3 people. Those person's ratio are like 6:6:1 means if I am giving 1 pen to Person C I have to give 6-6 pens to Person A and Person B.
I have tried to solve it by using simple calculation which I have described below.
PerPersonPen = (TotalCountofPens * PerPersonRatio)/(SumofAllPersonsRatio)
Person A =(Int) (10*6)/13 = 4
Person B = (Int) (10*6)/13 = 4
Person C = (Int) (10*1)/13 = 0
Here, Person C will get 0 pens, but it shouldn't happen. I would like that Person A & B are getting 4 pens, which are right. But Person C should get 1 pen also which remains.
This happens in a case of any person have 1 ratio count compare to others.
Can anybody help me to sort out this? Or How can I achieve it?
A simple method is to maintain a count of how many pens each person should get. Then, as long as they are pens to distribute, you give one to the person which should get the most pens.
Here is a work through of your example:
60/13, 60/13, 10/13 -> 1, 0, 0
47/13, 60/13, 10/13 -> 1, 1, 0
47/13, 47/13, 10/13 -> 2, 1, 0
34/13, 47/13, 10/13 -> 2, 2, 0
34/13, 34/13, 10/13 -> 3, 2, 0
21/13, 34/13, 10/13 -> 3, 3, 0
21/13, 21/13, 10/13 -> 4, 3, 0
8/13, 21/13, 10/13 -> 4, 4, 0
8/13, 8/13, 10/13 -> 4, 4, 1
8/13, 8/13, -3/13 -> 5, 4, 1
Laurent G has given a very useful link to Math Stack Exchange in the comment section, How to round numbers fairly
This post implicitely suggests the following algorithm :
For each person, compute the ratio x_i she should get (x_i = her share/sum of shares)
Set C_min = numberOfPens and C_max large enough such that sum_i (floor(x_i*C_max)) > numberOfPens (C_max = numberOfPens + 1/(min x_i) should work)
Do a dichotomic search between C_min and C_max until you find a C such that sum_i (floor(x_i*C)) = numberOfPens
This, however, might not work when there are ties (like in your example, where two ratios are of value 6/13). But what you can do is add a very little random noise (you add or substract less than half the smallest non-zero gap that exists between your original ratios) to your ratios to break ties and get a result in general, which will be "almost fair".
EDIT : I noticed that my initial choice of C_max was wrong, so I fixed it to something that works. However I think there is room for improvement. It is also possible to find the initial C_max by increasing k in 2^k numberOfPens until the criterion (sum_i(floor(x_i*C_max)) > numberOfPens) is satisfied.
When looking at your example it is clear that each persone should get at least the rsult you calculated.
And only some will get more.
I suggest the following algorithem:
calculate how much each presone should get in double. Same as you did without the cast.
sum the descrete sum of all persons (8 in our example).
Sort the persons by the Mantissa (After the point vaule).
For the first k persons Use Math.Ceiling(assigned value) where k is the desired value - sum we calculated in step 2. 10-8 in our example.
For the other persons use regular cast like (int) (assigned value).
Something like
int_weights = [6, 6, 1]
total_weights = sum(int_weights)
to_share = 10
still_to_share = to_share
left_weight_total = total_weights
shares = []
for weight in int_weights:
s = int(0.5 + (float(still_to_share) * weight / left_weight_total))
still_left -= s
left_weight_total -= weight
shares.append(s)
That gives quite but not absolutely fair (which can not be achieved with ties, cf the link I posted in comment of the Question)
shares
[5, 4, 1]
sum(shares)
0
If you want to achieve fairness to the ties, the best result would be [4, 4, 2] (or [5, 5, 0] [:auto_censored_but_not_totally:] ;-) ?) that you can reach by sorting and take into account previous allocation(s). Until I get to a good argument to be that picky I would keep with a solution like shown here-above.
I've got some data that represents periodic motion. So, it goes from a high to a low and back again; if you were to plot it, it would like a sine wave. However, the amplitude varies slightly in each cycle. I would like to make a list of each maximum and minimum in the entire sequence. If there were 10 complete cycles, I would end up with 20 numbers, 10 positive (high) and 10 negative (low).
It seems like this is a job for time series analysis, but I'm not familiar with statistics enough to know for sure.
I'm working in python.
Can anybody give me some guidance as far as possible code libraries and terminology?
This isn't an overly complicated problem if you didn't want to use a library, something like this should do what you want. Basically as you iterate through the data if you go from ascending to descending you have a high, and from descending to ascending you have a low.
def get_highs_and_lows(data):
prev = data[0]
high = []
low = []
asc = None
for value in data[1:]:
if not asc and value > prev:
asc = True
low.append(prev)
elif (asc is None or asc) and value < prev:
asc = False
high.append(prev)
prev = value
if asc:
high.append(data[-1])
else:
low.append(data[-1])
return (high, low)
>>> data = [0, 1, 2, 1, 0, -2, 0, 2, 4, 2, 6, 8, 4, 0, 2, 4]
>>> print str(get_highs_and_lows(data))
([2, 4, 8, 4], [0, -2, 2, 0])
You'll probably need to familiarize yourself with some of the popular python science/statistics libraries. numpy comes to mind.
And here's an item from the SciPy mailing list discussing how to do what you want using numpy.
If x is a list of your data, and you happen to know the cycle length, T, try this:
# Create 10 1000-sample cycles of a noisy sine wave.
T = 1000
x = scipy.sin(2*scipy.pi*scipy.arange(10*T)/T) + 0.1*scipy.randn(10*T)
# Find the maximum and minimum of each cycle.
[(min(x[i:i+T]), max(x[i:i+T])) for i in range(0, len(x), T)]
# prints the following:
[(-1.2234858463372265, 1.2508648231644286),
(-1.2272859833650591, 1.2339382830978067),
(-1.2348835727451217, 1.2554960382962332),
(-1.2354184224872098, 1.2305636540601534),
(-1.2367724101594981, 1.2384651681019756),
(-1.2239698560399894, 1.2665865375358363),
(-1.2211500568892304, 1.1687268390393153),
(-1.2471220836642811, 1.296787070454136),
(-1.3047322264307399, 1.1917835644190464),
(-1.3015059337968433, 1.1726658435644288)]
Note that this should work regardless of the phase offset of the sinusoid (with high probability).