Random distribution of integers with fixed total [duplicate]

Random distribution of integers with fixed total [duplicate] - python

So here is the deal: I want to (for example) generate 4 pseudo-random numbers, that when added together would equal 40. How could this be dome in python? I could generate a random number 1-40, then generate another number between 1 and the remainder,etc, but then the first number would have a greater chance of "grabbing" more.

Here's the standard solution. It's similar to Laurence Gonsalves' answer, but has two advantages over that answer.
It's uniform: each combination of 4 positive integers adding up to 40 is equally likely to come up with this scheme.
and
it's easy to adapt to other totals (7 numbers adding up to 100, etc.)
import random
def constrained_sum_sample_pos(n, total):
"""Return a randomly chosen list of n positive integers summing to total.
Each such list is equally likely to occur."""
dividers = sorted(random.sample(range(1, total), n - 1))
return [a - b for a, b in zip(dividers + [total], [0] + dividers)]
Sample outputs:
>>> constrained_sum_sample_pos(4, 40)
[4, 4, 25, 7]
>>> constrained_sum_sample_pos(4, 40)
[9, 6, 5, 20]
>>> constrained_sum_sample_pos(4, 40)
[11, 2, 15, 12]
>>> constrained_sum_sample_pos(4, 40)
[24, 8, 3, 5]
Explanation: there's a one-to-one correspondence between (1) 4-tuples (a, b, c, d) of positive integers such that a + b + c + d == 40, and (2) triples of integers (e, f, g) with 0 < e < f < g < 40, and it's easy to produce the latter using random.sample. The correspondence is given by (e, f, g) = (a, a + b, a + b + c) in one direction, and (a, b, c, d) = (e, f - e, g - f, 40 - g) in the reverse direction.
If you want nonnegative integers (i.e., allowing 0) instead of positive ones, then there's an easy transformation: if (a, b, c, d) are nonnegative integers summing to 40 then (a+1, b+1, c+1, d+1) are positive integers summing to 44, and vice versa. Using this idea, we have:
def constrained_sum_sample_nonneg(n, total):
"""Return a randomly chosen list of n nonnegative integers summing to total.
Each such list is equally likely to occur."""
return [x - 1 for x in constrained_sum_sample_pos(n, total + n)]
Graphical illustration of constrained_sum_sample_pos(4, 10), thanks to #FM. (Edited slightly.)
0 1 2 3 4 5 6 7 8 9 10 # The universe.
| | # Place fixed dividers at 0, 10.
| | | | | # Add 4 - 1 randomly chosen dividers in [1, 9]
a b c d # Compute the 4 differences: 2 3 4 1

Use multinomial distribution
from numpy.random import multinomial
multinomial(40, [1/4.] * 4)
Each variable will be distributed as a binomial distribution with mean n * p equal to 40 * 1/4 = 10 in this example.

b = random.randint(2, 38)
a = random.randint(1, b - 1)
c = random.randint(b + 1, 39)
return [a, b - a, c - b, 40 - c]
(I assume you wanted integers since you said "1-40", but this could be easily generalized for floats.)
Here's how it works:
cut the total range in two randomly, that's b. The odd range is because there are going to be at least 2 below the midpoint and at least 2 above. (This comes from your 1 minimum on each value).
cut each of those ranges in two randomly. Again, the bounds are to account for the 1 minimum.
return the size of each slice. They'll add up to 40.

Generate 4 random numbers, compute their sum, divide each one by the sum and multiply by 40.
If you want Integers, then this will require a little non-randomness.

There are only 37^4 = 1,874,161 arrangements of four integers in the range [1,37] (with repeats allowed). Enumerate them, saving and counting the permutations that add up to 40.
(This will be a much smaller number, N).
Draw uniformly distributed random integers K in the interval [0, N-1] and return the K-th permutation. This can easily be seen to guarantee a uniform distribution over the space of possible outcomes, with each sequence position identically distributed. (Many of the answers I'm seeing will have the final choice biased lower than the first three!)

If you want true randomness then use:
import numpy as np
def randofsum_unbalanced(s, n):
# Where s = sum (e.g. 40 in your case) and n is the output array length (e.g. 4 in your case)
r = np.random.rand(n)
a = np.array(np.round((r/np.sum(r))*s,0),dtype=int)
while np.sum(a) > s:
a[np.random.choice(n)] -= 1
while np.sum(a) < s:
a[np.random.choice(n)] += 1
return a
If you want a greater level of uniformity then take advantage of the multinomial distribution:
def randofsum_balanced(s, n):
return np.random.multinomial(s,np.ones(n)/n,size=1)[0]

Building on #markdickonson by providing some control over distribution between the divisors. I introduce a variance/jiggle as a percentage of the uniform distance between each.
def constrained_sum_sample(n, total, variance=50):
"""Return a random-ish list of n positive integers summing to total.
variance: int; percentage of the gap between the uniform spacing to vary the result.
"""
divisor = total/n
jiggle = divisor * variance / 100 / 2
dividers = [int((x+1)*divisor + random.random()*jiggle) for x in range(n-1)]
result = [a - b for a, b in zip(dividers + [total], [0] + dividers)]
return result
Sample output:
[12, 8, 10, 10]
[10, 11, 10, 9]
[11, 9, 11, 9]
[11, 9, 12, 8]
The idea remains to divide the population equally, then randomly move them left or right within the given range. Since each value is still bound to the uniform point we don't have to worry about it drifting.
Good enough for my purposes, but not perfect. eg: the first number will always vary higher, and the last will always vary lower.

Related

maximum difference in the summation of two subset

I have an array with N elements. I have to divide the array into two subset such that one subset has exactly M elements and the other subset has the rest. Dividing the items into subset in a way such that the difference in the summation of elements between the two subset is the maximum.
Example:
array size, N = 6
Number of element in one group, M = 3
elements = {2, 6, 3, 5, 10, 4}
The summation of subset 1 = 2 + 3 + 4 = 9
The summation of subset 2 = 6+ 5 + 10 = 21
The difference in subset = 21 - 9 = 12.
Note, this is the maximum difference possible.
I wrote following logic in python.
items = list(map(int, input().split()))
items.sort()
left = items[:M]
right = items[M:]
print(sum(right)-sum(left))
Not working when my input array is {100, 100, 150} and M = 2; Its giving me answer 50. But correct answer will be 150.
Constraints
1<= N <=1000 // N is size of array
1<= M < N // M is the size of subset
1<= array element <=10000000 // elements
What will be the approach to solve this problem?

You need to sort first which you got it. Now you can take M elements from either from start or from the end. Consider both cases and take max.
items.sort()
left_sum = sum(items[:M])
left_sum_2 = sum(items[:-M])
right_sum = sum(items)-left
right_sum_2 = sum(items)-left_sum_2
print(max(abs(right_sum_2-left_sum_2), abs(right_sum-left_sum)))

I suppose you should check two cases: the difference between the M lowest elements and the N-M highest ones, as you already did; and instead the difference between the M highest and the N-M lowest. Just return the biggest of the two. This is still O(n log n) by the way

How can realize fair numbering for two counters/markers?

Suppose Alice & Bob have to write from page 1 to page 200. According to this simple division, Alice will write 1, 2, 3... until 100. Bob will write 101, 102, 103... to 200. Bob will have to write a lot more digits than Alice! Let's say Alice & Bob are counters or markers for numbering, so how we can fairly split up this numbering task?
Considering two integers, start & end, for the starting and ending page numbers (inclusive) defining the range of pages that needs handwritten numbering.
A page number has to be written by either Alice or Bob. They cannot jointly write one number.
Page numbers are all decimal integers count from 1. The missing number of pages can start from page 1 or anywhere in the middle of the notes.
Input: There are multiple tests in each test case.
Line 1: Integer N, the number of tests to follow.
Following N lines: Each line has two integers, st ed, for the starting and ending page numbers (inclusive) defining the range of pages that needs handwritten numbering.
#Input examples
4 #N=4 it means 4 following lines
1 200
8 10
9 10
8 11
import sys
import math
n = int(input()) #1 ≤ N ≤ 200
for i in range(n): #1 ≤ start < end ≤ 10,000,000
start, end = [int(j) for j in input().split()]
Output:
N Lines: For the test in the input, It should be written the corresponding last page number that should be responsible by Alice to write the needed page number on.
#Output examples
118
9
9
9
I was trying to get inspired by this post on fair casting dice unsuccessfully. I also was wondering the solution is far from Checking number of elements for Counter
.

First thing to note is this cannot be done, consider the sequence 99, 100. You cannot split this up fairly. In saying that you can get pretty close +- 1 digit, this assumes you always start counting from 1.
start = 1
end = 200
bobs_numbers = []
alices_numbers = []
count = 0
for i in range(end, start - 1, -1):
if count > 0:
bobs_numbers.append(i)
count -= len(str(i))
else:
alices_numbers.append(i)
count += len(str(i))
print(bobs_numbers, alices_numbers, count)

This is an answer to the initial question. Since the question has been changed, I posted another answer for the new question.
The initial question was: Partition the set [1, 200] into two subsets such that the total number of digits in one subset is as close to possible to the total number of digits in the other subset.
Since user Mitchel Paulin already gave a straightforward iterative solution, let me give an arithmetic solution.
Counting the number of digits
First, let's count the total number of digits we want to split between Alice and Bob:
there are 9 numbers with 1 digit;
there are 90 numbers with 2 digits;
there are 101 numbers with 3 digits.
Total: 492 digits.
We want to give 246 digits to Alice and 246 digits to Bob.
How to most simply get 246 digit by summing up numbers with 1, 2 and 3 digits?
246 = 3 * 82.
Let's give 82 numbers with 3 digits to Bob, and all the other numbers to Alice.
Finally Bob can handle numbers [119, 200] and Alice can handle numbers [1, 118].
Generalizing to any range [1, n]
Counting the numbers of numbers with each possible number of digits should be O(log n).
Dividing by 2 to get the number of digits for Bob is O(1).
Decomposing this number using dynamic programming is linear in the maximum number of digits, i.e., O(log n) space and time (this is exactly the coin change problem).
Transforming this decomposition into a union of ranges is straightforward, and linear in the maximum number of digits, so again O(log n). Deducing the ranges for Alice by "subtracting" Bob's ranges from [1, n] is also straightforward.
Conclusion: the algorithm is O(log n) space and time, as opposed to Mitchel Paulin's O(n) algorithm. The output is also logarithmic instead of linear, since it can be written as a union of ranges, instead of a long list.
This algorithm is a bit more complex to write, but the output being in the form of ranges mean that Alice and Bob won't bother each other too much by writing adjacent pages, which they would do a lot with the simpler algorithm (which mostly alternates between giving a number to Bob and giving a number to Alice).

Since the question has changed, this is an answer the new question.
The new question is: Given a range [a, b], find number m such that the total number of digits in range [a, m] is as close as possible to the number of digits in range [m+1, b].
Algorithm explanation
The algorithm is simple: Start with m = (a + b) / 2, count the digits, then move m to the right or to the left to adjust.
To count the total number of digits in a range [1, n], we first count the number of unit digits (which is n); then add the number of tens digits (which is n - 9; then add the number of hundreds digits (which is n - 99); etc.
To count the total number of digits in a range [a, b], we take the difference between the total number of digits in ranges [1, b] and [1, a-1].
Note that the number of digits of a given number n > 1 is given by any of the two expressions math.ceil(math.log10(n)) and len(str(n)). I used the former in the code below. If you have a phobia of logarithms, you can replace it with the latter; in which case import math is no longer needed.
Code in python
import math
def count_digits_from_1(n):
power_of_ten = math.ceil(math.log10(n))
total_digits = 0
for i in range(1, power_of_ten+1):
digits_at_pos_i = n - (10**(i-1) - 1)
total_digits += digits_at_pos_i
return total_digits
def count_digits(a, b):
if a > 2:
return count_digits_from_1(b) - count_digits_from_1(a-1)
else:
return count_digits_from_1(b) - (a - 1) # assumes a >= 1
def share_digits(a, b):
total_digits = count_digits(a, b)
m = (a + b) // 2
alices_digits = count_digits(a, m)
bobs_digits = total_digits - alices_digits
direction = 1 if alices_digits < bobs_digits else -1
could_be_more_fair = True
while (could_be_more_fair):
new_m = m + direction
diff = math.ceil(math.log10(new_m))
new_alices_digits = alices_digits + direction * diff
new_bobs_digits = bobs_digits - direction * diff
if abs(alices_digits - bobs_digits) > abs(new_alices_digits - new_bobs_digits):
alices_digits = new_alices_digits
bobs_digits = new_bobs_digits
m = new_m
else:
could_be_more_fair = False
return ((a, m), (m+1, b))
if __name__=='__main__':
for (a, b) in [(1, 200), (8, 10), (9, 10), (8, 11)]:
print('{},{} ---> '.format(a,b), end='')
print(share_digits(a, b))
Output:
1,200 ---> ((1, 118), (119, 200))
8,10 ---> ((8, 9), (10, 10))
9,10 ---> ((9, 9), (10, 10))
8,11 ---> ((8, 10), (11, 11))
Remark: This code uses the assumption 1 <= a <= b.
Performance analysis
Function count_digits_from1 executes in O(log n); its for loop iterates over the position of the digits to count the number of unit digits, then the number of tens digits, then the number of hundreds digits, etc. There are log10(n) positions.
The question is: how many iterations will the while loop in share_digits have?
If we're lucky, the final value of m will be very close to the initial value (a+b)//2, so the number of iterations of this loop might be O(1). This remains to be proven.
If the number of iterations of this loop is too high, the algorithm could be improved by getting rid of this loop entirely, and calculating the final value of m directly. Indeed, replacing m with m+1 or m-1 changes the difference abs(alices_digits - bobs_digits) by exactly two times the number of digits of m+1 (or m-1). Therefore, the final value of m should be given approximately by:
new_m = m + direction * abs(alices_digits - bobs_digits) / (2 * math.ceil(math.log10(m)))

How to optimize (3*O(n**2)) + O(n) algorithm?

I am trying to solve the arithmetic progression problem from USACO. Here is the problem statement.
An arithmetic progression is a sequence of the form a, a+b, a+2b, ..., a+nb where n=0, 1, 2, 3, ... . For this problem, a is a non-negative integer and b is a positive integer.
Write a program that finds all arithmetic progressions of length n in the set S of bisquares. The set of bisquares is defined as the set of all integers of the form p2 + q2 (where p and q are non-negative integers).
The two lines of input are n and m, which are the length of each sequence, and the upper bound to limit the search of the bi squares respectively.
I have implemented an algorithm which correctly solves the problem, yet it takes too long. With the max constraints of n = 25 and m = 250, my program does not solve the problem in the 5 second time limit.
Here is the code:
n = 25
m = 250
bisq = set()
for i in range(m+1):
for j in range(i,m+1):
bisq.add(i**2+j**2)
seq = []
for b in range(1, max(bisq)):
for a in bisq:
x = a
for i in range(n):
if x not in bisq:
break
x += b
else:
seq.append((a,b))
The program outputs the correct answer, but it takes too long. I tried running the program with the max n/m values, and after 30 seconds, it was still going.

Disclaimer: this is not a full answer. This is more of a general direction where to look for.
For each member of a sequence, you're looking for four parameters: two numbers to be squared and summed (q_i and p_i), and two differences to be used in the next step (x and y) such that
q_i**2 + p_i**2 + b = (q_i + x)**2 + (p_i + y)**2
Subject to:
0 <= q_i <= m
0 <= p_i <= m
0 <= q_i + x <= m
0 <= p_i + y <= m
There are too many unknowns so we can't get a closed form solution.
let's fix b: (still too many unknowns)
let's fix q_i, and also state that this is the first member of the sequence. I.e., let's start searching from q_1 = 0, extend as much as possible and then extract all sequences of length n. Still, there are too many unknowns.
let's fix x: we only have p_i and y to solve for. At this point, note that the range of possible values to satisfy the equation is much smaller than full range of 0..m. After some calculus, b = x*(2*q_i + x) + y*(2*p_i + y), and there are really not many values to check.
This last step prune is what distinguishes it from the full search. If you write down this condition explicitly, you can get the range of possible p_i values and from that find the length of possible sequence with step b as a function of q_i and x. Rejecting sequences smaller than n should further prune the search.
This should get you from O(m**4) complexity to ~O(m**2). It should be enough to get into the time limit.

A couple more things that might help prune the search space:
b <= 2*m*m//n
a <= 2*m*m - b*n
An answer on math.stackexchange says that for a number x to be a bisquare, any prime factor of x of the form 3 + 4k (e.g., 3, 7, 11, 19, ...) must have an even power. I think this means that for any n > 3, b has to be even. The first item in the sequence a is a bisquare, so it has an even number of factors of 3. If b is odd, then one of a+1b or a+2b will have an odd number of factors of 3 and therefore isn't a bisquare.

Code to maximize the sum of squares modulo m

Inputs:
k-> number of lists
m->modulo
Constraints
1<=k<=7
1<=M<=1000
1<=Magnitude of elements in list<=10*9
1<=Elements in each list<=7
`
This snippet of code is responsible for maximizing (x1^2 + x2^2 + ...) % m where x1, x2, ... are chosen from lists X1, X2, ...
k,m=map(int,input().split())
Sum=0
s=[]
for _ in range(k):
s.append(max(map(int,input().split())))
Sum+=int(s[_])**2
print(Sum%m)
So for instance if inputs are :
3 1000
2 5 4
3 7 8 9
5 5 7 8 9 10
The output would be 206, owing to selecting highest element in each list, square that element, take the sum and perform modulus operation using m
So, it would be (5^2+9^2+10^2)%1000=206
If I provide input like,
3 998
6 67828645 425092764 242723908 669696211 501122842 438815206
4 625649397 295060482 262686951 815352670
3 100876777 196900030 523615865
The expected output is 974, but I am getting 624
I would like to know how you would approach this problem or how to correct existing code.

You have to find max((sum of squares) modulo m). That's not the same as max(sum of squares) modulo m.
It may be that you find a sum of squares that's not in absolute terms as large as possible, but is maximum when you take it modulo m.
For example:
m=100
[10, 9],
[10, 5]
Here, the maximum sum of squares is 100 + 100 = 200, which is 0 modulo 100. The maximum (sum of squares modulo 100) is (81 + 100) = 182, which is 82 modulo 100.
Given that m is forced to be small, there's an fast dynamic programming solution that runs in O(m * N) time, where N is the total number of items in all the lists.
def solve(m, xxs):
r = [1] + [0] * (m - 1)
for xs in xxs:
s = [0] * m
for i in xrange(m):
for x in xs:
xx = (x * x) % m
s[i] += r[(i - xx) % m]
r = s
return max(i for i in xrange(m) if r[i])
m = 998
xxs = [
[67828645, 425092764, 242723908, 669696211, 501122842, 438815206],
[625649397, 295060482, 262686951, 815352670],
[100876777, 196900030, 523615865]]
print solve(m, xxs)
This outputs 974 as required.

One important logical problem here is you have to skip the number of items in each list while find the max element in your for loop. That is, instead of
Example,
6 67828645 425092764 242723908 669696211 501122842 438815206
and your data is
67828645 425092764 242723908 669696211 501122842 438815206
That is,
input().split()
You have to use,
input().split()[1:]
As pointed by Paul Hankin, you basically need to find max(sum of powers % m)
You have to find the combination from three lists whose sum%m is max.
So, this is basically,
You scan the input, split with space, leaving the first element which is the number of values in each line,you map them to integers. And then, you find the squares and append them to a list s. Having that you find the product(itertools module) Example - product([1,2],[3,4,5]) will give, [(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3)]. Now, you can find the sum of each such result % m and find the max value!
That is,
k,m=map(int,input().split())
from itertools import product
s=[]
for _ in range(k):
s.append(map(lambda x:x**2,map(int,input().split()[1:])))
print(max([sum(i)%m for i in product(*s)]))
Try it online!
This will give you the desired output!
Hope it helps!

Your question is not very clear. However, if I understand it correctly, you have lists of possible values for f(X1), ..., f(Xn) (probably obtained by applying f to all possible values for X1, ..., Xn), and you want to maximize f(X1)^2 + ... + f(Xn)^2 ?
If so, your code seems good, I get the same result:
lists = [[6, 67828645, 425092764, 242723908, 669696211, 501122842, 438815206],
[4, 625649397, 295060482, 262686951, 815352670],
[3, 100876777, 196900030, 523615865]]
sum = 0
for l in lists:
sum += max(l)**2
print(sum%998)
This print 624, just like your code. Where are you getting the 974 from ?

Not going to win any codegolf with this but here was my solution:
from functools import reduce
def get_input():
"""
gets input from stdin.
input format:
3 1000
2 5 4
3 7 8 9
5 5 7 8 9 10
"""
k, m = [int(i) for i in input().split()]
lists = []
for _ in range(k):
lists.append([int(i) for i in input().split()[1:]])
return m, k, lists
def maximise(m, k, lists):
"""
m is the number by which the sum of squares is modulo'd
k is the number of lists in the list of lists
lists is the list of lists containing vals to be sum of squared
maximise aims to maximise S for:
S = (f(x1) + f(x2)...+ f(xk)) % m
where:
f(x) = x**2
"""
max_value = reduce(lambda x,y: x+y**2, [max(l) for l in lists], 0)
# check whether the max sum of squares is greater than m
# if it is the answer has to be the max
if max_value < m:
print(max_value)
return
results = []
for product in cartesian_product(lists):
S = reduce(lambda x, y: x + y**2, product, 0) % m
if S == m-1:
print(S)
return
results.append(S)
print(max(results))
def cartesian_product(ll, accum=None):
"""
all combinations of lists made by combining one element from
each list in a list of lists (cartesian product)
"""
if not accum:
accum = []
for i in range(len(ll[0])):
if len(ll) == 1:
yield accum + [ll[0][i]]
else:
yield from cartesian_product(ll[1:], accum + [ll[0][i]])
if __name__ == "__main__":
maximise(*get_input())

Creating a N sets of combinations of variables from the given range of each variable

This may be a very vague question -- I apologize in advance.
y is a function of a,b,c,d,e.
a can go from 1 to 130; b from 0.5 to 1; c from 3 to 10; d from 0 to 1; and e is 1-d.
Is there a way in python I can create N (say, 10,000) sets of combinations of a,b,c,d and e from the given range?

itertools is a great library for creating iterators like that. In your case, it is product that we need. However, you cannot set the number of point you want. You are going to have to derive that mathematically and translate it into steps (the 1s at the end of the range() function).
a = [i for i in range(1, 131, 1)]
b = [i/10 for i in range(5, 11,1)]
c = [i for i in range(3, 11, 1)]
d = [i/10 for i in range(0, 11, 1)]
from itertools import product
combs = product(a, b, c, d)
for i, x in enumerate(combs, 1):
x = list(x)
x.append(1-x[-1])
print(i, x) # example print: 52076 [99, 0.8, 9, 0.1, 0.9]
The example above produces: 130 * 6 * 8 * 11 = 68640 combinations
These are more than you requested but you get the point. You can also decide to have a variable more ro less finely discretised.
I am also assuming a & c are integer variables..

I assume you want floating point numbers for all of these. If you want ints use random.randint
from random import uniform
inputs = [[uniform(1,130), uniform(.5, 1), uniform(3,10), uniform(0,1)] for _ in range(N)]
for input_list in inputs:
input_list.append(1-input_list[-1])
a(*input_list)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.