maximum difference in the summation of two subset

maximum difference in the summation of two subset - python

I have an array with N elements. I have to divide the array into two subset such that one subset has exactly M elements and the other subset has the rest. Dividing the items into subset in a way such that the difference in the summation of elements between the two subset is the maximum.
Example:
array size, N = 6
Number of element in one group, M = 3
elements = {2, 6, 3, 5, 10, 4}
The summation of subset 1 = 2 + 3 + 4 = 9
The summation of subset 2 = 6+ 5 + 10 = 21
The difference in subset = 21 - 9 = 12.
Note, this is the maximum difference possible.
I wrote following logic in python.
items = list(map(int, input().split()))
items.sort()
left = items[:M]
right = items[M:]
print(sum(right)-sum(left))
Not working when my input array is {100, 100, 150} and M = 2; Its giving me answer 50. But correct answer will be 150.
Constraints
1<= N <=1000 // N is size of array
1<= M < N // M is the size of subset
1<= array element <=10000000 // elements
What will be the approach to solve this problem?

You need to sort first which you got it. Now you can take M elements from either from start or from the end. Consider both cases and take max.
items.sort()
left_sum = sum(items[:M])
left_sum_2 = sum(items[:-M])
right_sum = sum(items)-left
right_sum_2 = sum(items)-left_sum_2
print(max(abs(right_sum_2-left_sum_2), abs(right_sum-left_sum)))

I suppose you should check two cases: the difference between the M lowest elements and the N-M highest ones, as you already did; and instead the difference between the M highest and the N-M lowest. Just return the biggest of the two. This is still O(n log n) by the way

Related

How can I print the numbers which have a sum that equals the cube of a number

n = 5
cube = n**3
def get_sum(n):
a1 = n * (n - 1) + 1
for i in range(a1, cube, 2):
print(i, end='+')
print(f'{get_sum(n)}')
print(cube)
I have output:
21+23+25+27+29+31+33+35+37+39+41+43+45+47+49+51+53+55+57+59+61+63+65+67+69+71+73+75+77+79+81+83+85+87+89+91+93+95+97+99+101+103+105+107+109+111+113+115+117+119+121+123+None
125
How can I get a range till 29 so the sum of these numbers will be equal to cube in Python?
For example, 21+23+25+27+29 = 5^3

first, no need to write print(f'{get_sum(n)}') since your function doesn't return anything except None which you can see in your output, get_sum(n) is enough.
since you are always looping n times, you can simplify your condition, in my solution I used a while loop with a sum variable to keep tabs with the current sum of numbers.
you can apply the same logic with a for loop of course, this is just my implementation.
def get_sum(n):
a1 = n * (n - 1) + 1
sum = a1
while sum < cube:
print(a1, end='+')
a1+=2
sum+=a1
print(a1, end='=')
n = 5
cube = n**3
get_sum(n)
print(cube)
output:
21+23+25+27+29=125

Inefficient approach:
Keep a variable that tracks the current sum to check if we need to break the loop or not (as mentioned in the other answers).
Efficient Approach:
n^3 can be expressed as a sum of n odd integers, which are symmetric about n^2. Examples:
3^3 = 7+9+11 (symmetric about 9)
4^3 = 13+15+17+19 (symmetric about 16)
5^3 = 21+23+25+27+29 (symmetric about 25)
Use this approach to get a simpler algorithm

Generate an Asymmetric NxM matrix whose Rows and Columns Independently Sum to 1

Given a target matrix size of N rows and M columns, is it possible to pick values such that all rows and columns sum to 1, on the condition that the matrix is not symmetric across the diagonal? Here's a target matrix I was able to generate when N==M (The problems arise when N!=M - see below):
[[0.08345877 0.12844672 0.90911941 0.41964704 0.57709569]
[0.53949086 0.07965491 0.62582134 0.48922244 0.38357809]
[0.80619328 0.27581426 0.31312973 0.26855717 0.4540732 ]
[0.11803505 0.88201276 0.1990759 0.2818701 0.63677383]
[0.57058968 0.75183898 0.07062126 0.6584709 0.06624682]]
I'm writing this in numpy. Currently I've written the following (brute force) code, which I know works when n==m. However, if n != m, rowwise and columwise sums don't converge to 0, and the ratio of rowwise sums to columwise sums converges to (n/m):
n,m = (5,4)
mat = np.random.random((n,m))
for i in range(100):
s0 = mat.sum(0)
s1 = mat.sum(1)[:,newaxis]
mat = (mat/s0)
mat = (mat/s1)
if i%10 == 0:
print(s0[0]/s1[0,0])
The final output in this case is 1.25 (I.e. n/m, or 5/4). I'm beginning to think this might not be mathematically possible. Can someone prove me wrong?

I suspect you are correct, the problem cannot be solved if N != M.
Take a 2x3 matrix as an example:
[[a b c]
[d e f]]
Assume that all rows and all columns sum to 1 and show a contradiction. The rows sum to 1 so:
a+b+c = 1
d+e+f = 1
This gives:
(a+b+c)+(d+e+f) = 1 + 1 = 2
Now look at the columns. Each column also sums to 1 so we have:
a+d = 1
b+e = 1
c+f = 1
Combining the three column equations gives:
(a+d)+(b+e)+(c+f) = 1 + 1 + 1 = 3
Since the sum of all six matrix elements cannot be both 2 and 3 simultaneously, 2 != 3, the initial assumption leads to a contradiction and so is disproved. More generally the problem cannot be solved for N != M with N rows and M columns.
The contradiction disappears when N = M for a square matrix.

How to optimize (3*O(n**2)) + O(n) algorithm?

I am trying to solve the arithmetic progression problem from USACO. Here is the problem statement.
An arithmetic progression is a sequence of the form a, a+b, a+2b, ..., a+nb where n=0, 1, 2, 3, ... . For this problem, a is a non-negative integer and b is a positive integer.
Write a program that finds all arithmetic progressions of length n in the set S of bisquares. The set of bisquares is defined as the set of all integers of the form p2 + q2 (where p and q are non-negative integers).
The two lines of input are n and m, which are the length of each sequence, and the upper bound to limit the search of the bi squares respectively.
I have implemented an algorithm which correctly solves the problem, yet it takes too long. With the max constraints of n = 25 and m = 250, my program does not solve the problem in the 5 second time limit.
Here is the code:
n = 25
m = 250
bisq = set()
for i in range(m+1):
for j in range(i,m+1):
bisq.add(i**2+j**2)
seq = []
for b in range(1, max(bisq)):
for a in bisq:
x = a
for i in range(n):
if x not in bisq:
break
x += b
else:
seq.append((a,b))
The program outputs the correct answer, but it takes too long. I tried running the program with the max n/m values, and after 30 seconds, it was still going.

Disclaimer: this is not a full answer. This is more of a general direction where to look for.
For each member of a sequence, you're looking for four parameters: two numbers to be squared and summed (q_i and p_i), and two differences to be used in the next step (x and y) such that
q_i**2 + p_i**2 + b = (q_i + x)**2 + (p_i + y)**2
Subject to:
0 <= q_i <= m
0 <= p_i <= m
0 <= q_i + x <= m
0 <= p_i + y <= m
There are too many unknowns so we can't get a closed form solution.
let's fix b: (still too many unknowns)
let's fix q_i, and also state that this is the first member of the sequence. I.e., let's start searching from q_1 = 0, extend as much as possible and then extract all sequences of length n. Still, there are too many unknowns.
let's fix x: we only have p_i and y to solve for. At this point, note that the range of possible values to satisfy the equation is much smaller than full range of 0..m. After some calculus, b = x*(2*q_i + x) + y*(2*p_i + y), and there are really not many values to check.
This last step prune is what distinguishes it from the full search. If you write down this condition explicitly, you can get the range of possible p_i values and from that find the length of possible sequence with step b as a function of q_i and x. Rejecting sequences smaller than n should further prune the search.
This should get you from O(m**4) complexity to ~O(m**2). It should be enough to get into the time limit.

A couple more things that might help prune the search space:
b <= 2*m*m//n
a <= 2*m*m - b*n
An answer on math.stackexchange says that for a number x to be a bisquare, any prime factor of x of the form 3 + 4k (e.g., 3, 7, 11, 19, ...) must have an even power. I think this means that for any n > 3, b has to be even. The first item in the sequence a is a bisquare, so it has an even number of factors of 3. If b is odd, then one of a+1b or a+2b will have an odd number of factors of 3 and therefore isn't a bisquare.

Random distribution of integers with fixed total [duplicate]

So here is the deal: I want to (for example) generate 4 pseudo-random numbers, that when added together would equal 40. How could this be dome in python? I could generate a random number 1-40, then generate another number between 1 and the remainder,etc, but then the first number would have a greater chance of "grabbing" more.

Here's the standard solution. It's similar to Laurence Gonsalves' answer, but has two advantages over that answer.
It's uniform: each combination of 4 positive integers adding up to 40 is equally likely to come up with this scheme.
and
it's easy to adapt to other totals (7 numbers adding up to 100, etc.)
import random
def constrained_sum_sample_pos(n, total):
"""Return a randomly chosen list of n positive integers summing to total.
Each such list is equally likely to occur."""
dividers = sorted(random.sample(range(1, total), n - 1))
return [a - b for a, b in zip(dividers + [total], [0] + dividers)]
Sample outputs:
>>> constrained_sum_sample_pos(4, 40)
[4, 4, 25, 7]
>>> constrained_sum_sample_pos(4, 40)
[9, 6, 5, 20]
>>> constrained_sum_sample_pos(4, 40)
[11, 2, 15, 12]
>>> constrained_sum_sample_pos(4, 40)
[24, 8, 3, 5]
Explanation: there's a one-to-one correspondence between (1) 4-tuples (a, b, c, d) of positive integers such that a + b + c + d == 40, and (2) triples of integers (e, f, g) with 0 < e < f < g < 40, and it's easy to produce the latter using random.sample. The correspondence is given by (e, f, g) = (a, a + b, a + b + c) in one direction, and (a, b, c, d) = (e, f - e, g - f, 40 - g) in the reverse direction.
If you want nonnegative integers (i.e., allowing 0) instead of positive ones, then there's an easy transformation: if (a, b, c, d) are nonnegative integers summing to 40 then (a+1, b+1, c+1, d+1) are positive integers summing to 44, and vice versa. Using this idea, we have:
def constrained_sum_sample_nonneg(n, total):
"""Return a randomly chosen list of n nonnegative integers summing to total.
Each such list is equally likely to occur."""
return [x - 1 for x in constrained_sum_sample_pos(n, total + n)]
Graphical illustration of constrained_sum_sample_pos(4, 10), thanks to #FM. (Edited slightly.)
0 1 2 3 4 5 6 7 8 9 10 # The universe.
| | # Place fixed dividers at 0, 10.
| | | | | # Add 4 - 1 randomly chosen dividers in [1, 9]
a b c d # Compute the 4 differences: 2 3 4 1

Use multinomial distribution
from numpy.random import multinomial
multinomial(40, [1/4.] * 4)
Each variable will be distributed as a binomial distribution with mean n * p equal to 40 * 1/4 = 10 in this example.

b = random.randint(2, 38)
a = random.randint(1, b - 1)
c = random.randint(b + 1, 39)
return [a, b - a, c - b, 40 - c]
(I assume you wanted integers since you said "1-40", but this could be easily generalized for floats.)
Here's how it works:
cut the total range in two randomly, that's b. The odd range is because there are going to be at least 2 below the midpoint and at least 2 above. (This comes from your 1 minimum on each value).
cut each of those ranges in two randomly. Again, the bounds are to account for the 1 minimum.
return the size of each slice. They'll add up to 40.

Generate 4 random numbers, compute their sum, divide each one by the sum and multiply by 40.
If you want Integers, then this will require a little non-randomness.

There are only 37^4 = 1,874,161 arrangements of four integers in the range [1,37] (with repeats allowed). Enumerate them, saving and counting the permutations that add up to 40.
(This will be a much smaller number, N).
Draw uniformly distributed random integers K in the interval [0, N-1] and return the K-th permutation. This can easily be seen to guarantee a uniform distribution over the space of possible outcomes, with each sequence position identically distributed. (Many of the answers I'm seeing will have the final choice biased lower than the first three!)

If you want true randomness then use:
import numpy as np
def randofsum_unbalanced(s, n):
# Where s = sum (e.g. 40 in your case) and n is the output array length (e.g. 4 in your case)
r = np.random.rand(n)
a = np.array(np.round((r/np.sum(r))*s,0),dtype=int)
while np.sum(a) > s:
a[np.random.choice(n)] -= 1
while np.sum(a) < s:
a[np.random.choice(n)] += 1
return a
If you want a greater level of uniformity then take advantage of the multinomial distribution:
def randofsum_balanced(s, n):
return np.random.multinomial(s,np.ones(n)/n,size=1)[0]

Building on #markdickonson by providing some control over distribution between the divisors. I introduce a variance/jiggle as a percentage of the uniform distance between each.
def constrained_sum_sample(n, total, variance=50):
"""Return a random-ish list of n positive integers summing to total.
variance: int; percentage of the gap between the uniform spacing to vary the result.
"""
divisor = total/n
jiggle = divisor * variance / 100 / 2
dividers = [int((x+1)*divisor + random.random()*jiggle) for x in range(n-1)]
result = [a - b for a, b in zip(dividers + [total], [0] + dividers)]
return result
Sample output:
[12, 8, 10, 10]
[10, 11, 10, 9]
[11, 9, 11, 9]
[11, 9, 12, 8]
The idea remains to divide the population equally, then randomly move them left or right within the given range. Since each value is still bound to the uniform point we don't have to worry about it drifting.
Good enough for my purposes, but not perfect. eg: the first number will always vary higher, and the last will always vary lower.

Code to maximize the sum of squares modulo m

Inputs:
k-> number of lists
m->modulo
Constraints
1<=k<=7
1<=M<=1000
1<=Magnitude of elements in list<=10*9
1<=Elements in each list<=7
`
This snippet of code is responsible for maximizing (x1^2 + x2^2 + ...) % m where x1, x2, ... are chosen from lists X1, X2, ...
k,m=map(int,input().split())
Sum=0
s=[]
for _ in range(k):
s.append(max(map(int,input().split())))
Sum+=int(s[_])**2
print(Sum%m)
So for instance if inputs are :
3 1000
2 5 4
3 7 8 9
5 5 7 8 9 10
The output would be 206, owing to selecting highest element in each list, square that element, take the sum and perform modulus operation using m
So, it would be (5^2+9^2+10^2)%1000=206
If I provide input like,
3 998
6 67828645 425092764 242723908 669696211 501122842 438815206
4 625649397 295060482 262686951 815352670
3 100876777 196900030 523615865
The expected output is 974, but I am getting 624
I would like to know how you would approach this problem or how to correct existing code.

You have to find max((sum of squares) modulo m). That's not the same as max(sum of squares) modulo m.
It may be that you find a sum of squares that's not in absolute terms as large as possible, but is maximum when you take it modulo m.
For example:
m=100
[10, 9],
[10, 5]
Here, the maximum sum of squares is 100 + 100 = 200, which is 0 modulo 100. The maximum (sum of squares modulo 100) is (81 + 100) = 182, which is 82 modulo 100.
Given that m is forced to be small, there's an fast dynamic programming solution that runs in O(m * N) time, where N is the total number of items in all the lists.
def solve(m, xxs):
r = [1] + [0] * (m - 1)
for xs in xxs:
s = [0] * m
for i in xrange(m):
for x in xs:
xx = (x * x) % m
s[i] += r[(i - xx) % m]
r = s
return max(i for i in xrange(m) if r[i])
m = 998
xxs = [
[67828645, 425092764, 242723908, 669696211, 501122842, 438815206],
[625649397, 295060482, 262686951, 815352670],
[100876777, 196900030, 523615865]]
print solve(m, xxs)
This outputs 974 as required.

One important logical problem here is you have to skip the number of items in each list while find the max element in your for loop. That is, instead of
Example,
6 67828645 425092764 242723908 669696211 501122842 438815206
and your data is
67828645 425092764 242723908 669696211 501122842 438815206
That is,
input().split()
You have to use,
input().split()[1:]
As pointed by Paul Hankin, you basically need to find max(sum of powers % m)
You have to find the combination from three lists whose sum%m is max.
So, this is basically,
You scan the input, split with space, leaving the first element which is the number of values in each line,you map them to integers. And then, you find the squares and append them to a list s. Having that you find the product(itertools module) Example - product([1,2],[3,4,5]) will give, [(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3)]. Now, you can find the sum of each such result % m and find the max value!
That is,
k,m=map(int,input().split())
from itertools import product
s=[]
for _ in range(k):
s.append(map(lambda x:x**2,map(int,input().split()[1:])))
print(max([sum(i)%m for i in product(*s)]))
Try it online!
This will give you the desired output!
Hope it helps!

Your question is not very clear. However, if I understand it correctly, you have lists of possible values for f(X1), ..., f(Xn) (probably obtained by applying f to all possible values for X1, ..., Xn), and you want to maximize f(X1)^2 + ... + f(Xn)^2 ?
If so, your code seems good, I get the same result:
lists = [[6, 67828645, 425092764, 242723908, 669696211, 501122842, 438815206],
[4, 625649397, 295060482, 262686951, 815352670],
[3, 100876777, 196900030, 523615865]]
sum = 0
for l in lists:
sum += max(l)**2
print(sum%998)
This print 624, just like your code. Where are you getting the 974 from ?

Not going to win any codegolf with this but here was my solution:
from functools import reduce
def get_input():
"""
gets input from stdin.
input format:
3 1000
2 5 4
3 7 8 9
5 5 7 8 9 10
"""
k, m = [int(i) for i in input().split()]
lists = []
for _ in range(k):
lists.append([int(i) for i in input().split()[1:]])
return m, k, lists
def maximise(m, k, lists):
"""
m is the number by which the sum of squares is modulo'd
k is the number of lists in the list of lists
lists is the list of lists containing vals to be sum of squared
maximise aims to maximise S for:
S = (f(x1) + f(x2)...+ f(xk)) % m
where:
f(x) = x**2
"""
max_value = reduce(lambda x,y: x+y**2, [max(l) for l in lists], 0)
# check whether the max sum of squares is greater than m
# if it is the answer has to be the max
if max_value < m:
print(max_value)
return
results = []
for product in cartesian_product(lists):
S = reduce(lambda x, y: x + y**2, product, 0) % m
if S == m-1:
print(S)
return
results.append(S)
print(max(results))
def cartesian_product(ll, accum=None):
"""
all combinations of lists made by combining one element from
each list in a list of lists (cartesian product)
"""
if not accum:
accum = []
for i in range(len(ll[0])):
if len(ll) == 1:
yield accum + [ll[0][i]]
else:
yield from cartesian_product(ll[1:], accum + [ll[0][i]])
if __name__ == "__main__":
maximise(*get_input())

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

maximum difference in the summation of two subset - python

I suppose you should check two cases: the difference between the M lowest elements and the N-M highest ones, as you already did; and instead the difference between the M highest and the N-M lowest. Just return the biggest of the two. This is still O(n log n) by the way

Related

How can I print the numbers which have a sum that equals the cube of a number

Generate an Asymmetric NxM matrix whose Rows and Columns Independently Sum to 1

How to optimize (3*O(n**2)) + O(n) algorithm?

Random distribution of integers with fixed total [duplicate]

Code to maximize the sum of squares modulo m

Categories

Resources