Minimize sum of product of two uneven consecutive arrays - python

I've got an optimization problem in which I need to minimize the sum product of two uneven but consecutive arrays, say:
A = [1, 2, 3]
B = [4, 9, 5, 3, 2, 10]
Shuffling of values is not allowed i.e. the index of the arrays must remain the same.
In other words, it is a distribution minimization of array A over array B in consecutive order.
Or: Given that len(B)>=len(A) Minimize the sum product the values of Array A of length n over n values of array B without changing the order of array A or B.
In this case, the minimum would be:
min_sum = 1*4 + 2*3 + 3*2 = 16
A brute force approach to this problem would be:
from itertools import combinations
sums = [sum(a*b for a,b in zip(A,b)) for b in combinations(B,len(A))]
min_sum = min(sums)
I need to do this for many sets of arrays however. I see a lot of overlap with the knapsack problem and I have the feeling that it should be solved with dynamic programming. I am stuck however in how to write an efficient algorithm to perform this.
Any help would be greatly appreciated!

Having two lists
A = [1, 2, 3]
B = [4, 9, 5, 3, 2, 10]
the optimal sum product can be found using:
min_sum = sum(a*b for a,b in zip(sorted(A), sorted(B)[:len(A)][::-1]))
In case A is always given sorted, this simplified version can be used:
min_sum = sum(a*b for a,b in zip(A, sorted(B)[:len(A)][::-1]))
The important part(s) to note:
You need factors of A sorted. sorted(A) will do this job, without modifying the original A (in contrast to A.sort()). In case A is already given sorted, this step can be left out.
You need the N lowest values from B, where N is the length of A. This can be done with sorted(B)[:len(A)]
In order to evaluate the minimal sum of products, you need to multiply the highest number of A with the lowest of B, the second hightst of A with the second lowest of B. That is why after getting the N lowest values of B the order gets reversed with [::-1]
Output
print(min_sum)
# 16
print(A)
# [1, 2, 3] <- The original list A is not modified
print(B)
# [4, 9, 5, 3, 2, 10] <- The original list B is not modified

With Python, you can easily sort and flip sets. The code you are looking for is
A, B = sorted(A), sorted(B)[:len(A)]
min_sum = sum([a*b for a,b in zip(A, B[::-1])])

You may need to get the values one by one from B, and keep the order of the list by having each value assigned to a key.
A = [1, 3, 2]
B = [4, 9, 5, 3, 2, 10]
#create a new dictionary with key value pairs of B array values
new_dict = {}
j=0
for k in B:
new_dict[j] = k
j+= 1
#create a new list of the smallest values in B up to length of array A
min_Bmany =[]
for lp in range(0,len(A)):
#get the smallest remaining value from dictionary new_dict
rmvky= min(zip(new_dict.values(), new_dict.keys()))
#append this item to minimums list
min_Bmany.append((rmvky[1],rmvky[0]))
#delete this key from the dictionary new_dict
del new_dict[rmvky[1]]
#sort the list by the keys(instead of the values)
min_Bmany.sort(key=lambda r: r[0])
#create list of only the values, but still in the same order as they are in original array
min_B =[]
for z in min_Bmany:
min_B.append(z[1])
print(A)
print(min_B)
ResultStr = ""
Result = 0
#Calculate the result
for s in range(0,len(A)):
ResultStr = ResultStr + str(A[s]) +"*" +str(min_B[s])+ " + "
Result = Result + A[s]*min_B[s]
print(ResultStr)
print("Result = ",Result)
The output will be as follows:
A = [1, 3, 2]
B = [4, 9, 5, 3, 2, 10]
1*4 + 3*3 + 2*2 +
Result = 17
Then change the A, and the output becomes:
A = [1, 2, 3]
B = [4, 9, 5, 3, 2, 10]
1*4 + 2*3 + 3*2 +
Result = 16

Not sure if this is helpful, but anyway.
This can be formulated as a mixed-integer programming (MIP) problem. Basically, an assignment problem with some side constraints.
min sum((i,j),x(i,j)*a(i)*b(j))
sum(j, x(i,j)) = 1 ∀i "each a(i) is assigned to exactly one b(j)"
sum(i, x(i,j)) ≤ 1 ∀j "each b(j) can be assigned to at most one a(i)"
v(i) = sum(j, j*x(i,j)) "position of each a(i) in b"
v(i) ≥ v(i-1)+1 ∀i>1 "maintain ordering"
x(i,j) ∈ {0,1} "binary variable"
v(i) ≥ 1 "continuous (or integer) variable"
Example output:
---- 40 VARIABLE z.L = 16.000
---- 40 VARIABLE x.L assign
j1 j4 j5
i1 1.000
i2 1.000
i3 1.000
---- 40 VARIABLE v.L position of a(i) in b
i1 1.000, i2 4.000, i3 5.000
Cute little MIP model.
Just as an experiment I generated a random problem with len(a)=50 and len(b)=500. This leads to a MIP with 650 rows and 25k columns. Solved in 50 seconds (to proven global optimality) on my slow laptop.

It turns out using a shortest path algorithm on a direct graph is pretty fast. Erwin did a post showing a MIP model. As you can see in the comments section there, a few of us independently tried shortest path approaches, and on examples with 100 for the length of A and 1000 for the length of B we get optimal solutions in the vicinity of 4 seconds.

The graph can look like:
Nodes are labeled n(i,j) indicating that visiting the node means assigning a(i) to b(j). The costs a(i)*b(j) can be associated with any incoming (or any outgoing) arc. After that calculate the shortest path from src to snk.
BTW can you tell a bit about the background of this problem?

Related

Vectorized relative complement of sets in numpy

I have np.arange(n) A and a numpy array B of its non-intersecting subarrays - division of the initial array into k arrays of consecutive numbers.
One example would be:
A = [0, 1, 2, 3, 4, 5, 6]
B = [[0, 1], [2, 3, 4], [5, 6]]
For every subarray C of B I have to calculate A\C (where \ is operation on sets, so the result is a numpy array of all elements of A which are not in B).
My current solution hits time limit:
import numpy as np
for C in B:
ans.append(np.setdiff1d(A, C))
return ans
I'd like to speed up it by using vectorization, but I have no idea how to. I've tried to remove the cycle, leaving only functions like setxor1d and setdiff1d, but failed.
I assume A and the subarrays of B are sorted and have unique elements. Then for my below example of 10**6 integers divided into 100 subarrays generated by the following code.
np.random.seed(0)
A = np.sort(np.unique(np.random.randint(0,10**10,10**6)))
B = np.split(A, np.sort(np.random.randint(0,10**6-1,99)))
You can cut the time in half by setting unique=True. And cut that time by a factor of 3 on top of that by only doing the setminus in for the numbers in A that lie between the biggest and smallest number in the particular subset of B. I realize that my example is the optimal case for this optimization to help so am not sure how that will be for your real world example. You will have to try.
boundaries = [x[i] for x in B for i in [0,-1]]
boundary_idx = np.searchsorted(A, boundaries).reshape(-1,2)
[np.concatenate([A[:x[0]],
np.setdiff1d(A[x[0]:x[1]+1], b, assume_unique=True),
A[x[1]+1:]])
for b,x in zip(B, boundary_idx)]

Find a way to implement this summation

I have two arrays A, B, filled with n integers. I want to find a way to implement this summation:
Σ(k=2 to n-2) (B[k] * A[n-k])
but considering that I have to use this summation in a for loop that costs O(n).
The problem is to find a way to re-use the previous result of the summation to save it in a variable and don't have to sum all the values in every loop.
I add the values in the two arrays:
[32, 164, 752, 3348, ...]
[10, 18, 38, 84, ...]
The values in A are filled thanks to this formula, so I can't use the summation in the xth iteration without fill the x-1 position of A.
You can try something like this:
A = [1, 2, 3, 4, 5, 6, 7]
B = [1, 2, 3, 4, 5, 6, 7]
sum(a * b for a, b in zip(A[-3: 1: -1], B[2: -2]))
A[-3: 1: -1] flips list A and does not take into account the 2 first and 2 last elements.
B[2: -2] does not take into account the 2 first and 2 last elements.
It will sum 5*3 + 4*4 + 3*5 and gives 46.
The simple solution is to compute that sum before you start your other loop and store the result in a variable, e.g.
def my_sum(a, b):
summation = 0
for k in range(2, n-1):
summation += b[k] * A[n-k]
return summation
# and in your main code
a = [1, 2, 3, 4, 5, 6]
b = [4, 5, 6, 7, 8, 9]
c = my_sum(a, b)
for n in range(1000):
do_something_with(c)
Now, none of that code is very pythonic, but I presume you are starting out, so aim for working code first, then fast or beautiful code.
If you want it to be more efficient in practice, you should have a look into numpy, which provides faster operations than just using basic lists like what I show here. The function you are probably looking for is called convolve (and the operation you are looking for is convolution).
Do note that this kind of operation (or at least the direct implementation) is inherently O(n^2), so you would need a better algorithm to get below that kind of behavior. The numpy docs already hint toward a faster algorithm using the Fast Fourier Transform. I'll leave that one up to you to look into :-)
You can try this :
def somme(A,B):
n = len(B)
somme = 0
for i in range(2, n-1):
somme = somme + ( B[i] * A[n-i] )
return somme
You can use a numpy array. It doesn't require creating the list of pruducts before the sum.
import numpy as np
A=[1,2,3,4,5,6,7,8]
B =[9,10,11,12,13,14,15,16]
n = len(A)
k0 = 2 # start offset
n0 = 2 # end offset
A1 = np.array(A[k0 : n-n0])
B1 = np.array(B[-1-n0:-n+k0-1:-1])
answer = (A1*B1).sum()

Optimize testing all combinations of rows from multiple NumPy arrays

I have three NumPy arrays of ints, same number of columns, arbitrary number of rows each. I am interested in all instances where a row of the first one plus a row of the second one gives a row of the third one ([3, 1, 4] + [1, 5, 9] = [4, 6, 13]).
Here is a pseudo-code:
for i, j in rows(array1), rows(array2):
if i + j is in rows(array3):
somehow store the rows this occured at (eg. (1,2,5) if 1st row of
array1 + 2nd row of array2 give 5th row of array3)
I will need to run this for very big matrices so I have two questions:
(1) I can write the above using nested loops but is there a quicker way, perhaps list comprehensions or itertools?
(2) What is the fastest/most memory-efficient way to store the triples? Later I will need to create a heatmap using two as coordinates and the first one as the corresponding value eg. point (2,5) has value 1 in the pseudo-code example.
Would be very grateful for any tips - I know this sounds quite simple but it needs to run fast and I have very little experience with optimization.
edit: My ugly code was requested in comments
import numpy as np
#random arrays
A = np.array([[-1,0],[0,-1],[4,1], [-1,2]])
B = np.array([[1,2],[0,3],[3,1]])
C = np.array([[0,2],[2,3]])
#triples stored as numbers with 2 coordinates in a otherwise-zero matrix
output_matrix = np.zeros((B.shape[0], C.shape[0]), dtype = int)
for i in range(A.shape[0]):
for j in range(B.shape[0]):
for k in range(C.shape[0]):
if np.array_equal((A[i,] + B[j,]), C[k,]):
output_matrix[j, k] = i+1
print(output_matrix)
We can leverage broadcasting to perform all those summations and comparison in a vectorized manner and then use np.where on it to get the indices corresponding to the matching ones and finally index and assign -
output_matrix = np.zeros((B.shape[0], C.shape[0]), dtype = int)
mask = ((A[:,None,None,:] + B[None,:,None,:]) == C).all(-1)
I,J,K = np.where(mask)
output_matrix[J,K] = I+1
(1) Improvements
You can use sets for the final result in the third matrix, as a + b = c must hold identically. This already replaces one nested loop with a constant-time lookup. I will show you an example of how to do this below, but we first ought to introduce some notation.
For a set-based approach to work, we need a hashable type. Lists will thus not work, but a tuple will: it is an ordered, immutable structure. There is, however, a problem: tuple addition is defined as appending, that is,
(0, 1) + (1, 0) = (0, 1, 1, 0).
This will not do for our use-case: we need element-wise addition. As such, we subclass the built-in tuple as follows,
class AdditionTuple(tuple):
def __add__(self, other):
"""
Element-wise addition.
"""
if len(self) != len(other):
raise ValueError("Undefined behaviour!")
return AdditionTuple(self[idx] + other[idx]
for idx in range(len(self)))
Where we override the default behaviour of __add__. Now that we have a data-type amenable to our problem, let's prepare the data.
You give us,
A = [[-1, 0], [0, -1], [4, 1], [-1, 2]]
B = [[1, 2], [0, 3], [3, 1]]
C = [[0, 2], [2, 3]]
To work with. I say,
from types import SimpleNamespace
A = [AdditionTuple(item) for item in A]
B = [AdditionTuple(item) for item in B]
C = {tuple(item): SimpleNamespace(idx=idx, values=[])
for idx, item in enumerate(C)}
That is, we modify A and B to use our new data-type, and turn C into a dictionary which supports (amortised) O(1) look-up times.
We can now do the following, eliminating one loop altogether,
from itertools import product
for a, b in product(enumerate(A), enumerate(B)):
idx_a, a_i = a
idx_b, b_j = b
if a_i + b_j in C: # a_i + b_j == c_k, identically
C[a_i + b_j].values.append((idx_a, idx_b))
Then,
>>>print(C)
{(2, 3): namespace(idx=1, values=[(3, 2)]), (0, 2): namespace(idx=0, values=[(0, 0), (1, 1)])}
Where for each value in C, you get the index of that value (as idx), and a list of tuples of (idx_a, idx_b) whose elements of A and B together sum to the value at idx in C.
Let us briefly analyse the complexity of this algorithm. Redefining the lists A, B, and C as above is linear in the length of the lists. Iterating over A and B is of course in O(|A| * |B|), and the nested condition computes the element-wise addition of the tuples: this is linear in the length of the tuples themselves, which we shall denote k. The whole algorithm then runs in O(k * |A| * |B|).
This is a substantial improvement over your current O(k * |A| * |B| * |C|) algorithm.
(2) Matrix plotting
Use a dok_matrix, a sparse SciPy matrix representation. Then you can use any heatmap-plotting library you like on the matrix, e.g. Seaborn's heatmap.

How to get values in list at incremental indexes in Python?

I'm looking at getting values in a list with an increment.
l = [0,1,2,3,4,5,6,7]
and I want something like:
[0,4,6,7]
At the moment I am using l[0::2] but I would like sampling to be sparse at the beginning and increase towards the end of the list.
The reason I want this is because the list represents the points along a line from the center of a circle to a point on its circumference. At the moment I iterate every 10 points along the lines and draw a circle with a small radius on each. Therefore, my circles close to the center tend to overlap and I have gaps as I get close to the circle edge. I hope this provides a bit of context.
Thank you !
This can be more complicated than it sounds... You need a list of indices starting at zero and ending at the final element position in your list, presumably with no duplication (i.e. you don't want to get the same points twice). A generic way to do this would be to define the number of points you want first and then use a generator (scaled_series) that produces the required number of indices based on a function. We need a second generator (unique_ints) to ensure we get integer indices and no duplication.
def scaled_series(length, end, func):
""" Generate a scaled series based on y = func(i), for an increasing
function func, starting at 0, of the specified length, and ending at end
"""
scale = float(end) / (func(float(length)) - func(1.0))
intercept = -scale * func(1.0)
print 'scale', scale, 'intercept', intercept
for i in range(1, length + 1):
yield scale * func(float(i)) + intercept
def unique_ints(iter):
last_n = None
for n in iter:
if last_n is None or round(n) != round(last_n):
yield int(round(n))
last_n = n
L = [0, 1, 2, 3, 4, 5, 6, 7]
print [L[i] for i in unique_ints(scaled_series(4, 7, lambda x: 1 - 1 / (2 * x)))]
In this case, the function is 1 - 1/2x, which gives the series you want [0, 4, 6, 7]. You can play with the length (4) and the function to get the kind of spacing between the circles you are looking for.
I am not sure what exact algorithm you want to use, but if it is non-constant, as your example appears to be, then you should consider creating a generator function to yield values:
https://wiki.python.org/moin/Generators
Depending on what your desire here is, you may want to consider a built in interpolator like scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html#scipy.interpolate.interp1d
Basically, given your question, you can't do it with the basic slice operator. Without more information this is the best answer I can give you :-)
Use the slice function to create a range of indices. You can then extend your sliced list with other slices.
k = [0,1,2,3,4,5,6,7]
r = slice(0,len(k)//2,4)
t = slice(r.stop,None,1)
j = k[r]
j.extend(k[t])
print(j) #outputs: [0,4,5,6,7]
What I would do is just use list comprehension to retrieve the values. It is not possible to do it just by indexing. This is what I came up with:
l = [0, 1, 2, 3, 4, 5, 6, 7]
m = [l[0]] + [l[1+sum(range(3, s-1, -1))] for s in [x for x in range(3, 0, -1)]]
and here is a breakdown of the code into loops:
# Start the list with the first value of l (the loop does not include it)
m = [l[0]]
# Descend from 3 to 1 ([3, 2, 1])
for s in range(3, 0, -1):
# append 1 + sum of [3], [3, 2] and [3, 2, 1]
m.append(l[ 1 + sum(range(3, s-1, -1)) ])
Both will give you the same answer:
>>> m
[0, 4, 6, 7]
I made this graphic that would I hope will help you to understand the process:

Subset sum with `itertools.combinations`

I have a list of integers in python, let's say:
weight = [7, 5, 3, 2, 9, 1]
How should I use itertools.combinations to find all of the possible subsets of sums that there are with these integers.
So that I get (an example of desired output with 3 integers - weight = [7, 5, 3]:
sums = [ [7], [7+5], [7+3], [7+5+3], [5], [5+3], [3] ]
Associated with these weights I have another array called luggages that is a list of lists with the luggage name and its correspondent weight in this format:
luggages = [["samsonite", 7], ["Berkin", 5], ["Catelli", 3] .....]
I created an array called weight in this manner.
weight = numpy.array([c[1] for c in luggages])
I could do this for the luggage names need be.
I attempted to use itertools.combinations in this manner (upon suggestion):
comb = [combinations(weight, i) for i in range(len(luggages))]
My goal: To print out all the possible subsets of luggage names that I can bring on a trip given the max_weight = 23 kg of all the combination of each subset that satisfies the condition that the subsets sum equals EXACTLY 23 KG.
In simpler terms I have to print out a list with the names of the luggages that if its weights were summed would equal the max_weight = 23 EXACTLY.
Keed in mind: The luggages can be selected only once in each subset but they can appear in as many subsets as possible. Also, The number of items in each subset is irrelevant: it can be 1 luggage, 2, 3... as long as their sum equals exactly 23.
Working on the traveling salesman, are we? You can do this using everyone's favorite Python feature, list comprehensions:
weight = [7, 5, 3, 2, 9, 1]
cmb = []
for x in range(1, len(weight) + 1):
cmb += itertools.combinations(weight, x)
#cmb now contains all combos, filter out ones over the limit
limit = 23
valid_combos = [i for i in cmb if sum(i) == limit]
print(valid_combos)

Categories