Find a way to implement this summation - python

I have two arrays A, B, filled with n integers. I want to find a way to implement this summation:
Σ(k=2 to n-2) (B[k] * A[n-k])
but considering that I have to use this summation in a for loop that costs O(n).
The problem is to find a way to re-use the previous result of the summation to save it in a variable and don't have to sum all the values in every loop.
I add the values in the two arrays:
[32, 164, 752, 3348, ...]
[10, 18, 38, 84, ...]
The values in A are filled thanks to this formula, so I can't use the summation in the xth iteration without fill the x-1 position of A.

You can try something like this:
A = [1, 2, 3, 4, 5, 6, 7]
B = [1, 2, 3, 4, 5, 6, 7]
sum(a * b for a, b in zip(A[-3: 1: -1], B[2: -2]))
A[-3: 1: -1] flips list A and does not take into account the 2 first and 2 last elements.
B[2: -2] does not take into account the 2 first and 2 last elements.
It will sum 5*3 + 4*4 + 3*5 and gives 46.

The simple solution is to compute that sum before you start your other loop and store the result in a variable, e.g.
def my_sum(a, b):
summation = 0
for k in range(2, n-1):
summation += b[k] * A[n-k]
return summation
# and in your main code
a = [1, 2, 3, 4, 5, 6]
b = [4, 5, 6, 7, 8, 9]
c = my_sum(a, b)
for n in range(1000):
do_something_with(c)
Now, none of that code is very pythonic, but I presume you are starting out, so aim for working code first, then fast or beautiful code.
If you want it to be more efficient in practice, you should have a look into numpy, which provides faster operations than just using basic lists like what I show here. The function you are probably looking for is called convolve (and the operation you are looking for is convolution).
Do note that this kind of operation (or at least the direct implementation) is inherently O(n^2), so you would need a better algorithm to get below that kind of behavior. The numpy docs already hint toward a faster algorithm using the Fast Fourier Transform. I'll leave that one up to you to look into :-)

You can try this :
def somme(A,B):
n = len(B)
somme = 0
for i in range(2, n-1):
somme = somme + ( B[i] * A[n-i] )
return somme

You can use a numpy array. It doesn't require creating the list of pruducts before the sum.
import numpy as np
A=[1,2,3,4,5,6,7,8]
B =[9,10,11,12,13,14,15,16]
n = len(A)
k0 = 2 # start offset
n0 = 2 # end offset
A1 = np.array(A[k0 : n-n0])
B1 = np.array(B[-1-n0:-n+k0-1:-1])
answer = (A1*B1).sum()

Related

Vectorized relative complement of sets in numpy

I have np.arange(n) A and a numpy array B of its non-intersecting subarrays - division of the initial array into k arrays of consecutive numbers.
One example would be:
A = [0, 1, 2, 3, 4, 5, 6]
B = [[0, 1], [2, 3, 4], [5, 6]]
For every subarray C of B I have to calculate A\C (where \ is operation on sets, so the result is a numpy array of all elements of A which are not in B).
My current solution hits time limit:
import numpy as np
for C in B:
ans.append(np.setdiff1d(A, C))
return ans
I'd like to speed up it by using vectorization, but I have no idea how to. I've tried to remove the cycle, leaving only functions like setxor1d and setdiff1d, but failed.
I assume A and the subarrays of B are sorted and have unique elements. Then for my below example of 10**6 integers divided into 100 subarrays generated by the following code.
np.random.seed(0)
A = np.sort(np.unique(np.random.randint(0,10**10,10**6)))
B = np.split(A, np.sort(np.random.randint(0,10**6-1,99)))
You can cut the time in half by setting unique=True. And cut that time by a factor of 3 on top of that by only doing the setminus in for the numbers in A that lie between the biggest and smallest number in the particular subset of B. I realize that my example is the optimal case for this optimization to help so am not sure how that will be for your real world example. You will have to try.
boundaries = [x[i] for x in B for i in [0,-1]]
boundary_idx = np.searchsorted(A, boundaries).reshape(-1,2)
[np.concatenate([A[:x[0]],
np.setdiff1d(A[x[0]:x[1]+1], b, assume_unique=True),
A[x[1]+1:]])
for b,x in zip(B, boundary_idx)]

Minimize sum of product of two uneven consecutive arrays

I've got an optimization problem in which I need to minimize the sum product of two uneven but consecutive arrays, say:
A = [1, 2, 3]
B = [4, 9, 5, 3, 2, 10]
Shuffling of values is not allowed i.e. the index of the arrays must remain the same.
In other words, it is a distribution minimization of array A over array B in consecutive order.
Or: Given that len(B)>=len(A) Minimize the sum product the values of Array A of length n over n values of array B without changing the order of array A or B.
In this case, the minimum would be:
min_sum = 1*4 + 2*3 + 3*2 = 16
A brute force approach to this problem would be:
from itertools import combinations
sums = [sum(a*b for a,b in zip(A,b)) for b in combinations(B,len(A))]
min_sum = min(sums)
I need to do this for many sets of arrays however. I see a lot of overlap with the knapsack problem and I have the feeling that it should be solved with dynamic programming. I am stuck however in how to write an efficient algorithm to perform this.
Any help would be greatly appreciated!
Having two lists
A = [1, 2, 3]
B = [4, 9, 5, 3, 2, 10]
the optimal sum product can be found using:
min_sum = sum(a*b for a,b in zip(sorted(A), sorted(B)[:len(A)][::-1]))
In case A is always given sorted, this simplified version can be used:
min_sum = sum(a*b for a,b in zip(A, sorted(B)[:len(A)][::-1]))
The important part(s) to note:
You need factors of A sorted. sorted(A) will do this job, without modifying the original A (in contrast to A.sort()). In case A is already given sorted, this step can be left out.
You need the N lowest values from B, where N is the length of A. This can be done with sorted(B)[:len(A)]
In order to evaluate the minimal sum of products, you need to multiply the highest number of A with the lowest of B, the second hightst of A with the second lowest of B. That is why after getting the N lowest values of B the order gets reversed with [::-1]
Output
print(min_sum)
# 16
print(A)
# [1, 2, 3] <- The original list A is not modified
print(B)
# [4, 9, 5, 3, 2, 10] <- The original list B is not modified
With Python, you can easily sort and flip sets. The code you are looking for is
A, B = sorted(A), sorted(B)[:len(A)]
min_sum = sum([a*b for a,b in zip(A, B[::-1])])
You may need to get the values one by one from B, and keep the order of the list by having each value assigned to a key.
A = [1, 3, 2]
B = [4, 9, 5, 3, 2, 10]
#create a new dictionary with key value pairs of B array values
new_dict = {}
j=0
for k in B:
new_dict[j] = k
j+= 1
#create a new list of the smallest values in B up to length of array A
min_Bmany =[]
for lp in range(0,len(A)):
#get the smallest remaining value from dictionary new_dict
rmvky= min(zip(new_dict.values(), new_dict.keys()))
#append this item to minimums list
min_Bmany.append((rmvky[1],rmvky[0]))
#delete this key from the dictionary new_dict
del new_dict[rmvky[1]]
#sort the list by the keys(instead of the values)
min_Bmany.sort(key=lambda r: r[0])
#create list of only the values, but still in the same order as they are in original array
min_B =[]
for z in min_Bmany:
min_B.append(z[1])
print(A)
print(min_B)
ResultStr = ""
Result = 0
#Calculate the result
for s in range(0,len(A)):
ResultStr = ResultStr + str(A[s]) +"*" +str(min_B[s])+ " + "
Result = Result + A[s]*min_B[s]
print(ResultStr)
print("Result = ",Result)
The output will be as follows:
A = [1, 3, 2]
B = [4, 9, 5, 3, 2, 10]
1*4 + 3*3 + 2*2 +
Result = 17
Then change the A, and the output becomes:
A = [1, 2, 3]
B = [4, 9, 5, 3, 2, 10]
1*4 + 2*3 + 3*2 +
Result = 16
Not sure if this is helpful, but anyway.
This can be formulated as a mixed-integer programming (MIP) problem. Basically, an assignment problem with some side constraints.
min sum((i,j),x(i,j)*a(i)*b(j))
sum(j, x(i,j)) = 1 ∀i "each a(i) is assigned to exactly one b(j)"
sum(i, x(i,j)) ≤ 1 ∀j "each b(j) can be assigned to at most one a(i)"
v(i) = sum(j, j*x(i,j)) "position of each a(i) in b"
v(i) ≥ v(i-1)+1 ∀i>1 "maintain ordering"
x(i,j) ∈ {0,1} "binary variable"
v(i) ≥ 1 "continuous (or integer) variable"
Example output:
---- 40 VARIABLE z.L = 16.000
---- 40 VARIABLE x.L assign
j1 j4 j5
i1 1.000
i2 1.000
i3 1.000
---- 40 VARIABLE v.L position of a(i) in b
i1 1.000, i2 4.000, i3 5.000
Cute little MIP model.
Just as an experiment I generated a random problem with len(a)=50 and len(b)=500. This leads to a MIP with 650 rows and 25k columns. Solved in 50 seconds (to proven global optimality) on my slow laptop.
It turns out using a shortest path algorithm on a direct graph is pretty fast. Erwin did a post showing a MIP model. As you can see in the comments section there, a few of us independently tried shortest path approaches, and on examples with 100 for the length of A and 1000 for the length of B we get optimal solutions in the vicinity of 4 seconds.
The graph can look like:
Nodes are labeled n(i,j) indicating that visiting the node means assigning a(i) to b(j). The costs a(i)*b(j) can be associated with any incoming (or any outgoing) arc. After that calculate the shortest path from src to snk.
BTW can you tell a bit about the background of this problem?

Generic algorithm for element wise operation between hundreds of lists

I have been tasked to write an algorithm for a project. Basically, i scan the data to get the unique items and store their positions in an array. So i end up with multiple arrays with variable length. Now i have to do element wise operations on ALL of these arrays and their elements. Note that these will always be sorted (if that matters)
a = [0, 7, 13, 18]
b = [1, 2, 8, 10]
c = [0, 3, 5, 6, 7]
The current solution i have is a pretty basic loop solution where i loop through every array and compare its element with every other array and its elements. It works for small number of arrays and, as you can imagine, doesn't work well where i have a lot of unique items each with their own array/list.
def add(a, b):
result = []
for i in range(len(a)):
for j in range(len(b)):
result.append(a[i] + b[j])
return result
a = [0, 7, 13, 18]
b = [1, 2, 8, 10]
c = [0, 3, 5, 6, 7]
total_unique_items = [a, b, c]
calc = []
for i in range(len(total_unique_items)):
for j in range(i+1, len(total_unique_items)):
calc.append(add(total_unique_items[i], total_unique_items[j]))
print(calc)
I know there are pythonic solutions like zip but my teacher is asking for a generic language-independent solution here.
I am not really sure how to tackle this problem. One way would be to use a data structure like a tree or a graph and traverse through it? the other way would be to find a way to perform the operation on all the array's ith elements in ith iteration of the loop. This way, my main loop would run for the length of the longest array. I am just really confused about it and would love to get an idea of the direction i should go from here.

Fast(er) way to determine all non-dominated items in a list

I have a list of n arrays with 4 elements each, i.e (n=2):
l = [[1, 2, 3, 4], [5, 6, 7, 8]]
and am trying to find all elements of the list that are 'non-dominated' - that is they are not dominated by any other element in the list. An array dominates another array if each item inside it is less than or equal to the corresponding item in the other array. So
dominates([1, 2, 3, 4], [5, 6, 7, 8]) == True
as 1 <= 5 and 2 <= 6 and 3 <= 7 and 4 <= 8. But
dominates([1, 2, 3, 9], [5, 6, 7, 8]) == False
as 9 > 8. This function is relatively easy to write, for example:
def dominates(a, b):
return all(i <= j for i, j in zip(a, b))
More succinctly, given l = [a1, a2, a3, .., an] where the a are length 4 arrays, I'm looking to find all a that are not dominated by any other a in l.
I have the following solution:
def get_non_dominated(l):
to_remove = set()
for ind, item_1 in enumerate(l):
if item_2 in to_remove:
continue
for item_2 in l[ind + 1:]:
if dominates(item_2, item_1):
to_remove.add(item_1)
break
elif dominates(item_1, item_2):
to_remove.add(item_2)
return [i for i in l if i not in to_remove]
So get_non_dominated([[1, 2, 3, 4], [5, 6, 7, 8]]) should return [[1, 2, 3, 4]]. Similarly get_non_dominated([[1, 2, 3, 9], [5, 6, 7, 8]]) should return the list unchanged by the logic above (nothing dominates anything else).
But this check happens a lot and l is potentially quite large. I was wondering if anyone had ideas on a way to speed this up? My first thought was to try and vectorize this code with numpy, but I have relatively little experience with it and am struggling a bit. You can assume l has all unique arrays. Any ideas are greatly appreciated!
Another version of #Nyps answer:
def dominates(a, b):
return (np.asarray(a) <= b).all()
It is the vectorized approach of your code using numpy.
This might still be slow if you have to loop through all the rows you have. If you have a list with all the rows and you want to compare them pairwise, you could use scipy to create a N x N array (where N is the number of rows).
import numpy as np
a = np.random.randint(0, 10, size=(1000, 10))
a here is a 1000 x 10 array, simulating 1000 rows of 10 elements it:
from scipy.spatial.distance import cdist
X = cdist(a, a, metric=dominates).astype(np.bool)
X is now a 1000 x 1000 matrix containing the pairwise comparison between all the entries. This is, X[i, j] contains True if sample i dominates sample j or False otherwise.
You can now extract fancy results from X, such as the sample that dominates them all:
>>> a[50] = 0 # set a row to all 0s to fake a dominant row
>>> X = cdist(a, a, metric=dominates).astype(np.bool)
>>> non_dominated = np.where(X.all(axis=1))[0]
>>> non_dominated
array([50])
Sample at position 50 is the ruler if your population, you should watch it closely.
Now, if you want to preserve only the dominated you can do:
if non_dominated.size > 0:
return [a[i] for i in non_dominated]
else: # no one dominates every other
return a
As a recap:
import numpy as np
from scipy.spatial.distance import cdist
def get_ruler(a):
X = cdist(a, a, metric=dominates).astype(np.bool)
rulers = np.where(X.all(axis=1))[0]
if rulers.size > 0:
return [a[i] for i in rulers]
else: # no one dominates every other
return a
How about:
import numpy as np
np.all((np.asarry(l[1])-np.asarry(l[0]))>=0)
You can go a simliar way in case you are able to create your list as numpy array straight away, i.e. type(l) == np.ndarray. Then the syntax would be:
np.all(p[1])-p[0])>=0)

How to get values in list at incremental indexes in Python?

I'm looking at getting values in a list with an increment.
l = [0,1,2,3,4,5,6,7]
and I want something like:
[0,4,6,7]
At the moment I am using l[0::2] but I would like sampling to be sparse at the beginning and increase towards the end of the list.
The reason I want this is because the list represents the points along a line from the center of a circle to a point on its circumference. At the moment I iterate every 10 points along the lines and draw a circle with a small radius on each. Therefore, my circles close to the center tend to overlap and I have gaps as I get close to the circle edge. I hope this provides a bit of context.
Thank you !
This can be more complicated than it sounds... You need a list of indices starting at zero and ending at the final element position in your list, presumably with no duplication (i.e. you don't want to get the same points twice). A generic way to do this would be to define the number of points you want first and then use a generator (scaled_series) that produces the required number of indices based on a function. We need a second generator (unique_ints) to ensure we get integer indices and no duplication.
def scaled_series(length, end, func):
""" Generate a scaled series based on y = func(i), for an increasing
function func, starting at 0, of the specified length, and ending at end
"""
scale = float(end) / (func(float(length)) - func(1.0))
intercept = -scale * func(1.0)
print 'scale', scale, 'intercept', intercept
for i in range(1, length + 1):
yield scale * func(float(i)) + intercept
def unique_ints(iter):
last_n = None
for n in iter:
if last_n is None or round(n) != round(last_n):
yield int(round(n))
last_n = n
L = [0, 1, 2, 3, 4, 5, 6, 7]
print [L[i] for i in unique_ints(scaled_series(4, 7, lambda x: 1 - 1 / (2 * x)))]
In this case, the function is 1 - 1/2x, which gives the series you want [0, 4, 6, 7]. You can play with the length (4) and the function to get the kind of spacing between the circles you are looking for.
I am not sure what exact algorithm you want to use, but if it is non-constant, as your example appears to be, then you should consider creating a generator function to yield values:
https://wiki.python.org/moin/Generators
Depending on what your desire here is, you may want to consider a built in interpolator like scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html#scipy.interpolate.interp1d
Basically, given your question, you can't do it with the basic slice operator. Without more information this is the best answer I can give you :-)
Use the slice function to create a range of indices. You can then extend your sliced list with other slices.
k = [0,1,2,3,4,5,6,7]
r = slice(0,len(k)//2,4)
t = slice(r.stop,None,1)
j = k[r]
j.extend(k[t])
print(j) #outputs: [0,4,5,6,7]
What I would do is just use list comprehension to retrieve the values. It is not possible to do it just by indexing. This is what I came up with:
l = [0, 1, 2, 3, 4, 5, 6, 7]
m = [l[0]] + [l[1+sum(range(3, s-1, -1))] for s in [x for x in range(3, 0, -1)]]
and here is a breakdown of the code into loops:
# Start the list with the first value of l (the loop does not include it)
m = [l[0]]
# Descend from 3 to 1 ([3, 2, 1])
for s in range(3, 0, -1):
# append 1 + sum of [3], [3, 2] and [3, 2, 1]
m.append(l[ 1 + sum(range(3, s-1, -1)) ])
Both will give you the same answer:
>>> m
[0, 4, 6, 7]
I made this graphic that would I hope will help you to understand the process:

Categories