Find subarray with fixed end and largest average of any length - python

So here I am given an array - array1 with N positive integers in [0, 10000]. N might be up to 10^8. Given that we need to fix the end point, what is the continuous subarray with the largest average possible?
For example, let array1 be [3, 1, 9, 2, 7]. Since we fix the end point, the subarray with the largest average possible is [9, 2, 7], with average 6.
I have tried a purely linear search, though Python is slow at looping 10^8 loops so it is not a good algorithm.
Time Restriction - The time limit is 4 seconds.
EDIT: I really don't have idea to start improving, so any hints would be appreciated. Is it possible to reduce it to O(log n)?
Clarification: The Last element is needed to be in the subarray and the subarray's length need to be >1

You can use itertools.combinations to do this task:
def sublists(l, minsize, endpoint):
# create empty list of all candidate sublists
# O(1)
candidates = []
# obtain the possible sublists
# O(n)
for start, end in combinations(range(len(l)), minsize):
# check if last element is the endpoint
# O(1)
if l[end] == endpoint:
# add sublist to the candidates
# O(1)
candidates.append(l[start:end+1])
# create tuple pairs (average, sublist)
# O(n)
pairs = [(sum(x) / len(x), x) for x in candidates]
# get max sublist
# O(n)
return max(pairs)[1]
Which works as follows:
>>> print(sublists([3, 1, 9, 2, 7], 2, 7))
[9, 2, 7]
Note: The algorithm above is O(n). This is expected since you need to create all possible contiguous combinations of the list to begin with. I don't think you can do this algorithm in O(logn) time, since generating the combinations itself is O(n). If you use a balanced BST you could perhaps find the maximum average in O(logn) time, but this would just be for the search part, not the algorithm as a whole.

Related

How could I reduce the time complexity of this nested loop?

I'm in the process of learning to optimize code and implement more data structures and algorithms in my programs, however I'm experiencing difficulties with this code block.
My primary goal is to reduce this from a O(n**2) time complexity mainly not using a nested loop.
numArray = np.array([[20, 12, 10, 8, 6, 4], [10, 10, 10, 10, 10, 10]], dtype=int)
rolls = [[] for i in range(len(numArray[0]))]
for i in range(len(numArray[0])):
for j in range(numArray[1, i]):
rolls[i].append(random.randint(1, numArray[0, i]))
The code is supposed to generate x amount of random integers (where x is index i of the second numArray subarray, e.g. 10) between 1 and index i of the first numArray subarray (e.g. 20).
Then repeat this for each index in the first numArray subarray.
(In the whole dice program numArray subarrays are user generated integers, but I assigned fixed numbers to it for simplicities sake while optimizing.)
You could use np.random.randint since you're already importing numpy. It accepts a size argument to produce multiple random values in one go.
rolls = [list(np.random.randint(1, numArray[0][idx], val)) for idx, val in enumerate(numArray[1])]
This of course assumes that both lists in numArray are the same length, but should get you somewhere at least.

Split sorted list into two lists

I'm trying to split a sorted integer list into two lists. The first list would have all ints under n and the second all ints over n. Note that n does not have to be in the original list.
I can easily do this with:
under = []
over = []
for x in sorted_list:
if x < n:
under.append(x)
else
over.append(x)
But it just seems like it should be possible to do this in a more elegant way knowing that the list is sorted. takewhile and dropwhile from itertools sound like the solution but then I would be iterating over the list twice.
Functionally, the best I can do is this:
i = 0
while sorted_list[i] < n:
i += 1
under = sorted_list[:i]
over = sorted_list[i:]
But I'm not even sure if it is actually better than just iterating over the list twice and it is definitely not more elegant.
I guess I'm looking for a way to get the list returned by takewhile and the remaining list, perhaps, in a pair.
The correct solution here is the bisect module. Use bisect.bisect to find the index to the right of n (or the index where it would be inserted if it's missing), then slice around that point:
import bisect # At top of file
split_idx = bisect.bisect(sorted_list, n)
under = sorted_list[:split_idx]
over = sorted_list[split_idx:]
While any solution is going to be O(n) (you do have to copy the elements after all), the comparisons are typically more expensive than simple pointer copies (and associated reference count updates), and bisect reduces the comparison work on a sorted list to O(log n), so this will typically (on larger inputs) beat simply iterating and copying element by element until you find the split point.
Use bisect.bisect_left (which finds the leftmost index of n) instead of bisect.bisect (equivalent to bisect.bisect_right) if you want n to end up in over instead of under.
I would use following approach, where I find the index and use slicing to create under and over:
sorted_list = [1,2,4,5,6,7,8]
n=6
idx = sorted_list.index(n)
under = sorted_list[:idx]
over = sorted_list[idx:]
print(under)
print(over)
Output (same as with your code):
[1, 2, 4, 5]
[6, 7, 8]
Edit: As I understood the question wrong here is an adapted solution to find the nearest index:
import numpy as np
sorted_list = [1,2,4,5,6,7,8]
n=3
idx = np.searchsorted(sorted_list, n)
under = sorted_list[:idx]
over = sorted_list[idx:]
print(under)
print(over)
Output:
[1, 2]
[4, 5, 6, 7, 8]

Splitting list into 2 parts, as equal to sum as possible

I'm trying to wrap my head around this whole thing and I can't seem to figure it out. Basically, I have a list of ints. Adding up those int values equals 15. I want to split up a list into 2 parts, but at the same time, making each list as close as possible to each other in total sum. Sorry if I'm not explaining this good.
Example:
list = [4,1,8,6]
I want to achieve something like this:
list = [[8, 1][6,4]]
adding the first list up equals 9, and the other equals 10. That's perfect for what I want as they are as close as possible.
What I have now:
my_list = [4,1,8,6]
total_list_sum = 15
def divide_chunks(l, n):
# looping till length l
for i in range(0, len(l), n):
yield l[i:i + n]
n = 2
x = list(divide_chunks(my_list, n))
print (x)
But, that just splits it up into 2 parts.
Any help would be appreciated!
You could use a recursive algorithm and "brute force" partitioning of the list. Starting with a target difference of zero and progressively increasing your tolerance to the difference between the two lists:
def sumSplit(left,right=[],difference=0):
sumLeft,sumRight = sum(left),sum(right)
# stop recursion if left is smaller than right
if sumLeft<sumRight or len(left)<len(right): return
# return a solution if sums match the tolerance target
if sumLeft-sumRight == difference:
return left, right, difference
# recurse, brutally attempting to move each item to the right
for i,value in enumerate(left):
solution = sumSplit(left[:i]+left[i+1:],right+[value], difference)
if solution: return solution
if right or difference > 0: return
# allow for imperfect split (i.e. larger difference) ...
for targetDiff in range(1, sumLeft-min(left)+1):
solution = sumSplit(left, right, targetDiff)
if solution: return solution
# sumSplit returns the two lists and the difference between their sums
print(sumSplit([4,1,8,6])) # ([1, 8], [4, 6], 1)
print(sumSplit([5,3,2,2,2,1])) # ([2, 2, 2, 1], [5, 3], 1)
print(sumSplit([1,2,3,4,6])) # ([1, 3, 4], [2, 6], 0)
Use itertools.combinations (details here). First let's define some functions:
def difference(sublist1, sublist2):
return abs(sum(sublist1) - sum(sublist2))
def complement(sublist, my_list):
complement = my_list[:]
for x in sublist:
complement.remove(x)
return complement
The function difference calculates the "distance" between lists, i.e, how similar the sums of the two lists are. complement returns the elements of my_list that are not in sublist.
Finally, what you are looking for:
def divide(my_list):
lower_difference = sum(my_list) + 1
for i in range(1, int(len(my_list)/2)+1):
for partition in combinations(my_list, i):
partition = list(partition)
remainder = complement(partition, my_list)
diff = difference(partition, remainder)
if diff < lower_difference:
lower_difference = diff
solution = [partition, remainder]
return solution
test1 = [4,1,8,6]
print(divide(test1)) #[[4, 6], [1, 8]]
test2 = [5,3,2,2,2,1]
print(divide(test2)) #[[5, 3], [2, 2, 2, 1]]
Basically, it tries with every possible division of sublists and returns the one with the minimum "distance".
If you want to make it a a little bit faster you could return the first combination whose difference is 0.
I think what you're looking for is a hill climbing algorithm. I'm not sure this will cover all cases but at least works for your example. I'll update this if I think of a counter example or something.
Let's call your list of numbers vals.
vals.sort(reverse=true)
a,b=[],[]
for v in vals:
if sum(a)<sum(b):
a.append(v)
else:
b.append(v)

How to change the index in a list of lists

I would like to change the way a list of lists in indexed.
Suppose my initial list is two lists of one list and two lists of three elements. For example:
L = [[[1, 2, 3]], [[4, 5, 6], [7, 8, 9]]]
Then let say I want to take '4' in L, I must do
L[1][0][0].
Now I'd like to create a new list such that the last indexing become the first one
Lnew = [[[1], [4, 7]], [[2], [5, 8]], [[3], [6, 9]]]
And then for taking '4' I have to do:
Lnew[0][1][0]
In a more general case, I'd like to create the list Lnew defined by:
Lnew[i][k][l] = L[k][l][i]
Is there a way to do this kind of permutation of the index without doing the following loops:
Lnew = []
for i in range(len(Payment_dates)):
L1 = []
for k in range(N+1):
L2 = []
for l in range(k+1):
L2.append(L[k][l][i])
L1.append(L2)
Lnew.append(L1)
Which is not very optimal in term of complexity.
Thanks
What you'd like to achieve, presupposes that all sublists have the same length.
If they have differing lengths, you may wish to append zeros to all sublists until they have the length of the longest sublist or (which is easier) until infinity.
The same behaviour can be achieved by using a function to access the elements of the list. You can call this function during runtime every time you need an element of the list:
def getElement(myList, i, k, l):
if k < myList.length and l < myList[k].length and i < myList[k][l].length:
return myList[k][l][i]
else:
return None # or zero, or whatever you prefer
Depending on your code structure, you might not need this as a function and you can just put the conditions inside of your code.
You can also nest the if-conditions and throw different errors or return different values depending on what level the element does not exist.
If we neglect the time complexity of outputting a multidimensional list's element, this approach should decrease your time complexity from O(n^3) to O(1).

Inserting and removing into/from sorted list in Python

I have a sorted list of integers, L, and I have a value X that I wish to insert into the list such that L's order is maintained. Similarly, I wish to quickly find and remove the first instance of X.
Questions:
How do I use the bisect module to do the first part, if possible?
Is L.remove(X) going to be the most efficient way to do the second part? Does Python detect that the list has been sorted and automatically use a logarithmic removal process?
Example code attempts:
i = bisect_left(L, y)
L.pop(i) #works
del L[bisect_left(L, i)] #doesn't work if I use this instead of pop
You use the bisect.insort() function:
bisect.insort(L, X)
L.remove(X) will scan the whole list until it finds X. Use del L[bisect.bisect_left(L, X)] instead (provided that X is indeed in L).
Note that removing from the middle of a list is still going to incur a cost as the elements from that position onwards all have to be shifted left one step. A binary tree might be a better solution if that is going to be a performance bottleneck.
You could use Raymond Hettinger's IndexableSkiplist. It performs 3 operations in O(ln n) time:
insert value
remove value
lookup value by rank
import skiplist
import random
random.seed(2013)
N = 10
skip = skiplist.IndexableSkiplist(N)
data = range(N)
random.shuffle(data)
for num in data:
skip.insert(num)
print(list(skip))
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
for num in data[:N//2]:
skip.remove(num)
print(list(skip))
# [0, 3, 4, 6, 9]

Categories