Divide and conquer strategy python - python

I am trying to write a code which to compare each element of a list and give the index of the closet larger number in two direction: left or right. Using the divide and conquer method
For example:
Input: arr=[5,2,6,8,1,4,3,9]
Output:
Left=[None, 0, None, None, 3, 3, 5, None]
Right=[2, 2, 3, 7, 5, 7, 7, None]
Input: arr=[4,2,3,1,8,5,6,9]
Output:
L=[None, 0, 0, 2, None, 4, 4, None]
R=[4, 2, 4, 4, 7, 6, 7, None]
This is what I have now:
arr = [5,2,6,8,1,4,3,9]
def Left(arr):
L = []
for i in range(len(arr)):
flag = True
for j in range(i,-1,-1):
if (arr[i] < arr[j]):
L.append(j)
flag = False
break
if flag:
L.append(None)
return L
def Right(arr):
R = []
for i in range(len(arr)):
flag = True
for j in range(i, len(arr), 1):
if (arr[i] < arr[j]):
R.append(j)
flag = False
break
if flag:
R.append(None)
return R
print(*Left(arr), sep = ",")
print(*Right(arr), sep =",")
Am I doing it in a right way? Thank you.

This is my python version code for the algorithm in its "closest larger right" version.
Obviously, as you can see it is recursive. Recursion is really elegant but a bit tricky because few lines of code condense lots of concepts regarding to algorithms design and the own language they are coded. In my opinion 4 relevant moments are happening:
1) Recursive calls. Function is call to itself. During this step the list progresible slice into halves. Once the atomic unit is reached the base algorithm will be executed over them firstly (step 3). if not solution is reached greater list sizes will be involved in the calculation in further recursions.
2) Termination condition. Previous step is not run forever, it allows stop recursion and going to the next step (base algorithm). Now the code has len(arr) > 1: that means that the atomic unit will be pairs of numbers (or one of three in case of odd list). You can increase the number so that the recursive function will stay less time slicing the list and summarizing the results, but the counterpart is that in a parallelized environment "workers" will have to digest a bigger list.
3) Base algorithm. It makes the essential calculation. Whatever the size of the list, it returns the indexes of its elements to the right closest larger number
4) "Calculation saving". The base algorithm no need to calculated indexes on those numbers resolved in previous recursions. There is also a break to stop calculations once the number gets the index in the current recursion list.
Other algorithms models could be design, more efficient for sure. It occurs to me ones based on dictionaries or on different slicing strategies.
def closest_larger_right(arr):
len_total = len(arr)
result = [None] * len_total
def recursive(arr, len_total, position=0):
# 2) Termination condition
if len(arr) > 1:
mid = len(arr) // 2
left = arr[:mid]
right = arr[mid:]
position_left = 0 + position
position_right = len(left) + position
# 1) Recursive calls
recursive(left, len_total, position_left)
recursive(right, len_total, position_right)
# 3) Base algorithm
for i in range(len(arr)-1):
# 4) Calculation saving
if result[i + position] is None:
for j in range(i+1, len(arr), 1):
if (arr[i] < arr[j]):
result[i + position] = j + position
break
return result
return recursive(arr, len_total)
# output: [2, 2, 3, 7, 5, 7, 7, None]
print(closest_larger_right([5, 2, 6, 8, 1, 4, 3, 9]))

I am not sure how a divide-and-conquer algorithm can be applied here, but here's an improvement to your current algorithm that also already has optimal running time of O(n) for n elements in the array:
stack = []
left = []
for i in range(len(arr)):
while stack and arr[stack[-1]] < arr[i]:
stack.pop()
left.append(stack[-1] if stack else None)
stack.append(i)
This uses a stack to keep track of the indices of the larger elements to the left, popping indices from the stack as long as their element are smaller than the current element, and then adding the current index itself. Since each element is added to and popped from the stack at most once, running time is O(n). The same can be used for the right-side elements simply by iterating the array in reverse order.

Related

How many steps to the nearest zero value

Looking for some help with solving a seemingly easy algorithm.
Brief overview:
We have an unsorted list of whole numbers. The goal is to find out how far each element of this list is from the nearest '0' value.
So if we have a list similar to this: [0, 1, 2, 0, 4, 5, 6, 7, 0, 5, 6, 9]
The expected result will be: [0, 1, 1, 0, 1, 2, 2, 1, 0, 1, 2, 3]
I've tried to simplify the problem in order to come up with some naive algorithm, but I can't figure out how to keep track of previous and next zero values.
My initial thoughts were to figure out all indexes for zeros in the list and fill the gaps between those zeros with values, but this obviously didn't quite work out for me.
The poorly implemented code (so far I'm just counting down the steps to the next zero):
def get_empty_lot_index(arr: list) -> list:
''' Gets all indices of empty lots '''
lots = []
for i in range(len(arr)):
if arr[i] == 0:
lots.append(i)
return lots
def space_to_empty_lots(arr: list) -> list:
empty_lots = get_empty_lot_index(arr)
new_arr = []
start = 0
for i in empty_lots:
steps = i - start
while steps >= 0:
new_arr.append(steps)
steps -= 1
start = i + 1
return new_arr
One possible algorithm is to make two sweeps through the input list: once forward, once backward. Each time retain the index of the last encountered 0 and store the difference. In the second sweep take the minimum of what was stored in the first sweep and the new result:
def space_to_empty_lots(arr: list) -> list:
result = []
# first sweep
lastZero = -len(arr)
for i, value in enumerate(arr):
if value == 0:
lastZero = i
result.append(i - lastZero)
# second sweep
lastZero = len(arr)*2
for i, value in reversed(list(enumerate(arr))):
if value == 0:
lastZero = i
result[i] = min(result[i], lastZero - i)
return result
NB: this function assumes that there is at least one 0 in the input list. It is not clear what the function should do when there is no 0. In that case this implementation will return a list with values greater than the length of the input list.

Trying to optimize this code: iterating over a list to replace its values

I am trying to do a challenge in Python, the challenge consists of :
Given an array X of positive integers, its elements are to be transformed by running the following operation on them as many times as required:
if X[i] > X[j] then X[i] = X[i] - X[j]
When no more transformations are possible, return its sum ("smallest possible sum").
Basically you pick two non-equal numbers from the array, and replace the largest of them with their subtraction. You repeat this till all numbers in array are same.
I tried a basic approach by using min and max but there is another constraint which is time. I always get timeout because my code is not optimized and takes too much time to execute. Can you please suggest some solutions to make it run faster.
def solution(array):
while len(set(array)) != 1:
array[array.index(max(array))] = max(array) - min(array)
return sum(array)
Thank you so much !
EDIT
I will avoid to spoil the challenge... because I didn't find the solution in Python. But here's the general design of an algorithm that works in Kotlin (in 538 ms). In Python I'm stuck at the middle of the performance tests.
Some thoughts:
First, the idea to remove the minimum from the other elements is good: the modulo (we remove the minimum as long as it is possible) will be small.
Second, if this minimum is 1, the array will be soon full of 1s and the result is N (the len of the array).
Third, if all elements are equal, the result is N times the value of one element.
The algorithm
The idea is to keep two indices: i is the current index that cycles on 0..N and k is the index of the current minimum.
At the beginning, k = i = 0 and the minimum is m = arr[0]. We advance i until one of the following happen:
i == k => we made a full cycle without updating k, return N*m;
arr[i] == 1 => return N;
arr[i] < m => update k and m;
arr[i] > m => compute the new value of arr[i] (that is arr[i] % m or m if arr[i] is a multiple of m). If thats not m, thats arr[i] % m < m: update k and m;
arr[i] == m => pass.
Bascially, we use a rolling minimum and compute the modulos on the fly until all element are the same. That spares the computation of a min of the array periodically.
PREVIOUS ANSWER
As #BallpointBen wrote, you'll get the n times the GCD of all numbers. But that's cheating ;)! If you want to find a solution by hand, you can optimize your code.
While you don't find N identical numbers, you use the set, max (twice!), min and index functions on array. Those functions are pretty expensive. The number of iterations depend on the array.
Imagine the array is sorted in reverse order: [22, 14, 6, 2]. You can replace 22 by 22-14, 14 by 14-6, ... and get: [8, 12, 4, 2]. Sort again: [12, 8, 4, 2], replace again: [4, 4, 4, 2]. Sort again, replace again (if different): [4, 4, 2, 2], [4, 2, 2, 2], [2, 2, 2, 2]. Actually, in the first pass 14 could be replaced by 14-2*6 = 2 (as in the classic GCD computation), giving the following sequence:
[22, 14, 6, 2]
[8, 2, 2, 2]
[2, 2, 2, 2]
The convergence is fast.
def solution2(arr):
N = len(arr)
end = False
while not end:
arr = sorted(arr, reverse=True)
end = True
for i in range(1, N):
while arr[i-1] > arr[i]:
arr[i-1] -= arr[i]
end = False
return sum(arr)
A benchmark:
import random
import timeit
arr = [4*random.randint(1, 100000) for _ in range(100)] # GCD will be 4 or a multiple of 4
assert solution(list(arr)) == solution2(list(arr))
print(timeit.timeit(lambda: solution(list(arr)), number=100))
print(timeit.timeit(lambda: solution2(list(arr)), number=100))
Output:
2.5396839629975148
0.029025810996245127
def solution(a):
N = len(a)
end = False
while not end:
a = sorted(a, reverse=True)
small = min(a)
end = True
for i in range(1, N):
if a[i-1] > small:
a[i-1] = a[i-1]%small if a[i-1]%small !=0 else small
end = False
return sum(a)
made it faster with a slight change
This solution worked for me. I iterated on the list only once. initially I find the minimum and iterating over the list I replace the element with the rest of the division. If I find a rest equal to 1 the result will be trivially 1 multiplied by the length of the list otherwise if it is less than the minimum, i will replace the variable m with the minimum found and continue. Once the iteration is finished, the result will be the minimum for the length of the list.
Here the code:
def solution(a):
L = len(a)
if L == 1:
return a[0]
m=min(a)
for i in range(L):
if a[i] != m:
if a[i] % m != 0:
a[i] = a[i]%m
if a[i]<m:
m=a[i]
elif a[i] % m == 0:
a[i] -= m * (a[i] // m - 1)
if a[i]==1:
return 1*L
return m*L

Is there a python function that returns the first positive int that does not occur in list?

I'm tryin to design a function that, given an array A of N integers, returns the smallest positive integer (greater than 0) that does not occur in A.
This code works fine yet has a high order of complexity, is there another solution that reduces the order of complexity?
Note: The 10000000 number is the range of integers in array A, I tried the sort function but does it reduces the complexity?
def solution(A):
for i in range(10000000):
if(A.count(i)) <= 0:
return(i)
The following is O(n logn):
a = [2, 1, 10, 3, 2, 15]
a.sort()
if a[0] > 1:
print(1)
else:
for i in range(1, len(a)):
if a[i] > a[i - 1] + 1:
print(a[i - 1] + 1)
break
If you don't like the special handling of 1, you could just append zero to the array and have the same logic handle both cases:
a = sorted(a + [0])
for i in range(1, len(a)):
if a[i] > a[i - 1] + 1:
print(a[i - 1] + 1)
break
Caveats (both trivial to fix and both left as an exercise for the reader):
Neither version handles empty input.
The code assumes there no negative numbers in the input.
O(n) time and O(n) space:
def solution(A):
count = [0] * len(A)
for x in A:
if 0 < x <= len(A):
count[x-1] = 1 # count[0] is to count 1
for i in range(len(count)):
if count[i] == 0:
return i+1
return len(A)+1 # only if A = [1, 2, ..., len(A)]
This should be O(n). Utilizes a temporary set to speed things along.
a = [2, 1, 10, 3, 2, 15]
#use a set of only the positive numbers for lookup
temp_set = set()
for i in a:
if i > 0:
temp_set.add(i)
#iterate from 1 upto length of set +1 (to ensure edge case is handled)
for i in range(1, len(temp_set) + 2):
if i not in temp_set:
print(i)
break
My proposal is a recursive function inspired by quicksort.
Each step divides the input sequence into two sublists (lt = less than pivot; ge = greater or equal than pivot) and decides, which of the sublists is to be processed in the next step. Note that there is no sorting.
The idea is that a set of integers such that lo <= n < hi contains "gaps" only if it has less than (hi - lo) elements.
The input sequence must not contain dups. A set can be passed directly.
# all cseq items > 0 assumed, no duplicates!
def find(cseq, cmin=1):
# cmin = possible minimum not ruled out yet
size = len(cseq)
if size <= 1:
return cmin+1 if cmin in cseq else cmin
lt = []
ge = []
pivot = cmin + size // 2
for n in cseq:
(lt if n < pivot else ge).append(n)
return find(lt, cmin) if cmin + len(lt) < pivot else find(ge, pivot)
test = set(range(1,100))
print(find(test)) # 100
test.remove(42)
print(find(test)) # 42
test.remove(1)
print(find(test)) # 1
Inspired by various solutions and comments above, about 20%-50% faster in my (simplistic) tests than the fastest of them (though I'm sure it could be made faster), and handling all the corner cases mentioned (non-positive numbers, duplicates, and empty list):
import numpy
def firstNotPresent(l):
positive = numpy.fromiter(set(l), dtype=int) # deduplicate
positive = positive[positive > 0] # only keep positive numbers
positive.sort()
top = positive.size + 1
if top == 1: # empty list
return 1
sequence = numpy.arange(1, top)
try:
return numpy.where(sequence < positive)[0][0]
except IndexError: # no numbers are missing, top is next
return top
The idea is: if you enumerate the positive, deduplicated, sorted list starting from one, the first time the index is less than the list value, the index value is missing from the list, and hence is the lowest positive number missing from the list.
This and the other solutions I tested against (those from adrtam, Paritosh Singh, and VPfB) all appear to be roughly O(n), as expected. (It is, I think, fairly obvious that this is a lower bound, since every element in the list must be examined to find the answer.) Edit: looking at this again, of course the big-O for this approach is at least O(n log(n)), because of the sort. It's just that the sort is so fast comparitively speaking that it looked linear overall.

Best way to replace values in a list based on many indexes

I have list like this:
l = [1,2,3,4,5,6,7,8,9,10]
idx = [2,5,7]
I want to replace values in l with 0, using indexes from idx. For now I do:
for i in idx:
l[i] = 0
This give: l = [1, 2, 0, 4, 5, 0, 7, 0, 9, 10]
Is there better, faster, more pythonic way. This is only small example, but what if I have huge lists?
If you're talking about huge lists, you should really try not to create a new list, as the new list will require space in memory in addition to your input lists.
Now, let's consider the indices that you want to set to 0. These indices are contained in a list (idx), which itself could be just as long as the list with numbers (l). So, if you were to do something like this:
for i in range(len(l)):
if i in idx:
l[i] = 0
it would take O(mn) time, where m is the number of elements in idx and n is the number of elements in l. This is a really slow algorithm.
Now, you really can't do much faster than O(m), seeing as you have to consider every element in idx. But since m is strictly bounded from above by n, it's definitely a better strategy to loop over idx instead:
for i in idx:
l[i] = 0
But let's consider that idx might contain elements that are not valid indices of l (i.e. there is at least one element in idx whose value is greater than the largest index in l). Then, you could do this:
for i in idx:
if i<len(l):
l[i] = 0
or:
for ind in (i for i in idx if i<len(L)):
l[ind] = 0
Now, this makes O(m) comparisons, which could potentially be improved upon. For example, if idx were sorted, then a modified binary search could provide the appropriate slice of idx that has valid indices:
def binSearch(L, idx, i=0, j=None): # note that the list is not sliced, unlike some common binary search implementations. This saves on additional space
if not idx:
return pad
if j==None:
j = len(idx)-1
mid = (i+j)//2
if idx[mid] == len(L)-1:
return mid
elif idx[mid] > len(L)-1:
return binSearch(L, idx, i, mid-1)
else:
return binSearch(L, idx, mid+1, j)
So now, you could replace only the valid indices without any comparisons at all:
for ind in range(binSearch(L, idx)):
l[idx[ind]] = 0
Note that this approach takes O(log m) time to apply binSearch on idx in the first place
This would work if idx were already sorted. However, if that is an invalid assumption, then you might want to sort it yourself, which would cost O(m log m) time, which would be slower than the aforementioned O(m) implementation.
Yet, if idx were sufficiently large, you could try a distributed approach, with multiprocessing:
import multiprocessing as mp
def replace(l, idx):
numWorkers = mp.cpu_count()*2 -1
qIn = mp.Queue(maxsize=len(idx))
qOut = mp.Queue()
procs = [mp.Process(target=slave, args=(L, qIn, qOut)) for _ in range(numWorkers)]
for p in procs:
p.start()
for i in idx:
qIn.put(i)
numFinished = 0
while numFinished != numWorkers:
i = qOut.get()
if i is None:
numFinished += 1
continue
l[i] = 0
def slave(L, qIn, qOut):
for i in iter(qIn.get, None):
if i< len(L):
qOut.put(i)
qOut.put(None)
Of course, you could further improve this by adding the binSearch to the distributed solution as well, but I'll leave that to you.
Don't create another list for index. Instead:
l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
index = 1
while index < len(l):
if index == 2:
l[index] = 0
elif index == 5:
l[index] = 0
elif index == 7:
l[index] = 0
index += 1
print(l)
You do not have to use "elif" statements if you combine them all on one line with an "or" statement. For example:
l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
index = 1
while index < len(l):
if (index == 2) or (index == 5) or (index == 7):
l[index] = 0
index += 1
print(l)
I think this is perfectly fine. You could write a list comprehension, like this:
[v if i not in idx else 0 for i, v in enumerate(l)]
Or change it in place by iterating over l
for i, v in enumerate(l):
if i in idx:
l[i] = 0
But I find that harder to read, and very likely slower. I don't think any other solution will beat yours by a significant margin, ignoring CPU caching.

N random, contiguous and non-overlapping subsequences each of length

I'm trying to get n random and non-overlapping slices of a sequence where each subsequence is of length l, preferably in the order they appear.
This is the code I have so far and it's gotten more and more messy with each attempt to make it work, needless to say it doesn't work.
def rand_parts(seq, n, l):
"""
return n random non-overlapping partitions each of length l.
If n * l > len(seq) raise error.
"""
if n * l > len(seq):
raise Exception('length of seq too short for given n, l arguments')
if not isinstance(seq, list):
seq = list(seq)
gaps = [0] * (n + 1)
for g in xrange(len(seq) - (n * l)):
gaps[random.randint(0, len(gaps) - 1)] += 1
result = []
for i, g in enumerate(gaps):
x = g + (i * l)
result.append(seq[x:x+l])
if i < len(gaps) - 1:
gaps[i] += x
return result
For example if we say rand_parts([1, 2, 3, 4, 5, 6], 2, 2) there are 6 possible results that it could return from the following diagram:
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
So [[3, 4], [5, 6]] would be acceptable but [[3, 4], [4, 5]] wouldn't because it's overlapping and [[2, 4], [5, 6]] also wouldn't because [2, 4] isn't contiguous.
I encountered this problem while doing a little code golfing so for interests sake it would also be nice to see both a simple solution and/or an efficient one, not so much interested in my existing code.
def rand_parts(seq, n, l):
indices = xrange(len(seq) - (l - 1) * n)
result = []
offset = 0
for i in sorted(random.sample(indices, n)):
i += offset
result.append(seq[i:i+l])
offset += l - 1
return result
To understand this, first consider the case l == 1. Then it's basically just returning a random.sample() of the input data in sorted order; in this case the offset variable is always 0.
The case where l > 1 is an extension of the previous case. We use random.sample() to pick up positions, but maintain an offset to shift successive results: in this way, we make sure that they are non-overlapping ranges --- i.e. they start at a distance of at least l of each other, rather than 1.
Many solutions can be hacked for this problem, but one has to be careful if the sequences are to be strictly random. For example, it's wrong to begin by picking a random number between 0 and len(seq)-n*l and say that the first sequence will start there, then work recursively.
The problem is equivalent to selecting randomly n+1 integer numbers such that their sum is equal to len(seq)-l*n. (These numbers will be the "gaps" between your sequences.) To solve it, you can see this question.
This worked for me in Python 3.3.2. It should be backwards compatible with Python 2.7.
from random import randint as r
def greater_than(n, lis, l):
for element in lis:
if n < element + l:
return False
return True
def rand_parts(seq, n, l):
"""
return n random non-overlapping partitions each of length l.
If n * l > len(seq) raise error.
"""
if n * l > len(seq):
raise(Exception('length of seq too short for given n, l arguments'))
if not isinstance(seq, list):
seq = list(seq)
# Setup
left_to_do = n
tried = []
result = []
# The main loop
while left_to_do > 0:
while True:
index = r(0, len(seq) - 1)
if greater_than(index, tried, l) and index <= len(seq) - left_to_do * l:
tried.append(index)
break
left_to_do -= 1
result.append(seq[index:index+l])
# Done
return result
a = [1, 2, 3, 4, 5, 6]
print(rand_parts(a, 3, 2))
The above code will always print [[1, 2], [3, 4], [5, 6]]
If you do it recursively it's much simpler. Take the first part from (so the rest will fit):
[0:total_len - (numer_of_parts - 1) * (len_of_parts)]
and then recurse with what left to do:
rand_parts(seq - begining _to_end_of_part_you_grabbed, n - 1, l)
First of all, I think you need to clarify what you mean by the term random.
How can you generate a truly random list of sub-sequences when you are placing specific restrictions on the sub-sequences themselves?
As far as I know, the best "randomness" anyone can achieve in this context is generating all lists of sub-sequences that satisfy your criteria, and selecting from the pool however many you need in a random fashion.
Now based on my experience from an algorithms class that I've taken a few years ago, your problem seems to be a typical example which could be solved using a greedy algorithm making these big (but likely?) assumptions about what you were actually asking in the first place:
What you actually meant by random is not that a list of sub-sequence should be generated randomly (which is kind of contradictory as I said before), but that any of the solutions that could be produced is just as valid as the rest (e.g. any of the 6 solutions is valid from input [1,2,3,4,5,6] and you don't care which one)
Restating the above, you just want any one of the possible solutions that could be generated, and you want an algorithm that can output one of these valid answers.
Assuming the above here is a greedy algorithm which generates one of the possible lists of sub-sequences in linear time (excluding sorting, which is O(n*log(n))):
def subseq(seq, count, length):
s = sorted(list(set(seq)))
result = []
subseq = []
for n in s:
if len(subseq) == length:
result.append(subseq)
if len(result) == count:
return result
subseq = [n]
elif len(subseq) == 0:
subseq.append(n)
elif subseq[-1] + 1 == n:
subseq.append(n)
elif subseq[-1] + 1 < n:
subseq = [n]
print("Impossible!")
The gist of the algorithm is as follows:
One of your requirements is that there cannot be any overlaps, and this ultimately implies you need to deal with unique numbers and unique numbers only. So I use the set() operation to get rid all the duplicates. Then I sort it.
Rest is pretty straight forward imo. I just iterate over the sorted list and form sub-sequences greedily.
If the algorithm can't form enough number of sub-sequences then print "Impossible!"
Hope this was what you were looking for.
EDIT: For some reason I wrongly assumed that there couldn't be repeating values in a sub-sequence, this one allows it.
def subseq2(seq, count, length):
s = sorted(seq)
result = []
subseq = []
for n in s:
if len(subseq) == length:
result.append(subseq)
if len(result) == count:
return result
subseq = [n]
elif len(subseq) == 0:
subseq.append(n)
elif subseq[-1] + 1 == n or subseq[-1] == n:
subseq.append(n)
elif subseq[-1] + 1 < n:
subseq = [n]
print("Impossible!")

Categories