Finding the subarray with the minimum range

Finding the subarray with the minimum range - python

Given an array of N positive integers, ranging from index 0 to N - 1, how can I find a contiguous subarray of length K with the minimum range possible. In other words, max(subarray) - min(subarray) is minimised. If there are multiple answers, any is fine.
For example, find the subarray of length 2 with the smallest range from [4, 1, 2, 6]
The answer would be [1, 2] as 2 - 1 = 1 gives the smallest range of all possible contiguous subarrays.
Other subarrays are [4, 1] (range 3), [2, 6] (range 4)
I'm using python and so far I've tried a linear search with min() max() functions and it just doesn't seem efficient to do so. I've thought of using a minheap but I'm not sure how you would implement this and I'm not even sure if it would work. Any help would be much appreciated.
edit: code added
# N = length of array_of_heights, K = subarray length
N, K = map(int, input().split(' '))
array_of_heights = [int(i) for i in input().split(' ')]
min_min = 100000000000000000000
# iterates through all subarrays in the array_of_heights of length K
for i in range(N + 1 - K):
subarray = land[i : i + K]
min_min = min(max(subarray)-min(subarray), min_min)
print(min_min)

There is linear-time algorithm O(N)for getting min or max in moving window of specified size (while your implementation has O(N*K) complexity)
Using deque from collections module you can implement two parallel deques keeping minumum and maximum for current window position and retrieve the best difference after the only walk through the list.
import collections
def mindiff(a, k):
dqmin = collections.deque()
dqmax = collections.deque()
best = 100000000
for i in range(len(a)):
if len(dqmin) > 0 and dqmin[0] <= i - k:
dqmin.popleft()
while len(dqmin) > 0 and a[dqmin[-1]] > a[i]:
dqmin.pop()
dqmin.append(i)
if len(dqmax) > 0 and dqmax[0] <= i - k:
dqmax.popleft()
while len(dqmax) > 0 and a[dqmax[-1]] < a[i]:
dqmax.pop()
dqmax.append(i)
if i >= k - 1:
best = min(best, a[dqmax[0]]-a[dqmin[0]])
return best
print(mindiff([4, 1, 2, 6], 2))

You could use numpy to improve execution time.
Example:
def f1(l,k):
subs = np.array([l[i:i+k] for i in range(len(l)-k+1)])
return np.min(subs.max(axis=1) - subs.min(axis=1))
Small test (f2 is your function here).
>>> arr = np.random.randint(100,size=10000)
>>> timeit.timeit("f1(arr,4)",setup="from __main__ import f1,f2,np,arr",number=1)
0.01172515214420855
>>> timeit.timeit("f2(arr,4)",setup="from __main__ import f1,f2,np,arr",number=1)
14.226237731054425

Related

Sum by Factors From Codewars.com

Sinopsis: my code runs well with simple lists, but when I attempt, after the 4 basic test its execution time gets timed out.
Since I don't want to look for others solution, I'm asking for help and someone can show me which part of the code its messing with the time execution in order to focus only into modify that part.
Note: I don't want a finally solution, just know which part of the code I have to change please
Exercise:
Given an array of positive or negative integers
I= [i1,..,in]
you have to produce a sorted array P of the form
[ [p, sum of all ij of I for which p is a prime factor (p positive) of ij] ...]
P will be sorted by increasing order of the prime numbers. The final result has to be given as a string in Java, C# or C++ and as an array of arrays in other languages.
Example:
I = [12, 15] # result = [[2, 12], [3, 27], [5, 15]]
[2, 3, 5] is the list of all prime factors of the elements of I, hence the result.
Notes: It can happen that a sum is 0 if some numbers are negative!
Example: I = [15, 30, -45] 5 divides 15, 30 and (-45) so 5 appears in the result, the sum of the numbers for which 5 is a factor is 0 so we have [5, 0] in the result amongst others.
`
def sum_for_list(lst):
if len(lst) == 0:
return []
max = sorted(list(map(lambda x: abs(x), lst)), reverse = True)[0]
#create the list with the primes, already filtered
primes = []
for i in range (2, max + 1):
for j in range (2, i):
if i % j == 0:
break
else:
for x in lst:
if x % i == 0:
primes.append([i])
break
#i add the sums to the primes
for i in primes:
sum = 0
for j in lst:
if j % i[0] == 0:
sum += j
i.append(sum)
return primes
`
Image
I tried to simplyfy the code as much as I could but same result.
I also tried other ways to iterate in the first step:
# Find the maximum value in the list
from functools import reduce
max = reduce(lambda x,y: abs(x) if abs(x)>abs(y) else abs(y), lst)

Finding the maximum increase within a given sequence

I am currently working on a function that takes a sequence and returns the maximum increase from one element to the other at a higher index. However, the function is not returning the correct maximum increase.
I have put a for loop inside a for loop, then tried to return the maximum value out of all the differences, which did not work (it said 'int' object is not iterable)
def max_increase(seq):
i = 0
maximum_increase = 0
for i in range(len(seq)):
difference = 0
for j in range(i + 1, len(seq)):
difference = seq[j] - seq[i]
if 0 <= maximum_increase < difference:
maximum_increase = difference
return maximum_increase
For max_increase([1,2,3,5,0]), it should return 4 since from the differences list [1,2,4,-1,1,3,-2,2,-3,-5], the maximum is 4. However, my function returns a negative value, -1.

You have an indentation problem. This fixes it:
def max_increase(seq):
i = 0
maximum_increase = 0
for i in range(len(seq)):
difference = 0
for j in range(i + 1, len(seq)):
difference = seq[j] - seq[i]
if 0 <= maximum_increase < difference:
maximum_increase = difference
return maximum_increase

Given you have received help in debugging your code already, here is a short pythonic solution to the problem:
>>> l=[1,2,3,5,0]
>>> inc = (i-el for p, el in enumerate(l) for i in l[p:])
>>> max(inc)
4
but even better is one that avoids creating unnecessary slices (at the cost of reversing the sequence):
import itertools as it
def incs(seq):
pr = []
for el in reversed(seq):
print(f"{el} is compared with {pr}")
yield (i-el for i in pr)
pr.append(el)
seq = [1, 2, 3, 5, 0]
print("The max inc is", max(it.chain.from_iterable(incs(seq))))
which produces
0 is compared with []
5 is compared with [0]
3 is compared with [0, 5]
2 is compared with [0, 5, 3]
1 is compared with [0, 5, 3, 2]
The max inc is 4
Note: in case the increase is to be intended as the distance between the two numbers, i.e. always positive irrespective of sign, then make the change
yield (abs(i-el) for i in pr)

you can use:
my_l = [1,2,3,5,0]
max((e for i in range(len(my_l)) for e in (j - my_l[i] for j in my_l[i + 1:])))
output:
4

Trying to optimize this code: iterating over a list to replace its values

I am trying to do a challenge in Python, the challenge consists of :
Given an array X of positive integers, its elements are to be transformed by running the following operation on them as many times as required:
if X[i] > X[j] then X[i] = X[i] - X[j]
When no more transformations are possible, return its sum ("smallest possible sum").
Basically you pick two non-equal numbers from the array, and replace the largest of them with their subtraction. You repeat this till all numbers in array are same.
I tried a basic approach by using min and max but there is another constraint which is time. I always get timeout because my code is not optimized and takes too much time to execute. Can you please suggest some solutions to make it run faster.
def solution(array):
while len(set(array)) != 1:
array[array.index(max(array))] = max(array) - min(array)
return sum(array)
Thank you so much !

EDIT
I will avoid to spoil the challenge... because I didn't find the solution in Python. But here's the general design of an algorithm that works in Kotlin (in 538 ms). In Python I'm stuck at the middle of the performance tests.
Some thoughts:
First, the idea to remove the minimum from the other elements is good: the modulo (we remove the minimum as long as it is possible) will be small.
Second, if this minimum is 1, the array will be soon full of 1s and the result is N (the len of the array).
Third, if all elements are equal, the result is N times the value of one element.
The algorithm
The idea is to keep two indices: i is the current index that cycles on 0..N and k is the index of the current minimum.
At the beginning, k = i = 0 and the minimum is m = arr[0]. We advance i until one of the following happen:
i == k => we made a full cycle without updating k, return N*m;
arr[i] == 1 => return N;
arr[i] < m => update k and m;
arr[i] > m => compute the new value of arr[i] (that is arr[i] % m or m if arr[i] is a multiple of m). If thats not m, thats arr[i] % m < m: update k and m;
arr[i] == m => pass.
Bascially, we use a rolling minimum and compute the modulos on the fly until all element are the same. That spares the computation of a min of the array periodically.
PREVIOUS ANSWER
As #BallpointBen wrote, you'll get the n times the GCD of all numbers. But that's cheating ;)! If you want to find a solution by hand, you can optimize your code.
While you don't find N identical numbers, you use the set, max (twice!), min and index functions on array. Those functions are pretty expensive. The number of iterations depend on the array.
Imagine the array is sorted in reverse order: [22, 14, 6, 2]. You can replace 22 by 22-14, 14 by 14-6, ... and get: [8, 12, 4, 2]. Sort again: [12, 8, 4, 2], replace again: [4, 4, 4, 2]. Sort again, replace again (if different): [4, 4, 2, 2], [4, 2, 2, 2], [2, 2, 2, 2]. Actually, in the first pass 14 could be replaced by 14-2*6 = 2 (as in the classic GCD computation), giving the following sequence:
[22, 14, 6, 2]
[8, 2, 2, 2]
[2, 2, 2, 2]
The convergence is fast.
def solution2(arr):
N = len(arr)
end = False
while not end:
arr = sorted(arr, reverse=True)
end = True
for i in range(1, N):
while arr[i-1] > arr[i]:
arr[i-1] -= arr[i]
end = False
return sum(arr)
A benchmark:
import random
import timeit
arr = [4*random.randint(1, 100000) for _ in range(100)] # GCD will be 4 or a multiple of 4
assert solution(list(arr)) == solution2(list(arr))
print(timeit.timeit(lambda: solution(list(arr)), number=100))
print(timeit.timeit(lambda: solution2(list(arr)), number=100))
Output:
2.5396839629975148
0.029025810996245127

def solution(a):
N = len(a)
end = False
while not end:
a = sorted(a, reverse=True)
small = min(a)
end = True
for i in range(1, N):
if a[i-1] > small:
a[i-1] = a[i-1]%small if a[i-1]%small !=0 else small
end = False
return sum(a)
made it faster with a slight change

This solution worked for me. I iterated on the list only once. initially I find the minimum and iterating over the list I replace the element with the rest of the division. If I find a rest equal to 1 the result will be trivially 1 multiplied by the length of the list otherwise if it is less than the minimum, i will replace the variable m with the minimum found and continue. Once the iteration is finished, the result will be the minimum for the length of the list.
Here the code:
def solution(a):
L = len(a)
if L == 1:
return a[0]
m=min(a)
for i in range(L):
if a[i] != m:
if a[i] % m != 0:
a[i] = a[i]%m
if a[i]<m:
m=a[i]
elif a[i] % m == 0:
a[i] -= m * (a[i] // m - 1)
if a[i]==1:
return 1*L
return m*L

Is there a python function that returns the first positive int that does not occur in list?

I'm tryin to design a function that, given an array A of N integers, returns the smallest positive integer (greater than 0) that does not occur in A.
This code works fine yet has a high order of complexity, is there another solution that reduces the order of complexity?
Note: The 10000000 number is the range of integers in array A, I tried the sort function but does it reduces the complexity?
def solution(A):
for i in range(10000000):
if(A.count(i)) <= 0:
return(i)

The following is O(n logn):
a = [2, 1, 10, 3, 2, 15]
a.sort()
if a[0] > 1:
print(1)
else:
for i in range(1, len(a)):
if a[i] > a[i - 1] + 1:
print(a[i - 1] + 1)
break
If you don't like the special handling of 1, you could just append zero to the array and have the same logic handle both cases:
a = sorted(a + [0])
for i in range(1, len(a)):
if a[i] > a[i - 1] + 1:
print(a[i - 1] + 1)
break
Caveats (both trivial to fix and both left as an exercise for the reader):
Neither version handles empty input.
The code assumes there no negative numbers in the input.

O(n) time and O(n) space:
def solution(A):
count = [0] * len(A)
for x in A:
if 0 < x <= len(A):
count[x-1] = 1 # count[0] is to count 1
for i in range(len(count)):
if count[i] == 0:
return i+1
return len(A)+1 # only if A = [1, 2, ..., len(A)]

This should be O(n). Utilizes a temporary set to speed things along.
a = [2, 1, 10, 3, 2, 15]
#use a set of only the positive numbers for lookup
temp_set = set()
for i in a:
if i > 0:
temp_set.add(i)
#iterate from 1 upto length of set +1 (to ensure edge case is handled)
for i in range(1, len(temp_set) + 2):
if i not in temp_set:
print(i)
break

My proposal is a recursive function inspired by quicksort.
Each step divides the input sequence into two sublists (lt = less than pivot; ge = greater or equal than pivot) and decides, which of the sublists is to be processed in the next step. Note that there is no sorting.
The idea is that a set of integers such that lo <= n < hi contains "gaps" only if it has less than (hi - lo) elements.
The input sequence must not contain dups. A set can be passed directly.
# all cseq items > 0 assumed, no duplicates!
def find(cseq, cmin=1):
# cmin = possible minimum not ruled out yet
size = len(cseq)
if size <= 1:
return cmin+1 if cmin in cseq else cmin
lt = []
ge = []
pivot = cmin + size // 2
for n in cseq:
(lt if n < pivot else ge).append(n)
return find(lt, cmin) if cmin + len(lt) < pivot else find(ge, pivot)
test = set(range(1,100))
print(find(test)) # 100
test.remove(42)
print(find(test)) # 42
test.remove(1)
print(find(test)) # 1

Inspired by various solutions and comments above, about 20%-50% faster in my (simplistic) tests than the fastest of them (though I'm sure it could be made faster), and handling all the corner cases mentioned (non-positive numbers, duplicates, and empty list):
import numpy
def firstNotPresent(l):
positive = numpy.fromiter(set(l), dtype=int) # deduplicate
positive = positive[positive > 0] # only keep positive numbers
positive.sort()
top = positive.size + 1
if top == 1: # empty list
return 1
sequence = numpy.arange(1, top)
try:
return numpy.where(sequence < positive)[0][0]
except IndexError: # no numbers are missing, top is next
return top
The idea is: if you enumerate the positive, deduplicated, sorted list starting from one, the first time the index is less than the list value, the index value is missing from the list, and hence is the lowest positive number missing from the list.
This and the other solutions I tested against (those from adrtam, Paritosh Singh, and VPfB) all appear to be roughly O(n), as expected. (It is, I think, fairly obvious that this is a lower bound, since every element in the list must be examined to find the answer.) Edit: looking at this again, of course the big-O for this approach is at least O(n log(n)), because of the sort. It's just that the sort is so fast comparitively speaking that it looked linear overall.

N random, contiguous and non-overlapping subsequences each of length

I'm trying to get n random and non-overlapping slices of a sequence where each subsequence is of length l, preferably in the order they appear.
This is the code I have so far and it's gotten more and more messy with each attempt to make it work, needless to say it doesn't work.
def rand_parts(seq, n, l):
"""
return n random non-overlapping partitions each of length l.
If n * l > len(seq) raise error.
"""
if n * l > len(seq):
raise Exception('length of seq too short for given n, l arguments')
if not isinstance(seq, list):
seq = list(seq)
gaps = [0] * (n + 1)
for g in xrange(len(seq) - (n * l)):
gaps[random.randint(0, len(gaps) - 1)] += 1
result = []
for i, g in enumerate(gaps):
x = g + (i * l)
result.append(seq[x:x+l])
if i < len(gaps) - 1:
gaps[i] += x
return result
For example if we say rand_parts([1, 2, 3, 4, 5, 6], 2, 2) there are 6 possible results that it could return from the following diagram:
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
[1, 2, 3, 4, 5, 6]
____ ____
So [[3, 4], [5, 6]] would be acceptable but [[3, 4], [4, 5]] wouldn't because it's overlapping and [[2, 4], [5, 6]] also wouldn't because [2, 4] isn't contiguous.
I encountered this problem while doing a little code golfing so for interests sake it would also be nice to see both a simple solution and/or an efficient one, not so much interested in my existing code.

def rand_parts(seq, n, l):
indices = xrange(len(seq) - (l - 1) * n)
result = []
offset = 0
for i in sorted(random.sample(indices, n)):
i += offset
result.append(seq[i:i+l])
offset += l - 1
return result
To understand this, first consider the case l == 1. Then it's basically just returning a random.sample() of the input data in sorted order; in this case the offset variable is always 0.
The case where l > 1 is an extension of the previous case. We use random.sample() to pick up positions, but maintain an offset to shift successive results: in this way, we make sure that they are non-overlapping ranges --- i.e. they start at a distance of at least l of each other, rather than 1.

Many solutions can be hacked for this problem, but one has to be careful if the sequences are to be strictly random. For example, it's wrong to begin by picking a random number between 0 and len(seq)-n*l and say that the first sequence will start there, then work recursively.
The problem is equivalent to selecting randomly n+1 integer numbers such that their sum is equal to len(seq)-l*n. (These numbers will be the "gaps" between your sequences.) To solve it, you can see this question.

This worked for me in Python 3.3.2. It should be backwards compatible with Python 2.7.
from random import randint as r
def greater_than(n, lis, l):
for element in lis:
if n < element + l:
return False
return True
def rand_parts(seq, n, l):
"""
return n random non-overlapping partitions each of length l.
If n * l > len(seq) raise error.
"""
if n * l > len(seq):
raise(Exception('length of seq too short for given n, l arguments'))
if not isinstance(seq, list):
seq = list(seq)
# Setup
left_to_do = n
tried = []
result = []
# The main loop
while left_to_do > 0:
while True:
index = r(0, len(seq) - 1)
if greater_than(index, tried, l) and index <= len(seq) - left_to_do * l:
tried.append(index)
break
left_to_do -= 1
result.append(seq[index:index+l])
# Done
return result
a = [1, 2, 3, 4, 5, 6]
print(rand_parts(a, 3, 2))
The above code will always print [[1, 2], [3, 4], [5, 6]]

If you do it recursively it's much simpler. Take the first part from (so the rest will fit):
[0:total_len - (numer_of_parts - 1) * (len_of_parts)]
and then recurse with what left to do:
rand_parts(seq - begining _to_end_of_part_you_grabbed, n - 1, l)

First of all, I think you need to clarify what you mean by the term random.
How can you generate a truly random list of sub-sequences when you are placing specific restrictions on the sub-sequences themselves?
As far as I know, the best "randomness" anyone can achieve in this context is generating all lists of sub-sequences that satisfy your criteria, and selecting from the pool however many you need in a random fashion.
Now based on my experience from an algorithms class that I've taken a few years ago, your problem seems to be a typical example which could be solved using a greedy algorithm making these big (but likely?) assumptions about what you were actually asking in the first place:
What you actually meant by random is not that a list of sub-sequence should be generated randomly (which is kind of contradictory as I said before), but that any of the solutions that could be produced is just as valid as the rest (e.g. any of the 6 solutions is valid from input [1,2,3,4,5,6] and you don't care which one)
Restating the above, you just want any one of the possible solutions that could be generated, and you want an algorithm that can output one of these valid answers.
Assuming the above here is a greedy algorithm which generates one of the possible lists of sub-sequences in linear time (excluding sorting, which is O(n*log(n))):
def subseq(seq, count, length):
s = sorted(list(set(seq)))
result = []
subseq = []
for n in s:
if len(subseq) == length:
result.append(subseq)
if len(result) == count:
return result
subseq = [n]
elif len(subseq) == 0:
subseq.append(n)
elif subseq[-1] + 1 == n:
subseq.append(n)
elif subseq[-1] + 1 < n:
subseq = [n]
print("Impossible!")
The gist of the algorithm is as follows:
One of your requirements is that there cannot be any overlaps, and this ultimately implies you need to deal with unique numbers and unique numbers only. So I use the set() operation to get rid all the duplicates. Then I sort it.
Rest is pretty straight forward imo. I just iterate over the sorted list and form sub-sequences greedily.
If the algorithm can't form enough number of sub-sequences then print "Impossible!"
Hope this was what you were looking for.
EDIT: For some reason I wrongly assumed that there couldn't be repeating values in a sub-sequence, this one allows it.
def subseq2(seq, count, length):
s = sorted(seq)
result = []
subseq = []
for n in s:
if len(subseq) == length:
result.append(subseq)
if len(result) == count:
return result
subseq = [n]
elif len(subseq) == 0:
subseq.append(n)
elif subseq[-1] + 1 == n or subseq[-1] == n:
subseq.append(n)
elif subseq[-1] + 1 < n:
subseq = [n]
print("Impossible!")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding the subarray with the minimum range - python

Related

Sum by Factors From Codewars.com

Finding the maximum increase within a given sequence

Trying to optimize this code: iterating over a list to replace its values

Is there a python function that returns the first positive int that does not occur in list?

N random, contiguous and non-overlapping subsequences each of length

Categories

Resources