Find largest substring of numbers within a tolerance level - python

I have the following input:
a tolerance level T
Number of numbers N
N numbers
The task is to find the longest period within those N numbers such that they are within the tolerance level. More precisely, given a left and a right bound of a substring l and r and two distinct elements a1 and a2 between the two bounds, it must hold that |a1 - a1| <= T. How can I do this in an efficient way? My approach is:
def getLength(T, N, numbers):
max_length = 1
for i in range(0, N-1):
start = numbers[i]
numlist = [start]
for j in range(i+1, N):
end = numbers[j]
numlist.append(end)
if (max(numlist) - min(numlist)) > T:
break
if (j-i+1) > max_length:
max_length = j-i+1
return max_length
EDIT: To make it clear. The code works as expected. However, it is not efficient enough. I would like to do it more efficiently.

First of all, I'm not sure if your code does what you describe in your question. Secondly, it takes (much) less than second to process 12,000 random numbers.
Regardless, it can be sped up by not calling min() and max() on the numlist every time a new element is appended to it. Instead you can just update the current minimum and maximum variables with a couple of if statements.
Here code showing that being done, along with a simple framework I wrote for timing performance:
def getLength(T, N, numbers):
max_length = 1
for i in range(N-1):
start = numbers[i]
numlist = [start]
min_numlist = max_numlist = start # Added variables.
for j in range(i+1, N):
end = numbers[j]
numlist.append(end)
# Inefficient - replaced.
# if (max(numlist) - min(numlist)) > T:
# break
# Update extremities.
if end > max_numlist:
max_numlist = end
if end < min_numlist:
min_numlist = end
if max_numlist-min_numlist > T:
break
if j-i+1 > max_length:
max_length = j-i+1
return max_length
if __name__ == '__main__':
import random
import time
random.seed(42) # Use hardcoded seed to get same numbers each time run.
T = 100
N = 12000
numbers = [random.randrange(1000) for _ in range(N)]
starttime = time.time()
max_length = getLength(T, N, numbers)
stoptime = time.time()
print('max length: {}'.format(max_length))
print('processing {:,d} elements took {:.5f} secs'.format(N, stoptime-starttime))

Related

Performance improvement for calculating the powerset of a list of integers

I am trying to compute the powerset of a list of prime numbers. I have already done some research and the prefered way of doing this seems to be using a line like
itertools.chain.from_iterable(itertools.combinations(primes, r) for r in range(2, len(primes) + 1))
and then iterating over all combinations to get the products with math.prod(). All in all, the code currently looks like this:
number = 200
p1 = []
# calculate all primes below specified number
for i in range(2, number + 1):
isPrime = True
for prime in p1:
if i % prime == 0:
isPrime = False
if isPrime:
p1.append(i)
Pp = []
myIterable = itertools.chain.from_iterable(itertools.combinations(p1, r) for r in range(2, len(p1) + 1))
# convert iterable to integer array of products -- The code below is extremely slow and should be improved
for x in myIterable:
newValue = math.prod(x)
if newValue <= number:
Pp.append(newValue)
This works, but it is not feasible for any "number" greater than 100 because of too high execution time. The problem is the last for loop, which takes forever to compute. Everything else performs reasonably well. The powerset has to be constricted to sets, whos products are less or equal to number, as done using the last if statement, or else the memory will explode.
The solution to this problem was to create a pointer array, which crawls through the prime array until the product of the pointed primes gets too high. The needed helper functions can be implemented like this:
def calcProductOfPointers(pointerArray, dataArray):
prod = 1
for pointer in pointerArray:
prod *= dataArray[pointer]
return prod
def incrementPointer(pointerArray, dataArray, threshold):
ret = False
for i in range(1, len(pointerArray) + 1):
index = len(pointerArray) - i
pointerArray[index] += 1
if calcProductOfPointers(pointerArray, dataArray) <= threshold and pointerArray[index] < len(dataArray):
ret = True
break
elif index > 0:
pointerArray[index] = pointerArray[index - 1] + 2
else:
break
return ret
And then the iteration over all powersets can be substituted with this code:
Pp = []
for i in range(2, len(p1) + 1): # start at a minimum of 2 prime factors
primePointers = []
for index in range(i):
primePointers.append(index)
if calcProductOfPointers(primePointers, p1) > number:
break
while calcProductOfPointers(primePointers, p1) <= number:
Pp.append(calcProductOfPointers(primePointers, p1))
if not incrementPointer(primePointers, p1, number):
break

How to count the number of unique numbers in sorted array using Binary Search?

I am trying to count the number of unique numbers in a sorted array using binary search. I need to get the edge of the change from one number to the next to count. I was thinking of doing this without using recursion. Is there an iterative approach?
def unique(x):
start = 0
end = len(x)-1
count =0
# This is the current number we are looking for
item = x[start]
while start <= end:
middle = (start + end)//2
if item == x[middle]:
start = middle+1
elif item < x[middle]:
end = middle -1
#when item item greater, change to next number
count+=1
# if the number
return count
unique([1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,5,5,5,5,5,5,5,5,5,5])
Thank you.
Edit: Even if the runtime benefit is negligent from o(n), what is my binary search missing? It's confusing when not looking for an actual item. How can I fix this?
Working code exploiting binary search (returns 3 for given example).
As discussed in comments, complexity is about O(k*log(n)) where k is number of unique items, so this approach works well when k is small compared with n, and might become worse than linear scan in case of k ~ n
def countuniquebs(A):
n = len(A)
t = A[0]
l = 1
count = 0
while l < n - 1:
r = n - 1
while l < r:
m = (r + l) // 2
if A[m] > t:
r = m
else:
l = m + 1
count += 1
if l < n:
t = A[l]
return count
print(countuniquebs([1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,5,5,5,5,5,5,5,5,5,5]))
I wouldn't quite call it "using a binary search", but this binary divide-and-conquer algorithm works in O(k*log(n)/log(k)) time, which is better than a repeated binary search, and never worse than a linear scan:
def countUniques(A, start, end):
len = end-start
if len < 1:
return 0
if A[start] == A[end-1]:
return 1
if len < 3:
return 2
mid = start + len//2
return countUniques(A, start, mid+1) + countUniques(A, mid, end) - 1
A = [1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,3,4,5,5,5,5,5,5,5,5,5,5]
print(countUniques(A,0,len(A)))

Speeding up the iteration of a generator over a list

I have two functions found on the internet. Both are SieveOfEratosthenes, but one is a generator and one is a straight list creator. I'm seeing a speed difference though when iterating and am confused because the generator is faster in the event that you create the full list from it. I'm wondering if it's possible to iterate over the generator faster as it would be very beneficial to me as i'm trying to speed up some code. Here are the functions:
# Generator
def primes_sieve2(limit):
a = [True] * limit
a[0] = a[1] = False
for (i, isprime) in enumerate(a):
if isprime:
yield i
for n in range(i*i, limit, i):
a[n] = False
# Creates complete List
def SieveOfEratosthenes(n):
# Create a boolean array "prime[0..n]" and initialize
# all entries it as true. A value in prime[i] will
# finally be false if i is Not a prime, else true.
prime = [True for i in range(n+1)]
p = 2
while (p * p <= n):
# If prime[p] is not changed, then it is a prime
if (prime[p] == True):
# Update all multiples of p
for i in range(p * p, n+1, p):
prime[i] = False
p += 1
size = 0
for p in range(2, n+1):
if prime[p]:
size+=1
#vv = np.zeros(size, dtype='int64')
vv = []
count = 0
# Print all prime numbers
for p in range(2, n+1):
if prime[p]:
#vv[count] = p
vv.append(p)
count+=1
return vv
Now, here are the timing differences, the generator is fast at creating the list, but slower when iterating over it, which is confusing me, so i thought i'd ask to see if i'm doing something wrong and to why this is the output. Here are the speed differences:
# The generator is slower at iteration
import time
start = time.time()
small_primes2 = primes_sieve2(10000000)
for xx in small_primes2:
xx
end = time.time()
print(end-start)
2.776512861251831
small_primes3 = SieveOfEratosthenes(10000000)
import time
start = time.time()
for xx in small_primes3:
xx
end = time.time()
print(end-start)
0.043845176696777344
# The generator is faster at creating the full list when not iterating.
import time
start = time.time()
small_primes3 = SieveOfEratosthenes(10000000)
end = time.time()
print(end-start)
3.6824228763580322
import time
start = time.time()
small_primes2 = list(primes_sieve2(10000000))
end = time.time()
print(end-start)
2.7591686248779297
You can see the generator creates the list in about a second faster, but is 2.3 seconds slower when iterating over it. Is this expected or am i doing something incorrectly?
I thought that since the generator could create the list faster than the non generator Sieve, that i could iterate over it faster but that is not the case.

Longest Arithmetic Progression

Given a list of numbers arr (not sorted) , find the Longest Arithmetic Progression in it.
Arrays: Integer a
1 ≤ arr.size() ≤ 10^3. and
-10^9 ≤ arr[i] ≤ 10^9.
Examples:
arr = [7,6,1,9,7,9,5,6,1,1,4,0] -------------- output = [7,6,5,4]
arr = [4,4,6,7,8,13,45,67] -------------- output = [4,6,8]
from itertools import combinations
def arithmeticProgression2(a):
n=len(a)
diff = ((y-x, x) for x, y in combinations(a, 2))
dic=[]
for d, n in diff:
k = []
seq=a
while n in seq:
k.append(n)
i=seq.index(n)
seq=seq[i+1:]
n += d
dic.append(k)
maxx=max([len(k) for k in dic])
for x in dic:
if len(x)==maxx:
return x
in case arr.size() is big enough. my code will be run more than 4000ms.
Example :
arr = [randint(-10**9,10**9) for i in range(10**3)]
runtime > 4000ms
How to reduce the space complexity for the above solution?
One of the things that makes the code slow is that you build series from scratch for each pair, which is not necessary:
you don't actually need to build k each time. If you just keep the step, the length and the start (or end) value of a progression, you know enough. Only build the progression explicitly for the final result
by doing this for each pair, you also create series where the start point is in fact in the middle of a longer series (having the same step), and so you partly do double work, and work that is not useful, as in that case the progression that starts earlier will evidently be longer than the currently analysed one.
It makes your code run in O(n³) time instead of the possible O(n²).
The following seems to return the result much faster in O(n²), using dynamic programming:
def longestprogression(data):
if len(data) < 3:
return data
maxlen = 0 # length of longest progression so far
endvalue = None # last value of longest progression
beststep = None # step of longest progression
# progressions ending in index i, keyed by their step size,
# with the progression length as value
dp = [{} for _ in range(len(data))]
# iterate all possible ending pairs of progressions
for j in range(1, len(arr)):
for i in range(j):
step = arr[j] - arr[i]
if step in dp[i]:
curlen = dp[i][step] + 1
else:
curlen = 2
dp[j][step] = curlen
if curlen > maxlen:
maxlen = curlen
endvalue = arr[j]
beststep = step
# rebuild the longest progression from the values we maintained
return list(reversed(range(endvalue, endvalue - maxlen * beststep, -beststep)))

Down to zero problem - getting time exceeded error

Trying to solve hackerrank problem.
You are given Q queries. Each query consists of a single number N. You can perform 2 operations on N in each move. If N=a×b(a≠1, b≠1), we can change N=max(a,b) or decrease the value of N by 1.
Determine the minimum number of moves required to reduce the value of N to 0.
I have used BFS approach to solve this.
a. Generating all prime numbers using seive
b. using prime numbers I can simply avoid calculating the factors
c. I enqueue -1 along with all the factors to get to zero.
d. I have also used previous results to not enqueue encountered data.
This still is giving me time exceeded. Any idea? Added comments also in the code.
import math
#find out all the prime numbers
primes = [1]*(1000000+1)
primes[0] = 0
primes[1] = 0
for i in range(2, 1000000+1):
if primes[i] == 1:
j = 2
while i*j < 1000000:
primes[i*j] = 0
j += 1
n = int(input())
for i in range(n):
memoize= [-1 for i in range(1000000)]
count = 0
n = int(input())
queue = []
queue.append((n, count))
while len(queue):
data, count = queue.pop(0)
if data <= 1:
count += 1
break
#if it is a prime number then just enqueue -1
if primes[data] == 1 and memoize[data-1] == -1:
queue.append((data-1, count+1))
memoize[data-1] = 1
continue
#enqueue -1 along with all the factors
queue.append((data-1, count+1))
sqr = int(math.sqrt(data))
for i in range(sqr, 1, -1):
if data%i == 0:
div = max(int(data/i), i)
if memoize[div] == -1:
memoize[div] = 1
queue.append((div, count+1))
print(count)
There are two large causes of slowness with this code.
Clearing an array is slower than clearing a set
The first problem is this line:
memoize= [-1 for i in range(1000000)]
this prepares 1 million integers and is executed for each of your 1000 test cases. A faster approach is to simply use a Python set to indicate which values have already been visited.
Unnecessary loop being executed
The second problem is this line:
if primes[data] == 1 and memoize[data-1] == -1:
If you have a prime number, and you have already visited this number, you actually do the slow loop searching for prime factors which will never find any solutions (because it is a prime).
Faster code
In fact, the improvement due to using sets is so much that you don't even need your prime testing code and the following code passes all tests within the time limit:
import math
n = int(input())
for i in range(n):
memoize = set()
count = 0
n = int(input())
queue = []
queue.append((n, count))
while len(queue):
data, count = queue.pop(0)
if data <= 1:
if data==1:
count += 1
break
if data-1 not in memoize:
memoize.add(data-1)
queue.append((data-1, count+1))
sqr = int(math.sqrt(data))
for i in range(sqr, 1, -1):
if data%i == 0:
div = max(int(data/i), i)
if div not in memoize:
memoize.add(div)
queue.append((div, count+1))
print(count)
Alternatively, there's a O(n*sqrt(n)) time and O(n) space complexity solution that passes all the test cases just fine.
The idea is to cache minimum counts for each non-negative integer number up to 1,000,000 (the maximum possible input number in the question) !!!BEFORE!!! running any query. After doing so, for each query just return a minimum count for a given number stored in the cache. So, retrieving a result will have O(1) time complexity per query.
To find minimal counts for each number (let's call it down2ZeroCounts), we should consider several cases:
0 and 1 have 0 and 1 minimal counts correspondingly.
Prime number p doesn't have factors other than 1 and itself. Hence, its minimal count is 1 plus a minimal count of p - 1 or more formally down2ZeroCounts[p] = down2ZeroCounts[p - 1] + 1.
For a composite number num it's a bit more complicated. For any pair of factors a > 1,b > 1 such that num = a*b the minimal count of num is either down2ZeroCounts[a] + 1 or down2ZeroCounts[b] + 1 or down2ZeroCounts[num - 1] + 1.
So, we can gradually build minimal counts for each number in ascending order. Calculating a minimal count of each consequent number will be based on optimal counts for lower numbers and so in the end a list of optimal counts will be built.
To better understand the approach please check the code:
from __future__ import print_function
import os
import sys
maxNumber = 1000000
down2ZeroCounts = [None] * 1000001
def cacheDown2ZeroCounts():
down2ZeroCounts[0] = 0
down2ZeroCounts[1] = 1
currentNum = 2
while currentNum <= maxNumber:
if down2ZeroCounts[currentNum] is None:
down2ZeroCounts[currentNum] = down2ZeroCounts[currentNum - 1] + 1
else:
down2ZeroCounts[currentNum] = min(down2ZeroCounts[currentNum - 1] + 1, down2ZeroCounts[currentNum])
for i in xrange(2, currentNum + 1):
product = i * currentNum
if product > maxNumber:
break
elif down2ZeroCounts[product] is not None:
down2ZeroCounts[product] = min(down2ZeroCounts[product], down2ZeroCounts[currentNum] + 1)
else:
down2ZeroCounts[product] = down2ZeroCounts[currentNum] + 1
currentNum += 1
def downToZero(n):
return down2ZeroCounts[n]
if __name__ == '__main__':
fptr = open(os.environ['OUTPUT_PATH'], 'w')
q = int(raw_input())
cacheDown2ZeroCounts()
for q_itr in xrange(q):
n = int(raw_input())
result = downToZero(n)
fptr.write(str(result) + '\n')
fptr.close()

Categories