I have a problem. In my code I need to do the following:
I have an array like this: [1, 5, 7, 9, 10, 11, 14, 15]
Now I need to determine in the most efficient way what the closest values are for a given x. For example:
array = [1, 5, 7, 9, 10, 11, 14, 15]
above, below = myFunction(array, 8)
Should return 7 and 9
If there is no number higher or lower than the given number, the value that couldn't be determined should be None. If the same number is given as in the array, for example: 7. The values 5 and 9 should be returned.
I know there is a min() and max() function, but I couldn't find anything in my case. The only thing close to what I want, was manual looping through the array, but I was wondering if there was a more efficient way, because this code will be executed arround 500.000 times, so speed is important!
Is there a efficient way to determine those values?
This answer based on #Mark Tolonen hint about bisect
Let's say that you need to append one number on every iteration but initial array is the same.
All elements in array should be unique. Initial array may be not sorted.
import bisect as b
init_array = [5, 9, 10, 11, 14, 15, 1, 7]
init_array.sort()
numbers_to_append = [22, 4, 12, 88, 99, 109, 999, 1000, 1001]
numbers_to_check = [12, 55, 23, 55, 0, 55, 10, 999]
for (_, n) in enumerate(numbers_to_check):
# get index of closer right to below
i = b.bisect_left(init_array, n)
# get above
if i >= (len(init_array)):
above = None
else:
if init_array[i] == n:
try:
above = init_array[i + 1]
except IndexError:
above = None
else:
above = init_array[i]
# get below
below = init_array[i - 1] if i - 1 >= 0 and init_array[i - 1] - n < 0 else None
print('---------')
print('array is', init_array)
print('given x is', n)
print('below is', below)
print('above is', above)
print('appending', numbers_to_append[_])
# check if number trying to append doesn't exists in array
# WARNING: this check may be slow and added only for showing that we need to add only unique numbers
# real check should be based on real conditions and numbers that we suppose to add
i_to_append = b.bisect_left(init_array, numbers_to_append[_])
if numbers_to_append[_] not in init_array[i_to_append:i_to_append+1]:
init_array.insert(i_to_append, numbers_to_append[_])
output:
---------
array is [1, 5, 7, 9, 10, 11, 14, 15]
given x is 12
below is 11
above is 14
appending 22
---------
array is [1, 5, 7, 9, 10, 11, 14, 15, 22]
given x is 55
below is 22
above is None
appending 4
---------
array is [1, 4, 5, 7, 9, 10, 11, 14, 15, 22]
given x is 23
below is 22
above is None
appending 12
---------
array is [1, 4, 5, 7, 9, 10, 11, 12, 14, 15, 22]
given x is 55
below is 22
above is None
appending 88
---------
array is [1, 4, 5, 7, 9, 10, 11, 12, 14, 15, 22, 88]
given x is 0
below is None
above is 1
appending 99
---------
array is [1, 4, 5, 7, 9, 10, 11, 12, 14, 15, 22, 88, 99]
given x is 55
below is 22
above is 88
appending 109
---------
array is [1, 4, 5, 7, 9, 10, 11, 12, 14, 15, 22, 88, 99, 109]
given x is 10
below is 9
above is 11
appending 999
---------
array is [1, 4, 5, 7, 9, 10, 11, 12, 14, 15, 22, 88, 99, 109, 999]
given x is 999
below is 109
above is None
appending 1000
Assuming that your array is not sorted, you will be indeed forced to scan the entire array.
But if it is sorted you can use a binary search approach that will run in O(log(n)), something like this:
def search_nearest_elems(array, elem):
"""
Uses dichotomic search approach.
Runs in O(log(n)), with n = len(array)
"""
i = len(array) // 2
left, right = 0, len(array) - 1
while right - left > 1:
if array[i] == elem:
return elem, elem
elif array[i] > elem:
right, i = i, (left + i) // 2
else:
left, i = i, (right + i) // 2
return array[left], array[right]
array = [1, 5, 7, 9, 10, 11, 14, 15]
assert search_nearest_elems(array, 8) == (7, 9)
assert search_nearest_elems(array, 9) == (9, 9)
assert search_nearest_elems(array, 14.5) == (14, 15)
assert search_nearest_elems(array, 2) == (1, 5)
import bisect
def find_nearest(arr,value):
arr.sort()
idx = bisect.bisect_left(arr, value)
if idx == 0:
return None, arr[idx]
elif idx == len(arr):
return arr[idx - 1], None
elif arr[idx] == value:
return arr[idx], arr[idx]
else:
return arr[idx - 1], arr[idx]
array = [1, 5, 7, 9, 10, 11, 14, 15]
print(find_nearest(array, 0))
print(find_nearest(array, 4))
print(find_nearest(array, 8))
print(find_nearest(array, 10))
print(find_nearest(array, 20))
Output:
(None, 1)
(1, 5)
(7, 9)
(10, 10)
(15, None)
Helper source
i have a list of many unsorted numbers, for example :
N=1000000
x = [random.randint(0,N) for i in range(N)]
I only want the top k minimum values, currently this is my approach
def f1(x,k): # O(nlogn)
return sorted(x)[:k]
This performs lots of redundant operations, as we are sorting the remaining N-k elements too. Enumerating doesn't work either:
def f2(x,k): # O(nlogn)
y = []
for idx,val in enumerate( sorted(x) ):
if idx == k: break
y.append(val)
return y
Verifying enumerating doesn't help:
if 1 : ## Time taken = 0.6364126205444336
st1 = time.time()
y = f1(x,3)
et1 = time.time()
print('Time taken = ', et1-st1)
if 1 : ## Time taken = 0.6330435276031494
st2 = time.time()
y = f2(x,3)
et2 = time.time()
print('Time taken = ', et2-st2)
Probably i need a generator that continually returns the next minimum of the list, and since getting the next minimum should be O(1) operation, the function f3() should be just O(k) right ?
What GENERATOR function will work best in this case?
def f3(x,k): # O(k)
y = []
for idx,val in enumerate( GENERATOR ):
if idx == k: break
y.append(val)
return y
EDIT 1 :
The analysis shown here are wrong, please ignore and jump to Edit 3
Lowest bound possible : In terms of time complexity i think this is the lower bound achievable, but as it will will augment the original list, it is
n't the solution for my problem.
def f3(x,k): # O(k) Time
y = []
idx=0
while idx<k:
curr_min = min(x)
x.remove(curr_min) # This removes from the original list
y.append(curr_min)
idx += 1
return y
if 1 : ## Time taken = 0.07096505165100098
st3 = time.time()
y = f3(x,3)
et3 = time.time()
print('Time taken = ', et3-st3)
O(N) Time | O(N) Storage : Best solution so far, however it requires a copy of the original list, hence resulting in O(N) time and storage, having an iterator that gets the next minimum, for k times, will be O(1) storage and O(k) time.
def f3(x,k): # O(N) Time | O(N) Storage
y = []
idx=0
while idx<k:
curr_min = min(x)
x.remove(curr_min)
y.append(curr_min)
idx += 1
return y
if 1 : ## Time taken = 0.0814204216003418
st3 = time.time()
y = f3(x,3)
et3 = time.time()
print('Time taken = ', et3-st3)
EDIT 2 :
Thanks for pointing out my above mistakes, getting minimum of a list should be O(n), not O(1).
EDIT 3 :
Here's a full script of analysis after using the recommended solution. Now this raised more questions
1) Constructing x as a heap using heapq.heappush is slower than using list.append x to a list, then to heapq.heapify it ?
2) heapq.nsmallest slows down if x is already a heap?
3) Current conclusion : don't heapq.heapify the current list, then use heapq.nsmallest.
import time, random, heapq
import numpy as np
class Timer:
def __init__(self, description):
self.description = description
def __enter__(self):
self.start = time.perf_counter()
return self
def __exit__(self, *args):
end = time.perf_counter()
print(f"The time for '{self.description}' took: {end - self.start}.")
def f3(x,k):
y = []
idx=0
while idx<k:
curr_min = min(x)
x.remove(curr_min)
y.append(curr_min)
idx += 1
return y
def f_sort(x, k):
y = []
for idx,val in enumerate( sorted(x) ):
if idx == k: break
y.append(val)
return y
def f_heapify_pop(x, k):
heapq.heapify(x)
return [heapq.heappop(x) for _ in range(k)]
def f_heap_pop(x, k):
return [heapq.heappop(x) for _ in range(k)]
def f_heap_nsmallest(x, k):
return heapq.nsmallest(k, x)
def f_np_partition(x, k):
return np.partition(x, k)[:k]
if True : ## Constructing list vs heap
N=1000000
# N= 500000
x_main = [random.randint(0,N) for i in range(N)]
with Timer('constructing list') as t:
x=[]
for curr_val in x_main:
x.append(curr_val)
with Timer('constructing heap') as t:
x_heap=[]
for curr_val in x_main:
heapq.heappush(x_heap, curr_val)
with Timer('heapify x from a list') as t:
x_heapify=[]
for curr_val in x_main:
x_heapify.append(curr_val)
heapq.heapify(x_heapify)
with Timer('x list to numpy') as t:
x_np = np.array(x)
"""
N=1000000
The time for 'constructing list' took: 0.2717265225946903.
The time for 'constructing heap' took: 0.45691753178834915.
The time for 'heapify x from a list' took: 0.4259336367249489.
The time for 'x list to numpy' took: 0.14815033599734306.
"""
if True : ## Performing experiments on list vs heap
TRIALS = 10
## Experiments on x as list :
with Timer('f3') as t:
for _ in range(TRIALS):
y = f3(x.copy(), 30)
print(y)
with Timer('f_sort') as t:
for _ in range(TRIALS):
y = f_sort(x.copy(), 30)
print(y)
with Timer('f_np_partition on x') as t:
for _ in range(TRIALS):
y = f_np_partition(x.copy(), 30)
print(y)
## Experiments on x as list, but converted to heap in place :
with Timer('f_heapify_pop on x') as t:
for _ in range(TRIALS):
y = f_heapify_pop(x.copy(), 30)
print(y)
with Timer('f_heap_nsmallest on x') as t:
for _ in range(TRIALS):
y = f_heap_nsmallest(x.copy(), 30)
print(y)
## Experiments on x_heap as heap :
with Timer('f_heap_pop on x_heap') as t:
for _ in range(TRIALS):
y = f_heap_pop(x_heap.copy(), 30)
print(y)
with Timer('f_heap_nsmallest on x_heap') as t:
for _ in range(TRIALS):
y = f_heap_nsmallest(x_heap.copy(), 30)
print(y)
## Experiments on x_np as numpy array :
with Timer('f_np_partition on x_np') as t:
for _ in range(TRIALS):
y = f_np_partition(x_np.copy(), 30)
print(y)
#
"""
Experiments on x as list :
[0, 1, 1, 4, 5, 5, 5, 6, 6, 7, 7, 7, 10, 10, 11, 11, 12, 12, 12, 13, 13, 14, 18, 18, 19, 19, 21, 22, 24, 25]
The time for 'f3' took: 10.180440502241254.
[0, 1, 1, 4, 5, 5, 5, 6, 6, 7, 7, 7, 10, 10, 11, 11, 12, 12, 12, 13, 13, 14, 18, 18, 19, 19, 21, 22, 24, 25]
The time for 'f_sort' took: 9.054768254980445.
[ 1 5 5 1 0 4 5 6 7 6 7 7 12 12 11 13 11 12 13 18 10 14 10 18 19 19 21 22 24 25]
The time for 'f_np_partition on x' took: 1.2620676811784506.
Experiments on x as list, but converted to heap in place :
[0, 1, 1, 4, 5, 5, 5, 6, 6, 7, 7, 7, 10, 10, 11, 11, 12, 12, 12, 13, 13, 14, 18, 18, 19, 19, 21, 22, 24, 25]
The time for 'f_heapify_pop on x' took: 0.8628390356898308.
[0, 1, 1, 4, 5, 5, 5, 6, 6, 7, 7, 7, 10, 10, 11, 11, 12, 12, 12, 13, 13, 14, 18, 18, 19, 19, 21, 22, 24, 25]
The time for 'f_heap_nsmallest on x' took: 0.5187360178679228.
Experiments on x_heap as heap :
[0, 1, 1, 4, 5, 5, 5, 6, 6, 7, 7, 7, 10, 10, 11, 11, 12, 12, 12, 13, 13, 14, 18, 18, 19, 19, 21, 22, 24, 25]
The time for 'f_heap_pop on x_heap' took: 0.2054140530526638.
[0, 1, 1, 4, 5, 5, 5, 6, 6, 7, 7, 7, 10, 10, 11, 11, 12, 12, 12, 13, 13, 14, 18, 18, 19, 19, 21, 22, 24, 25]
The time for 'f_heap_nsmallest on x_heap' took: 0.6638103127479553.
[ 1 5 5 1 0 4 5 6 7 6 7 7 12 12 11 13 11 12 13 18 10 14 10 18 19 19 21 22 24 25]
The time for 'f_np_partition on x_np' took: 0.2107151597738266.
"""
This is a classic problem for which the generally accepted solution is a data structure known as a heap. Below I have done 10 trials for each algorithm f3 and f_heap. As the value for the second argument, k, gets larger the discrepancy between the two performances become even greater. For k = 3, we have algorithm f3 taking .76 seconds and algorithm f_heap taking .54 seconds. But with k = 30 these values become respectively 6.33 seconds and .54 seconds.
import time, random, heapq
class Timer:
def __init__(self, description):
self.description = description
def __enter__(self):
self.start = time.perf_counter()
return self
def __exit__(self, *args):
end = time.perf_counter()
print(f"The time for {self.description} took: {end - self.start}.")
def f3(x,k): # O(N) Time | O(N) Storage
y = []
idx=0
while idx<k:
curr_min = min(x)
x.remove(curr_min)
y.append(curr_min)
idx += 1
return y
def f_heap(x, k): # O(nlogn)
# if you do not need to retain a heap and just need the k smallest, then:
#return heapq.nsmallest(k, x)
heapq.heapify(x)
return [heapq.heappop(x) for _ in range(k)]
N=1000000
x = [random.randint(0,N) for i in range(N)]
TRIALS = 10
with Timer('f3') as t:
for _ in range(TRIALS):
y = f3(x.copy(), 30)
print(y)
print()
with Timer('f_heap') as t:
for _ in range(TRIALS):
y = f_heap(x.copy(), 30)
print(y)
Prints:
The time for f3 took: 6.3301973.
[0, 1, 1, 7, 9, 11, 11, 13, 13, 14, 17, 18, 18, 18, 19, 20, 20, 21, 23, 24, 25, 25, 26, 27, 28, 28, 29, 30, 30, 31]
The time for f_heap took: 0.5372357999999995.
[0, 1, 1, 7, 9, 11, 11, 13, 13, 14, 17, 18, 18, 18, 19, 20, 20, 21, 23, 24, 25, 25, 26, 27, 28, 28, 29, 30, 30, 31]
A Python Demo
Update
Selecting the k smallest using numpy.partition as suggested by #user2357112supportsMonica is indeed very fast if you are already dealing with a numpy array. But if you are starting with an ordinary list and factor in the time to convert to an numpy array just to use the numpy.partition method, then it is slower than using hepaq methods:
def f_np_partition(x, k):
return sorted(np.partition(x, k)[:k])
with Timer('f_np_partition') as t:
for _ in range(TRIALS):
x_np = np.array(x)
y = f_np_partition(x_np.copy(), 30) # don't really need to copy
print(y)
The relative timings:
The time for f3 took: 7.2039111.
[0, 2, 2, 3, 3, 3, 5, 6, 6, 6, 9, 9, 10, 10, 10, 11, 11, 12, 13, 13, 14, 16, 16, 16, 16, 17, 17, 18, 19, 20]
The time for f_heap took: 0.35521280000000033.
[0, 2, 2, 3, 3, 3, 5, 6, 6, 6, 9, 9, 10, 10, 10, 11, 11, 12, 13, 13, 14, 16, 16, 16, 16, 17, 17, 18, 19, 20]
The time for f_np_partition took: 0.8379164999999995.
[0, 2, 2, 3, 3, 3, 5, 6, 6, 6, 9, 9, 10, 10, 10, 11, 11, 12, 13, 13, 14, 16, 16, 16, 16, 17, 17, 18, 19, 20]
I have a list of hours starting from (0 is midnight).
hour = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
I want to generate a sequence of 3 consecutive hours randomly. Example:
[3,6]
or
[15, 18]
or
[23,2]
and so on. random.sample does not achieve what I want!
import random
hourSequence = sorted(random.sample(range(1,24), 2))
Any suggestions?
Doesn't exactly sure what you want, but probably
import random
s = random.randint(0, 23)
r = [s, (s+3)%24]
r
Out[14]: [16, 19]
Note: None of the other answers take in to consideration the possible sequence [23,0,1]
Please notice the following using itertools from python lib:
from itertools import islice, cycle
from random import choice
hours = list(range(24)) # List w/ 24h
hours_cycle = cycle(hours) # Transform the list in to a cycle
select_init = islice(hours_cycle, choice(hours), None) # Select a iterator on a random position
# Get the next 3 values for the iterator
select_range = []
for i in range(3):
select_range.append(next(select_init))
print(select_range)
This will print sequences of three values on your hours list in a circular way, which will also include on your results for example the [23,0,1].
You can try this:
import random
hour = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
index = random.randint(0,len(hour)-2)
l = [hour[index],hour[index+3]]
print(l)
You can get a random number from the array you already created hour and take the element that is 3 places afterward:
import random
def random_sequence_endpoints(l, span):
i = random.choice(range(len(l)))
return [hour[i], hour[(i+span) % len(l)]]
hour = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
result = random_sequence_endpoints(hour, 3)
This will work not only for the above hours list example but for any other list contain any other elements.