I have one list representing point in time of a change, and another one of values:
indexes_list = [5, 6, 8, 9, 12, 15]
# [ 5 6 8 9 12 15]
values_list = [i * 10 for i in range(6)]
# [ 0 10 20 30 40 50]
I want to create the "full" list, which in the above example is:
expanded_values = [0, 0, 0, 0, 0, 0, 10, 20, 20, 30, 40, 40, 40, 50, 50, 50]
# [ 0 0 0 0 0 0 10 20 20 30 40 40 40 50 50 50]
I wrote something, but it feels wrong and I guess there is a better, more pythonic way of doing that:
result = []
for i in range(len(values_list)):
if i == 0:
tmp = [values_list[i]] * (indexes_list[i] + 1)
else:
tmp = [values_list[i]] * (indexes_list[i] - indexes_list[i - 1])
result += tmp
# result = [0, 0, 0, 0, 0, 0, 10, 20, 20, 30, 40, 40, 40, 50, 50, 50]
Use:
indexes_array = [5, 6, 8, 9, 12, 15]
values_array = [i * 10 for i, _ in enumerate(range(6))]
diffs = indexes_array[:1] + [j - i for i, j in zip(indexes_array, indexes_array[1:])]
res = [v for i, v in zip(diffs, values_array) for _ in range(i)]
print(res)
Output
[0, 0, 0, 0, 0, 10, 20, 20, 30, 40, 40, 40, 50, 50, 50]
As an alternative, you could use the pairwise recipe with a twist:
from itertools import tee
def pairwise(iterable, prepend):
a, b = tee(iterable)
yield prepend, next(b, None)
yield from zip(a, b)
indices = [5, 6, 8, 9, 12, 15]
values = [i * 10 for i, _ in enumerate(range(6))]
differences = [second - first for first, second in pairwise(indices, prepend=0)]
res = [v for i, v in zip(differences, values) for _ in range(i)]
print(res)
Output
[0, 0, 0, 0, 0, 10, 20, 20, 30, 40, 40, 40, 50, 50, 50]
Finally if you are doing numerical work I advise that you use numpy, as below:
import numpy as np
indices = [5, 6, 8, 9, 12, 15]
values = [i * 10 for i, _ in enumerate(range(6))]
differences = np.diff(indices, prepend=0)
res = np.repeat(values, differences).tolist()
print(res)
I would argue that it is pythonic to use the appropriate library, which in this case is pandas:
import pandas as pd
indexes_array = [5, 6, 8, 9, 12, 15]
values_array = [i * 10 for i in range(6)]
series = pd.Series(values_array, indexes_array).reindex(
range(indexes_array[-1] + 1), method='backfill')
series
0 0
1 0
2 0
3 0
4 0
5 0
6 10
7 20
8 20
9 30
10 40
11 40
12 40
13 50
14 50
15 50
dtype: int64
See the reindex documentation for details.
Try this:
indexes_array = [5, 6, 8, 9, 12, 15]
# [ 5 6 8 9 12 15]
values_array = [i * 10 for i, _ in enumerate(range(6))]
# [ 0 10 20 30 40 50]
result = []
last_ind = 0
zipped = zip(indexes_array, values_array)
for ind, val in zipped:
count = ind - last_ind
last_ind = ind
for i in range(count):
result.append(val)
print(result)
Output:
[0, 0, 0, 0, 0, 10, 20, 20, 30, 40, 40, 40, 50, 50, 50]
Try this:
indexes_array = [5, 6, 8, 9, 12, 15]
values_array = [i * 10 for i, _ in enumerate(range(6))]
output=[]
for x in range(len(indexes_array)):
if x ==0:
output.extend([values_array[x]]*indexes_array[x])
else:
output.extend([values_array[x]]*(indexes_array[x]-indexes_array[x-1]))
print(output)
The output is :
[0, 0, 0, 0, 0, 10, 20, 20, 30, 40, 40, 40, 50, 50, 50]
Related
I am trying to change the value of x depending on the length of my list
How can x count up by 5 through each iteration and then back to 0 after the 10th round?
list = {"one", "two".....fifty} #example shortened
listLen = len(list) # 50
for i in range (0, listLen) # 0 - 49
x = ??? # +5 max 45
ops.update(location=x)
Desired outcome:
0. x = 0
1. x = 5
2. x = 10
...
9. x = 45
10. x = 0
11. x = 5
12. x = 10
...
19. x = 45
...
(0,5,10,15,20,25,30,25,40,45,0,5,10,15,20,25,30,25,40,45
0,5,10,15,20,25,30,25,40,45,0,5,10,15,20,25,30,25,40,45
0,5,10,15,20,25,30,25,40,45)
Try this
outcome = []
lst = {1,2,3,...,50}
lstLen = range(0, len(lst), 5)
for a in range(len(lst)):
outcome.append(lstLen[a%10])
print(outcome)
Output
[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 0, 5, 10, 15, 20, 25, 30, 35, 40, 45,
0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 0, 5, 10, 15, 20, 25, 30, 35, 40, 45,
0, 5, 10, 15, 20, 25, 30, 35, 40, 45]
Try this:
ans = []
for i in range (0, listLen): # 0 - 49
x += 5
ans.append(x)
if i % 9 == 0:
x = 0
I have a problem. In my code I need to do the following:
I have an array like this: [1, 5, 7, 9, 10, 11, 14, 15]
Now I need to determine in the most efficient way what the closest values are for a given x. For example:
array = [1, 5, 7, 9, 10, 11, 14, 15]
above, below = myFunction(array, 8)
Should return 7 and 9
If there is no number higher or lower than the given number, the value that couldn't be determined should be None. If the same number is given as in the array, for example: 7. The values 5 and 9 should be returned.
I know there is a min() and max() function, but I couldn't find anything in my case. The only thing close to what I want, was manual looping through the array, but I was wondering if there was a more efficient way, because this code will be executed arround 500.000 times, so speed is important!
Is there a efficient way to determine those values?
This answer based on #Mark Tolonen hint about bisect
Let's say that you need to append one number on every iteration but initial array is the same.
All elements in array should be unique. Initial array may be not sorted.
import bisect as b
init_array = [5, 9, 10, 11, 14, 15, 1, 7]
init_array.sort()
numbers_to_append = [22, 4, 12, 88, 99, 109, 999, 1000, 1001]
numbers_to_check = [12, 55, 23, 55, 0, 55, 10, 999]
for (_, n) in enumerate(numbers_to_check):
# get index of closer right to below
i = b.bisect_left(init_array, n)
# get above
if i >= (len(init_array)):
above = None
else:
if init_array[i] == n:
try:
above = init_array[i + 1]
except IndexError:
above = None
else:
above = init_array[i]
# get below
below = init_array[i - 1] if i - 1 >= 0 and init_array[i - 1] - n < 0 else None
print('---------')
print('array is', init_array)
print('given x is', n)
print('below is', below)
print('above is', above)
print('appending', numbers_to_append[_])
# check if number trying to append doesn't exists in array
# WARNING: this check may be slow and added only for showing that we need to add only unique numbers
# real check should be based on real conditions and numbers that we suppose to add
i_to_append = b.bisect_left(init_array, numbers_to_append[_])
if numbers_to_append[_] not in init_array[i_to_append:i_to_append+1]:
init_array.insert(i_to_append, numbers_to_append[_])
output:
---------
array is [1, 5, 7, 9, 10, 11, 14, 15]
given x is 12
below is 11
above is 14
appending 22
---------
array is [1, 5, 7, 9, 10, 11, 14, 15, 22]
given x is 55
below is 22
above is None
appending 4
---------
array is [1, 4, 5, 7, 9, 10, 11, 14, 15, 22]
given x is 23
below is 22
above is None
appending 12
---------
array is [1, 4, 5, 7, 9, 10, 11, 12, 14, 15, 22]
given x is 55
below is 22
above is None
appending 88
---------
array is [1, 4, 5, 7, 9, 10, 11, 12, 14, 15, 22, 88]
given x is 0
below is None
above is 1
appending 99
---------
array is [1, 4, 5, 7, 9, 10, 11, 12, 14, 15, 22, 88, 99]
given x is 55
below is 22
above is 88
appending 109
---------
array is [1, 4, 5, 7, 9, 10, 11, 12, 14, 15, 22, 88, 99, 109]
given x is 10
below is 9
above is 11
appending 999
---------
array is [1, 4, 5, 7, 9, 10, 11, 12, 14, 15, 22, 88, 99, 109, 999]
given x is 999
below is 109
above is None
appending 1000
Assuming that your array is not sorted, you will be indeed forced to scan the entire array.
But if it is sorted you can use a binary search approach that will run in O(log(n)), something like this:
def search_nearest_elems(array, elem):
"""
Uses dichotomic search approach.
Runs in O(log(n)), with n = len(array)
"""
i = len(array) // 2
left, right = 0, len(array) - 1
while right - left > 1:
if array[i] == elem:
return elem, elem
elif array[i] > elem:
right, i = i, (left + i) // 2
else:
left, i = i, (right + i) // 2
return array[left], array[right]
array = [1, 5, 7, 9, 10, 11, 14, 15]
assert search_nearest_elems(array, 8) == (7, 9)
assert search_nearest_elems(array, 9) == (9, 9)
assert search_nearest_elems(array, 14.5) == (14, 15)
assert search_nearest_elems(array, 2) == (1, 5)
import bisect
def find_nearest(arr,value):
arr.sort()
idx = bisect.bisect_left(arr, value)
if idx == 0:
return None, arr[idx]
elif idx == len(arr):
return arr[idx - 1], None
elif arr[idx] == value:
return arr[idx], arr[idx]
else:
return arr[idx - 1], arr[idx]
array = [1, 5, 7, 9, 10, 11, 14, 15]
print(find_nearest(array, 0))
print(find_nearest(array, 4))
print(find_nearest(array, 8))
print(find_nearest(array, 10))
print(find_nearest(array, 20))
Output:
(None, 1)
(1, 5)
(7, 9)
(10, 10)
(15, None)
Helper source
I have an array [ 0 10 15 20 10 0 35 25 15 35 0 30 20 25 30 0] and I need to insert each element of another array ' [5,7,8,15] ' at locations with an increment of 5 such that the final array looks [ 0 10 15 20 5 10 0 35 25 7 15 35 0 30 8 20 25 30 0 15] length is 20
I am trying with this code
arr_fla = [ 0 10 15 20 10 0 35 25 15 35 0 30 20 25 30 0]
arr_split = [5,7,8,15]
node = 5
node_len = node * (node-1)
for w in range(node, node_len, 5):
for v in arr_split:
arr_fla = np.insert(arr_fla,w,v)
print(arr_fla)
The result I am getting is
'[ 0 10 15 20 10 15 8 7 5 0 15 8 7 5 35 15 8 7 5 25 15 35 0 30
20 25 30 0]' length 28
Can someone please tell me where I am going wrong.
If the sizes line up as cleanly as in your example you can use reshape ...
np.reshape(arr_fla,(len(arr_split),-1))
# array([[ 0, 10, 15, 20],
# [10, 0, 35, 25],
# [15, 35, 0, 30],
# [20, 25, 30, 0]])
... append arr_split as a new column ...
np.c_[np.reshape(arr_fla,(len(arr_split),-1)),arr_split]
# array([[ 0, 10, 15, 20, 5],
# [10, 0, 35, 25, 7],
# [15, 35, 0, 30, 8],
# [20, 25, 30, 0, 15]])
... and flatten again ...
np.c_[np.reshape(arr_fla,(len(arr_split),-1)),arr_split].ravel()
# array([ 0, 10, 15, 20, 5, 10, 0, 35, 25, 7, 15, 35, 0, 30, 8, 20, 25,
# 30, 0, 15])
I have corrected it:
arr_fla = [0,10,15,20,10,0,35,25,15,35,0,30,20,25,30,0]
arr_split = [5,7,8,15]
node = 5
for w in range(len(arr_split)):
arr_fla = np.insert(arr_fla, (w+1)*node-1, arr_split[w])
print(arr_fla)
'''
Output:
[ 0 10 15 20 5 10 0 35 25 7 15 35 0 30 8 20 25 30 0 15]
'''
In your code:
for v in arr_split:
This gets all the elements at once (in total w times), but you need just one element at a time. Thus you do not need an extra for loop.
You want to have a counter that keeps going up every time you insert the item from your second array arr_split.
Try this code. My assumption is that your last element can be inserted directly as the original array has only 16 elements.
arr_fla = [0,10,15,20,10,0,35,25,15,35,0,30,20,25,30,0]
arr_split = [5,7,8,15]
j = 0 #use this as a counter to insert from arr_split
#start iterating from 4th position as you want to insert in the 5th position
for i in range(4,len(arr_fla),5):
arr_fla.insert(i,arr_split[j]) #insert at the 5th position every time
#every time you insert an element, the array size increase
j +=1 #increase the counter by 1 so you can insert the next element
arr_fla.append(arr_split[j]) #add the final element to the original array
print(arr_fla)
Output:
[0, 10, 15, 20, 5, 10, 0, 35, 25, 7, 15, 35, 0, 30, 8, 20, 25, 30, 0, 15]
You could split the list in even chunks, append to each the split values to each chunk, and reassemble the whole (credit to Ned Batchelder for the chunk function ):
arr_fla = [0,10,15,20,10,0,35,25,15,35,0,30,20,25,30,0]
arr_split = [5,7,8,15]
node = 5
def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), n):
yield lst[i:i + n]
tmp_arr = chunks(arr_fla, node)
arr_out = []
for index, chunk in enumerate(tmp_arr):
if arr_split[index]: # make sure arr_split is not exhausted
chunk.append(arr_split[index]) # we use the index of the chunks list to access the split number to insert
arr_out += chunk
print(arr_out)
Outputs:
[0, 10, 15, 20, 10, 5, 0, 35, 25, 15, 35, 7, 0, 30, 20, 25, 30, 8, 0, 15]
you can change to below and have a try.
import numpy as np
arr_fla = [0, 10, 15, 20, 10, 0, 35, 25, 15, 35, 0, 30, 20, 25, 30, 0]
arr_split = [5, 7, 8, 15]
index = 4
for ele in arr_split:
arr_fla = np.insert(arr_fla, index, ele)
index += 5
print(arr_fla)
the result is
[ 0 10 15 20 5 10 0 35 25 7 15 35 0 30 8 20 25 30 0 15]
about the wrong part of yours, I think it's have two questions:
the second loop is no need, it will cause np insert all the element of arr_split at the same position
the position is not start at 5, it should be 4
I have a matrix (3x5) where a number is randomly selected in this matrix. I want to swap the selected number with the one down-right. I'm able to locate the index of the randomly selected number but not sure how to replace it with the one that is down then right. For example, given the matrix:
[[169 107 229 317 236]
[202 124 114 280 106]
[306 135 396 218 373]]
and the selected number is 280 (which is in position [1,3]), needs to be swapped with 373 on [2,4]. I'm having issues on how to move around with the index. I can hard-code it but it becomes a little more complex when the number to swap is randomly selected.
If the selected number is on [0,0], then hard-coded would look like:
selected_task = tard_generator1[0,0]
right_swap = tard_generator1[1,1]
tard_generator1[1,1] = selected_task
tard_generator1[0,0] = right_swap
Any suggestions are welcome!
How about something like
chosen = (1, 2)
right_down = chosen[0] + 1, chosen[1] + 1
matrix[chosen], matrix[right_down] = matrix[right_down], matrix[chosen]
will output:
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
>>> index = (1, 2)
>>> right_down = index[0] + 1, index[1] + 1
>>> a[index], a[right_down] = a[right_down], a[index]
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 13, 8, 9],
[10, 11, 12, 7, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
There should be a boundary check but its omitted
Try this:
import numpy as np
def swap_rdi(mat, index):
row, col = index
rows, cols = mat.shape
assert(row + 1 != rows and col + 1 != cols)
mat[row, col], mat[row+1, col+1] = mat[row+1, col+1], mat[row, col]
return
Example:
mat = np.matrix([[1,2,3], [4,5,6]])
print('Before:\n{}'.format(mat))
print('After:\n{}'.format(swap_rdi(mat, (0,1))))
Outputs:
Before:
[[1 2 3]
[4 5 6]]
After:
[[1 6 3]
[4 5 2]]
i have a list of many unsorted numbers, for example :
N=1000000
x = [random.randint(0,N) for i in range(N)]
I only want the top k minimum values, currently this is my approach
def f1(x,k): # O(nlogn)
return sorted(x)[:k]
This performs lots of redundant operations, as we are sorting the remaining N-k elements too. Enumerating doesn't work either:
def f2(x,k): # O(nlogn)
y = []
for idx,val in enumerate( sorted(x) ):
if idx == k: break
y.append(val)
return y
Verifying enumerating doesn't help:
if 1 : ## Time taken = 0.6364126205444336
st1 = time.time()
y = f1(x,3)
et1 = time.time()
print('Time taken = ', et1-st1)
if 1 : ## Time taken = 0.6330435276031494
st2 = time.time()
y = f2(x,3)
et2 = time.time()
print('Time taken = ', et2-st2)
Probably i need a generator that continually returns the next minimum of the list, and since getting the next minimum should be O(1) operation, the function f3() should be just O(k) right ?
What GENERATOR function will work best in this case?
def f3(x,k): # O(k)
y = []
for idx,val in enumerate( GENERATOR ):
if idx == k: break
y.append(val)
return y
EDIT 1 :
The analysis shown here are wrong, please ignore and jump to Edit 3
Lowest bound possible : In terms of time complexity i think this is the lower bound achievable, but as it will will augment the original list, it is
n't the solution for my problem.
def f3(x,k): # O(k) Time
y = []
idx=0
while idx<k:
curr_min = min(x)
x.remove(curr_min) # This removes from the original list
y.append(curr_min)
idx += 1
return y
if 1 : ## Time taken = 0.07096505165100098
st3 = time.time()
y = f3(x,3)
et3 = time.time()
print('Time taken = ', et3-st3)
O(N) Time | O(N) Storage : Best solution so far, however it requires a copy of the original list, hence resulting in O(N) time and storage, having an iterator that gets the next minimum, for k times, will be O(1) storage and O(k) time.
def f3(x,k): # O(N) Time | O(N) Storage
y = []
idx=0
while idx<k:
curr_min = min(x)
x.remove(curr_min)
y.append(curr_min)
idx += 1
return y
if 1 : ## Time taken = 0.0814204216003418
st3 = time.time()
y = f3(x,3)
et3 = time.time()
print('Time taken = ', et3-st3)
EDIT 2 :
Thanks for pointing out my above mistakes, getting minimum of a list should be O(n), not O(1).
EDIT 3 :
Here's a full script of analysis after using the recommended solution. Now this raised more questions
1) Constructing x as a heap using heapq.heappush is slower than using list.append x to a list, then to heapq.heapify it ?
2) heapq.nsmallest slows down if x is already a heap?
3) Current conclusion : don't heapq.heapify the current list, then use heapq.nsmallest.
import time, random, heapq
import numpy as np
class Timer:
def __init__(self, description):
self.description = description
def __enter__(self):
self.start = time.perf_counter()
return self
def __exit__(self, *args):
end = time.perf_counter()
print(f"The time for '{self.description}' took: {end - self.start}.")
def f3(x,k):
y = []
idx=0
while idx<k:
curr_min = min(x)
x.remove(curr_min)
y.append(curr_min)
idx += 1
return y
def f_sort(x, k):
y = []
for idx,val in enumerate( sorted(x) ):
if idx == k: break
y.append(val)
return y
def f_heapify_pop(x, k):
heapq.heapify(x)
return [heapq.heappop(x) for _ in range(k)]
def f_heap_pop(x, k):
return [heapq.heappop(x) for _ in range(k)]
def f_heap_nsmallest(x, k):
return heapq.nsmallest(k, x)
def f_np_partition(x, k):
return np.partition(x, k)[:k]
if True : ## Constructing list vs heap
N=1000000
# N= 500000
x_main = [random.randint(0,N) for i in range(N)]
with Timer('constructing list') as t:
x=[]
for curr_val in x_main:
x.append(curr_val)
with Timer('constructing heap') as t:
x_heap=[]
for curr_val in x_main:
heapq.heappush(x_heap, curr_val)
with Timer('heapify x from a list') as t:
x_heapify=[]
for curr_val in x_main:
x_heapify.append(curr_val)
heapq.heapify(x_heapify)
with Timer('x list to numpy') as t:
x_np = np.array(x)
"""
N=1000000
The time for 'constructing list' took: 0.2717265225946903.
The time for 'constructing heap' took: 0.45691753178834915.
The time for 'heapify x from a list' took: 0.4259336367249489.
The time for 'x list to numpy' took: 0.14815033599734306.
"""
if True : ## Performing experiments on list vs heap
TRIALS = 10
## Experiments on x as list :
with Timer('f3') as t:
for _ in range(TRIALS):
y = f3(x.copy(), 30)
print(y)
with Timer('f_sort') as t:
for _ in range(TRIALS):
y = f_sort(x.copy(), 30)
print(y)
with Timer('f_np_partition on x') as t:
for _ in range(TRIALS):
y = f_np_partition(x.copy(), 30)
print(y)
## Experiments on x as list, but converted to heap in place :
with Timer('f_heapify_pop on x') as t:
for _ in range(TRIALS):
y = f_heapify_pop(x.copy(), 30)
print(y)
with Timer('f_heap_nsmallest on x') as t:
for _ in range(TRIALS):
y = f_heap_nsmallest(x.copy(), 30)
print(y)
## Experiments on x_heap as heap :
with Timer('f_heap_pop on x_heap') as t:
for _ in range(TRIALS):
y = f_heap_pop(x_heap.copy(), 30)
print(y)
with Timer('f_heap_nsmallest on x_heap') as t:
for _ in range(TRIALS):
y = f_heap_nsmallest(x_heap.copy(), 30)
print(y)
## Experiments on x_np as numpy array :
with Timer('f_np_partition on x_np') as t:
for _ in range(TRIALS):
y = f_np_partition(x_np.copy(), 30)
print(y)
#
"""
Experiments on x as list :
[0, 1, 1, 4, 5, 5, 5, 6, 6, 7, 7, 7, 10, 10, 11, 11, 12, 12, 12, 13, 13, 14, 18, 18, 19, 19, 21, 22, 24, 25]
The time for 'f3' took: 10.180440502241254.
[0, 1, 1, 4, 5, 5, 5, 6, 6, 7, 7, 7, 10, 10, 11, 11, 12, 12, 12, 13, 13, 14, 18, 18, 19, 19, 21, 22, 24, 25]
The time for 'f_sort' took: 9.054768254980445.
[ 1 5 5 1 0 4 5 6 7 6 7 7 12 12 11 13 11 12 13 18 10 14 10 18 19 19 21 22 24 25]
The time for 'f_np_partition on x' took: 1.2620676811784506.
Experiments on x as list, but converted to heap in place :
[0, 1, 1, 4, 5, 5, 5, 6, 6, 7, 7, 7, 10, 10, 11, 11, 12, 12, 12, 13, 13, 14, 18, 18, 19, 19, 21, 22, 24, 25]
The time for 'f_heapify_pop on x' took: 0.8628390356898308.
[0, 1, 1, 4, 5, 5, 5, 6, 6, 7, 7, 7, 10, 10, 11, 11, 12, 12, 12, 13, 13, 14, 18, 18, 19, 19, 21, 22, 24, 25]
The time for 'f_heap_nsmallest on x' took: 0.5187360178679228.
Experiments on x_heap as heap :
[0, 1, 1, 4, 5, 5, 5, 6, 6, 7, 7, 7, 10, 10, 11, 11, 12, 12, 12, 13, 13, 14, 18, 18, 19, 19, 21, 22, 24, 25]
The time for 'f_heap_pop on x_heap' took: 0.2054140530526638.
[0, 1, 1, 4, 5, 5, 5, 6, 6, 7, 7, 7, 10, 10, 11, 11, 12, 12, 12, 13, 13, 14, 18, 18, 19, 19, 21, 22, 24, 25]
The time for 'f_heap_nsmallest on x_heap' took: 0.6638103127479553.
[ 1 5 5 1 0 4 5 6 7 6 7 7 12 12 11 13 11 12 13 18 10 14 10 18 19 19 21 22 24 25]
The time for 'f_np_partition on x_np' took: 0.2107151597738266.
"""
This is a classic problem for which the generally accepted solution is a data structure known as a heap. Below I have done 10 trials for each algorithm f3 and f_heap. As the value for the second argument, k, gets larger the discrepancy between the two performances become even greater. For k = 3, we have algorithm f3 taking .76 seconds and algorithm f_heap taking .54 seconds. But with k = 30 these values become respectively 6.33 seconds and .54 seconds.
import time, random, heapq
class Timer:
def __init__(self, description):
self.description = description
def __enter__(self):
self.start = time.perf_counter()
return self
def __exit__(self, *args):
end = time.perf_counter()
print(f"The time for {self.description} took: {end - self.start}.")
def f3(x,k): # O(N) Time | O(N) Storage
y = []
idx=0
while idx<k:
curr_min = min(x)
x.remove(curr_min)
y.append(curr_min)
idx += 1
return y
def f_heap(x, k): # O(nlogn)
# if you do not need to retain a heap and just need the k smallest, then:
#return heapq.nsmallest(k, x)
heapq.heapify(x)
return [heapq.heappop(x) for _ in range(k)]
N=1000000
x = [random.randint(0,N) for i in range(N)]
TRIALS = 10
with Timer('f3') as t:
for _ in range(TRIALS):
y = f3(x.copy(), 30)
print(y)
print()
with Timer('f_heap') as t:
for _ in range(TRIALS):
y = f_heap(x.copy(), 30)
print(y)
Prints:
The time for f3 took: 6.3301973.
[0, 1, 1, 7, 9, 11, 11, 13, 13, 14, 17, 18, 18, 18, 19, 20, 20, 21, 23, 24, 25, 25, 26, 27, 28, 28, 29, 30, 30, 31]
The time for f_heap took: 0.5372357999999995.
[0, 1, 1, 7, 9, 11, 11, 13, 13, 14, 17, 18, 18, 18, 19, 20, 20, 21, 23, 24, 25, 25, 26, 27, 28, 28, 29, 30, 30, 31]
A Python Demo
Update
Selecting the k smallest using numpy.partition as suggested by #user2357112supportsMonica is indeed very fast if you are already dealing with a numpy array. But if you are starting with an ordinary list and factor in the time to convert to an numpy array just to use the numpy.partition method, then it is slower than using hepaq methods:
def f_np_partition(x, k):
return sorted(np.partition(x, k)[:k])
with Timer('f_np_partition') as t:
for _ in range(TRIALS):
x_np = np.array(x)
y = f_np_partition(x_np.copy(), 30) # don't really need to copy
print(y)
The relative timings:
The time for f3 took: 7.2039111.
[0, 2, 2, 3, 3, 3, 5, 6, 6, 6, 9, 9, 10, 10, 10, 11, 11, 12, 13, 13, 14, 16, 16, 16, 16, 17, 17, 18, 19, 20]
The time for f_heap took: 0.35521280000000033.
[0, 2, 2, 3, 3, 3, 5, 6, 6, 6, 9, 9, 10, 10, 10, 11, 11, 12, 13, 13, 14, 16, 16, 16, 16, 17, 17, 18, 19, 20]
The time for f_np_partition took: 0.8379164999999995.
[0, 2, 2, 3, 3, 3, 5, 6, 6, 6, 9, 9, 10, 10, 10, 11, 11, 12, 13, 13, 14, 16, 16, 16, 16, 17, 17, 18, 19, 20]