Related
For my project I need to repeatedly find the indices of timestamps in lists and if the exact timestamp
is not in the list I need to find the index of the timestamp right before the one I'm looking for.
I tried looping through the list, but that's very slow:
def find_item_index(arr, x):
'''
returns index of x in ordered list.
If x is between two items in the list, the index of the lower one is returned.
'''
for index in range(len(arr)):
if arr[index] <= x < arr[index+1]:
return index
raise ValueError(f'{x} not in array.')
I also tried to do it recursivly, but that was even slower:
def find_item_index_recursive(arr, x, index = 0):
'''
returns index of x in ordered list.
If x is between two items in the list, the index of the lower one is returned.
'''
length = len(arr)
if length == 1:
return index
if arr[length // 2] < x:
return find_item_index_recursive(arr[length // 2:], x, index + length // 2)
else:
return find_item_index_recursive(arr[:length // 2], x, index)
raise ValueError(f'{x} not in array.')
Is there a faster way to do this?
Sort the list and keep track of whether it's sorted before bothering to do any work with it
if not arr_is_sorted: # create me somewhere!
arr.sort() # inplace sort
arr_is_sorted = True # unset if you're unsure if the array is sorted
With a sorted list, you can binary search to efficiently O(log n) find the insertion point - there's a convenient builtin library for this, bisect!
import bisect
insertion_point = bisect.bisect_left(arr, x)
This also keeps the array sorted, so you don't need to re-sort it unless you make unrelated changes to it (ideally you would never make an unordered insertion, so it will always be sorted)
Here's a complete example of how to use bisect
>>> l = [100,50,200,99]
>>> l.sort()
>>> l
[50, 99, 100, 200]
>>> import bisect
>>> bisect.bisect_left(l, 55)
1
>>> bisect.bisect_left(l, 201)
4
The you can use arr.insert(position, value) to put the value into the list
>>> l
[50, 99, 100, 200]
>>> value = 55
>>> l.insert(bisect.bisect_left(l, value), value)
>>> l
[50, 55, 99, 100, 200]
You can prevent duplicate insertions by checking if that position is already equal
>>> pos = bisect.bisect_left(l, value)
>>> if pos == len(l) or l[pos] != value: # length check avoids IndexError
... l.insert(pos, value)
this should work fast I think:
(I am assuming that your timestamps are sorted?)
def find_item_index(arr, x):
'''
returns index of x in ordered list.
If x is between two items in the list, the index of the lower one is returned.
'''
l = len(arr)
i = l//2
j = i//2
while(j>0):
if x<arr[i]:
i-= j
else:
i+= j
j = j//2
return i
Edit: I just checked. Compared to your first version it is faster for longer lists.. I expect at least 4 times, if list gets longer even 10 times
Numpy searchsorted is usually involved in these cases:
np.searchsorted([1,2,8,9], 5) # Your case
> 2
np.searchsorted([1,2,8,9], (-1, 2, 100)) #Other cases
> array([0, 1, 4])
index in missing cases refers to the near right. If this is not your case, this can be modified in order to obtain the near left position.
List has an in-built method which will give you the index of an element. If the element is not found then it'll raise value error.
try:
index = list1.index(element_to_search)
except ValueError as e:
print('element not found')
I have two sorted lists containing float values. The first contains the values I am interested in (l1) and the second list contains values I want to search (l2). However, I am not looking for exact matches and I am tolerating differences based on a function. Since I have do this search very often (>>100000) and the lists can be quite large (~5000 and ~200000 elements), I am really interested in runtime. At first, I thought I could somehow use numpy.isclose(), but my tolerance is not fixed, but depending on the value of interest. Several nested for loops work, but are really slow. I am sure that there is some efficient way to do this.
#check if two floats are close enough to match
def matching(mz1, mz2):
if abs( (1-mz1/mz2) * 1000000) <= 2:
return True
return False
#imagine another huge for loop around everything
l1 = [132.0317, 132.8677, 132.8862, 133.5852, 133.7507]
l2 = [132.0317, 132.0318, 132.8678, 132.8861, 132.8862, 133.5851999, 133.7500]
d = {i:[] for i in l1}
for i in l1:
for j in l2:
if matching(i, j):
d[i].append(j)
fyi: As an alternative to the matching function, I could also create a dictionary first, mapping the values of interest from l1 to the window (min ,max) I would allow. e.g. {132.0317:(132.0314359366, 132.0319640634), ...}, but I think checking for each value from l2 if it lies within one of the windows from this dictionary would be even slower...
This would be how to generate the dictionary containing min/max values for each value from l1:
def calcMinMaxMZ(mz, delta_ppm=2):
minmz = mz- (mz* +delta_ppm)/1000000
maxmz = mz- (mz* -delta_ppm)/1000000
return minmz, maxmz
minmax_d = {mz:calcMinMaxMZ(mz, delta_ppm=2) for mz in l1}
The result may be a dictionary like this:
d = {132.0317: [132.0317, 132.0318], 132.8677: [132.8678], 132.8862: [132.8862, 132.8861], 133.5852: [133.5851999], 133.7507: []} But I actually do much more, when there is a match.
Any help is appreciated!
I re-implemented the for loop using itertools. For it working, the inputs must be sorted. For benchmark I generated 1000 items from <130.0, 135.0> for l1 and 100_000 items from <130.0, 135.0> for l2:
from timeit import timeit
from itertools import tee
from random import uniform
#check if two floats are close enough to match
def matching(mz1, mz2):
if abs( (1-mz1/mz2) * 1000000) <= 2:
return True
return False
#imagine another huge for loop around everything
l1 = sorted([uniform(130.00, 135.00) for _ in range(1000)])
l2 = sorted([uniform(130.00, 135.00) for _ in range(100_000)])
def method1():
d = {i:[] for i in l1}
for i in l1:
for j in l2:
if matching(i, j):
d[i].append(j)
return d
def method2():
iter_2, last_match = tee(iter(l2))
d = {}
for i in l1:
d.setdefault(i, [])
found = False
while True:
j = next(iter_2, None)
if j is None:
break
if matching(i, j):
d[i].append(j)
if not found:
iter_2, last_match = tee(iter_2)
found = True
else:
if found:
break
iter_2, last_match = tee(last_match)
return d
print(timeit(lambda: method1(), number=1))
print(timeit(lambda: method2(), number=1))
Prints:
16.900722101010615
0.030588202003855258
If you transpose your formula to produce a range of mz2 values for a given mz1, you could use a binary search to find the first match in the sorted l2 list, then work your way up sequentially until you reach the end of the range.
def getRange(mz1):
minimum = mz1/(1+2/1000000)
maximum = mz1/(1-2/1000000)
return minimum,maximum
l1 = [132.0317, 132.8677, 132.8862, 133.5852, 133.7507]
l2 = [132.0317, 132.0318, 132.8678, 132.8862, 132.8861, 133.5851999, 133.7500]
l2 = sorted(l2)
from bisect import bisect_left
d = { mz1:[] for mz1 in l1 }
for mz1 in l1:
lo,hi = getRange(mz1)
i = bisect_left(l2,lo)
while i < len(l2) and l2[i]<= hi:
d[mz1].append(l2[i])
i+=1
Sorting l2 will cost O(NlogN) and the dictionary creation will cost O(MlogN) where N is len(l2) and M is len(l1). You will only be applying the tolerance/range formula M times instead of N*M times which should save a lot of processing.
Your lists are already sorted, so you can maybe use paradigm similar to the "Merge" part of MergeSort: keep track of the current element of both idx1 and idx2, and when one of them is acceptable, process it and advance only that index.
d = {i:[] for i in l1}
idx1, idx2 = 0, 0
while idx1 < len(l1):
while matching(l1[idx1], l2[idx2]) and idx2 < len(l2):
d[l1[idx1]].append(l2[idx2])
idx2 += 1
idx1 += 1
print(d)
# {132.0317: [132.0317, 132.0318], 132.8677: [132.8678], 132.8862: [132.8862, 132.8861], 133.5852: [133.5851999], 133.7507: []}
this is O(len(l1) + len(l2)), since it executes exactly once for each element of both lists.
The big caveat here is that this never "steps back" - if the current element of l1 matches the current element of l2 but the next element of l1 would also match the current element of l2, then the latter does not get listed. Fixing this might require adding some sort of "look-back" functionality (which would drive the complexity class up by a magnitude of n in the worst case, but would still be quicker than iterating through both lists repeatedly). However, it does work for your given dataset.
I've been working on some quick and dirty scripts for doing some of my chemistry homework, and one of them iterates through lists of a constant length where all the elements sum to a given constant. For each, I check if they meet some additional criteria and tack them on to another list.
I figured out a way to meet the sum criteria, but it looks horrendous, and I'm sure there's some type of teachable moment here:
# iterate through all 11-element lists where the elements sum to 8.
for a in range(8+1):
for b in range(8-a+1):
for c in range(8-a-b+1):
for d in range(8-a-b-c+1):
for e in range(8-a-b-c-d+1):
for f in range(8-a-b-c-d-e+1):
for g in range(8-a-b-c-d-e-f+1):
for h in range(8-a-b-c-d-e-f-g+1):
for i in range(8-a-b-c-d-e-f-g-h+1):
for j in range(8-a-b-c-d-e-f-g-h-i+1):
k = 8-(a+b+c+d+e+f+g+h+i+j)
x = [a,b,c,d,e,f,g,h,i,j,k]
# see if x works for what I want
Here's a recursive generator that yields the lists in lexicographic order. Leaving exact as True gives the requested result where every sum==limit; setting exact to False gives all lists with 0 <= sum <= limit. The recursion takes advantage of this option to produce the intermediate results.
def lists_with_sum(length, limit, exact=True):
if length:
for l in lists_with_sum(length-1, limit, False):
gap = limit-sum(l)
for i in range(gap if exact else 0, gap+1):
yield l + [i]
else:
yield []
Generic, recursive solution:
def get_lists_with_sum(length, my_sum):
if my_sum == 0:
return [[0 for _ in range(length)]]
if not length:
return [[]]
elif length == 1:
return [[my_sum]]
else:
lists = []
for i in range(my_sum+1):
rest = my_sum - i
sublists = get_lists_with_sum(length-1, rest)
for sl in sublists:
sl.insert(0, i)
lists.append(sl)
return lists
print get_lists_with_sum(11, 8)
I found this code on this site to find the second largest number:
def second_largest(numbers):
m1, m2 = None, None
for x in numbers:
if x >= m1:
m1, m2 = x, m1
elif x > m2:
m2 = x
return m2
Source: Get the second largest number in a list in linear time
Is it possible to modify this code to find the second smallest number? So for example
print second_smallest([1, 2, 3, 4])
2
a = [6,5,4,4,2,1,10,1,2,48]
s = set(a) # used to convert any of the list/tuple to the distinct element and sorted sequence of elements
# Note: above statement will convert list into sets
print sorted(s)[1]
The function can indeed be modified to find the second smallest:
def second_smallest(numbers):
m1 = m2 = float('inf')
for x in numbers:
if x <= m1:
m1, m2 = x, m1
elif x < m2:
m2 = x
return m2
The old version relied on a Python 2 implementation detail that None is always sorted before anything else (so it tests as 'smaller'); I replaced that with using float('inf') as the sentinel, as infinity always tests as larger than any other number. Ideally the original function should have used float('-inf') instead of None there, to not be tied to an implementation detail other Python implementations may not share.
Demo:
>>> def second_smallest(numbers):
... m1 = m2 = float('inf')
... for x in numbers:
... if x <= m1:
... m1, m2 = x, m1
... elif x < m2:
... m2 = x
... return m2
...
>>> print(second_smallest([1, 2, 3, 4]))
2
Outside of the function you found, it's almost just as efficient to use the heapq.nsmallest() function to return the two smallest values from an iterable, and from those two pick the second (or last) value. I've included a variant of the unique_everseen() recipe to filter out duplicate numbers:
from heapq import nsmallest
from itertools import filterfalse
def second_smallest(numbers):
s = set()
sa = s.add
un = (sa(n) or n for n in filterfalse(s.__contains__, numbers))
return nsmallest(2, un)[-1]
Like the above implementation, this is a O(N) solution; keeping the heap variant each step takes logK time, but K is a constant here (2)!
Whatever you do, do not use sorting; that takes O(NlogN) time.
Or just use heapq:
import heapq
def second_smallest(numbers):
return heapq.nsmallest(2, numbers)[-1]
second_smallest([1, 2, 3, 4])
# Output: 2
As per the Python in-built function sorted
sorted(my_list)[0]
gives back the smallest number, and sorted(my_list)[1] does accordingly for the second smallest, and so on and so forth.
My favourite way of finding the second smallest number is by eliminating the smallest number from the list and then printing the minimum from the list would return me the second smallest element of the list. The code for the task is as below:
mylist=[1,2,3,4]
mylist=[x for x in mylist if x!=min(mylist)] #deletes the min element from the list
print(min(mylist))
Solution that returns second unique number in list with no sort:
def sec_smallest(numbers):
smallest = float('+inf')
small = float('+inf')
for i in numbers:
if i < smallest:
small = smallest
smallest = i
elif i < small and i != smallest:
small = i
return small
print('Sec_smallest:', sec_smallest([1, 2, -8, -8, -2, 0]))
Yes, except that code relies on a small quirk (that raises an exception in Python 3): the fact that None compares as smaller than a number.
Another value that works is float("-inf"), which is a number that is smaller than any other number.
If you use that instead of None, and just change -inf to +inf and > to <, there's no reason it wouldn't work.
Edit: another possibility would be to simply write -x in all the comparisons on x, e.g. do if -x <= m1: et cetera.
mi= min(input_list)
second_min = float('inf')
for i in input_list:
if i != mi:
if i<second_min:
second_min=i
if second_min == float('inf'):
print('not present')
else:
print(second_min)
##input_list = [6,6,6,6,6]
#input_list = [3, 1, 4, 4, 5, 5, 5, 0, 2, 2]
#input_list = [7, 2, 0, 9, -1, 8]
# Even if there is same number in the list then Python will not get confused.
I'd like to add another, more general approach:
Here's a recursive way of finding the i-th minimums of a given list of numbers
def find_i_minimums(numbers,i):
minimum = float('inf')
if i==0:
return []
less_than_i_minimums = find_i_minimums(numbers,i-1)
for element in numbers:
if element not in less_than_i_minimums and element < minimum:
minimum = element
return less_than_i_minimums + [minimum]
For example,
>>> find_i_minimums([0,7,4,5,21,2,6,1],3) # finding 3 minimial values for the given list
[0, 1, 2]
( And if you want only the i-th minimum number you'd extract the final value of the list )
The time-complexity of the above algorithm is bad though, it is O(N*i^2) ( Since the recursion depth is i , and at each recursive call we go over all values in 'numbers' list whose length is N and we check if the minimum element we're searching for isn't in a list of length i-1, thus the total complexity can be described by a geometric sum that will give the above mentioned complexity ).
Here's a similar but alternative-implementation whose time-complexity is O(N*i) on average. It uses python's built-in 'set' data-structure:
def find_i_minimums(numbers,i):
minimum = float('inf')
if i==0:
return set()
less_than_i_minimums = find_i_minimums(numbers,i-1)
for element in numbers:
if element not in less_than_i_minimums and element < minimum:
minimum = element
return less_than_i_minimums.union(set({minimum}))
If your 'i' is small, you can use the implementations above and then extract how many minimums you want ( or if you want the second minimum, then in your case run the code for i=2 and just extract the last element from the output data-structure ).
But if 'i' is for example greater than log(N) , I'd recommend sorting the list of numbers itself ( for example, using mergesort whose complexity is O(N*log(N)) at worst case ) and then taking the i-th element. Why so? because as stated, the run-time of the algorithm above is not great for larger values of 'i'.
You might find this code easy and understandable
def secsmall(numbers):
small = max(numbers)
for i in range(len(numbers)):
if numbers[i]>min(numbers):
if numbers[i]<small:
small = numbers[i]
return small
I am assuming "numbers" is a list name.
Find the first and the second smallest numbers in an interger array
arr= [1,2,3,4,5,6,7,-1,0,-2,-10]
def minSecondmin(arr,n):
i=1
if arr[i-1] < arr[i]:
f = arr[i-1]
s = arr[i]
else:
f=arr[i]
s=arr[i-1]
for i in range(2,n):
if arr[i]<f:
s=f
f = arr[i]
elif arr[i]<s:
s=arr[i]
return f,s
minSecondmin(arr,len(arr))
l = [41,9000,123,1337]
# second smallest
sorted(l)[1]
123
# second biggest
sorted(l)[-2]
1337
Here we want to keep an invariant while we scan the list of numbers, for every sublist it must be
m1<=m2<={all other elements}
the minimum length of a list for which the question (2nd smallest) is sensible is 2, so we establish the invariant examining the first and the second element of the list (no need for magic numbers), next we iterate on all the remaining numbers, maintaining our invariant.
def second_smaller(numbers):
# if len(numbers)<2: return None or otherwise raise an exception
m1, m2 = numbers[:2]
if m2<m1: m1, m2 = m2, m1
for x in numbers[2:]:
if x <= m1:
m1, m2 = x, m1
elif x < m2:
m2 = x
return m2
Addendum
BTW, the same reasoning should be applied to the second_largest function mentioned by the OP
I am writing the code which is using recursion to find the second smallest element in a list.
def small(l):
small.counter+=1;
min=l[0];
emp=[]
for i in range(len(l)):
if l[i]<min:
min=l[i]
for i in range(len(l)):
if min==l[i]:
emp.append(i)
if small.counter==2:
print "The Second smallest element is:"+str(min)
else:
for j in range(0,len(emp)):
l.remove(min)
small(l)
small.counter = 0
list=[-1-1-1-1-1-1-1-1-1,1,1,1,1,1]
small(list)
You can test it with various input integers.
There is a easy way to do . First sort the list and get the second item from the list.
def solution(a_list):
a_list.sort()
print a_list[1]
solution([1, 2, -8, -2, -10])
You can use in built function 'sorted'
def second_smallest(numbers):
count = 0
l = []
for i in numbers:
if(i not in l):
l.append(i)
count+=1
if(count==2):
break
return max(l)
To find second smallest in the list, use can use following approach which will work if two or more elements are repeated.
def second_smallest(numbers):
s = sorted(set(numbers))
return s[1]
Here is:
def find_second_smallest(a: list) -> int:
first, second = float('inf')
for i in range(len(a)):
if a[i] < first:
first, second = a[i], first
elif a[i] < second and a[i] != first:
second = a[i]
return second
input: [1, 1, 1, 2]
output: 2
This code is also works fine, To find the second smallest number in list.
For this code first we have to sort the values in list. after that we have to initialize the variable as second index.
l1 = [12,32,4,34,64,3,43]
for i in range(0,len(l1)):
for j in range(0,i+1):
if l1[i]<l1[j]:
l1[i],l1[j]=l1[j],l1[i]
min_val = l1[1]
for k in l1:
if min_val>k:
break
print(min_val)
def SecondSmallest(x):
lowest=min(x[0],x[1])
lowest2 = max(x[0],x[1])
for item in x:
if item < lowest:
lowest2 = lowest
lowest = item
elif lowest2 > item and item > lowest:
lowest2 = item
return lowest2
SecondSmallest([10,1,-1,2,3,4,5])
i have homework in which i must use recursion to find all occurances of a number/letter/word in a list and return their index in the original list.. I have searched this site for previously answered question but I couldn't find any answer regarding recursion with the option to continue checking the list even after the first occurances has been found..
should look like this pretty much:
>>> find_value( [4,7,5,3,2,5,3,7,8,6,5,6], 5)
[2,5,10]
my code so far goes like this:
def find_all(x,y):
if len(x) == 1 and x[0] == y:
return [i for i, y in enumerate(x)]
return find_all(x[1:],y)
Though it only minimize the list and gives me the same [0] as the index.. which is true, for the divided list.. this way I will never get the original index..
Thanks
- if this already exist, I am sorry for i have searched and couldn't find.
Here is a simple non-recursive solution:
def find_value(l, lookfor):
return [i for i, v in enumerate(l) if v == lookfor]
As a piece of advice for your homework -- just pass the progress through the list as an optional third argument to find_all:
def find_value(list, lookfor, position=0)
... and add one to position each time you recurse.
The point of assigning a homework is usually so that you can explore the problem and learn from it. In this case, it is recursion which is usually hard for beginners.
The point of recursion is to construct an answer for a larger problem from a solution of a smaller ones. So it is best to start off with the smallest one possible:
def find_all(haystack, needle):
if not haystack:
# no occurrences can happen
return []
If the list is not empty, we can check if the first element is what we are looking for:
if haystack[0] == needle:
occurrences = [0] # the index of the first element is always 0
else:
occurrences = []
We will also need the solution to the smaller problem:
recursive_occurences = find_all(haystack[1:], needle)
Now the problem you have noticed is that the indices that are returned are always 0. That's because they are indices in the smaller list. If an item has index 0 in a smaller list, it means its index in the largest list is actually 1 (this is the main part your program was missing), thus:
for x in recursive_occurences:
occurrences.append(x+1)
And return the complete answer:
return occurrences
I hope this helps you a bit, so you can do your next homework on your own.
Here are several solution:
in one go, ugly, but working:
def find_value(lst, elt):
return [x + 1
for x in ([] if not lst else
(([-1] if lst[0] == elt else []) +
find_value(lst[1:], elt)))]
prettier, but with hidden index param:
def find_value(lst, elt, idx=0):
return [] if not lst else \
(([idx] if lst[0] == elt else []) +
find_value(lst[1:], elt, idx + 1))
pretty?, long with inner recursive function... more maintainable ?
def find_value(lst, elt):
def _rec(lst, elt, idx):
if not lst:
return []
res = [idx] if lst[0] == elt else []
return res + _rec(lst[1:], elt, idx + 1)
return _rec(lst, elt, idx=0)
There is a very simple solution to this problem, even if you are using recursion to solve the assignment:
>>> def find_value(array, value):
*head, tail = array
array = find_value(head, value) if head else []
return array + [len(head)] if tail == value else array
>>> find_value([4, 7, 5, 3, 2, 5, 3, 7, 8, 6, 5, 6], 5)
[2, 5, 10]
>>>