Most efficient way to remove entries in a list

Most efficient way to remove entries in a list - python

I have a massive 4D data set, spread throughout 4 variables, x_list, y_list, z_list, and i_list. Each is a list of N scalars, with X, Y, and Z representing the point's position in space, and I representing intensity.
I already have a function that picks through and marks negligible points (those whose intensity is too low) for deletion, by setting their intensity to 0. However, when I run this on my 2-million point set, the deletion process takes hours.
Currently, I am using the .pop(index) command to remove the data points, because it does so very cleanly. Here is the code:
counter = 0
i = 0
for entry in i_list
if (i_list[i] == 0):
x_list.pop(i)
y_list.pop(i)
z_list.pop(i)
i_list.pop(i)
counter += 1
print (counter, "points removed")
else
i += 1
How can I do this more efficiently?

I think it'll be faster to create new empty lists for each existing list, and append items to them if i_list[i] != 0. Look up the time complexity of the operations you're doing, and you'll see that deleting items is O(n), whereas appending is O(1). Currently you're doing a lot of O(n) deletes with a pretty large n, that will be very slow.
So something like:
new_x = []
new_y = []
new_y = []
new_i = []
for index in range(len(i_list)):
if i_list[index] != 0:
new_x.append(x_list[index])
new_y.append(y_list[index])
# Etc.
Going further, you should look into numpy arrays, where subsetting to find the set of items where i_list != 0 would be very fast.

You should use del:
array = [1, 2, 3]
del array[0]
gives: [2, 3]
And most important, using print() while looping through large file is suicide. Most of the time is consumed by printing. Here's example:
>>> from time import time
>>> def test1(n):
... for i in range(n):
... print(i)
...
>>> def test2(n):
... for i in range(n):
... i += 1
...
>>> def wraper():
... t1 = time()
... test1(1000)
... t2 = time()
... test2(1000)
... t3 = time()
... print("Test1: %s\ntest2: %s: " % (t2-t1, t3-t2))
And output is:
(lots of numbers)
Test1: 0.46030712127685547
test2: 0.0:

This is a job for the happy list comprehension!
x_prime_list = [x for (index, x) in enumerate(x_list)
if i_list[index] != 0]
Which pairs up members of x_list with their ordinal address using enumerate(). It puts all the members x in a new list, if and only if i_list[index] is not zero (otherwise it adds nothing to the list.
The advantage that list comprehensions have over the equivalent code you posted is that the looping and appending is handled in C rather than needing Python to do these tasks.

Related

If the input number is in the list add its index to a new one

I want to check if the input number is in the list, and if so - add its index in the original list to the new one. If it's not in the list - I want to add a -1.
I tried using the for loop and adding it like that, but it is kind of bad on the speed of the program.
n = int(input())
k = [int(x) for x in input().split()]
z = []
m = int(input())
for i in range(m):
a = int(input())
if a in k: z.append(k.index(a))
else: z.append(-1)
The input should look like this :
3
2 1 3
1
8
3
And the output should be :
1
-1
2
How can I do what I'm trying to do more efficiently/quickly

There are many approaches to this problem. This is typical when you're first starting in programming as, the simpler the problem, the more options you have. Choosing which option depends what you have and what you want.
In this case we're expecting user input of this form:
3
2 1 3
1
8
3
One approach is to generate a dict to use for lookups instead of using list operations. Lookup in dict will give you better performance overall. You can use enumerate to give me both the index i and the value x from the list from user input. Then use int(x) as the key and associate it to the index.
The key should always be the data you have, and the value should always be the data you want. (We have a value, we want the index)
n = int(input())
k = {}
for i, x in enumerate(input().split()):
k[int(x)] = i
z = []
for i in range(n):
a = int(input())
if a in k:
z.append(k[a])
else:
z.append(-1)
print(z)
k looks like:
{2: 0, 1: 1, 3: 2}
This way you can call k[3] and it will give you 2 in O(1) or constant time.
(See. Python: List vs Dict for look up table)
There is a structure known as defaultdict which allows you to specify behaviour when a key is not present in the dictionary. This is particularly helpful in this case, as we can just request from the defaultdict and it will return the desired value either way.
from collections import defaultdict
n = int(input())
k = defaultdict(lambda: -1)
for i, x in enumerate(input().split()):
k[int(x)] = i
z = []
for i in range(n):
a = int(input())
z.append(k[a])
print(z)
While this does not speed up your program, it does make your second for loop easier to read. It also makes it easier to move into the comprehension in the next section.
(See. How does collections.defaultdict work?
With these things in place, we can use, yes, list comprehension, to very minimally speed up the construction of z and k. (See. Are list-comprehensions and functional functions faster than “for loops”?
from collections import defaultdict
n = int(input())
k = defaultdict(lambda: -1)
for i, x in enumerate(input().split()):
k[int(x)] = i
z = [k[int(input())] for i in range(n)]
print(z)
All code snippets print z as a list:
[1, -1, 2]
See Printing list elements on separated lines in Python if you'd like different print outs.
Note: The index function will find the index of the first occurrence of the value in a list. Because of the way the dict is built, the index of the last occurrence will be stored in k. If you need to mimic index exactly you should ensure that a later index does not overwrite a previous one.
for i, x in enumerate(input().split()):
x = int(x)
if x not in k:
k[x] = i

Adapt this solution for your problem.
def test(list1,value):
try:
return list1.index(value)
except ValueError as e:
print(e)
return -1
list1=[2, 1, 3]
in1 = [1,8,3]
res= [test(list1,i) for i in in1]
print(res)
output
8 is not in list
[1, -1, 2]

Matching two lists containing slightly differing float values by allowing a tolerance

I have two sorted lists containing float values. The first contains the values I am interested in (l1) and the second list contains values I want to search (l2). However, I am not looking for exact matches and I am tolerating differences based on a function. Since I have do this search very often (>>100000) and the lists can be quite large (~5000 and ~200000 elements), I am really interested in runtime. At first, I thought I could somehow use numpy.isclose(), but my tolerance is not fixed, but depending on the value of interest. Several nested for loops work, but are really slow. I am sure that there is some efficient way to do this.
#check if two floats are close enough to match
def matching(mz1, mz2):
if abs( (1-mz1/mz2) * 1000000) <= 2:
return True
return False
#imagine another huge for loop around everything
l1 = [132.0317, 132.8677, 132.8862, 133.5852, 133.7507]
l2 = [132.0317, 132.0318, 132.8678, 132.8861, 132.8862, 133.5851999, 133.7500]
d = {i:[] for i in l1}
for i in l1:
for j in l2:
if matching(i, j):
d[i].append(j)
fyi: As an alternative to the matching function, I could also create a dictionary first, mapping the values of interest from l1 to the window (min ,max) I would allow. e.g. {132.0317:(132.0314359366, 132.0319640634), ...}, but I think checking for each value from l2 if it lies within one of the windows from this dictionary would be even slower...
This would be how to generate the dictionary containing min/max values for each value from l1:
def calcMinMaxMZ(mz, delta_ppm=2):
minmz = mz- (mz* +delta_ppm)/1000000
maxmz = mz- (mz* -delta_ppm)/1000000
return minmz, maxmz
minmax_d = {mz:calcMinMaxMZ(mz, delta_ppm=2) for mz in l1}
The result may be a dictionary like this:
d = {132.0317: [132.0317, 132.0318], 132.8677: [132.8678], 132.8862: [132.8862, 132.8861], 133.5852: [133.5851999], 133.7507: []} But I actually do much more, when there is a match.
Any help is appreciated!

I re-implemented the for loop using itertools. For it working, the inputs must be sorted. For benchmark I generated 1000 items from <130.0, 135.0> for l1 and 100_000 items from <130.0, 135.0> for l2:
from timeit import timeit
from itertools import tee
from random import uniform
#check if two floats are close enough to match
def matching(mz1, mz2):
if abs( (1-mz1/mz2) * 1000000) <= 2:
return True
return False
#imagine another huge for loop around everything
l1 = sorted([uniform(130.00, 135.00) for _ in range(1000)])
l2 = sorted([uniform(130.00, 135.00) for _ in range(100_000)])
def method1():
d = {i:[] for i in l1}
for i in l1:
for j in l2:
if matching(i, j):
d[i].append(j)
return d
def method2():
iter_2, last_match = tee(iter(l2))
d = {}
for i in l1:
d.setdefault(i, [])
found = False
while True:
j = next(iter_2, None)
if j is None:
break
if matching(i, j):
d[i].append(j)
if not found:
iter_2, last_match = tee(iter_2)
found = True
else:
if found:
break
iter_2, last_match = tee(last_match)
return d
print(timeit(lambda: method1(), number=1))
print(timeit(lambda: method2(), number=1))
Prints:
16.900722101010615
0.030588202003855258

If you transpose your formula to produce a range of mz2 values for a given mz1, you could use a binary search to find the first match in the sorted l2 list, then work your way up sequentially until you reach the end of the range.
def getRange(mz1):
minimum = mz1/(1+2/1000000)
maximum = mz1/(1-2/1000000)
return minimum,maximum
l1 = [132.0317, 132.8677, 132.8862, 133.5852, 133.7507]
l2 = [132.0317, 132.0318, 132.8678, 132.8862, 132.8861, 133.5851999, 133.7500]
l2 = sorted(l2)
from bisect import bisect_left
d = { mz1:[] for mz1 in l1 }
for mz1 in l1:
lo,hi = getRange(mz1)
i = bisect_left(l2,lo)
while i < len(l2) and l2[i]<= hi:
d[mz1].append(l2[i])
i+=1
Sorting l2 will cost O(NlogN) and the dictionary creation will cost O(MlogN) where N is len(l2) and M is len(l1). You will only be applying the tolerance/range formula M times instead of N*M times which should save a lot of processing.

Your lists are already sorted, so you can maybe use paradigm similar to the "Merge" part of MergeSort: keep track of the current element of both idx1 and idx2, and when one of them is acceptable, process it and advance only that index.
d = {i:[] for i in l1}
idx1, idx2 = 0, 0
while idx1 < len(l1):
while matching(l1[idx1], l2[idx2]) and idx2 < len(l2):
d[l1[idx1]].append(l2[idx2])
idx2 += 1
idx1 += 1
print(d)
# {132.0317: [132.0317, 132.0318], 132.8677: [132.8678], 132.8862: [132.8862, 132.8861], 133.5852: [133.5851999], 133.7507: []}
this is O(len(l1) + len(l2)), since it executes exactly once for each element of both lists.
The big caveat here is that this never "steps back" - if the current element of l1 matches the current element of l2 but the next element of l1 would also match the current element of l2, then the latter does not get listed. Fixing this might require adding some sort of "look-back" functionality (which would drive the complexity class up by a magnitude of n in the worst case, but would still be quicker than iterating through both lists repeatedly). However, it does work for your given dataset.

python intersection of lists while not having the same index

I have a curious case, and after some time I have not come up with an adequate solution.
Say you have two lists and you need to find items that have the same index.
x = [1,4,5,7,8]
y = [1,3,8,7,9]
I am able to get a correct intersection of those which appear in both lists with the same index by using the following:
matches = [i for i, (a,b) in enumerate(zip(x,y)) if a==b)
This would return:
[0,3]
I am able to get a a simple intersection of both lists with the following (and in many other ways, this is just an example)
intersected = set(x) & set(y)
This would return this list:
[1,8,7,9]
Here's the question. I'm wondering for some ideas for a way of getting a list of items (as in the second list) which do not include those matches above but are not in the same position on the list.
In other words, I'm looking items in x that do not share the same index in the y
The desired result would be the index position of "8" in y, or [2]
Thanks in advance

You're so close: iterate through y; look for a value that is in x, but not at the same position:
offset = [i for i, a in enumerate(y) if a in x and a != x[i] ]
Result:
[2]
Including the suggested upgrade from pault, with respect to Martijn's comment ... the pre-processing reduces the complexity, in case of large lists:
>>> both = set(x) & set(y)
>>> offset = [i for i, a in enumerate(y) if a in both and a != x[i] ]
As PaulT pointed out, this is still quite readable at OP's posted level.

I'd create a dictionary of indices for the first list, then use that to test if the second value is a) in that dictionary, and b) the current index is not present:
def non_matching_indices(x, y):
x_indices = {}
for i, v in enumerate(x):
x_indices.setdefault(v, set()).add(i)
return [i for i, v in enumerate(y) if i not in x_indices.get(v, {i})]
The above takes O(len(x) + len(y)) time; a single full scan through the one list, then another full scan through the other, where each test to include i is done in constant time.
You really don't want to use a value in x containment test here, because that requires a scan (a loop) over x to see if that value is really in the list or not. That takes O(len(x)) time, and you do that for each value in y, which means that the fucntion takes O(len(x) * len(y)) time.
You can see the speed differences when you run a time trial with a larger list filled with random data:
>>> import random, timeit
>>> def using_in_x(x, y):
... return [i for i, a in enumerate(y) if a in x and a != x[i]]
...
>>> x = random.sample(range(10**6), 1000)
>>> y = random.sample(range(10**6), 1000)
>>> for f in (using_in_x, non_matching_indices):
... timer = timeit.Timer("f(x, y)", f"from __main__ import f, x, y")
... count, total = timer.autorange()
... print(f"{f.__name__:>20}: {total / count * 1000:6.3f}ms")
...
using_in_x: 10.468ms
non_matching_indices: 0.630ms
So with two lists of 1000 numbers each, if you use value in x testing, you easily take 15 times as much time to complete the task.

x = [1,4,5,7,8]
y = [1,3,8,7,9]
result=[]
for e in x:
if e in y and x.index(e) != y.index(e):
result.append((x.index(e),y.index(e),e))
print result #gives tuple with x_position,y_position,value
This version goes item by item through the first list and checks whether the item is also in the second list. If it is, it compares the indices for the found item in both lists and if they are different then it stores both indices and the item value as a tuple with three values in the result list.

Python: Fastest Way to compare arrays elementwise

I am looking for the fastest way to output the index of the first difference of two arrays in Python. For example, let's take the following two arrays:
test1 = [1, 3, 5, 8]
test2 = [1]
test3 = [1, 3]
Comparing test1 and test2, I would like to output 1, while the comparison of test1 and test3 should output 2.
In other words I look for an equivalent to the statement:
import numpy as np
np.where(np.where(test1 == test2, test1, 0) == '0')[0][0]
with varying array lengths.
Any help is appreciated.

For lists this works:
from itertools import zip_longest
def find_first_diff(list1, list2):
for index, (x, y) in enumerate(zip_longest(list1, list2,
fillvalue=object())):
if x != y:
return index
zip_longest pads the shorter list with None or with a provided fill value. The standard zip does not work if the difference is caused by different list lengths rather than actual different values in the lists.
On Python 2 use izip_longest.
Updated: Created unique fill value to avoid potential problems with None as list value. object() is unique:
>>> o1 = object()
>>> o2 = object()
>>> o1 == o2
False
This pure Python approach might be faster than a NumPy solution. This depends on the actual data and other circumstances.
Converting a list into a NumPy array also takes time. This might actually
take longer than finding the index with the function above. If you are not
going to use the NumPy array for other calculations, the conversion
might cause considerable overhead.
NumPy always searches the full array. If the difference comes early,
you do a lot more work than you need to.
NumPy creates a bunch of intermediate arrays. This costs memory and time.
NumPy needs to construct intermediate arrays with the maximum length.
Comparing many small with very large arrays is unfavorable here.
In general, in many cases NumPy is faster than a pure Python solution.
But each case is a bit different and there are situations where pure
Python is faster.

with numpy arrays (which will be faster for big arrays) then you could check the lengths of the lists then (also) check the overlapping parts something like the following (obviously slicing the longer to the length of the shorter):
import numpy as np
n = min(len(test1), len(test2))
x = np.where(test1[:n] != test2[:n])[0]
if len(x) > 0:
ans = x[0]
elif len(test1) != len(test2):
ans = n
else:
ans = None
EDIT - despite this being voted down I will leave my answer up here in case someone else needs to do something similar.
If the starting arrays are large and numpy then this is the fastest method. Also I had to modify Andy's code to get it to work. In the order: 1. my suggestion, 2. Paidric's (now removed but the most elegant), 3. Andy's accepted answer, 4. zip - non numpy, 5. vanilla python without zip as per #leekaiinthesky
0.1ms, 9.6ms, 0.6ms, 2.8ms, 2.3ms
if the conversion to ndarray is included in timeit then the non-numpy nop-zip method is fastest
7.1ms, 17.1ms, 7.7ms, 2.8ms, 2.3ms
and even more so if the difference between the two lists is at around index 1,000 rather than 10,000
7.1ms, 17.1ms, 7.7ms, 0.3ms, 0.2ms
import timeit
setup = """
import numpy as np
from itertools import zip_longest
list1 = [1 for i in range(10000)] + [4, 5, 7]
list2 = [1 for i in range(10000)] + [4, 4]
test1 = np.array(list1)
test2 = np.array(list2)
def find_first_diff(l1, l2):
for index, (x, y) in enumerate(zip_longest(l1, l2, fillvalue=object())):
if x != y:
return index
def findFirstDifference(list1, list2):
minLength = min(len(list1), len(list2))
for index in range(minLength):
if list1[index] != list2[index]:
return index
return minLength
"""
fn = ["""
n = min(len(test1), len(test2))
x = np.where(test1[:n] != test2[:n])[0]
if len(x) > 0:
ans = x[0]
elif len(test1) != len(test2):
ans = n
else:
ans = None""",
"""
x = np.where(np.in1d(list1, list2) == False)[0]
if len(x) > 0:
ans = x[0]
else:
ans = None""",
"""
x = test1
y = np.resize(test2, x.shape)
x = np.where(np.where(x == y, x, 0) == 0)[0]
if len(x) > 0:
ans = x[0]
else:
ans = None""",
"""
ans = find_first_diff(list1, list2)""",
"""
ans = findFirstDifference(list1, list2)"""]
for f in fn:
print(timeit.timeit(f, setup, number = 1000))

Here one way to do it:
from itertools import izip
def compare_lists(lista, listb):
"""
Compare two lists and return the first index where they differ. if
they are equal, return the list len
"""
for position, (a, b) in enumerate(zip(lista, listb)):
if a != b:
return position
return min([len(lista), len(listb)])
The algorithm is simple: zip (or in this case, a more efficient izip) the two lists, then compare them element by element.
The eumerate function gives the index position which we can return if a discrepancy found
If we exit the for loop without any returns, one of the two possibilities can happen:
The two lists are identical. In this case, we want to return the length of either lists.
Lists are of different length and they are equal up to the length of the shorter list. In this case, we want to return the length of the shorter list
In ether cases, the min(...) expression is what we want.
This function has a bug: if you compare two empty lists, it returns 0, which seems wrong. I'll leave it to you to fix it as an exercise.

The fastest algorithm would compare every element up to the first difference and no more. So iterating through the two lists pairwise like that would give you this:
def findFirstDifference(list1, list2):
minLength = min(len(list1), len(list2))
for index in xrange(minLength):
if list1[index] != list2[index]:
return index
return minLength # the two lists agree where they both have values, so return the next index
Which gives the output you want:
print findFirstDifference(test1, test3)
> 2

Thanks for all of your suggestions, I just found a much simpler way for my problem which is:
x = numpy.array(test1)
y = np.resize(numpy.array(test2), x.shape)
np.where(np.where(x == y, x, 0) == '0')[0][0]

Here's an admittedly not very pythonic, numpy-free stab:
b = zip (test1, test2)
c = 0
while b:
b = b[1:]
if not b or b[0][0] != b[0][1]:
break
else:
c = c + 1
print c

For Python 3.x:
def first_diff_index(ls1, ls2):
l = min(len(ls1), len(ls2))
return next((i for i in range(l) if ls1[i] != ls2[i]), l)
(for Python 2.7 onwards substitute range by xrange)

Find the smallest positive number not in list

I have a list in python like this:
myList = [1,14,2,5,3,7,8,12]
How can I easily find the first unused value? (in this case '4')

I came up with several different ways:
Iterate the first number not in set
I didn't want to get the shortest code (which might be the set-difference trickery) but something that could have a good running time.
This might be one of the best proposed here, my tests show that it might be substantially faster - especially if the hole is in the beginning - than the set-difference approach:
from itertools import count, filterfalse # ifilterfalse on py2
A = [1,14,2,5,3,7,8,12]
print(next(filterfalse(set(A).__contains__, count(1))))
The array is turned into a set, whose __contains__(x) method corresponds to x in A. count(1) creates a counter that starts counting from 1 to infinity. Now, filterfalse consumes the numbers from the counter, until a number is found that is not in the set; when the first number is found that is not in the set it is yielded by next()
Timing for len(a) = 100000, randomized and the sought-after number is 8:
>>> timeit(lambda: next(filterfalse(set(a).__contains__, count(1))), number=100)
0.9200698399945395
>>> timeit(lambda: min(set(range(1, len(a) + 2)) - set(a)), number=100)
3.1420603669976117
Timing for len(a) = 100000, ordered and the first free is 100001
>>> timeit(lambda: next(filterfalse(set(a).__contains__, count(1))), number=100)
1.520096342996112
>>> timeit(lambda: min(set(range(1, len(a) + 2)) - set(a)), number=100)
1.987783643999137
(note that this is Python 3 and range is the py2 xrange)
Use heapq
The asymptotically good answer: heapq with enumerate
from heapq import heapify, heappop
heap = list(A)
heapify(heap)
from heapq import heapify, heappop
from functools import partial
# A = [1,2,3] also works
A = [1,14,2,5,3,7,8,12]
end = 2 ** 61 # these are different and neither of them can be the
sentinel = 2 ** 62 # first gap (unless you have 2^64 bytes of memory).
heap = list(A)
heap.append(end)
heapify(heap)
print(next(n for n, v in enumerate(
iter(partial(heappop, heap), sentinel), 1) if n != v))
Now, the one above could be the preferred solution if written in C, but heapq is written in Python and most probably slower than many other alternatives that mainly use C code.
Just sort and enumerate to find the first not matching
Or the simple answer with good constants for O(n lg n)
next(i for i, e in enumerate(sorted(A) + [ None ], 1) if i != e)
This might be fastest of all if the list is almost sorted because of how the Python Timsort works, but for randomized the set-difference and iterating the first not in set are faster.
The + [ None ] is necessary for the edge cases of there being no gaps (e.g. [1,2,3]).

This makes use of the property of sets
>>> l = [1,2,3,5,7,8,12,14]
>>> m = range(1,len(l))
>>> min(set(m)-set(l))
4

I would suggest you to use a generator and use enumerate to determine the missing element
>>> next(a for a, b in enumerate(myList, myList[0]) if a != b)
4
enumerate maps the index with the element so your goal is to determine that element which differs from its index.
Note, I am also assuming that the elements may not start with a definite value, in this case which is 1, and if it is so, you can simplify the expression further as
>>> next(a for a, b in enumerate(myList, 1) if a != b)
4

A for loop with the list will do it.
l = [1,14,2,5,3,7,8,12]
for i in range(1, max(l)):
if i not in l: break
print(i) # result 4

Don't know how efficient, but why not use an xrange as a mask and use set minus?
>>> myList = [1,14,2,5,3,7,8,12]
>>> min(set(xrange(1, len(myList) + 1)) - set(myList))
4
You're only creating a set as big as myList, so it can't be that bad :)
This won't work for "full" lists:
>>> myList = range(1, 5)
>>> min(set(xrange(1, len(myList) + 1)) - set(myList))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: min() arg is an empty sequence
But the fix to return the next value is simple (add one more to the masked set):
>>> min(set(xrange(1, len(myList) + 2)) - set(myList))
5

import itertools as it
next(i for i in it.count() if i not in mylist)
I like this because it reads very closely to what you're trying to do: "start counting, keep going until you reach a number that isn't in the list, then tell me that number". However, this is quadratic since testing i not in mylist is linear.
Solutions using enumerate are linear, but rely on the list being sorted and no value being repeated. Sorting first makes it O(n log n) overall, which is still better than quadratic. However, if you can assume the values are distinct, then you could put them into a set first:
myset = set(mylist)
next(i for i in it.count() if i not in myset)
Since set containment checks are roughly constant time, this will be linear overall.

I just solved this in a probably non pythonic way
def solution(A):
# Const-ish to improve readability
MIN = 1
if not A: return MIN
# Save re-computing MAX
MAX = max(A)
# Loop over all entries with minimum of 1 starting at 1
for num in range(1, MAX):
# going for greatest missing number return optimistically (minimum)
# If order needs to switch, then use max as start and count backwards
if num not in A: return num
# In case the max is < 0 double wrap max with minimum return value
return max(MIN, MAX+1)
I think it reads quite well

My effort, no itertools. Sets "current" to be the one less than the value you are expecting.
list = [1,2,3,4,5,7,8]
current = list[0]-1
for i in list:
if i != current+1:
print current+1
break
current = i

The naive way is to traverse the list which is an O(n) solution. However, since the list is sorted, you can use this feature to perform binary search (a modified version for it). Basically, you are looking for the last occurance of A[i] = i.
The pseudo algorithm will be something like:
binarysearch(A):
start = 0
end = len(A) - 1
while(start <= end ):
mid = (start + end) / 2
if(A[mid] == mid):
result = A[mid]
start = mid + 1
else: #A[mid] > mid since there is no way A[mid] is less than mid
end = mid - 1
return (result + 1)
This is an O(log n) solution. I assumed lists are one indexed. You can modify the indices accordingly
EDIT: if the list is not sorted, you can use the heapq python library and store the list in a min-heap and then pop the elements one by one
pseudo code
H = heapify(A) //Assuming A is the list
count = 1
for i in range(len(A)):
if(H.pop() != count): return count
count += 1

sort + reduce to the rescue!
from functools import reduce # python3
myList = [1,14,2,5,3,7,8,12]
res = 1 + reduce(lambda x, y: x if y-x>1 else y, sorted(myList), 0)
print(res)
Unfortunatelly it won't stop after match is found and will iterate whole list.
Faster (but less fun) is to use for loop:
myList = [1,14,2,5,3,7,8,12]
res = 0
for num in sorted(myList):
if num - res > 1:
break
res = num
res = res + 1
print(res)

you can try this
for i in range(1,max(arr1)+2):
if i not in arr1:
print(i)
break

Easy to read, easy to understand, gets the job done:
def solution(A):
smallest = 1
unique = set(A)
for int in unique:
if int == smallest:
smallest += 1
return smallest

Keep incrementing a counter in a loop until you find the first positive integer that's not in the list.
def getSmallestIntNotInList(number_list):
"""Returns the smallest positive integer that is not in a given list"""
i = 0
while True:
i += 1
if i not in number_list:
return i
print(getSmallestIntNotInList([1,14,2,5,3,7,8,12]))
# 4
I found that this had the fastest performance compared to other answers on this post. I tested using timeit in Python 3.10.8. My performance results can be seen below:
import timeit
def findSmallestIntNotInList(number_list):
# Infinite while-loop until first number is found
i = 0
while True:
i += 1
if i not in number_list:
return i
t = timeit.Timer(lambda: findSmallestIntNotInList([1,14,2,5,3,7,8,12]))
print('Execution time:', t.timeit(100000), 'seconds')
# Execution time: 0.038100800011307 seconds
import timeit
def findSmallestIntNotInList(number_list):
# Loop with a range to len(number_list)+1
for i in range (1, len(number_list)+1):
if i not in number_list:
return i
t = timeit.Timer(lambda: findSmallestIntNotInList([1,14,2,5,3,7,8,12]))
print('Execution time:', t.timeit(100000), 'seconds')
# Execution time: 0.05068870005197823 seconds
import timeit
def findSmallestIntNotInList(number_list):
# Loop with a range to max(number_list) (by silgon)
# https://stackoverflow.com/a/49649558/3357935
for i in range (1, max(number_list)):
if i not in number_list:
return i
t = timeit.Timer(lambda: findSmallestIntNotInList([1,14,2,5,3,7,8,12]))
print('Execution time:', t.timeit(100000), 'seconds')
# Execution time: 0.06317249999847263 seconds
import timeit
from itertools import count, filterfalse
def findSmallestIntNotInList(number_list):
# iterate the first number not in set (by Antti Haapala -- Слава Україні)
# https://stackoverflow.com/a/28178803/3357935
return(next(filterfalse(set(number_list).__contains__, count(1))))
t = timeit.Timer(lambda: findSmallestIntNotInList([1,14,2,5,3,7,8,12]))
print('Execution time:', t.timeit(100000), 'seconds')
# Execution time: 0.06515420007053763 seconds
import timeit
def findSmallestIntNotInList(number_list):
# Use property of sets (by Bhargav Rao)
# https://stackoverflow.com/a/28176962/3357935
m = range(1, len(number_list))
return min(set(m)-set(number_list))
t = timeit.Timer(lambda: findSmallestIntNotInList([1,14,2,5,3,7,8,12]))
print('Execution time:', t.timeit(100000), 'seconds')
# Execution time: 0.08586219989228994 seconds

The easiest way would be just to loop through the sorted list and check if the index is equal the value and if not return the index as solution.
This would have complexity O(nlogn) because of the sorting:
for index,value in enumerate(sorted(myList)):
if index is not value:
print(index)
break
Another option is to use python sets which are somewhat dictionaries without values, just keys. In dictionaries you can look for a key in constant time which make the whol solution look like the following, having only linear complexity O(n):
mySet = set(myList)
for i in range(len(mySet)):
if i not in mySet:
print(i)
break
Edit:
If the solution should also deal with lists where no number is missing (e.g. [0,1]) and output the next following number and should also correctly consider 0, then a complete solution would be:
def find_smallest_positive_number_not_in_list(myList):
mySet = set(myList)
for i in range(1, max(mySet)+2):
if i not in mySet:
return i

A solution that returns all those values is
free_values = set(range(1, max(L))) - set(L)
it does a full scan, but those loops are implemented in C and unless the list or its maximum value are huge this will be a win over more sophisticated algorithms performing the looping in Python.
Note that if this search is needed to implement "reuse" of IDs then keeping a free list around and maintaining it up-to-date (i.e. adding numbers to it when deleting entries and picking from it when reusing entries) is a often a good idea.

The following solution loops all numbers in between 1 and the length of the input list and breaks the loop whenever a number is not found inside it. Otherwise the result is the length of the list plus one.
listOfNumbers=[1,14,2,5,3,7,8,12]
for i in range(1, len(listOfNumbers)+1):
if not i in listOfNumbers:
nextNumber=i
break
else:
nextNumber=len(listOfNumbers)+1

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Most efficient way to remove entries in a list - python

Related

If the input number is in the list add its index to a new one

Matching two lists containing slightly differing float values by allowing a tolerance

python intersection of lists while not having the same index

Python: Fastest Way to compare arrays elementwise

Find the smallest positive number not in list

Categories

Resources