I have a list where I would like to compare each element of the list with each other. I know we can do that using a nested loop but the time complexity is O(n^2). Is there any option to improve the time complexity and make the comparisons efficient?
For example:
I have a list where I would like to find the difference in digits among each element. Consider a list array=[100,110,010,011,100] where I am trying to find the difference in the digits among each integer. array[0] is same as array[4] (i.e 100 and 100), while array[0] has 1 digit that is different from array[1] (i.e 100 and 110) and array[0] has 3 digits that are different from array[3] (i.e 100 and 011). Assuming similar integers are defined as integers that have either identical or the difference in digits is just 1, I would like to return a list as output, where every element denotes the integers with similar digits (i.e difference in digits <=1).
For the input list array=[100,110,010,011,100], my expected output should be [2,3,2,1,2]. In the output list, the output[0] indicates that array[0] is similar to array[1] and array[4] (i.e similar to 100 , we have 2 other integers 110,100 in the list)
This is my code that works, though very inefficient O(n^2):
def diff(a,b):
difference= [i for i in range(len(a)) if a[i]!=b[i]]
return len(difference)
def find_similarity_int(array):
# write your code in Python 3.6
res=[0]*len(array)
string=[]
for n in array:
string.append(str(n))
for i in range(0,len(string)):
for j in range(i+1,len(string)):
count=diff(string[i],string[j])
if(count<=1):
res[i]=res[i]+1
res[j]=res[j]+1
return res
input_list=['100','110','010','011','100']
output=find_similarity_int(input_list)
print("The similarity metrics for the given list is : ",output)
Output:
The similarity metrics for the given list is : [2, 3, 2, 1, 2]
Could anyone please suggest an efficient way to make the comparison, preferably with just 1 loop? Thanks!
If the values are binary digits only, you can get a O(nxm) solution (where m is the width of the values) using a multiset (Counter from collections). With the count of values in the multiset, add the counts of items that correspond to exactly one bit change in each number (plus the number of duplicates):
from collections import Counter
def simCount(L):
counts = Counter(L) # multiset of distinct values / count
result = []
for n in L:
r = counts[n]-1 # duplicates
for i,b in enumerate(n): # 1 bit changes
r += counts[n[:i]+"01"[b=="0"]+n[i+1:]] # count others
result.append(r) # sum of similars
return result
Output:
A = ['100','110','010','011','100']
print(simCount(A)) # [2, 3, 2, 1, 2]
To avoid the string manipulations on every item, you can convert them to integers and use bitwise operators to make the 1-bit changes:
from collections import Counter
def simCount(L):
bits = [1<<i for i in range(len(L[0]))] # bit masks
L = [int(n,2) for n in L] # numeric values
counts = Counter(L) # multiset n:count
result = []
for n in L:
result.append(counts[n]-1) # duplicates
for b in bits: # 1 bit changes
result[-1] += counts[b^n] # sum similars
return result
A = ['100','110','010','011','100']
print(simCount(A)) # [2, 3, 2, 1, 2]
I was trying an online test. the test asked to write a function that given a list of up to 100000 integers whose range is 1 to 100000, would find the first missing integer.
for example, if the list is [1,4,5,2] the output should be 3.
I iterated over the list as follow
def find_missing(num)
for i in range(1, 100001):
if i not in num:
return i
the feedback I receives is the code is not efficient in handling big lists.
I am quite new and I couldnot find an answer, how can I iterate more efficiently?
The first improvement would be to make yours linear by using a set for the repeated membership test:
def find_missing(nums)
s = set(nums)
for i in range(1, 100001):
if i not in s:
return i
Given how C-optimized python sorting is, you could also do sth like:
def find_missing(nums)
s = sorted(set(nums))
return next(i for i, n in enumerate(s, 1) if i != n)
But both of these are fairly space inefficient as they create a new collection. You can avoid that with an in-place sort:
from itertools import groupby
def find_missing(nums):
nums.sort() # in-place
return next(i for i, (k, _) in enumerate(groupby(nums), 1) if i != k)
For any range of numbers, the sum is given by Gauss's formula:
# sum of all numbers up to and including nums[-1] minus
# sum of all numbers up to but not including nums[-1]
expected = nums[-1] * (nums[-1] + 1) // 2 - nums[0] * (nums[0] - 1) // 2
If a number is missing, the actual sum will be
actual = sum(nums)
The difference is the missing number:
result = expected - actual
This compulation is O(n), which is as efficient as you can get. expected is an O(1) computation, while actual has to actually add up the elements.
A somewhat slower but similar complexity approach would be to step along the sequence in lockstep with either a range or itertools.count:
for a, e in zip(nums, range(nums[0], len(nums) + nums[0])):
if a != e:
return e # or break if not in a function
Notice the difference between a single comparison a != e, vs a linear containment check like e in nums, which has to iterate on average through half of nums to get the answer.
You can use Counter to count every occurrence of your list. The minimum number with occurrence 0 will be your output. For example:
from collections import Counter
def find_missing():
count = Counter(your_list)
keys = count.keys() #list of every element in increasing order
main_list = list(range(1:100000)) #the list of values from 1 to 100k
missing_numbers = list(set(main_list) - set(keys))
your_output = min(missing_numbers)
return your_output
I have a list of points and I want to keep the points of the list only if the distance between them is greater than a certain threshold. So, starting from the first point, if the the distance between the first point and the second is less than the threshold then I would remove the second point then compute the distance between the first one and the third one. If this distance is less than the threshold, compare the first and fourth point. Else move to the distance between the third and fourth and so on.
So for example, if the threshold is 2 and I have
list = [1, 2, 5, 6, 10]
then I would expect
new_list = [1, 5, 10]
Thank you!
Not a fancy one-liner, but you can just iterate the values in the list and append them to some new list if the current value is greater than the last value in the new list, using [-1]:
lst = range(10)
diff = 3
new = []
for n in lst:
if not new or abs(n - new[-1]) >= diff:
new.append(n)
Afterwards, new is [0, 3, 6, 9].
Concerning your comment "What if i had instead a list of coordinates (x,y)?": In this case you do exactly the same thing, except that instead of just comparing the numbers, you have to find the Euclidean distance between two points. So, assuming lst is a list of (x,y) pairs:
if not new or ((n[0]-new[-1][0])**2 + (n[1]-new[-1][1])**2)**.5 >= diff:
Alternatively, you can convert your (x,y) pairs into complex numbers. For those, basic operations such as addition, subtraction and absolute value are already defined, so you can just use the above code again.
lst = [complex(x,y) for x,y in lst]
new = []
for n in lst:
if not new or abs(n - new[-1]) >= diff: # same as in the first version
new.append(n)
print(new)
Now, new is a list of complex numbers representing the points: [0j, (3+3j), (6+6j), (9+9j)]
While the solution by tobias_k works, it is not the most efficient (in my opinion, but I may be overlooking something). It is based on list order and does not consider that the element which is close (within threshold) to the maximum number of other elements should be eliminated the last in the solution. The element that has the least number of such connections (or proximities) should be considered and checked first. The approach I suggest will likely allow retaining the maximum number of points that are outside the specified thresholds from other elements in the given list. This works very well for list of vectors and therefore x,y or x,y,z coordinates. If however you intend to use this solution with a list of scalars, you can simply include this line in the code orig_list=np.array(orig_list)[:,np.newaxis].tolist()
Please see the solution below:
import numpy as np
thresh = 2.0
orig_list=[[1,2], [5,6], ...]
nsamp = len(orig_list)
arr_matrix = np.array(orig_list)
distance_matrix = np.zeros([nsamp, nsamp], dtype=np.float)
for ii in range(nsamp):
distance_matrix[:, ii] = np.apply_along_axis(lambda x: np.linalg.norm(np.array(x)-np.array(arr_matrix[ii, :])),
1,
arr_matrix)
n_proxim = np.apply_along_axis(lambda x: np.count_nonzero(x < thresh),
0,
distance_matrix)
idx = np.argsort(n_proxim).tolist()
idx_out = list()
for ii in idx:
for jj in range(ii+1):
if ii not in idx_out:
if self.distance_matrix[ii, jj] < thresh:
if ii != jj:
idx_out.append(jj)
pop_idx = sorted(np.unique(idx_out).tolist(),
reverse=True)
for pop_id in pop_idx:
orig_list.pop(pop_id)
nsamp = len(orig_list)
I found this code on this site to find the second largest number:
def second_largest(numbers):
m1, m2 = None, None
for x in numbers:
if x >= m1:
m1, m2 = x, m1
elif x > m2:
m2 = x
return m2
Source: Get the second largest number in a list in linear time
Is it possible to modify this code to find the second smallest number? So for example
print second_smallest([1, 2, 3, 4])
2
a = [6,5,4,4,2,1,10,1,2,48]
s = set(a) # used to convert any of the list/tuple to the distinct element and sorted sequence of elements
# Note: above statement will convert list into sets
print sorted(s)[1]
The function can indeed be modified to find the second smallest:
def second_smallest(numbers):
m1 = m2 = float('inf')
for x in numbers:
if x <= m1:
m1, m2 = x, m1
elif x < m2:
m2 = x
return m2
The old version relied on a Python 2 implementation detail that None is always sorted before anything else (so it tests as 'smaller'); I replaced that with using float('inf') as the sentinel, as infinity always tests as larger than any other number. Ideally the original function should have used float('-inf') instead of None there, to not be tied to an implementation detail other Python implementations may not share.
Demo:
>>> def second_smallest(numbers):
... m1 = m2 = float('inf')
... for x in numbers:
... if x <= m1:
... m1, m2 = x, m1
... elif x < m2:
... m2 = x
... return m2
...
>>> print(second_smallest([1, 2, 3, 4]))
2
Outside of the function you found, it's almost just as efficient to use the heapq.nsmallest() function to return the two smallest values from an iterable, and from those two pick the second (or last) value. I've included a variant of the unique_everseen() recipe to filter out duplicate numbers:
from heapq import nsmallest
from itertools import filterfalse
def second_smallest(numbers):
s = set()
sa = s.add
un = (sa(n) or n for n in filterfalse(s.__contains__, numbers))
return nsmallest(2, un)[-1]
Like the above implementation, this is a O(N) solution; keeping the heap variant each step takes logK time, but K is a constant here (2)!
Whatever you do, do not use sorting; that takes O(NlogN) time.
Or just use heapq:
import heapq
def second_smallest(numbers):
return heapq.nsmallest(2, numbers)[-1]
second_smallest([1, 2, 3, 4])
# Output: 2
As per the Python in-built function sorted
sorted(my_list)[0]
gives back the smallest number, and sorted(my_list)[1] does accordingly for the second smallest, and so on and so forth.
My favourite way of finding the second smallest number is by eliminating the smallest number from the list and then printing the minimum from the list would return me the second smallest element of the list. The code for the task is as below:
mylist=[1,2,3,4]
mylist=[x for x in mylist if x!=min(mylist)] #deletes the min element from the list
print(min(mylist))
Solution that returns second unique number in list with no sort:
def sec_smallest(numbers):
smallest = float('+inf')
small = float('+inf')
for i in numbers:
if i < smallest:
small = smallest
smallest = i
elif i < small and i != smallest:
small = i
return small
print('Sec_smallest:', sec_smallest([1, 2, -8, -8, -2, 0]))
Yes, except that code relies on a small quirk (that raises an exception in Python 3): the fact that None compares as smaller than a number.
Another value that works is float("-inf"), which is a number that is smaller than any other number.
If you use that instead of None, and just change -inf to +inf and > to <, there's no reason it wouldn't work.
Edit: another possibility would be to simply write -x in all the comparisons on x, e.g. do if -x <= m1: et cetera.
mi= min(input_list)
second_min = float('inf')
for i in input_list:
if i != mi:
if i<second_min:
second_min=i
if second_min == float('inf'):
print('not present')
else:
print(second_min)
##input_list = [6,6,6,6,6]
#input_list = [3, 1, 4, 4, 5, 5, 5, 0, 2, 2]
#input_list = [7, 2, 0, 9, -1, 8]
# Even if there is same number in the list then Python will not get confused.
I'd like to add another, more general approach:
Here's a recursive way of finding the i-th minimums of a given list of numbers
def find_i_minimums(numbers,i):
minimum = float('inf')
if i==0:
return []
less_than_i_minimums = find_i_minimums(numbers,i-1)
for element in numbers:
if element not in less_than_i_minimums and element < minimum:
minimum = element
return less_than_i_minimums + [minimum]
For example,
>>> find_i_minimums([0,7,4,5,21,2,6,1],3) # finding 3 minimial values for the given list
[0, 1, 2]
( And if you want only the i-th minimum number you'd extract the final value of the list )
The time-complexity of the above algorithm is bad though, it is O(N*i^2) ( Since the recursion depth is i , and at each recursive call we go over all values in 'numbers' list whose length is N and we check if the minimum element we're searching for isn't in a list of length i-1, thus the total complexity can be described by a geometric sum that will give the above mentioned complexity ).
Here's a similar but alternative-implementation whose time-complexity is O(N*i) on average. It uses python's built-in 'set' data-structure:
def find_i_minimums(numbers,i):
minimum = float('inf')
if i==0:
return set()
less_than_i_minimums = find_i_minimums(numbers,i-1)
for element in numbers:
if element not in less_than_i_minimums and element < minimum:
minimum = element
return less_than_i_minimums.union(set({minimum}))
If your 'i' is small, you can use the implementations above and then extract how many minimums you want ( or if you want the second minimum, then in your case run the code for i=2 and just extract the last element from the output data-structure ).
But if 'i' is for example greater than log(N) , I'd recommend sorting the list of numbers itself ( for example, using mergesort whose complexity is O(N*log(N)) at worst case ) and then taking the i-th element. Why so? because as stated, the run-time of the algorithm above is not great for larger values of 'i'.
You might find this code easy and understandable
def secsmall(numbers):
small = max(numbers)
for i in range(len(numbers)):
if numbers[i]>min(numbers):
if numbers[i]<small:
small = numbers[i]
return small
I am assuming "numbers" is a list name.
Find the first and the second smallest numbers in an interger array
arr= [1,2,3,4,5,6,7,-1,0,-2,-10]
def minSecondmin(arr,n):
i=1
if arr[i-1] < arr[i]:
f = arr[i-1]
s = arr[i]
else:
f=arr[i]
s=arr[i-1]
for i in range(2,n):
if arr[i]<f:
s=f
f = arr[i]
elif arr[i]<s:
s=arr[i]
return f,s
minSecondmin(arr,len(arr))
l = [41,9000,123,1337]
# second smallest
sorted(l)[1]
123
# second biggest
sorted(l)[-2]
1337
Here we want to keep an invariant while we scan the list of numbers, for every sublist it must be
m1<=m2<={all other elements}
the minimum length of a list for which the question (2nd smallest) is sensible is 2, so we establish the invariant examining the first and the second element of the list (no need for magic numbers), next we iterate on all the remaining numbers, maintaining our invariant.
def second_smaller(numbers):
# if len(numbers)<2: return None or otherwise raise an exception
m1, m2 = numbers[:2]
if m2<m1: m1, m2 = m2, m1
for x in numbers[2:]:
if x <= m1:
m1, m2 = x, m1
elif x < m2:
m2 = x
return m2
Addendum
BTW, the same reasoning should be applied to the second_largest function mentioned by the OP
I am writing the code which is using recursion to find the second smallest element in a list.
def small(l):
small.counter+=1;
min=l[0];
emp=[]
for i in range(len(l)):
if l[i]<min:
min=l[i]
for i in range(len(l)):
if min==l[i]:
emp.append(i)
if small.counter==2:
print "The Second smallest element is:"+str(min)
else:
for j in range(0,len(emp)):
l.remove(min)
small(l)
small.counter = 0
list=[-1-1-1-1-1-1-1-1-1,1,1,1,1,1]
small(list)
You can test it with various input integers.
There is a easy way to do . First sort the list and get the second item from the list.
def solution(a_list):
a_list.sort()
print a_list[1]
solution([1, 2, -8, -2, -10])
You can use in built function 'sorted'
def second_smallest(numbers):
count = 0
l = []
for i in numbers:
if(i not in l):
l.append(i)
count+=1
if(count==2):
break
return max(l)
To find second smallest in the list, use can use following approach which will work if two or more elements are repeated.
def second_smallest(numbers):
s = sorted(set(numbers))
return s[1]
Here is:
def find_second_smallest(a: list) -> int:
first, second = float('inf')
for i in range(len(a)):
if a[i] < first:
first, second = a[i], first
elif a[i] < second and a[i] != first:
second = a[i]
return second
input: [1, 1, 1, 2]
output: 2
This code is also works fine, To find the second smallest number in list.
For this code first we have to sort the values in list. after that we have to initialize the variable as second index.
l1 = [12,32,4,34,64,3,43]
for i in range(0,len(l1)):
for j in range(0,i+1):
if l1[i]<l1[j]:
l1[i],l1[j]=l1[j],l1[i]
min_val = l1[1]
for k in l1:
if min_val>k:
break
print(min_val)
def SecondSmallest(x):
lowest=min(x[0],x[1])
lowest2 = max(x[0],x[1])
for item in x:
if item < lowest:
lowest2 = lowest
lowest = item
elif lowest2 > item and item > lowest:
lowest2 = item
return lowest2
SecondSmallest([10,1,-1,2,3,4,5])