Related
Can some one please let me know the time complexity of below code.
nums=[1,2,4,6,180,290,1249]
ll=[]
l=[]
for i in nums:
for j in range(1,int(sqrt(i))+1):
if(i%j==0):
l.append(j)
ll.append(l.copy())
l.clear()
print(ll)
pass
There are three main operations that are going to determine the time complexity.
The outer loop for i in nums is O(N) where N = len(nums)
The inner loop for j in range(1,int(sqrt(i))+1)
Within the first loop we also have ll.append(l.copy()), where l.copy() is an O(k) operation where k == len(l)
Let N = len(nums), M = sqrt(max(nums)), and K = the length of the longest list l being copied.
As M and K are at the same level, this starts us at O(N * (M+K))
However, K is dependent on M, and will always be smaller (K is the number of factors of i <= sqrt(i)), so we can effectively ignore it.
This results in the a complexity of O(N * M), where N = len(nums), and M = sqrt(max(nums))
I am trying to count the number of unique numbers in a sorted array using binary search. I need to get the edge of the change from one number to the next to count. I was thinking of doing this without using recursion. Is there an iterative approach?
def unique(x):
start = 0
end = len(x)-1
count =0
# This is the current number we are looking for
item = x[start]
while start <= end:
middle = (start + end)//2
if item == x[middle]:
start = middle+1
elif item < x[middle]:
end = middle -1
#when item item greater, change to next number
count+=1
# if the number
return count
unique([1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,5,5,5,5,5,5,5,5,5,5])
Thank you.
Edit: Even if the runtime benefit is negligent from o(n), what is my binary search missing? It's confusing when not looking for an actual item. How can I fix this?
Working code exploiting binary search (returns 3 for given example).
As discussed in comments, complexity is about O(k*log(n)) where k is number of unique items, so this approach works well when k is small compared with n, and might become worse than linear scan in case of k ~ n
def countuniquebs(A):
n = len(A)
t = A[0]
l = 1
count = 0
while l < n - 1:
r = n - 1
while l < r:
m = (r + l) // 2
if A[m] > t:
r = m
else:
l = m + 1
count += 1
if l < n:
t = A[l]
return count
print(countuniquebs([1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,5,5,5,5,5,5,5,5,5,5]))
I wouldn't quite call it "using a binary search", but this binary divide-and-conquer algorithm works in O(k*log(n)/log(k)) time, which is better than a repeated binary search, and never worse than a linear scan:
def countUniques(A, start, end):
len = end-start
if len < 1:
return 0
if A[start] == A[end-1]:
return 1
if len < 3:
return 2
mid = start + len//2
return countUniques(A, start, mid+1) + countUniques(A, mid, end) - 1
A = [1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,3,4,5,5,5,5,5,5,5,5,5,5]
print(countUniques(A,0,len(A)))
I need to write a function that returns the number of ways of reaching a certain number by adding numbers of a list. For example:
print(p([3,5,8,9,11,12,20], 20))
should return:5
The code I wrote is:
def pow(lis):
power = [[]]
for lst in lis:
for po in power:
power = power + [list(po)+[lst]]
return power
def p(lst, n):
counter1 = 0
counter2 = 0
power_list = pow(lst)
print(power_list)
for p in power_list:
for j in p:
counter1 += j
if counter1 == n:
counter2 += 1
counter1 == 0
else:
counter1 == 0
return counter2
pow() is a function that returns all of the subsets of the list and p should return the number of ways to reach the number n. I keep getting an output of zero and I don't understand why. I would love to hear your input for this.
Thanks in advance.
There are two typos in your code: counter1 == 0 is a boolean, it does not reset anything.
This version should work:
def p(lst, n):
counter2 = 0
power_list = pow(lst)
for p in power_list:
counter1 = 0 #reset the counter for every new subset
for j in p:
counter1 += j
if counter1 == n:
counter2 += 1
return counter2
As tobias_k and Faibbus mentioned, you have a typo: counter1 == 0 instead of counter1 = 0, in two places. The counter1 == 0 produces a boolean object of True or False, but since you don't assign the result of that expression the result gets thrown away. It doesn't raise a SyntaxError, since an expression that isn't assigned is legal Python.
As John Coleman and B. M. mention it's not efficient to create the full powerset and then test each subset to see if it has the correct sum. This approach is ok if the input sequence is small, but it's very slow for even moderately sized sequences, and if you actually create a list containing the subsets rather than using a generator and testing the subsets as they're yielded you'll soon run out of RAM.
B. M.'s first solution is quite efficient since it doesn't produce subsets that are larger than the target sum. (I'm not sure what B. M. is doing with that dict-based solution...).
But we can enhance that approach by sorting the list of sums. That way we can break out of the inner for loop as soon as we detect a sum that's too high. True, we need to sort the sums list on each iteration of the outer for loop, but fortunately Python's TimSort is very efficient, and it's optimized to handle sorting a list that contains sorted sub-sequences, so it's ideal for this application.
def subset_sums(seq, goal):
sums = [0]
for x in seq:
subgoal = goal - x
temp = []
for y in sums:
if y > subgoal:
break
temp.append(y + x)
sums.extend(temp)
sums.sort()
return sum(1 for y in sums if y == goal)
# test
lst = [3, 5, 8, 9, 11, 12, 20]
total = 20
print(subset_sums(lst, total))
lst = range(1, 41)
total = 70
print(subset_sums(lst, total))
output
5
28188
With lst = range(1, 41) and total = 70, this code is around 3 times faster than the B.M. lists version.
A one pass solution with one counter, which minimize additions.
def one_pass_sum(L,target):
sums = [0]
cnt = 0
for x in L:
for y in sums[:]:
z = x+y
if z <= target :
sums.append(z)
if z == target : cnt += 1
return cnt
This way if n=len(L), you make less than 2^n additions against n/2 * 2^n by calculating all the sums.
EDIT :
A more efficient solution, that just counts ways. The idea is to see that if there is k ways to make z-x, there is k more way to do z when x arise.
def enhanced_sum_with_lists(L,target):
cnt=[1]+[0]*target # 1 way to make 0
for x in L:
for z in range(target,x-1,-1): # [target, ..., x+1, x]
cnt[z] += cnt[z-x]
return cnt[target]
But order is important : z must be considered descendant here, to have the good counts (Thanks to PM 2Ring).
This can be very fast (n*target additions) for big lists.
For example :
>>> enhanced_sum_with_lists(range(1,100),2500)
875274644371694133420180815
is obtained in 61 ms. It will take the age of the universe to compute it by the first method.
from itertools import chain, combinations
def powerset_generator(i):
for subset in chain.from_iterable(combinations(i, r) for r in range(len(i)+1)):
yield set(subset)
def count_sum(s, cnt):
return sum(1 for i in powerset_generator(s) if sum(k for k in i) == cnt)
print(count_sum(set([3,5,8,9,11,12,20]), 20))
So I'm working on some practice problems and having trouble reducing the complexity. I am given an array of distinct integers a[] and a threshold value T. I need to find the number of triplets i,j,k such that a[i] < a[j] < a[k] and a[i] + a[j] + a[k] <= T. I've gotten this down from O(n^3) to O(n^2 log n) with the following python script. I'm wondering if I can optimize this any further.
import sys
import bisect
first_line = sys.stdin.readline().strip().split(' ')
num_numbers = int(first_line[0])
threshold = int(first_line[1])
count = 0
if num_numbers < 3:
print count
else:
numbers = sys.stdin.readline().strip().split(' ')
numbers = map(int, numbers)
numbers.sort()
for i in xrange(num_numbers - 2):
for j in xrange(i+1, num_numbers - 1):
k_1 = threshold - (numbers[i] + numbers[j])
if k_1 < numbers[j]:
break
else:
cross_thresh = bisect.bisect(numbers,k_1) - (j+1)
if cross_thresh > 0:
count += cross_thresh
print count
In the above example, the first input line simply provides the number of numbers and the threshold. The next line is the full list. If the list is less than 3, there is no triplets that can exist, so we return 0. If not, we read in the full list of integers, sort them, and then process them as follows: we iterate over every element of i and j (such that i < j) and we compute the highest value of k that would not break i + j + k <= T. We then find the index (s) of the first element in the list that violates this condition and take all the elements between j and s and add them to the count. For 30,000 elements in a list, this takes about 7 minutes to run. Is there any way to make it faster?
You are performing binary search for each (i,j) pair to find the corresponding value for k. Hence O(n^2 log(n)).
I can suggest an algorithm that will have the worst case time complexity of O(n^2).
Assume the list is sorted from left to right and elements are numbered from 1 to n. Then the pseudo code is:
for i = 1 to n - 2:
j = i + 1
find maximal k with binary search
while j < k:
j = j + 1
find maximal k with linear search to the left, starting from last k position
The reason this has the worst case time complexity of O(n^2) and not O(n^3) is because the position k is monotonically decreasing. Thus even with linear scanning, you are not spending O(n) for each (i,j) pair. Rather, you are spending a total of O(n) time to scan for k for each distinct i value.
O(n^2) version implemented in Python (based on wookie919's answer):
def triplets(N, T):
N = sorted(N)
result = 0
for i in xrange(len(N)-2):
k = len(N)-1
for j in xrange(i+1, len(N)-1):
while k>=0 and N[i]+N[j]+N[k]>T:
k-=1
result += max(k, j)-j
return result
import random
sample = random.sample(xrange(1000000), 30000)
print triplets(sample, 500000)
What is the fastest way to sort an array of whole integers bigger than 0 and less than 100000 in Python? But not using the built in functions like sort.
Im looking at the possibility to combine 2 sport functions depending on input size.
If you are interested in asymptotic time, then counting sort or radix sort provide good performance.
However, if you are interested in wall clock time you will need to compare performance between different algorithms using your particular data sets, as different algorithms perform differently with different datasets. In that case, its always worth trying quicksort:
def qsort(inlist):
if inlist == []:
return []
else:
pivot = inlist[0]
lesser = qsort([x for x in inlist[1:] if x < pivot])
greater = qsort([x for x in inlist[1:] if x >= pivot])
return lesser + [pivot] + greater
Source: http://rosettacode.org/wiki/Sorting_algorithms/Quicksort#Python
Since you know the range of numbers, you can use Counting Sort which will be linear in time.
Radix sort theoretically runs in linear time (sort time grows roughly in direct proportion to array size ), but in practice Quicksort is probably more suited, unless you're sorting absolutely massive arrays.
If you want to make quicksort a bit faster, you can use insertion sort] when the array size becomes small.
It would probably be helpful to understand the concepts of algorithmic complexity and Big-O notation too.
Early versions of Python used a hybrid of samplesort (a variant of quicksort with large sample size) and binary insertion sort as the built-in sorting algorithm. This proved to be somewhat unstable. S0, from python 2.3 onward uses adaptive mergesort algorithm.
Order of mergesort (average) = O(nlogn).
Order of mergesort (worst) = O(nlogn).
But Order of quick sort (worst) = n*2
if you uses list=[ .............. ]
list.sort() uses mergesort algorithm.
For comparison between sorting algorithm you can read wiki
For detail comparison comp
I might be a little late to the show, but there's an interesting article that compares different sorts at https://www.linkedin.com/pulse/sorting-efficiently-python-lakshmi-prakash
One of the main takeaways is that while the default sort does great we can do a little better with a compiled version of quicksort. This requires the Numba package.
Here's a link to the Github repo:
https://github.com/lprakash/Sorting-Algorithms/blob/master/sorts.ipynb
We can use count sort using a dictionary to minimize the additional space usage, and keep the running time low as well. The count sort is much slower for small sizes of the input array because of the python vs C implementation overhead. The count sort starts to overtake the regular sort when the size of the array (COUNT) is about 1 million.
If you really want huge speedups for smaller size inputs, implement the count sort in C and call it from Python.
(Fixed a bug which Aaron (+1) helped catch ...)
The python only implementation below compares the 2 approaches...
import random
import time
COUNT = 3000000
array = [random.randint(1,100000) for i in range(COUNT)]
random.shuffle(array)
array1 = array[:]
start = time.time()
array1.sort()
end = time.time()
time1 = (end-start)
print 'Time to sort = ', time1*1000, 'ms'
array2 = array[:]
start = time.time()
ardict = {}
for a in array2:
try:
ardict[a] += 1
except:
ardict[a] = 1
indx = 0
for a in sorted(ardict.keys()):
b = ardict[a]
array2[indx:indx+b] = [a for i in xrange(b)]
indx += b
end = time.time()
time2 = (end-start)
print 'Time to count sort = ', time2*1000, 'ms'
print 'Ratio =', time2/time1
The built in functions are best, but since you can't use them have a look at this:
http://en.wikipedia.org/wiki/Quicksort
def sort(l):
p = 0
while(p<len(l)-1):
if(l[p]>l[p+1]):
l[p],l[p+1] = l[p+1],l[p]
if(not(p==0)):
p = p-1
else:
p += 1
return l
this is a algorithm that I created but is really fast. just do sort(l)
l being the list that you want to sort.
#fmark
Some benchmarking of a python merge-sort implementation I wrote against python quicksorts from http://rosettacode.org/wiki/Sorting_algorithms/Quicksort#Python
and from top answer.
Size of the list and size of numbers in list irrelevant
merge sort wins, however it uses builtin int() to floor
import numpy as np
x = list(np.random.rand(100))
# TEST 1, merge_sort
def merge(l, p, q, r):
n1 = q - p + 1
n2 = r - q
left = l[p : p + n1]
right = l[q + 1 : q + 1 + n2]
i = 0
j = 0
k = p
while k < r + 1:
if i == n1:
l[k] = right[j]
j += 1
elif j == n2:
l[k] = left[i]
i += 1
elif left[i] <= right[j]:
l[k] = left[i]
i += 1
else:
l[k] = right[j]
j += 1
k += 1
def _merge_sort(l, p, r):
if p < r:
q = int((p + r)/2)
_merge_sort(l, p, q)
_merge_sort(l, q+1, r)
merge(l, p, q, r)
def merge_sort(l):
_merge_sort(l, 0, len(l)-1)
# TEST 2
def quicksort(array):
_quicksort(array, 0, len(array) - 1)
def _quicksort(array, start, stop):
if stop - start > 0:
pivot, left, right = array[start], start, stop
while left <= right:
while array[left] < pivot:
left += 1
while array[right] > pivot:
right -= 1
if left <= right:
array[left], array[right] = array[right], array[left]
left += 1
right -= 1
_quicksort(array, start, right)
_quicksort(array, left, stop)
# TEST 3
def qsort(inlist):
if inlist == []:
return []
else:
pivot = inlist[0]
lesser = qsort([x for x in inlist[1:] if x < pivot])
greater = qsort([x for x in inlist[1:] if x >= pivot])
return lesser + [pivot] + greater
def test1():
merge_sort(x)
def test2():
quicksort(x)
def test3():
qsort(x)
if __name__ == '__main__':
import timeit
print('merge_sort:', timeit.timeit("test1()", setup="from __main__ import test1, x;", number=10000))
print('quicksort:', timeit.timeit("test2()", setup="from __main__ import test2, x;", number=10000))
print('qsort:', timeit.timeit("test3()", setup="from __main__ import test3, x;", number=10000))
Bucket sort with bucket size = 1. Memory is O(m) where m = the range of values being sorted. Running time is O(n) where n = the number of items being sorted. When the integer type used to record counts is bounded, this approach will fail if any value appears more than MAXINT times.
def sort(items):
seen = [0] * 100000
for item in items:
seen[item] += 1
index = 0
for value, count in enumerate(seen):
for _ in range(count):
items[index] = value
index += 1